, which is one of the primary uses of web
proxies. Caching is very important as a way of speeding up transfers
and reducing the amount of data transferred across crowded links.
Once cache servers are set up, the next logical step is to use
multiple cache servers and have them coordinate operations. A lot of
active development is going on, and it's not at all clear what
protocol is going to win out in the long run.
15.5.1. Internet Cache Protocol (ICP)
ICP is the oldest of the cache
management protocols in current use and is supported by the largest
number of caches, including Netscape Proxy, Harvest, and Squid. The
principle behind ICP is that cache servers operate independently, but
when a cache server gets a request for a document that it does not
have cached, it asks other cache servers for the document, and
retrieves the document from its source only if no other cache server
has the document. ICP has a number of drawbacks; it requires a
considerable amount of communication between caches, it slows down
document retrieval, it provides no security or authentication, and it
searches the cache based only on URL, not on document header
information, which may cause it to return incorrect document
versions. On the other hand, it has the noticeable advantage of being
both standardized (it is documented in IETF RFCs 2186 and 2187) and
in widespread use.
15.5.1.1. Packet filtering characteristics of ICP
ICP normally uses UDP; the port number is configurable but defaults
to 3130. ICP can also be run over TCP, once again at any port. Caches
exchange documents via HTTP. Once again, the port used for HTTP is
configurable, but it defaults to 3128.
Direction |
Source Addr. |
Dest. Addr. |
Protocol |
Source Port |
Dest. Port |
ACK Set |
Notes |
In |
Ext |
Int |
UDP |
>1023 |
3130[44]
|
[45]
|
ICP request or response, external cache to internal cache |
Out |
Int |
Ext |
UDP |
3130[44] |
>1023 |
[45] |
ICP request or response, internal cache to external cache |
In |
Ext |
Int |
TCP |
>1023 |
3128[46]
|
[47]
|
HTTP request, external cache to internal cache |
Out |
Int |
Ext |
TCP |
3128[46] |
>1023 |
Yes |
HTTP response, internal cache to external cache |
Out |
Int |
Ext |
TCP |
>1023 |
3128[46] |
[47] |
HTTP request, internal cache to external cache |
In |
Ext |
Int |
TCP |
3128[46] |
>1023 |
Yes |
HTTP response, external cache to internal cache |
[44]3130 is the standard port number for ICP, but
some servers run on different port numbers.
15.5.1.2. Proxying characteristics of ICP
ICP, like SMTP and NNTP, is a self-proxying protocol, one that allows
for queries to be passed from server to server. In general, if you
are configuring ICP in a firewall environment, you will use this
facility and set all internal cache servers to peer with a cache
server that's part of the firewall and serves as a proxy.
Since ICP is a straightforward TCP-based protocol, it would also be
possible to proxy it through a proxy system like SOCKS; the only
difficulty is that you would end up with a one-way relationship,
since the external cache would not be able to send queries to the
internal cache. This would slow down performance without providing
any more security than doing self-proxying, and no current
implementations support it.
15.5.1.3. Network address translation characteristics of ICP
ICP does contain embedded IP addresses, but they aren't
actually used for anything. It will work without problems through
network address translation systems, as long as you configure a
static translation (to allow for requests from other peers) and
don't mind the fact that the internal address will be visible
to anybody watching traffic.
15.5.2. Cache Array Routing Protocol (CARP)
CARP uses a completely
different approach. Rather than having caches communicate with each
other, CARP does load balancing between multiple cache servers by
having a client or a proxy server use different caches for different
requests, depending on the URL being requested and published
information about the cache server. The information about available
cache servers is distributed through HTTP, so CARP adds no extra
protocol complexity. For both packet filtering and proxying, CARP is
identical to other uses of HTTP. However, CARP does have difficulties
with network address translation, since the documents it uses are
guaranteed to have IP addresses in them (the addresses of the cache
servers). Netscape and Microsoft both support CARP as well as ICP.
15.5.3. Web Cache Coordination Protocol (WCCP)
WCCP is a protocol
developed by Cisco, which takes a third completely different
approach. In order to use WCCP, you need a router that is placed so
that it can intercept all HTTP traffic that should be handled by your
cache servers. The router will detect any packet addressed to TCP
port 80 at any destination and redirect the packet to a cache server.
The cache server then replies directly to the requestor as if the
request had been received normally. WCCP is used for communication
between the router and the cache servers, so that the router knows
what cache servers are currently running, what load each one is
running under, and which URLs should be directed to which servers,
and can appropriately balance traffic.
15.5.3.1. Packet filtering characteristics of WCCP
WCCP uses UDP at port 2048. In addition, routers that use WCCP
redirect HTTP traffic to cache servers by encapsulating it in GRE
packets (GRE is a form of IP over IP, discussed in
Chapter 4, "Packets and Protocols "). WCCP uses GRE protocol type hexadecimal 883E.
Note that neither UDP nor GRE uses ACK bits.
Direction |
Source Addr. |
Dest. Addr. |
Protocol |
Source Port |
Dest. Port |
Notes |
In |
Ext |
Int |
UDP |
[48]
|
2048 |
WCCP update, external participant to internal participant |
Out |
Int |
Ext |
UDP |
[48] |
2048 |
WCCP update, internal participant to external participant |
In |
Ext |
Int |
GRE |
[49]
|
[49] |
HTTP query redirected by external router to internal cache server |
Out |
Int |
Ext |
GRE |
[49] |
[49] |
HTTP query redirected by internal router to external cache server |
[48]The WCCP protocol does not define a source port; it
is likely to be 2048.
15.5.3.2. Proxying characteristics of WCCP
Because WCCP uses both UDP and GRE, it is going to be difficult to
proxy. Although UDP proxies have become relatively common, GRE is
still unknown territory for proxy servers.
15.5.3.3. Network address translation characteristics of WCCP
WCCP communications include embedded IP addresses and will not work
through network address translation. The architecture of WCCP assumes
that your router and your cache servers are near each other (in
network terms) in any case.