squid/FAQ.sgml
cvsdist d0eb593269 auto-import changelog data from squid-2.3.STABLE4-9.7.src.rpm
Thu Jul 12 2001 Bill Nottingham <notting@redhat.com>
- build for 7.0 (security fix in accel_only_access patch)
Fri Mar 02 2001 Nalin Dahyabhai <nalin@redhat.com>
- rebuild in new environment
Tue Feb 06 2001 Trond Eivind Glomsrød <teg@redhat.com>
- improve i18n
- make the initscript use the standard OK/FAILED
Tue Jan 23 2001 Bill Nottingham <notting@redhat.com>
- change i18n mechanism
Fri Jan 19 2001 Bill Nottingham <notting@redhat.com>
- fix path references in QUICKSTART (#15114)
- fix initscript translations (#24086)
- fix shutdown logic (#24234), patch from <jos@xos.nl>
- add /etc/sysconfig/squid for daemon options & shutdown timeouts
- three more bugfixes from the Squid people
- update FAQ.sgml
- build and ship auth modules (#23611)
Thu Jan 11 2001 Bill Nottingham <notting@redhat.com>
- initscripts translations
Mon Jan 08 2001 Bill Nottingham <notting@redhat.com>
- add patch to use mkstemp (greg@wirex.com)
Fri Dec 01 2000 Bill Nottingham <notting@redhat.com>
- rebuild because of broken fileutils
Sat Nov 11 2000 Bill Nottingham <notting@redhat.com>
- fix the acl matching cases (only need the second patch)
Tue Nov 07 2000 Bill Nottingham <notting@redhat.com>
- add two patches to fix domain ACLs
- add 2 bugfix patches from the squid people
2004-09-09 12:36:20 +00:00

12305 lines
421 KiB
Plaintext

<!doctype linuxdoc system>
<article>
<titlepag>
<TITLE>SQUID Frequently Asked Questions</TITLE>
<author>&copy; 2001 Duane Wessels, <tt/wessels@squid-cache.org/
<abstract>
Frequently Asked Questions (with answers!) about the Squid Internet
Object Cache software.
</abstract>
</titlepag>
<toc>
<p>
You can download the FAQ as
<url url="FAQ.ps.gz" name="compressed Postscript">, and
<url url="FAQ.txt" name="plain text">.
</p>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>About Squid, this FAQ, and other Squid information resources
<sect1>What is Squid?
<P>
Squid is a high-performance proxy caching server for web clients,
supporting FTP, gopher, and HTTP data objects. Unlike traditional
caching software, Squid handles all requests in a single,
non-blocking, I/O-driven process.
Squid keeps
meta data and especially hot objects cached in RAM, caches
DNS lookups, supports non-blocking DNS lookups, and implements
negative caching of failed requests.
Squid supports SSL, extensive
access controls, and full request logging. By using the
lightweight Internet Cache Protocol, Squid caches can be arranged
in a hierarchy or mesh for additional bandwidth savings.
<P>
Squid consists of a main server program <em/squid/, a Domain Name System
lookup program <em/dnsserver/, some optional programs for rewriting
requests and performing authentication, and some management and client
tools. When <em/squid/ starts up, it spawns a configurable number of
<em/dnsserver/ processes, each of which can perform a single, blocking
Domain Name System (DNS) lookup. This reduces the amount of time the
cache waits for DNS lookups.
<P>
Squid is derived from the ARPA-funded
<url url="http://harvest.cs.colorado.edu/"
name="Harvest project">.
<sect1>What is Internet object caching?
<P>
Internet object caching is a way to store requested Internet objects
(i.e., data available via the HTTP, FTP, and gopher protocols) on a
system closer to the requesting site than to the source. Web browsers
can then use the local Squid cache as a proxy HTTP server, reducing
access time as well as bandwidth consumption.
<sect1>Why is it called Squid?
<P>
Harris' Lament says, ``All the good ones are taken."
<P>
We needed to distinguish this new version from the Harvest
cache software. Squid was the code name for initial
development, and it stuck.
<sect1>What is the latest version of Squid?
<P>
Squid is updated often; please see
<url url="http://www.squid-cache.org/"
name="the Squid home page">
for the most recent versions.
<sect1>Who is responsible for Squid?
<P>
Squid is the result of efforts by numerous individuals from
the Internet community.
<url url="mailto:wessels@ircache.net"
name="Duane Wessels">
of the National Laboratory for Applied Network Research (funded by
the National Science Foundation) leads code development.
Please see
<url url="http://www.squid-cache.org/CONTRIBUTORS"
name="the CONTRIBUTORS file">
for a list of our excellent contributors.
<sect1>Where can I get Squid?
<P>
You can download Squid via FTP from
<url url="ftp://ftp.squid-cache.org/pub/"
name="the primary FTP site">
or one of the many worldwide
<url url="http://www.squid-cache.org/mirrors.html"
name="mirror sites">.
<P>
Many sushi bars also have Squid.
<sect1>What Operating Systems does Squid support?
<P>
The software is designed to operate on any modern Unix system, and
is known to work on at least the following platforms:
<itemize>
<item> Linux
<item> FreeBSD
<item> NetBSD
<item> BSDI
<item> OSF and Digital Unix
<item> IRIX
<item> SunOS/Solaris
<item> NeXTStep
<item> SCO Unix
<item> AIX
<item> HP-UX
<item> <ref id="building-os2" name="OS/2">
</itemize>
<P>
For more specific information, please see
<url url="http://www.squid-cache.org/platforms.html" name="platforms.html">.
If you encounter any platform-specific problems, please
let us know by sending email to
<url url="mailto:squid-bugs@ircache.net"
name="squid-bugs">.
<sect1>Does Squid run on Windows NT?
<P>
Recent versions of Squid will <em/compile and run/ on Windows/NT
with the
<url url="http://www.cygnus.com/misc/gnu-win32/"
name="GNU-Win32 package">.
<p>
<url url="http://www.logisense.com/" name="LogiSense">
has ported Squid to Windows NT and sells a supported
version. You can also download the source from
<url url="ftp://ftp.logisense.com/pub/cachexpress/" name="their FTP site">.
Thanks to LogiSense for making the code available as required by the GPL terms.
<p>
<url url="mailto: robert dot collins at itdomain dot com dot au" name="Robert Collins">
is working on a Windows NT port as well. You can find more information from him
at <url url="http://www.ideal.net.au/~collinsdial/Squid2.4.htm" name="his page">.
<sect1>What Squid mailing lists are available?
<P>
<itemize>
<item> squid-users@ircache.net: general discussions about the
Squid cache software. Subscribe via
<it/squid-users-request@ircache.net/.
Previous messages are available for browsing at
<url url="http://www.squid-cache.org/mail-archive/squid-users/"
name="the Squid Users Archive">,
and also at <url url="http://marc.theaimsgroup.com/?l=squid-users&amp;r=1&amp;w=2" name="theaimsgroup.com">.
<item>
squid-users-digest: digested (daily) version of
above. Subscribe via
<it/squid-users-digest-request@ircache.net/.
<item>
squid-announce@ircache.net: A receive-only list for
announcements of new versions.
Subscribe via
<it/squid-announce-request@ircache.net/.
<item>
<it/squid-bugs@ircache.net/:
A closed list for sending us bug reports.
Bug reports received here are given priority over
those mentioned on squid-users.
<item>
<it/squid@ircache.net/:
A closed list for sending us feed-back and ideas.
<item>
<it/squid-faq@ircache.net/:
A closed list for sending us feed-back, updates, and additions to
the Squid FAQ.
</itemize>
<P>
We also have a few other mailing lists which are not strictly
Squid-related.
<itemize>
<item>
<it/cache-snmp@ircache.net/:
A public list for discussion of Web Caching and SNMP issues and developments.
Eventually we hope to put forth a standard Web Caching MIB.
<item>
<it/icp-wg@ircache.net/:
Mostly-idle mailing list for the nonexistent ICP Working Group within
the IETF. It may be resurrected some day, you never know!
</itemize>
<sect1>I can't figure out how to unsubscribe from your mailing list.
<P>
All of our mailing lists have ``-request'' addresses that you must
use for subscribe and unsubscribe requests. To unsubscribe from
the squid-users list, you send a message to <em/squid-users-request@ircache.net/
and in the subject and/or body of your message, you put the magic word
``unsubscribe.''
<sect1>What Squid web pages are available?
<P>
Several Squid and Caching-related web pages are available:
<itemize>
<item>
<url url="http://www.squid-cache.org/" name="The Squid home page">
for information on the Squid software
<item>
<url url="http://www.ircache.net/Cache/" name="The IRCache Mesh">
gives information on our operational mesh of caches.
<item>
<url url="http://www.squid-cache.org/Doc/FAQ/" name="The Squid FAQ"> (uh, you're reading it).
<item>
<url url="http://cache.is.co.za" name="Oskar's Squid Users Guide">.
<item>
<url url="http://www.ircache.net/Cache/FAQ/" name="The Information Resource Caching FAQ">
<item>
<url url="http://www.squid-cache.org/Doc/Prog-Guide/prog-guide.html" name="Squid Programmers Guide">.
Yeah, its extremely incomplete. I assure you this is the most recent version.
<item>
<url url="http://www.ircache.net/Cache/reading.html" name="Web Caching Reading list">
<item><url url="/Versions/1.0/Release-Notes-1.0.txt" name="Squid-1.0 Release Notes">
<item><url url="/Versions/1.1/Release-Notes-1.1.txt" name="Squid-1.1 Release Notes">
<item><url url="http://www.squid-cache.org/Doc/Hierarchy-Tutorial/" name="Tutorial on Configuring Hierarchical Squid Caches">
<item><url url="http://info.internet.isi.edu/in-notes/rfc/files/rfc2186.txt" name="RFC 2186"> ICPv2 -- Protocol
<item><url url="http://info.internet.isi.edu/in-notes/rfc/files/rfc2187.txt" name="RFC 2187"> ICPv2 -- Application
<item><url url="http://info.internet.isi.edu/in-notes/rfc/files/rfc1016.txt" name="RFC 1016">
</itemize>
<sect1>Does Squid support SSL/HTTPS/TLS?
<P>
Squid supports these encrypted protocols by ``tunelling'' traffic between
clients and servers.
Squid can relay the encrypted bits between a client and a server.
<p>
Normally, when your browser comes across an <em/https/ URL, it
does one of two things:
<enum>
<item>The browser opens an SSL connection directly to the origin
server.
<item>The browser tunnels the request through Squid with the
<em/CONNECT/ request method.
</enum>
<p>
The <em/CONNECT/ method is a way to tunnel any kind of
connection through an HTTP proxy. The proxy doesn't
understand or interpret the contents. It just passes
bytes back and forth between the client and server.
For the gory details on tunnelling and the CONNECT
method, please see
<url url="ftp://ftp.isi.edu/in-notes/rfc2817.txt" name="RFC 2817">
and <url url="http://www.web-cache.com/Writings/Internet-Drafts/draft-luotonen-web-proxy-tunneling-01.txt"
name="Tunneling TCP based protocols through Web proxy servers"> (expired).
<p>
Squid can not (yet) encrypt or decrypt such connections, however.
Some folks are working on a patch, using OpenSSL, that allows Squid to do this.
<sect1>What's the legal status of Squid?
<P>
Squid is <url url="squid-copyright.txt" name="copyrighted">
by the University of California San Diego.
Squid uses some <url url="squid-credits.txt" name="code developed by others">.
<P>
Squid is
<url url="http://www.gnu.org/philosophy/free-sw.html"
name="Free Software">.
<P>
Squid is licensed under the terms of the
<url url="http://www.gnu.org/copyleft/gpl.html"
name="GNU General Public License">.
<sect1>Is Squid year-2000 compliant?
<P>
We think so. Squid uses the Unix time format for all internal time
representations. Potential problem areas are in printing and
parsing other time representations. We have made the following
fixes in to address the year 2000:
<itemize>
<item>
<em/cache.log</em> timestamps use 4-digit years instead of just 2 digits.
<item>
<em/parse_rfc1123()/ assumes years less than "70" are after 2000.
<item>
<em/parse_iso3307_time()/ checks all four year digits.
</itemize>
<P>
Year-2000 fixes were applied to the following Squid versions:
<itemize>
<item>
<url url="/Versions/v2/2.1/" name="squid-2.1">:
Year parsing bug fixed for dates in the "Wed Jun 9 01:29:59 1993 GMT"
format (Richard Kettlewell).
<item>
squid-1.1.22:
Fixed likely year-2000 bug in ftpget's timestamp parsing (Henrik Nordstrom).
<item>
squid-1.1.20:
Misc fixes (Arjan de Vet).
</itemize>
<P>Patches:
<itemize>
<item>
<url url="../Y2K/patch3" name="Richard's lib/rfc1123.c patch">.
If you are still running 1.1.X, then you should apply this patch to
your source and recompile.
<item>
<url url="../Y2K/patch2" name="Henrik's src/ftpget.c patch">.
<item>
<url url="../Y2K/patch1" name="Arjan's lib/rfc1123.c patch">.
</itemize>
<p>
Squid-2.2 and earlier versions have a <url
url="http://www.squid-cache.org/Versions/v2/2.2/bugs/index.html#squid-2.2.stable5-mkhttpdlogtime-end-of-year" name="New
Year bug">. This is not strictly a Year-2000 bug; it would happen on the first day of any year.
<sect1>Can I pay someone for Squid support?
<P>
Yep. Please see the <url url="/Support/services.html"
name="commercial support page">.
<sect1>Squid FAQ contributors
<P>
The following people have made contributions to this document:
<itemize>
<item>
<url url="mailto:JLarmour@origin-at.co.uk" name="Jonathan Larmour">
<item>
<url url="mailto:cord@Wunder-Nett.org" name="Cord Beermann">
<item>
<url url="mailto:tony@nlanr.net" name="Tony Sterrett">
<item>
<url url="mailto:ghynes@compusult.nf.ca" name="Gerard Hynes">
<item>
<url url="mailto:tkatayam@pi.titech.ac.jp" name="Katayama, Takeo">
<item>
<url url="mailto:wessels@ircache.net" name="Duane Wessels">
<item>
<url url="mailto:kc@caida.org" name="K Claffy">
<item>
<url url="mailto:pauls@etext.org" name="Paul Southworth">
<item>
<url url="mailto:oskar@is.co.za" name="Oskar Pearson">
<item>
<url url="mailto:ongbh@zpoprp.zpo.dec.com" name="Ong Beng Hui">
<item>
<url url="mailto:torsten.sturm@axis.de" name="Torsten Sturm">
<item>
<url url="mailto:jrg@blodwen.demon.co.uk" name="James R Grinter">
<item>
<url url="mailto:roever@nse.simac.nl" name="Rodney van den Oever">
<item>
<url url="mailto:bertold@tohotom.vein.hu" name="Kolics Bertold">
<item>
<url url="mailto:carson@cugc.org" name="Carson Gaspar">
<item>
<url url="mailto:michael@metal.iinet.net.au" name="Michael O'Reilly">
<item>
<url url="mailto:hclsmith@tallships.istar.ca" name="Hume Smith">
<item>
<url url="mailto:RichardA@noho.co.uk" name="Richard Ayres">
<item>
<url url="mailto:John.Saunders@scitec.com.au" name="John Saunders">
<item>
<url url="mailto:miquels@cistron.nl" name="Miquel van Smoorenburg">
<item>
<url url="mailto:david@avarice.nepean.uws.edu.au" name="David J N Begley">
<item>
<url url="mailto:SarKev@topnz.ac.nz" name="Kevin Sartorelli">
<item>
<url url="mailto:doering@usf.uni-kassel.de" name="Andreas Doering">
<item>
<url url="mailto:mark@cal026031.student.utwente.nl" name="Mark Visser">
<item>
<url url="mailto:tom@interact.net.au" name="tom minchin">
<item>
<url url="mailto:voeckler@rvs.uni-hannover.de" name="Jens-S. V&ouml;ckler">
<item>
<url url="mailto:andre.albsmeier@mchp.siemens.de" name="Andre Albsmeier">
<item>
<url url="mailto:nazard@man-assoc.on.ca" name="Doug Nazar">
<item>
<url url="mailto:hno@hem.passagen.se" name="Henrik Nordstrom">
<item>
<url url="mailto:mark@rts.com.au" name="Mark Reynolds">
<item>
<url url="mailto:Arjan.deVet@adv.IAEhv.nl" name="Arjan de Vet">
<item>
<url url="mailto:peter@spinner.dialix.com.au" name="Peter Wemm">
<item>
<url url="mailto:webadm@info.cam.ac.uk" name="John Line">
<item>
<url url="mailto:ARMISTEJ@oeca.otis.com" name="Jason Armistead">
<item>
<url url="mailto:cudch@csv.warwick.ac.uk" name="Chris Tilbury">
<item>
<url url="mailto:jeff@sisna.com" name="Jeff Madison">
<item>
<url url="mailto:mbatchelor@citysearch.com" name="Mike Batchelor">
<item>
<url url="mailto:bogstad@pobox.com" name="Bill Bogstad">
<item>
<url url="mailto:radu at netsoft dot ro" name="Radu Greab">
<item>
<url url="mailto:f.j.bosscha@nhl.nl" name="F.J. Bosscha">
<item>
<url url="mailto:signal@shreve.net" name="Brian Feeny">
<item>
<url url="mailto:Support@dnet.co.uk" name="Martin Lyons">
<item>
<url url="mailto:luyer@ucs.uwa.edu.au" name="David Luyer">
<item>
<url url="mailto:chris@senet.com.au" name="Chris Foote">
<item>
<url url="mailto:elkner@wotan.cs.Uni-Magdeburg.DE" name="Jens Elkner">
</itemize>
<P>
Please send corrections, updates, and comments to:
<url url="mailto:squid-faq@ircache.net"
name="squid-faq@ircache.net">.
<sect1>About This Document
<P>
This document is copyrighted (2000) by Duane Wessels.
<P>
This document was written in SGML and converted with the
<url url="http://www.sgmltools.org/"
name="SGML-Tools package">.
<sect2>Want to contribute? Please write in SGML...
<P>
It is easier for us if you send us text which is close to "correct" SGML.
The SQUID FAQ currently uses the LINUXDOC DTD. Its probably easiest
to follow examples in the this file.
Here are the basics:
<P>
Use the &lt;url&gt; tag for links, instead of HTML &lt;A HREF ...&gt;
<verb>
&lt;url url="http://www.squid-cache.org" name="Squid Home Page"&gt;
</verb>
<P>
Use &lt;em&gt; for emphasis, config options, and pathnames:
<verb>
&lt;em&gt;usr/local/squid/etc/squid.conf&lt;/em&gt;
&lt;em/cache_peer/
</verb>
<P>
Here is how you do lists:
<verb>
&lt;itemize&gt;
&lt;item&gt;foo
&lt;item&gt;bar
&lt;/itemize&gt;
</verb>
<P>
Use &lt;verb&gt;, just like HTML's &lt;PRE&gt; to show
unformatted text.
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Getting and Compiling Squid
<label id="compiling">
<sect1>Which file do I download to get Squid?
<P>
You must download a source archive file of the form
squid-x.y.z-src.tar.gz (eg, squid-1.1.6-src.tar.gz) from
<url url="http://www.squid-cache.org/"
name="the Squid home page">, or.
<url url="ftp://www.squid-cache.org/pub/"
name="the Squid FTP site">.
Context diffs are available for upgrading to new versions.
These can be applied with the <em/patch/ program (available from
<url url="ftp://prep.ai.mit.edu/pub/gnu/"
name="the GNU FTP site">).
<sect1>How do I compile Squid?
<P>
For <bf/Squid-1.0/ and <bf/Squid-1.1/ versions, you can just
type <em/make/ from the top-level directory after unpacking
the source files. For example:
<verb>
% tar xzf squid-1.1.21-src.tar.gz
% cd squid-1.1.21
% make
</verb>
<P>
For <bf/Squid-2/ you must run the <em/configure/ script yourself
before running <em/make/:
<verb>
% tar xzf squid-2.0.RELEASE-src.tar.gz
% cd squid-2.0.RELEASE
% ./configure
% make
</verb>
<sect1>What kind of compiler do I need?
<P>
To compile Squid, you will need an ANSI C compiler. Almost all
modern Unix systems come with pre-installed compilers which work
just fine. The old <em/SunOS/ compilers do not have support for ANSI
C, and the Sun compiler for <em/Solaris/ is a product which
must be purchased separately.
<P>
If you are uncertain about your system's C compiler, The GNU C compiler is
available at
<url url="ftp://prep.ai.mit.edu/pub/gnu/"
name="the GNU FTP site">.
In addition to gcc, you may also want or need to install the <em/binutils/
package.
<sect1>What else do I need to compile Squid?
<p>
You will need <url url="http://www.perl.com/" name="Perl"> installed
on your system.
<sect1>Do you have pre-compiled binaries available?
<!-- Binaries list replicated at /binaries.html -->
<P>
The developers do not have the resources to make pre-compiled
binaries available. Instead, we invest effort into making
the source code very portable. Some people have made
binary packages available. Please see our
<url url="http://www.squid-cache.org/platforms.html" name="Platforms Page">.
<p>
The <url url="http://freeware.sgi.com/" name="SGI Freeware"> site
has pre-compiled packages for SGI IRIX.
<p>
Squid binaries for
<url url="http://www.freebsd.org/cgi/ports.cgi?query=squid-2&amp;stype=all"
name="FreeBSD on Alpha and Intel">.
<p>
Squid binaries for
<url url="ftp://ftp.netbsd.org/pub/NetBSD/packages/pkgsrc/www/squid/README.html"
name="NetBSD on everything">
<sect1>How do I apply a patch or a diff?
<P>
You need the <tt/patch/ program. You should probably duplicate the
entire directory structure before applying the patch. For example, if
you are upgrading from squid-1.1.10 to 1.1.11, you would run
these commands:
<verb>
cd squid-1.1.10
mkdir ../squid-1.1.11
find . -depth -print | cpio -pdv ../squid-1.1.11
cd ../squid-1.1.11
patch < /tmp/diff-1.1.10-1.1.11
</verb>
After the patch has been applied, you must rebuild Squid from the
very beginning, i.e.:
<verb>
make realclean
./configure
make
make install
</verb>
Note, In later distributions (Squid 2), 'realclean' has been changed
to 'distclean'.
<P>
If patch keeps asking for a file name, try adding ``-p0'':
<verb>
patch -p0 < filename
</verb>
<P>
If your <tt/patch/ program seems to complain or refuses to work,
you should get a more recent version, from the
<url url="ftp://ftp.gnu.ai.mit.edu/pub/gnu/"
name="GNU FTP site">, for example.
<sect1><em/configure/ options
<P>
The configure script can take numerous options. The most
useful is <tt/--prefix/ to install it in a different directory.
The default installation directory is <em>/usr/local/squid/</em>. To
change the default, you could do:
<verb>
% cd squid-x.y.z
% ./configure --prefix=/some/other/directory/squid
</verb>
<P>
Type
<verb>
% ./configure --help
</verb>
to see all available options. You will need to specify some
of these options to enable or disable certain features.
Some options which are used often include:
<verb>
--prefix=PREFIX install architecture-independent files in PREFIX
[/usr/local/squid]
--enable-dlmalloc[=LIB] Compile & use the malloc package by Doug Lea
--enable-gnuregex Compile GNUregex
--enable-splaytree Use SPLAY trees to store ACL lists
--enable-xmalloc-debug Do some simple malloc debugging
--enable-xmalloc-debug-trace
Detailed trace of memory allocations
--enable-xmalloc-statistics
Show malloc statistics in status page
--enable-carp Enable CARP support
--enable-async-io Do ASYNC disk I/O using threads
--enable-icmp Enable ICMP pinging
--enable-delay-pools Enable delay pools to limit bandwith usage
--enable-mem-gen-trace Do trace of memory stuff
--enable-useragent-log Enable logging of User-Agent header
--enable-kill-parent-hack
Kill parent on shutdown
--enable-snmp Enable SNMP monitoring
--enable-time-hack Update internal timestamp only once per second
--enable-cachemgr-hostname[=hostname]
Make cachemgr.cgi default to this host
--enable-arp-acl Enable use of ARP ACL lists (ether address)
--enable-htpc Enable HTCP protocol
--enable-forw-via-db Enable Forw/Via database
--enable-cache-digests Use Cache Digests
see http://www.squid-cache.org/Doc/FAQ/FAQ-16.html
--enable-err-language=lang
Select language for Error pages (see errors dir)
</verb>
<sect1>undefined reference to __inet_ntoa
<P>
by <url url="mailto:SarKev@topnz.ac.nz" name="Kevin Sartorelli">
and <url url="mailto:doering@usf.uni-kassel.de" name="Andreas Doering">.
<P>
Probably you've recently installed bind 8.x. There is a mismatch between
the header files and DNS library that Squid has found. There are a couple
of things you can try.
<P>
First, try adding <tt/-lbind/ to <em/XTRA_LIBS/ in <em>src/Makefile</em>.
If <tt/-lresolv/ is already there, remove it.
<P>
If that doesn't seem to work, edit your <em>arpa/inet.h</em> file and comment out the following:
<verb>
#define inet_addr __inet_addr
#define inet_aton __inet_aton
#define inet_lnaof __inet_lnaof
#define inet_makeaddr __inet_makeaddr
#define inet_neta __inet_neta
#define inet_netof __inet_netof
#define inet_network __inet_network
#define inet_net_ntop __inet_net_ntop
#define inet_net_pton __inet_net_pton
#define inet_ntoa __inet_ntoa
#define inet_pton __inet_pton
#define inet_ntop __inet_ntop
#define inet_nsap_addr __inet_nsap_addr
#define inet_nsap_ntoa __inet_nsap_ntoa
</verb>
<sect1>How can I get true DNS TTL info into Squid's IP cache?
<label id="dns-ttl-hack">
<P>
If you have source for BIND, you can modify it as indicated in the diff
below. It causes the global variable _dns_ttl_ to be set with the TTL
of the most recent lookup. Then, when you compile Squid, the configure
script will look for the _dns_ttl_ symbol in libresolv.a. If found,
dnsserver will return the TTL value for every lookup.
<P>
This hack was contributed by
<url url="mailto:bne@CareNet.hu" name="Endre Balint Nagy">.
<verb>
diff -ru bind-4.9.4-orig/res/gethnamaddr.c bind-4.9.4/res/gethnamaddr.c
--- bind-4.9.4-orig/res/gethnamaddr.c Mon Aug 5 02:31:35 1996
+++ bind-4.9.4/res/gethnamaddr.c Tue Aug 27 15:33:11 1996
@@ -133,6 +133,7 @@
} align;
extern int h_errno;
+int _dns_ttl_;
#ifdef DEBUG
static void
@@ -223,6 +224,7 @@
host.h_addr_list = h_addr_ptrs;
haveanswer = 0;
had_error = 0;
+ _dns_ttl_ = -1;
while (ancount-- > 0 && cp < eom && !had_error) {
n = dn_expand(answer->buf, eom, cp, bp, buflen);
if ((n < 0) || !(*name_ok)(bp)) {
@@ -232,8 +234,11 @@
cp += n; /* name */
type = _getshort(cp);
cp += INT16SZ; /* type */
- class = _getshort(cp);
- cp += INT16SZ + INT32SZ; /* class, TTL */
+ class = _getshort(cp);
+ cp += INT16SZ; /* class */
+ if (qtype == T_A && type == T_A)
+ _dns_ttl_ = _getlong(cp);
+ cp += INT32SZ; /* TTL */
n = _getshort(cp);
cp += INT16SZ; /* len */
if (class != C_IN) {
</verb>
<P>
And here is a patch for BIND-8:
<verb>
*** src/lib/irs/dns_ho.c.orig Tue May 26 21:55:51 1998
--- src/lib/irs/dns_ho.c Tue May 26 21:59:57 1998
***************
*** 87,92 ****
--- 87,93 ----
#endif
extern int h_errno;
+ int _dns_ttl_;
/* Definitions. */
***************
*** 395,400 ****
--- 396,402 ----
pvt->host.h_addr_list = pvt->h_addr_ptrs;
haveanswer = 0;
had_error = 0;
+ _dns_ttl_ = -1;
while (ancount-- > 0 && cp < eom && !had_error) {
n = dn_expand(ansbuf, eom, cp, bp, buflen);
if ((n < 0) || !(*name_ok)(bp)) {
***************
*** 404,411 ****
cp += n; /* name */
type = ns_get16(cp);
cp += INT16SZ; /* type */
! class = ns_get16(cp);
! cp += INT16SZ + INT32SZ; /* class, TTL */
n = ns_get16(cp);
cp += INT16SZ; /* len */
if (class != C_IN) {
--- 406,416 ----
cp += n; /* name */
type = ns_get16(cp);
cp += INT16SZ; /* type */
! class = _getshort(cp);
! cp += INT16SZ; /* class */
! if (qtype == T_A && type == T_A)
! _dns_ttl_ = _getlong(cp);
! cp += INT32SZ; /* TTL */
n = ns_get16(cp);
cp += INT16SZ; /* len */
if (class != C_IN) {
</verb>
<sect1>My platform is BSD/OS or BSDI and I can't compile Squid
<label id="bsdi-compile">
<P>
<verb>
cache_cf.c: In function `parseConfigFile':
cache_cf.c:1353: yacc stack overflow before `token'
...
</verb>
<P>
You may need to upgrade your gcc installation to a more recent version.
Check your gcc version with
<verb>
gcc -v
</verb>
If it is earlier than 2.7.2, you might consider upgrading.
<P>
Alternatively, you can get pre-compiled Squid binaries for BSD/OS 2.1 at
the <url url="ftp://ftp.bsdi.com/patches/patches-2.1" name="BSD patches FTP site">,
patch <url url="ftp://ftp.bsdi.com/patches/patches-2.1/U210-019" name="U210-019">.
<sect1>Problems compiling <em/libmiscutil.a/ on Solaris
<P>
The following error occurs on Solaris systems using gcc when the Solaris C
compiler is not installed:
<verb>
/usr/bin/rm -f libmiscutil.a
/usr/bin/false r libmiscutil.a rfc1123.o rfc1738.o util.o ...
make[1]: *** [libmiscutil.a] Error 255
make[1]: Leaving directory `/tmp/squid-1.1.11/lib'
make: *** [all] Error 1
</verb>
Note on the second line the <bf>/usr/bin/false</bf>. This is supposed
to be a path to the <em/ar/ program. If <em/configure/ cannot find <em/ar/
on your system, then it substitues <em/false/.
<P>
To fix this you either need to:
<itemize>
<item>
Add <em>/usr/ccs/bin</em> to your PATH. This is where the <em/ar/
command should be. You need to install SUNWbtool if <em/ar/
is not there. Otherwise,
<item>
Install the <bf/binutils/ package from
<url url="ftp://prep.ai.mit.edu/pub/gnu/" name="the GNU FTP site">.
This package includes programs such as <em/ar/, <em/as/, and <em/ld/.
</itemize>
<sect1>I have problems compiling Squid on Platform Foo.
<P>
Please check the
<url url="/platforms.html" name="page of platforms">
on which Squid is known to compile. Your problem might be listed
there together with a solution. If it isn't listed there, mail
us what you are trying, your Squid version, and the problems
you encounter.
<sect1>I see a lot warnings while compiling Squid.
<P>
Warnings are usually not a big concern, and can be common with software
designed to operate on multiple platforms. If you feel like fixing
compile-time warnings, please do so and send us the patches.
<sect1>Building Squid on OS/2
<label id="building-os2">
<P>
by <url url="mailto:nazard@man-assoc.on.ca" name="Doug Nazar">
<P>
In order in compile squid, you need to have a reasonable facsimile of a
Unix system installed. This includes <em/bash/, <em/make/, <em/sed/,
<em/emx/, various file utilities and a few more. I've setup a TVFS
drive that matches a Unix file system but this probably isn't strictly
necessary.
<P>
I made a few modifications to the pristine EMX 0.9d install.
<enum>
<item>
added defines for <em/strcasecmp()/ &amp; <em/strncasecmp()/ to <em/string.h/
<item>
changed all occurrences of time_t to signed long instead
of unsigned long
<item>
hacked ld.exe
<enum>
<item>
to search for both xxxx.a and libxxxx.a
<item>
to produce the correct filename when using the
-Zexe option
</enum>
</enum>
<P>
You will need to run <em>scripts/convert.configure.to.os2</em> (in the
Squid source distribution) to modify
the configure script so that it can search for the various programs.
<P>
Next, you need to set a few environment variables (see EMX docs
for meaning):
<verb>
export EMXOPT="-h256 -c"
export LDFLAGS="-Zexe -Zbin -s"
</verb>
<P>
Now you are ready to configure squid:
<verb>
./configure
</verb>
<P>
Compile everything:
<verb>
make
</verb>
<P>
and finally, install:
<verb>
make install
</verb>
<P>
This will by default, install into <em>/usr/local/squid</em>. If you wish
to install somewhere else, see the <em/--prefix/ option for configure.
<P>
Now, don't forget to set EMXOPT before running squid each time. I
recommend using the -Y and -N options.
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Installing and Running Squid
<sect1>How big of a system do I need to run Squid?
<P>
There are no hard-and-fast rules. The most important resource
for Squid is physical memory. Your processor does not need
to be ultra-fast. Your disk system will be the major bottleneck,
so fast disks are important for high-volume caches. Do not use
IDE disks if you can help it.
<P>
In late 1998, if you are buying a new machine for
a cache, I would recommend the following configuration:
<itemize>
<item>300 MHz Pentium II CPU
<item>512 MB RAM
<item>Five 9 GB UW-SCSI disks
</itemize>
Your system disk, and logfile disk can probably be IDE without losing
any cache performance.
<P>
Also, see <url url="http://wwwcache.ja.net/servers/squids.html"
name="Squid Sizing for Intel Platforms"> by Martin Hamilton This is a
very nice page summarizing system configurations people are using for
large Squid caches.
<sect1>How do I install Squid?
<P>
After <ref id="compiling" name="compiling Squid">, you can install it
with this simple command:
<verb>
% make install
</verb>
If you have enabled the
<ref id="using-icmp" name="ICMP features">
then you will also want to type
<verb>
% su
# make install-pinger
</verb>
<P>
After installing, you will want to edit and customize
the <em/squid.conf/ file. By default, this file is
located at <em>/usr/local/squid/etc/squid.conf</em>.
<P>
Also, a QUICKSTART guide has been included with the source
distribution. Please see the directory where you
unpacked the source archive.
<sect1>What does the <em/squid.conf/ file do?
<P>
The <em/squid.conf/ file defines the configuration for
<em/squid/. the configuration includes (but not limited to)
HTTP port number, the ICP request port number, incoming and outgoing
requests, information about firewall access, and various timeout
information.
<sect1>Do you have a <em/squid.conf/ example?
<P>
Yes, after you <tt/make install/, a sample <em/squid.conf/ file will
exist in the ``etc" directory under the Squid installation directory.
The sample <em/squid.conf/ file contains comments explaining each
option.
<P>
<sect1>How do I start Squid?
<P>
After you've finished editing the configuration file, you can
start Squid for the first time. The procedure depends a little
bit on which version you are using.
<sect2>Squid version 2.X
<p>
First, you must create the swap directories. Do this by
running Squid with the -z option:
<verb>
% /usr/local/squid/bin/squid -z
</verb>
Once that completes, you can start Squid and try it out.
Probably the best thing to do is run it from your terminal
and watch the debugging output. Use this command:
<verb>
% /usr/local/squid/bin/squid -NCd1
</verb>
If everything is working okay, you will see the line:
<verb>
Ready to serve requests.
</verb>
If you want to run squid in the background, as a daemon process,
just leave off all options:
<verb>
% /usr/local/squid/bin/squid
</verb>
<p>
NOTE: depending on your configuration, you may need to start
squid as root.
<sect2>Squid version 1.1.X
<P>
With version 1.1.16 and later, you must first run Squid with the
<bf/-z/ option to create the cache swap directories.
<verb>
% /usr/local/squid/bin/squid -z
</verb>
Squid will exit when it finishes creating all of the directories.
Next you can start <em/RunCache/:
<verb>
% /usr/local/squid/bin/RunCache &
</verb>
<P>
For versions before 1.1.6 you should just start <em/RunCache/
immediately, instead of running <em/squid -z/ first.
<sect1>How do I start Squid automatically when the system boots?
<sect2>Squid Version 2.X
<P>
Squid-2 has a restart feature built in. This greatly simplifies
starting Squid and means that you don't need to use <em/RunCache/
or <em/inittab/. At the minimum, you only need to enter the
pathname to the Squid executable. For example:
<verb>
/usr/local/squid/bin/squid
</verb>
<P>
Squid will automatically background itself and then spawn
a child process. In your <em/syslog/ messages file, you
should see something like this:
<verb>
Sep 23 23:55:58 kitty squid[14616]: Squid Parent: child process 14617 started
</verb>
That means that process ID 14563 is the parent process which monitors the child
process (pid 14617). The child process is the one that does all of the
work. The parent process just waits for the child process to exit. If the
child process exits unexpectedly, the parent will automatically start another
child process. In that case, <em/syslog/ shows:
<verb>
Sep 23 23:56:02 kitty squid[14616]: Squid Parent: child process 14617 exited with status 1
Sep 23 23:56:05 kitty squid[14616]: Squid Parent: child process 14619 started
</verb>
<p>
If there is some problem, and Squid can not start, the parent process will give up
after a while. Your <em/syslog/ will show:
<verb>
Sep 23 23:56:12 kitty squid[14616]: Exiting due to repeated, frequent failures
</verb>
When this happens you should check your <em/syslog/ messages and
<em/cache.log/ file for error messages.
<p>
When you look at a process (<em/ps/ command) listing, you'll see two squid processes:
<verb>
24353 ?? Ss 0:00.00 /usr/local/squid/bin/squid
24354 ?? R 0:03.39 (squid) (squid)
</verb>
The first is the parent process, and the child process is the one called ``(squid)''.
Note that if you accidentally kill the parent process, the child process will not
notice.
<p>
If you want to run Squid from your termainal and prevent it from
backgrounding and spawning a child process, use the <em/-N/ command
line option.
<verb>
/usr/local/squid/bin/squid -N
</verb>
<sect2>Squid Version 1.1.X
<sect3>From inittab
<P>
On systems which have an <em>/etc/inittab</em> file (Digital Unix,
Solaris, IRIX, HP-UX, Linux), you can add a line like this:
<verb>
sq:3:respawn:/usr/local/squid/bin/squid.sh < /dev/null >> /tmp/squid.log 2>&1
</verb>
We recommend using a <em/squid.sh/ shell script, but you could instead call
Squid directly. A sameple <em/squid.sh/ script is shown below:
<verb>
#!/bin/sh
C=/usr/local/squid
PATH=/usr/bin:$C/bin
TZ=PST8PDT
export PATH TZ
notify="root"
cd $C
umask 022
sleep 10
while [ -f /tmp/nosquid ]; do
sleep 1
done
/usr/bin/tail -20 $C/logs/cache.log \
| Mail -s "Squid restart on `hostname` at `date`" $notify
exec bin/squid -CYs
</verb>
<sect3>From rc.local
<P>
On BSD-ish systems, you will need to start Squid from the ``rc'' files,
usually <em>/etc/rc.local</em>. For example:
<verb>
if [ -f /usr/local/squid/bin/RunCache ]; then
echo -n ' Squid'
(/usr/local/squid/bin/RunCache &)
fi
</verb>
<sect3>From init.d
<P>
Some people may want to use the ``init.d'' startup system.
If you start Squid (or RunCache) from an ``init.d'' script, then you
should probably use <em/nohup/, e.g.:
<verb>
nohup squid -sY $conf >> $logdir/squid.out 2>&1
</verb>
Also, you may need to add a line to trap certain signals
and prevent them from being sent to the Squid process.
Add this line at the top of your script:
<verb>
trap '' 1 2 3 18
</verb>
<sect1>How do I tell if Squid is running?
<P>
You can use the <em/client/ program:
<verb>
% client http://www.netscape.com/ > test
</verb>
<P>
There are other command-line HTTP client programs available
as well. Two that you may find useful are
<url url="ftp://gnjilux.cc.fer.hr/pub/unix/util/wget/"
name="wget">
and
<url url="ftp://ftp.internatif.org/pub/unix/echoping/"
name="echoping">.
<P>
Another way is to use Squid itself to see if it can signal a running
Squid process:
<verb>
% squid -k check
</verb>
And then check the shell's exit status variable.
<P>
Also, check the log files, most importantly the <em/access.log/ and
<em/cache.log/ files.
<sect1><em/squid/ command line options
<P>
These are the command line options for <bf/Squid-2/:
<descrip>
<tag/-a/
Specify an alternate port number for incoming HTTP requests.
Useful for testing a configuration file on a non-standard port.
<tag/-d/
Debugging level for ``stderr'' messages. If you use this
option, then debugging messages up to the specified level will
also be written to stderr.
<tag/-f/
Specify an alternate <em/squid.conf/ file instead of the
pathname compiled into the executable.
<tag/-h/
Prints the usage and help message.
<tag/-k reconfigure/
Sends a <em/HUP/ signal, which causes Squid to re-read
its configuration files.
<tag/-k rotate/
Sends an <em/USR1/ signal, which causes Squid to
rotate its log files. Note, if <em/logfile_rotate/
is set to zero, Squid still closes and re-opens
all log files.
<tag/-k shutdown/
Sends a <em/TERM/ signal, which causes Squid to
wait briefly for current connections to finish and then
exit. The amount of time to wait is specified with
<em/shutdown_lifetime/.
<tag/-k interrupt/
Sends an <em/INT/ signal, which causes Squid to
shutdown immediately, without waiting for
current connections.
<tag/-k kill/
Sends a <em/KILL/ signal, which causes the Squid
process to exit immediately, without closing
any connections or log files. Use this only
as a last resort.
<tag/-k debug/
Sends an <em/USR2/ signal, which causes Squid
to generate full debugging messages until the
next <em/USR2/ signal is recieved. Obviously
very useful for debugging problems.
<tag/-k check/
Sends a ``<em/ZERO/'' signal to the Squid process.
This simply checks whether or not the process
is actually running.
<tag/-s/
Send debugging (level 0 only) message to syslog.
<tag/-u/
Specify an alternate port number for ICP messages.
Useful for testing a configuration file on a non-standard port.
<tag/-v/
Prints the Squid version.
<tag/-z/
Creates disk swap directories. You must use this option when
installing Squid for the first time, or when you add or
modify the <em/cache_dir/ configuration.
<tag/-D/
Do not make initial DNS tests. Normally, Squid looks up
some well-known DNS hostnames to ensure that your DNS
name resolution service is working properly.
<tag/-F/
If the <em/swap.state/ logs are clean, then the cache is
rebuilt in the ``foreground'' before any requests are
served. This will decrease the time required to rebuild
the cache, but HTTP requests will not be satisified during
this time.
<tag/-N/
Do not automatically become a background daemon process.
<tag/-R/
Do not set the SO_REUSEADDR option on sockets.
<tag/-V/
Enable virtual host support for the httpd-accelerator mode.
This is identical to writing <em/httpd_accel_host virtual/
in the config file.
<tag/-X/
Enable full debugging while parsing the config file.
<tag/-Y/
Return ICP_OP_MISS_NOFETCH instead of ICP_OP_MISS while
the <em/swap.state/ file is being read. If your cache has
mostly child caches which use ICP, this will allow your
cache to rebuild faster.
</descrip>
<sect1>How do I see how Squid works?
<P>
<itemize>
<item>
Check the <em/cache.log/ file in your logs directory. It logs
interesting (and boring) things as a part of its normal operation.
<item>
Install and use the
<ref id="cachemgr-section" name="Cache Manager">.
</itemize>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Configuration issues
<sect1>How do I join a cache hierarchy?
<P>
To place your cache in a hierarchy, use the <tt/cache_host/
directive in <em/squid.conf/ to specify the parent and sibling
nodes.
<P>
For example, the following <em/squid.conf/ file on
<tt/childcache.example.com/ configures its cache to retrieve
data from one parent cache and two sibling caches:
<verb>
# squid.conf - On the host: childcache.example.com
#
# Format is: hostname type http_port udp_port
#
cache_host parentcache.example.com parent 3128 3130
cache_host childcache2.example.com sibling 3128 3130
cache_host childcache3.example.com sibling 3128 3130
</verb>
The <tt/cache_host_domain/ directive allows you to specify that
certain caches siblings or parents for certain domains:
<verb>
# squid.conf - On the host: sv.cache.nlanr.net
#
# Format is: hostname type http_port udp_port
#
cache_host electraglide.geog.unsw.edu.au parent 3128 3130
cache_host cache1.nzgate.net.nz parent 3128 3130
cache_host pb.cache.nlanr.net parent 3128 3130
cache_host it.cache.nlanr.net parent 3128 3130
cache_host sd.cache.nlanr.net parent 3128 3130
cache_host uc.cache.nlanr.net sibling 3128 3130
cache_host bo.cache.nlanr.net sibling 3128 3130
cache_host_domain electraglide.geog.unsw.edu.au .au
cache_host_domain cache1.nzgate.net.nz .au .aq .fj .nz
cache_host_domain pb.cache.nlanr.net .uk .de .fr .no .se .it
cache_host_domain it.cache.nlanr.net .uk .de .fr .no .se .it
cache_host_domain sd.cache.nlanr.net .mx .za .mu .zm
</verb>
The configuration above indicates that the cache will use
<tt/pb.cache.nlanr.net/ and <tt/it.cache.nlanr.net/
for domains uk, de, fr, no, se and it, <tt/sd.cache.nlanr.net/
for domains mx, za, mu and zm, and <tt/cache1.nzgate.net.nz/
for domains au, aq, fj, and nz.
<sect1>How do I join NLANR's cache hierarchy?
<P>
We have a simple set of
<url url="http://www.ircache.net/Cache/joining.html"
name="guidelines for joining">
the NLANR cache hierarchy.
<sect1>Why should I want to join NLANR's cache hierarchy?
<P>
The NLANR hierarchy can provide you with an initial source for parent or
sibling caches. Joining the NLANR global cache system will frequently
improve the performance of your caching service.
<sect1>How do I register my cache with NLANR's registration service?
<P>
Just enable these options in your <em/squid.conf/ and you'll be
registered:
<verb>
cache_announce 24
announce_to sd.cache.nlanr.net:3131
</verb>
<em/NOTE:/ announcing your cache <bf/is not/ the same thing as
joining the NLANR cache hierarchy.
You can join the NLANR cache hierarchy without registering, and
you can register without joining the NLANR cache hierarchy.
<P>
<sect1>How do I find other caches close to me and arrange parent/child/sibling relationships with them?
<P>
Visit the NLANR cache
<url url="http://www.ircache.net/Cache/Tracker/"
name="registration database">
to discover other caches near you. Keep in mind that just because
a cache is registered in the database <bf/does not/ mean they
are willing to be your parent/sibling/child. But it can't hurt to ask...
<P>
<sect1>My cache registration is not appearing in the Tracker database.
<P>
<itemize>
<item>
Your site will not be listed if your cache IP address does not have
a DNS PTR record. If we can't map the IP address back to a domain
name, it will be listed as ``Unknown.''
<item>
The registration messages are sent with UDP. We may not be receiving
your announcement message due to firewalls which block UDP, or
dropped packets due to congestion.
</itemize>
<sect1>What is the httpd-accelerator mode?
<P>
This entry has been moved to <ref id="what-is-httpd-accelerator" name="a different section">.
<sect1>How do I configure Squid to work behind a firewall?
<p>
<em>Note: The information here is current for version 2.2.</em>
<P>
If you are behind a firewall then you can't make direct connections
to the outside world, so you <bf/must/ use a
parent cache. Squid doesn't use ICP queries for a request if it's
behind a firewall or if there is only one parent.
<P>
You can use the <tt/never_direct/ access list in
<em/squid.conf/ to specify which requests must be forwarded to
your parent cache outside the firewall. For example, if Squid
can connect directly to all servers that end with <em/mydomain.com/, but
must use the parent for all others, you would write:
<verb>
acl INSIDE dstdomain mydomain.com
never_direct deny INSIDE
</verb>
Note that the outside domains will not match the <em/INSIDE/
acl. When there are no matches, the default action is
the opposite of the last action. Its as if there is
an implicit <em/never_direct allow all/ as the final rule.
<p>
You could also specify internal servers by IP address
<verb>
acl INSIDE_IP dst 1.2.3.4/24
never_direct deny INSIDE
</verb>
Note, however that when you use IP addresses, Squid must
perform a DNS lookup to convert URL hostnames to an
address. Your internal DNS servers may not be able to
lookup external domains.
<p>
If you use <em/never_direct/ and you have multiple parent caches,
then you probably will want to mark one of them as a default
choice in case Squid can't decide which one to use. That is
done with the <em/default/ keyword on a <em/cache_peer/
line. For example:
<verb>
cache_peer xyz.mydomain.com parent 3128 0 default
</verb>
<sect1>How do I configure Squid forward all requests to another proxy?
<p>
<em>Note: The information here is current for version 2.2.</em>
<p>
First, you need to give Squid a parent cache. Second, you need
to tell Squid it can not connect directly to origin servers. This is done
with three configuration file lines:
<verb>
cache_peer parentcache.foo.com parent 3128 0 no-query default
acl all src 0.0.0.0/0.0.0.0
never_direct allow all
</verb>
Note, with this configuration, if the parent cache fails or becomes
unreachable, then every request will result in an error message.
<p>
In case you want to be able to use direct connections when all the
parents go down you should use a different approach:
<verb>
cache_peer parentcache.foo.com parent 3128 0 no-query
prefer_direct off
</verb>
The default behaviour of Squid in the absence of positive ICP, HTCP, etc
replies is to connect to the origin server instead of using parents.
The <em>prefer_direct off</em> directive tells Squid to try parents first.
<sect1>I have <em/dnsserver/ processes that aren't being used, should I lower the number in <em/squid.conf/?
<P>
The <em/dnsserver/ processes are used by <em/squid/ because the <tt/gethostbyname(3)/ library routines used to
convert web sites names to their internet addresses
blocks until the function returns (i.e., the process that calls
it has to wait for a reply). Since there is only one <em/squid/
process, everyone who uses the cache would have to wait each
time the routine was called. This is why the <em/dnsserver/ is
a separate process, so that these processes can block,
without causing blocking in <em/squid/.
<P>
It's very important that there are enough <em/dnsserver/
processes to cope with every access you will need, otherwise
<em/squid/ will stop occasionally. A good rule of thumb is to
make sure you have at least the maximum number of dnsservers
<em/squid/ has <bf/ever/ needed on your system,
and probably add two to be on the safe side. In other words, if
you have only ever seen at most three <em/dnsserver/ processes
in use, make at least five. Remember that a <em/dnsserver/ is
small and, if unused, will be swapped out.
<sect1>My <em/dnsserver/ average/median service time seems high, how can I reduce it?
<P>
First, find out if you have enough <em/dnsserver/ processes running by
looking at the Cachemanager <em/dns/ output. Ideally, you should see
that the first <em/dnsserver/ handles a lot of requests, the second one
less than the first, etc. The last <em/dnsserver/ should have serviced
relatively few requests. If there is not an obvious decreasing trend, then
you need to increase the number of <em/dns_children/ in the configuration
file. If the last <em/dnsserver/ has zero requests, then you definately
have enough.
<P>
Another factor which affects the <em/dnsserver/ service time is the
proximity of your DNS resolver. Normally we do not recommend running
Squid and <em/named/ on the same host. Instead you should try use a
DNS resolver (<em/named/) on a different host, but on the same LAN.
If your DNS traffic must pass through one or more routers, this could
be causing unnecessary delays.
<sect1>How can I easily change the default HTTP port?
<P>
Before you run the configure script, simply set the <em/CACHE_HTTP_PORT/
environment variable.
<verb>
setenv CACHE_HTTP_PORT 8080
./configure
make
make install
</verb>
<sect1>Is it possible to control how big each <em/cache_dir/ is?
<P>
With Squid-1.1 it is NOT possible. Each <em/cache_dir/ is assumed
to be the same size. The <em/cache_swap/ setting defines the size of
all <em/cache_dir/'s taken together. If you have N <em/cache_dir/'s
then each one will hold <em/cache_swap/ &divide; N Megabytes.
<sect1>What <em/cache_dir/ size should I use?
<p>
Most people have a disk partition dedicated to the Squid cache.
You don't want to use the entire partition size. You have to leave
some extra room. Currently, Squid is not very tolerant of running
out of disk space.
<p>
Lets say you have a 9GB disk.
Remember that disk manufacturers lie about the space available.
A so-called 9GB disk usually results in about 8.5GB of raw, usable space.
First, put a filesystem on it, and mount
it. Then check the ``available space'' with your <em/df/ program.
Note that you lose some disk space to filesystem overheads, like superblocks,
inodes, and directory entries. Also note that Unix normally keeps
10% free for itself. So with a 9GB disk, you're probably down to
about 8GB after formatting.
<p>
Next, I suggest taking off another 10%
or so for Squid overheads, and a "safe buffer." Squid normally puts
its <em/swap.state/ files in each cache directory. These grow in size
until you rotate the logs, or restart squid.
Also note that Squid performs better when there is
more free space. So if performance is important to you, then take off
even more space. Typically, for a 9GB disk, I recommend a <em/cache_dir/
setting of 6000 to 7500 Megabytes:
<verb>
cache_dir ... 7000 16 256
</verb>
<p>
Its better to start out conservative. After the cache becomes full,
look at the disk usage. If you think there is plenty of unused space,
then increase the <em/cache_dir/ setting a little.
<p>
If you're getting ``disk full'' write errors, then you definately need
to decrease your cache size.
<sect1>I'm adding a new <em/cache_dir/. Will I lose my cache?
<P>
With Squid-1.1, yes, you will lose your cache. This is because
version 1.1 uses a simplistic algorithm to distribute files
between cache directories.
<P>
With Squid-2, you will not lose your existing cache.
You can add and delete <em/cache_dir/'s without affecting
any of the others.
<sect1>Squid and <em/http-gw/ from the TIS toolkit.
<P>
Several people on both the <em/fwtk-users/ and the
<em/squid-users/ mailing asked
about using Squid in combination with http-gw from the
<url url="http://www.tis.com/"
name="TIS toolkit">.
The most elegant way in my opinion is to run an internal Squid caching
proxyserver which handles client requests and let this server forward
it's requests to the http-gw running on the firewall. Cache hits won't
need to be handled by the firewall.
<P>
In this example Squid runs on the same server as the http-gw, Squid uses
8000 and http-gw uses 8080 (web). The local domain is <em/home.nl/.
<sect2>Firewall configuration:
<P>
Either run http-gw as a daemon from the <em>/etc/rc.d/rc.local</em> (Linux
Slackware):
<verb>
exec /usr/local/fwtk/http-gw -daemon 8080
</verb>
or run it from inetd like this:
<verb>
web stream tcp nowait.100 root /usr/local/fwtk/http-gw http-gw
</verb>
I increased the watermark to 100 because a lot of people run into
problems with the default value.
<P>
Make sure you have at least the following line in
<em>/usr/local/etc/netperm-table</em>:
<verb>
http-gw: hosts 127.0.0.1
</verb>
You could add the IP-address of your own workstation to this rule and
make sure the http-gw by itself works, like:
<verb>
http-gw: hosts 127.0.0.1 10.0.0.1
</verb>
<sect2>Squid configuration:
<P>
The following settings are important:
<verb>
http_port 8000
icp_port 0
cache_host localhost.home.nl parent 8080 0 default
acl HOME dstdomain .home.nl
never_direct deny HOME
</verb>
This tells Squid to use the parent for all domains other than <em/home.nl/.
Below, <em/access.log/ entries show what happens if you do a reload on the
Squid-homepage:
<verb>
872739961.631 1566 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/ - DEFAULT_PARENT/localhost.home.nl -
872739962.976 1266 10.0.0.21 TCP_CLIENT_REFRESH/304 88 GET http://www.nlanr.net/Images/cache_now.gif - DEFAULT_PARENT/localhost.home.nl -
872739963.007 1299 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/Icons/squidnow.gif - DEFAULT_PARENT/localhost.home.nl -
872739963.061 1354 10.0.0.21 TCP_CLIENT_REFRESH/304 83 GET http://www.squid-cache.org/Icons/Squidlogo2.gif - DEFAULT_PARENT/localhost.home.nl
</verb>
<P>
http-gw entries in syslog:
<verb>
Aug 28 02:46:00 memo http-gw[2052]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:00 memo http-gw[2052]: log host=localhost/127.0.0.1 protocol=HTTP cmd=dir dest=www.squid-cache.org path=/
Aug 28 02:46:01 memo http-gw[2052]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1
Aug 28 02:46:01 memo http-gw[2053]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2053]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/Squidlogo2.gif
Aug 28 02:46:01 memo http-gw[2054]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2054]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/squidnow.gif
Aug 28 02:46:01 memo http-gw[2055]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2055]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.nlanr.net path=/Images/cache_now.gif
Aug 28 02:46:02 memo http-gw[2055]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1
Aug 28 02:46:03 memo http-gw[2053]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=2
Aug 28 02:46:04 memo http-gw[2054]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=3
</verb>
<P>
To summarize:
<P>
Advantages:
<itemize>
<item>
http-gw allows you to selectively block ActiveX and Java, and it's
primary design goal is security.
<item>
The firewall doesn't need to run large applications like Squid.
<item>
The internal Squid-server still gives you the benefit of caching.
</itemize>
<P>
Disadvantages:
<itemize>
<item>
The internal Squid proxyserver can't (and shouldn't) work with other
parent or neighbor caches.
<item>
Initial requests are slower because these go through http-gw, http-gw
also does reverse lookups. Run a nameserver on the firewall or use an
internal nameserver.
</itemize>
<quote>
--<url url="mailto:RvdOever@baan.nl" name="Rodney van den Oever">
</quote>
<sect1>What is ``HTTP_X_FORWARDED_FOR''? Why does squid provide it to WWW servers, and how can I stop it?
<P>
When a proxy-cache is used, a server does not see the connection
coming from the originating client. Many people like to implement
access controls based on the client address.
To accommodate these people, Squid adds its own request header
called "X-Forwarded-For" which looks like this:
<verb>
X-Forwarded-For: 128.138.243.150, unknown, 192.52.106.30
</verb>
Entries are always IP addresses, or the word <em/unknown/ if the address
could not be determined or if it has been disabled with the
<em/forwarded_for/ configuration option.
<P>
We must note that access controls based on this header are extremely
weak and simple to fake. Anyone may hand-enter a request with any IP
address whatsoever. This is perhaps the reason why client IP addresses
have been omitted from the HTTP/1.1 specification.
<sect1>Can Squid anonymize HTTP requests?
<p>
Yes it can, however the way of doing it has changed from earlier versions
of squid. As of squid-2.2 a more customisable method has been introduced.
Please follow the instructions for the version of squid that you are using.
As a default, no anonymizing is done.
<p>
If you choose to use the anonymizer you might wish to investigate the forwarded_for
option to prevent the client address being disclosed. Failure to turn off the
forwarded_for option will reduce the effectiveness of the anonymizer. Finally
if you filter the User-Agent header using the fake_user_agent option can
prevent some user problems as some sites require the User-Agent header.
<sect2>Squid 2.2
<p>
With the introduction of squid 2.2 the anonoymizer has become more customisable.
It now allows specification of exactly which headers will be allowed to pass.
The new anonymizer uses the 'anonymize_headers' tag. It has two modes 'deny' all
and allow the specified headers. The following example will simulate the old
paranoid mode.
<verb>
anonymize_headers allow Allow Authorization Cache-Control
anonymize_headers allow Content-Encoding Content-Length
anonymize_headers allow Content-Type Date Expires Host
anonymize_headers allow If-Modified-Since Last-Modified
anonymize_headers allow Location Pragma Accept Charset
anonymize_headers allow Accept-Encoding Accept-Language
anonymize_headers allow Content-Language Mime-Version
anonymize_headers allow Retry-After Title Connection
anonymize_headers allow Proxy-Connection
</verb>
This will prevent any headers other than those listed from being passed by the
proxy.
<p>
The second mode is 'allow' all and deny the specified headers. The example
replicates the old standard mode.
<verb>
anonymize_headers deny From Referer Server
anonymize_headers deny User-Agent WWW-Authenticate Link
</verb>
It allows all headers to pass unless they are listed.
<p>
You can not mix allow and deny in a squid configuration it is either one
or the other!
<sect2>Squid 2.1 and Earlier
<P>
There are three modes: <em/none/, <em/standard/, and
<em/paranoid/. The mode is set with the <em>http_anonymizer</em>
configuration option.
<P>
With no anonymizing (the default), Squid forwards all request headers
as received from the client, to the origin server (subject to the regular
rules of HTTP).
<P>
In the <em/standard/ mode, Squid filters out the following specific request
headers:
<itemize>
<item>From:
<item>Referer:
<item>Server:
<item>User-Agent:
<item>WWW-Authenticate:
<item>Link:
</itemize>
<P>
In the <em/paranoid/ mode, Squid allows only the following specific
request headers:
<itemize>
<item>Allow:
<item>Authorization:
<item>Cache-Control:
<item>Content-Encoding:
<item>Content-Length:
<item>Content-Type:
<item>Date:
<item>Expires:
<item>Host:
<item>If-Modified-Since:
<item>Last-Modified:
<item>Location:
<item>Pragma:
<item>Accept:
<item>Accept-Charset:
<item>Accept-Encoding:
<item>Accept-Language:
<item>Content-Language:
<item>Mime-Version:
<item>Retry-After:
<item>Title:
<item>Connection:
<item>Proxy-Connection:
</itemize>
<P>
References:
<url url="http://www.iks-jena.de/mitarb/lutz/anon/web.en.html"
name="Anonymous WWW">
<sect1>Can I make Squid go direct for some sites?
<p>
Sure, just use the <em/always_direct/ access list.
<p>
For example, if you want Squid to connect directly to <em/hotmail.com/ servers,
you can use these lines in your config file:
<verb>
acl hotmail dstdomain .hotmail.com
always_direct allow hotmail
</verb>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Communication between browsers and Squid
<P>
Most web browsers available today support proxying and are easily configured
to use a Squid server as a proxy. Some browsers support advanced features
such as lists of domains or URL patterns that shouldn't be fetched through
the proxy, or JavaScript automatic proxy configuration.
<sect1>Netscape manual configuration
<P>
Select <bf/Network Preferences/ from the
<bf/Options/ menu. On the <bf/Proxies/
page, click the radio button next to <bf/Manual Proxy
Configuration/ and then click on the <bf/View/
button. For each protocol that your Squid server supports (by default,
HTTP, FTP, and gopher) enter the Squid server's hostname or IP address
and put the HTTP port number for the Squid server (by default, 3128) in
the <bf/Port/ column. For any protocols that your Squid
does not support, leave the fields blank.
<P>
Here is a
<url url="/Doc/FAQ/navigator.jpg"
name="screen shot"> of the Netscape Navigator manual proxy
configuration screen.
<P>
<sect1>Netscape automatic configuration
<label id="netscape-pac">
<P>
Netscape Navigator's proxy configuration can be automated with
JavaScript (for Navigator versions 2.0 or higher). Select
<bf/Network Preferences/ from the <bf/Options/
menu. On the <bf/Proxies/ page, click the radio button
next to <bf/Automatic Proxy Configuration/ and then
fill in the URL for your JavaScript proxy configuration file in the
text box. The box is too small, but the text will scroll to the
right as you go.
<P>
Here is a
<url url="/Doc/FAQ/navigator-auto.jpg"
name="screen shot">
of the Netscape Navigator automatic proxy configuration screen.
You may also wish to consult Netscape's documentation for the Navigator
<url
url="http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy-live.html"
name="JavaScript proxy configuration">
<P>
Here is a sample auto configuration JavaScript from Oskar Pearson:
<code>
//We (www.is.co.za) run a central cache for our customers that they
//access through a firewall - thus if they want to connect to their intranet
//system (or anything in their domain at all) they have to connect
//directly - hence all the "fiddling" to see if they are trying to connect
//to their local domain.
//Replace each occurrence of company.com with your domain name
//and if you have some kind of intranet system, make sure
//that you put it's name in place of "internal" below.
//We also assume that your cache is called "cache.company.com", and
//that it runs on port 8080. Change it down at the bottom.
//(C) Oskar Pearson and the Internet Solution (http://www.is.co.za)
function FindProxyForURL(url, host)
{
//If they have only specified a hostname, go directly.
if (isPlainHostName(host))
return "DIRECT";
//These connect directly if the machine they are trying to
//connect to starts with "intranet" - ie http://intranet
//Connect directly if it is intranet.*
//If you have another machine that you want them to
//access directly, replace "internal*" with that
//machine's name
if (shExpMatch( host, "intranet*")||
shExpMatch(host, "internal*"))
return "DIRECT";
//Connect directly to our domains (NB for Important News)
if (dnsDomainIs( host,"company.com")||
//If you have another domain that you wish to connect to
//directly, put it in here
dnsDomainIs(host,"sistercompany.com"))
return "DIRECT";
//So the error message "no such host" will appear through the
//normal Netscape box - less support queries :)
if (!isResolvable(host))
return "DIRECT";
//We only cache http, ftp and gopher
if (url.substring(0, 5) == "http:" ||
url.substring(0, 4) == "ftp:"||
url.substring(0, 7) == "gopher:")
//Change the ":8080" to the port that your cache
//runs on, and "cache.company.com" to the machine that
//you run the cache on
return "PROXY cache.company.com:8080; DIRECT";
//We don't cache WAIS
if (url.substring(0, 5) == "wais:")
return "DIRECT";
else
return "DIRECT";
}
</code>
<sect1>Lynx and Mosaic configuration
<P>
For Mosaic and Lynx, you can set environment variables
before starting the application. For example (assuming csh or tcsh):
<P>
<verb>
% setenv http_proxy http://mycache.example.com:3128/
% setenv gopher_proxy http://mycache.example.com:3128/
% setenv ftp_proxy http://mycache.example.com:3128/
</verb>
<P>
For Lynx you can also edit the <em/lynx.cfg/ file to configure
proxy usage. This has the added benefit of causing all Lynx users on
a system to access the proxy without making environment variable changes
for each user. For example:
<verb>
http_proxy:http://mycache.example.com:3128/
ftp_proxy:http://mycache.example.com:3128/
gopher_proxy:http://mycache.example.com:3128/
</verb>
<sect1>Redundant Auto-Proxy-Configuration
<P>
There's one nasty side-effect to using auto-proxy scripts: if you start
the web browser it will try and load the auto-proxy-script.
<P>
If your script isn't available either because the web server hosting the
script is down or your workstation can't reach the web server (e.g.
because you're working off-line with your notebook and just want to
read a previously saved HTML-file) you'll get different errors depending
on the browser you use.
<P>
The Netscape browser will just return an error after a timeout (after
that it tries to find the site 'www.proxy.com' if the script you use is
called 'proxy.pac').
<P>
The Microsoft Internet Explorer on the other hand won't even start, no
window displays, only after about 1 minute it'll display a window asking
you to go on with/without proxy configuration.
<P>
The point is that your workstations always need to locate the
proxy-script. I created some extra redundancy by hosting the script on
two web servers (actually Apache web servers on the proxy servers
themselves) and adding the following records to my primary nameserver:
<verb>
proxy CNAME proxy1
CNAME proxy2
</verb>
The clients just refer to 'http://proxy/proxy.pac'. This script looks like this:
<verb>
function FindProxyForURL(url,host)
{
// Hostname without domainname or host within our own domain?
// Try them directly:
// http://www.domain.com actually lives before the firewall, so
// make an exception:
if ((isPlainHostName(host)||dnsDomainIs( host,".domain.com")) &&
!localHostOrDomainIs(host, "www.domain.com"))
return "DIRECT";
// First try proxy1 then proxy2. One server mostly caches '.com'
// to make sure both servers are not
// caching the same data in the normal situation. The other
// server caches the other domains normally.
// If one of 'm is down the client will try the other server.
else if (shExpMatch(host, "*.com"))
return "PROXY proxy1.domain.com:8080; PROXY proxy2.domain.com:8081; DIRECT";
return "PROXY proxy2.domain.com:8081; PROXY proxy1.domain.com:8080; DIRECT";
}
</verb>
<P>
I made sure every client domain has the appropriate 'proxy' entry.
The clients are automatically configured with two nameservers using
DHCP.
<quote>
--<url url="mailto:RvdOever@baan.nl"
name="Rodney van den Oever">
</quote>
<sect1>Microsoft Internet Explorer configuration
<P>
Select <bf/Options/ from the <bf/View/
menu. Click on the <bf/Connection/ tab. Tick the
<bf/Connect through Proxy Server/ option and hit the
<bf/Proxy Settings/ button. For each protocol that
your Squid server supports (by default, HTTP, FTP, and gopher)
enter the Squid server's hostname or IP address and put the HTTP
port number for the Squid server (by default, 3128) in the
<bf/Port/ column. For any protocols that your Squid
does not support, leave the fields blank.
<P>
Here is a
<url url="/Doc/FAQ/msie.jpg"
name="screen shot"> of the Internet Explorer proxy
configuration screen.
<P>
Microsoft is also starting to support Netscape-style JavaScript
automated proxy configuration. As of now, only MSIE version 3.0a
for Windows 3.1 and Windows NT 3.51 supports this feature (i.e.,
as of version 3.01 build 1225 for Windows 95 and NT 4.0, the feature
was not included).
<P>
If you have a version of MSIE that does have this feature, elect
<bf/Options/ from the <bf/View/ menu.
Click on the <bf/Advanced/ tab. In the lower left-hand
corner, click on the <bf/Automatic Configuration/
button. Fill in the URL for your JavaScript file in the dialog
box it presents you. Then exit MSIE and restart it for the changes
to take effect. MSIE will reload the JavaScript file every time
it starts.
<sect1>Netmanage Internet Chameleon WebSurfer configuration
<P>
Netmanage WebSurfer supports manual proxy configuration and exclusion
lists for hosts or domains that should not be fetched via proxy
(this information is current as of WebSurfer 5.0). Select
<bf/Preferences/ from the <bf/Settings/
menu. Click on the <bf/Proxies/ tab. Select the
<bf/Use Proxy/ options for HTTP, FTP, and gopher. For
each protocol that enter the Squid server's hostname or IP address
and put the HTTP port number for the Squid server (by default,
3128) in the <bf/Port/ boxes. For any protocols that
your Squid does not support, leave the fields blank.
<P>
Take a look at this
<url url="/Doc/FAQ/netmanage.jpg"
name="screen shot">
if the instructions confused you.
<P>
On the same configuration window, you'll find a button to bring up
the exclusion list dialog box, which will let you enter some hosts
or domains that you don't want fetched via proxy. It should be
self-explanatory, but you might look at this
<url url="/Doc/FAQ/netmanage-exclusion.jpg"
name="screen shot">
just for fun anyway.
<sect1>Opera 2.12 proxy configuration
<P>
Select <em/Proxy Servers.../ from the <em/Preferences/ menu. Check each
protocol that your Squid server supports (by default, HTTP, FTP, and
Gopher) and enter the Squid server's address as hostname:port (e.g.
mycache.example.com:3128 or 123.45.67.89:3128). Click on <em/Okay/ to accept the
setup.
<P>
Notes:
<itemize>
<item>
Opera 2.12 doesn't support gopher on its own, but requires a proxy; therefore
Squid's gopher proxying can extend the utility of your Opera immensely.
<item>
Unfortunately, Opera 2.12 chokes on some HTTP requests, for example
<url url="http://spam.abuse.net/spam/"
name="abuse.net">.
At the moment I think it has something to do with cookies. If you have
trouble with a site, try disabling the HTTP proxying by unchecking
that protocol in the <em/Preferences/|<em/Proxy Servers.../ dialogue. Opera will
remember the address, so reenabling is easy.
</itemize>
<quote>
--<url url="mailto:hclsmith@tallships.istar.ca" name="Hume Smith">
</quote>
<sect1>How do I tell Squid to use a specific username for FTP urls?
<P>
Insert your username in the host part of the URL, for example:
<verb>
ftp://joecool@ftp.foo.org/
</verb>
Squid should then prompt you for your account password. Alternatively,
you can specify both your username and password in the URL itself:
<verb>
ftp://joecool:secret@ftp.foo.org/
</verb>
However, we certainly do not recommend this, as it could be very
easy for someone to see or grab your password.
<sect1>Configuring Browsers for WPAD
<P>
by <url url="mailto:mark@rts.com.au" name="Mark Reynolds">
<P>
You may like to start by reading the
<url url="http://www.ietf.org/internet-drafts/draft-ietf-wrec-wpad-01.txt" name="Internet-Draft">
that describes WPAD.
<P>
After reading the 8 steps below, if you don't understand any of the
terms or methods mentioned, you probably shouldn't be doing this.
Implementing wpad requires you to <bf/fully/ understand:
<enum>
<item> web server installations and modifications.
<item>squid proxy server (or others) installation etc.
<item>Domain Name System maintenance etc.
</enum>
Please don't bombard the squid list with web server or dns questions. See
your system administrator, or do some more research on those topics.
<P>
This is not a recommendation for any product or version. As far as I
know IE5 is the only browser out now implementing wpad. I think wpad
is an excellent feature that will return several hours of life per month.
Hopefully, all browser clients will implement it as well. But it will take
years for all the older browsers to fade away though.
<P>
I have only focused on the domain name method, to the exclusion of the
DHCP method. I think the dns method might be easier for most people.
I don't currently, and may never, fully understand wpad and IE5, but this
method worked for me. It <bf/may/ work for you.
<P>
But if you'd rather just have a go ...
<enum>
<item>
Create a standard <ref id="netscape-pac" name="netscape auto
proxy config file">. The sample provided there is more than
adequate to get you going. No doubt all the other load balancing
and backup scripts will be fine also.
<item>
Store the resultant file in the document root directory of a
handy web server as <em/wpad.dat/ (Not <em/proxy.pac/ as you
may have previously done.)
<p>
<url url="mailto:ira at racoon.riga.lv" name="Andrei Ivanov">
notes that you should be able to use an HTTP redirect if you
want to store the wpad.dat file somewhere else. You can probably
even redirect <em/wpad.dat/ to <em/proxy.pac/:
<verb>
Redirect /wpad.dat http://racoon.riga.lv/proxy.pac
</verb>
<item>
If you do nothing more, a url like
<tt>http://www.your.domain.name/wpad.dat</tt> should bring up
the script text in your browser window.
<item>
Insert the following entry into your web server <em/mime.types/ file.
Maybe in addition to your pac file type, if you've done this before.
<verb>
application/x-ns-proxy-autoconfig dat
</verb>
And then restart your web server, for new mime type to work.
<item>
Assuming Internet Explorer 5, under <em/Tools/, <em/Internet
Options/, <em/Connections/, <em/Settings/ <bf/or/ <em/Lan
Settings/, set <bf/ONLY/ <em/Use Automatic Configuration Script/
to be the URL for where your new <em/wpad.dat/ file can be found.
i.e. <tt>http://www.your.domain.name/wpad.dat</tt> Test that
that all works as per your script and network. There's no point
continuing until this works ...
<item>
Create/install/implement a DNS record so that
<tt>wpad.your.domain.name</tt> resolves to the host above where
you have a functioning auto config script running. You should
now be able to use <tt>http://wpad.your.domain.name/wpad.dat</tt>
as the Auto Config Script location in step 5 above.
<item>
And finally, go back to the setup screen detailed in 5 above,
and choose nothing but the <em/Automatically Detect Settings/
option, turning everything else off. Best to restart IE5, as
you normally do with any Microsoft product... And it should all
work. Did for me anyway.
<item>
One final question might be 'Which domain name does the client
(IE5) use for the wpad... lookup?' It uses the hostname from
the control panel setting. It starts the search by adding the
hostname "WPAD" to current fully-qualified domain name. For
instance, a client in a.b.Microsoft.com would search for a WPAD
server at wpad.a.b.microsoft.com. If it could not locate one,
it would remove the bottom-most domain and try again; for
instance, it would try wpad.b.microsoft.com next. IE 5 would
stop searching when it found a WPAD server or reached the
third-level domain, wpad.microsoft.com.
</enum>
<P>
Anybody using these steps to install and test, please feel free to make
notes, corrections or additions for improvements, and post back to the
squid list...
<P>
There are probably many more tricks and tips which hopefully will be
detailed here in the future. Things like <em/wpad.dat/ files being served
from the proxy server themselves, maybe with a round robin dns setup
for the WPAD host.
<sect1>IE 5.0x crops trailing slashes from FTP URL's
<p>
by <url url="mailto:reuben at reub dot net" name="Reuben Farrelly">
<p>
There was a bug in the 5.0x releases of Internet Explorer in which IE
cropped any trailing slash off an FTP URL. The URL showed up correctly in
the browser's ``Address:'' field, however squid logs show that the trailing
slash was being taken off.
<p>
An example of where this impacted squid if you had a setup where squid
would go direct for FTP directory listings but forward a request to a
parent for FTP file transfers. This was useful if your upstream proxy was
an older version of Squid or another vendors software which displayed
directory listings with broken icons and you wanted your own local version
of squid to generate proper FTP directory listings instead.
The workaround for this is to add a double slash to any directory listing
in which the slash was important, or else upgrade to IE 5.5. (Or use Netscape)
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Squid Log Files
<P>
The logs are a valuable source of information about Squid workloads and
performance. The logs record not only access information, but also system
configuration errors and resource consumption (eg, memory, disk
space). There are several log file maintained by Squid. Some have to be
explicitely activated during compile time, others can safely be deactivated
during run-time.
<P>
There are a few basic points common to all log files. The time stamps
logged into the log files are usually UTC seconds unless stated otherwise.
The initial time stamp usually contains a millisecond extension.
<P>
The frequent time lookups on busy caches may have a performance impact on
some systems. The compile time configuration option
<em/--enable-time-hack/ makes Squid only look up a new time in one
second intervals. The implementation uses Unix's <em/alarm()/
functionality. Note that the resolution of logged times is much coarser
afterwards, and may not suffice for some log file analysis programs.
Usually there is no need to fiddle with the timestamp hack.
<sect1><em/squid.out/
<P>
If you run your Squid from the <em/RunCache/ script, a file
<em/squid.out/ contains the Squid startup times, and also all fatal
errors, e.g. as produced by an <em/assert()/ failure. If you are not
using <em/RunCache/, you will not see such a file.
<sect1><em/cache.log/
<P>
The <em/cache.log/ file contains the debug and error messages that Squid
generates. If you start your Squid using the default <em/RunCache/ script,
or start it with the <em/-s/ command line option, a copy of certain
messages will go into your syslog facilities. It is a matter of personal
preferences to use a separate file for the squid log data.
<P>
From the area of automatic log file analysis, the <em/cache.log/ file does
not have much to offer. You will usually look into this file for automated
error reports, when programming Squid, testing new features, or searching
for reasons of a perceived misbehaviour, etc.
<sect1><em/useragent.log/
<P>
The user agent log file is only maintained, if
<enum>
<item>you configured the compile time <em/--enable-useragent-log/
option, and
<item>you pointed the <em/useragent_log/ configuration option to a
file.
</enum>
<P>
From the user agent log file you are able to find out about distributation
of browsers of your clients. Using this option in conjunction with a loaded
production squid might not be the best of all ideas.
<sect1><em/store.log/
<P>
The <em/store.log/ file covers the objects currently kept on disk or
removed ones. As a kind of transaction log it is ususally used for
debugging purposes. A definitive statement, whether an object resides on
your disks is only possible after analysing the <em/complete/ log file.
The release (deletion) of an object may be logged at a later time than the
swap out (save to disk).
<P>
The <em/store.log/ file may be of interest to log file analysis which
looks into the objects on your disks and the time they spend there, or how
many times a hot object was accessed. The latter may be covered by another
log file, too. With knowledge of the <em/cache_dir/ configuration option,
this log file allows for a URL to filename mapping without recursing your
cache disks. However, the Squid developers recommend to treat
<em/store.log/ primarily as a debug file, and so should you, unless you
know what you are doing.
<P>
The print format for a store log entry (one line) consists of eleven
space-separated columns, compare with the <em/storeLog()/ function in file
<em>src/store_log.c</em>:
<verb>
"%9d.%03d %-7s %08X %4d %9d %9d %9d %s %d/%d %s %s\n"
</verb>
<descrip>
<tag/time/
<P>
The timestamp when the line was logged in UTC with a millisecond fraction.
<tag/action/
<P>
The action the object was sumitted to, compare with <em>src/store_log.c</em>:
<itemize>
<item><bf/CREATE/ Seems to be unused.
<item><bf/RELEASE/ The object was removed from the cache (see also
<ref id="log-fileno" name="file number">).
<item><bf/SWAPOUT/ The object was saved to disk.
<item><bf/SWAPIN/ The object existed on disk and was read into memory.
</itemize>
<tag/file number/<label id="log-fileno">
<P>
The file number for the object storage file. Please note that the path to
this file is calculated according to your <em/cache_dir/ configuration.
<P>
A file number of <em/FFFFFFFF/ denominates "memory only" objects. Any
action code for such a file number refers to an object which existed only
in memory, not on disk. For instance, if a <em/RELEASE/ code was logged
with file number <em/FFFFFFFF/, the object existed only in memory, and was
released from memory.
<tag/status/
<P>
The HTTP reply status code.
<tag/datehdr/<label id="log-Date">
<P>
The value of the HTTP "Date: " reply header.
<tag/lastmod<label id="log-LM">
<P>
The value of the HTTP "Last-Modified: " reply header.
<tag/expires/<label id="log-Expires">
<P>
The value of the HTTP "Expires: " reply header.
<tag/type/
<P>
The HTTP "Content-Type" major value, or "unknown" if it cannot be
determined.
<tag/sizes/
<P>
This column consists of two slash separated fields:
<enum>
<item>The advertised content length from the HTTP "Content-Length: " reply
header.
<item>The size actually read.
</enum>
<P>
If the advertised (or expected) length is missing, it will be set to
zero. If the advertised length is not zero, but not equal to the real
length, the object will be realeased from the cache.
<tag/method/
<P>
The request method for the object, e.g. <em/GET/.
<tag/key/<label id="log-key">
<P>
The key to the object, usually the URL.
</descrip>
<P>
The timestamp format for the columns <ref id="log-Date" name="Date"> to
<ref id="log-Expires" name="Expires"> are all expressed in UTC seconds. The
actual values are parsed from the HTTP reply headers. An unparsable header
is represented by a value of -1, and a missing header is represented by a
value of -2.
<P>
The column <ref id="log-key" name="key"> usually contains just the URL of
the object. Some objects though will never become public. Thus the key is
said to include a unique integer number and the request method in addition
to the URL.
<sect1><em/hierarchy.log/
<P>
This logfile exists for Squid-1.0 only. The format is
<verb>
[date] URL peerstatus peerhost
</verb>
<sect1><em/access.log/
<P>
Most log file analysis program are based on the entries in
<em/access.log/. Currently, there are two file formats possible for the log
file, depending on your configuration for the <em/emulate_httpd_log/
option. By default, Squid will log in its native log file format. If the
above option is enabled, Squid will log in the common log file format as
defined by the CERN web daemon.
<P>
The common log file format contains other information than the native log
file, and less. The native format contains more information for the admin
interested in cache evaluation.
<sect2><em/The common log file format/
<P>
The
<url url="http://www.w3.org/pub/WWW/Daemon/User/Config/Logging.html&num;common-logfile-format"
name="Common Logfile Format">
is used by numerous HTTP servers. This format consists of the following
seven fields:
<verb>
remotehost rfc931 authuser [date] "method URL" status bytes
</verb>
<P>
It is parsable by a variety of tools. The common format contains different
information than the native log file format. The HTTP version is logged,
which is not logged in native log file format.
<sect2><em/The native log file format/
<P>
The native format is different for different major versions of Squid. For
Squid-1.0 it is:
<verb>
time elapsed remotehost code/status/peerstatus bytes method URL
</verb>
<P>
For Squid-1.1, the information from the <em/hierarchy.log/ was moved
into <em/access.log/. The format is:
<verb>
time elapsed remotehost code/status bytes method URL rfc931 peerstatus/peerhost type
</verb>
<P>
For Squid-2 the columns stay the same, though the content within may change
a little.
<P>
The native log file format logs more and different information than the
common log file format: the request duration, some timeout information,
the next upstream server address, and the content type.
There exist tools, which convert one file format into the other. Please
mind that even though the log formats share most information, both formats
contain information which is not part of the other format, and thus this
part of the information is lost when converting. Especially converting back
and forth is not possible without loss.
<em/squid2common.pl/ is a conversion utility, which converts any of the
squid log file formats into the old CERN proxy style output. There exist
tools to analyse, evaluate and graph results from that format.
<sect2><em/access.log native format in detail/
<P>
It is recommended though to use Squid's native log format due to its
greater amount of information made available for later analysis. The print
format line for native <em/access.log/ entries looks like this:
<verb>
"%9d.%03d %6d %s %s/%03d %d %s %s %s %s%s/%s %s"
</verb>
<P>
Therefore, an <em/access.log/ entry usually consists of (at least) 10
columns separated by one ore more spaces:
<descrip>
<tag/time/
<P>
A Unix timestamp as UTC seconds with a millisecond resolution. You
can convert Unix timestamps into something more human readable using
this short perl script:
<verb>
#! /usr/bin/perl -p
s/^\d+\.\d+/localtime $&/e;
</verb>
<tag/duration/
<P>
The elapsed time considers how many milliseconds the transaction
busied the cache. It differs in interpretation between TCP and UDP:
<P>
<itemize>
<item>For HTTP/1.0, this is basically the time between <em/accept()/
and <em/close()/.
<item>For persistent connections, this ought to be the time between
scheduling the reply and finishing sending it.
<item>For ICP, this is the time between scheduling a reply and actually
sending it.
</itemize>
<P>
Please note that the entries are logged <em/after/ the reply finished
being sent, <em/not/ during the lifetime of the transaction.
<tag/client address/
<P>
The IP address of the requesting instance, the client IP address. The
<em/client_netmask/ configuration option can distort the clients for data
protection reasons, but it makes analysis more difficult. Often it is
better to use one of the log file anonymizers.
<P>
Also, the <em/log_fqdn/ configuration option may log the fully qualified
domain name of the client instead of the dotted quad. The use of that
option is discouraged due to its performance impact.
<tag/result codes/<label id="log-resultcode">
<P>
This column is made up of two entries separated by a slash. This column
encodes the transaction result:
<enum>
<item>The cache result of the request contains information on the kind of
request, how it was satisfied, or in what way it failed. Please refer
to section <ref id="cache-result-codes" name="Squid result codes">
for valid symbolic result codes.
<P>
Several codes from older versions are no longer available, were
renamed, or split. Especially the <em/ERR_/ codes do not seem to
appear in the log file any more. Also refer to section
<ref id="cache-result-codes" name="Squid result codes"> for details
on the codes no longer available in Squid-2.
<P>
The NOVM versions and Squid-2 also rely on the Unix buffer cache, thus
you will see less <em/TCP_MEM_HIT/s than with a Squid-1.
Basically, the NOVM feature relies on <em/read()/ to obtain an
object, but due to the kernel buffer cache, no disk activity is needed.
Only small objects (below 8KByte) are kept in Squid's part of main
memory.
<item>The status part contains the HTTP result codes with some Squid specific
extensions. Squid uses a subset of the RFC defined error codes for
HTTP. Refer to section <ref id="http-status-codes" name="status codes">
for details of the status codes recognized by a Squid-2.
</enum>
<tag/bytes/
<P>
The size is the amount of data delivered to the client. Mind that this does
not constitute the net object size, as headers are also counted. Also,
failed requests may deliver an error page, the size of which is also logged
here.
<tag/request method/
<P>
The request method to obtain an object. Please refer to section
<ref id="request-methods"> for available methods.
If you turned off <em/log_icp_queries/ in your configuration, you
will not see (and thus unable to analyse) ICP exchanges. The <em/PURGE/
method is only available, if you have an ACL for ``method purge'' enabled
in your configuration file.
<tag/URL/
<P>
This column contains the URL requested. Please note that the log file
may contain whitespaces for the URI. The default configuration for
<em/uri_whitespace/ denies whitespaces, though.
<tag/rfc931/
<P>
The eigth column may contain the ident lookups for the requesting
client. Since ident lookups have performance impact, the default
configuration turns <em/ident_loookups/ off. If turned off, or no ident
information is available, a ``-'' will be logged.
<tag/hierarchy code/
<P>
The hierarchy information consists of three items:
<P>
<enum>
<item>Any hierarchy tag may be prefixed with <em/TIMEOUT_/, if the
timeout occurs waiting for all ICP replies to return from the
neighbours. The timeout is either dynamic, if the
<em/icp_query_timeout/ was not set, or the time configured there
has run up.
<item>A code that explains how the request was handled, e.g. by
forwarding it to a peer, or going straight to the source. Refer to
section <ref id="hier-codes"> for details on hierarchy codes and
removed hierarchy codes.
<item>The name of the host the object was requested from. This host may
be the origin site, a parent or any other peer. Also note that the
hostname may be numerical.
</enum>
<tag/type/
<P>
The content type of the object as seen in the HTTP reply
header. Please note that ICP exchanges usually don't have any content
type, and thus are logged ``-''. Also, some weird replies have content
types ``:'' or even empty ones.
</descrip>
<P>
There may be two more columns in the <em/access.log/, if the (debug) option
<em/log_mime_headers/ is enabled In this case, the HTTP request headers are
logged between a ``['' and a ``]'', and the HTTP reply headers are also
logged between ``['' and ``]''. All control characters like CR and LF are
URL-escaped, but spaces are <em/not/ escaped! Parsers should watch out for
this.
<sect1>Squid result codes
<label id="cache-result-codes">
<P>
The <bf/TCP_/ codes refer to requests on the HTTP port (usually 3128). The
<bf/UDP_/ codes refer to requests on the ICP port (usually 3130). If
ICP logging was disabled using the <em/log_icp_queries/ option, no ICP
replies will be logged.
<P>
The following result codes were taken from a Squid-2, compare with the
<em/log_tags/ struct in <em>src/access_log.c</em>:
<descrip>
<tag/TCP_HIT/
A valid copy of the requested object was in the cache.
<tag/TCP_MISS/
The requested object was not in the cache.
<tag/TCP_REFRESH_HIT/
The requested object was cached but <em/STALE/. The IMS query
for the object resulted in "304 not modified".
<tag/TCP_REF_FAIL_HIT/
The requested object was cached but <em/STALE/. The IMS query
failed and the stale object was delivered.
<tag/TCP_REFRESH_MISS/
The requested object was cached but <em/STALE/. The IMS query
returned the new content.
<tag/TCP_CLIENT_REFRESH_MISS/<label id="tcp-client-refresh-miss">
The client issued a "no-cache" pragma, or some analogous cache
control command along with the request. Thus, the cache has to
refetch the object.
<tag/TCP_IMS_HIT/<label id="tcp-ims-hit">
The client issued an IMS request for an object which was in the
cache and fresh.
<tag/TCP_SWAPFAIL_MISS/<label id="tcp-swapfail-miss">
The object was believed to be in the cache,
but could not be accessed.
<tag/TCP_NEGATIVE_HIT/
Request for a negatively cached object,
e.g. "404 not found", for which the cache believes to know that
it is inaccessible. Also refer to the explainations for
<em/negative_ttl/ in your <em/squid.conf/ file.
<tag/TCP_MEM_HIT/
A valid copy of the requested object was in the
cache <em/and/ it was in memory, thus avoiding disk accesses.
<tag/TCP_DENIED/
Access was denied for this request.
<tag/TCP_OFFLINE_HIT/
The requested object was retrieved from the
cache during offline mode. The offline mode never
validates any object, see <em/offline_mode/ in
<em/squid.conf/ file.
<tag/UDP_HIT/
A valid copy of the requested object was in the cache.
<tag/UDP_MISS/
The requested object is not in this cache.
<tag/UDP_DENIED/
Access was denied for this request.
<tag/UDP_INVALID/
An invalid request was received.
<tag/UDP_MISS_NOFETCH/<label id="udp-miss-nofetch">
During "-Y" startup, or during frequent
failures, a cache in hit only mode will return either UDP_HIT or
this code. Neighbours will thus only fetch hits.
<tag/NONE/
Seen with errors and cachemgr requests.
</descrip>
<P>
The following codes are no longer available in Squid-2:
<descrip>
<tag/ERR_*/
Errors are now contained in the status code.
<tag/TCP_CLIENT_REFRESH/
See: <ref id="tcp-client-refresh-miss" name="TCP_CLIENT_REFRESH_MISS">.
<tag/TCP_SWAPFAIL/
See: <ref id="tcp-swapfail-miss" name="TCP_SWAPFAIL_MISS">.
<tag/TCP_IMS_MISS/
Deleted, <ref id="tcp-ims-hit" name="TCP_IMS_HIT"> used instead.
<tag/UDP_HIT_OBJ/
Hit objects are no longer available.
<tag/UDP_RELOADING/
See: <ref id="udp-miss-nofetch" name="UDP_MISS_NOFETCH">.
</descrip>
<sect1>HTTP status codes
<label id="http-status-codes">
<P>
These are taken from
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2616.txt"
name="RFC 2616"> and verified for Squid. Squid-2 uses almost all
codes except 307 (Temporary Redirect), 416 (Request Range Not Satisfiable),
and 417 (Expectation Failed). Extra codes include 0 for a result code being
unavailable, and 600 to signal an invalid header, a proxy error. Also, some
definitions were added as for
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2518.txt"
name="RFC 2518"> (WebDAV).
Yes, there are really two entries for status code
424, compare with <em/http_status/ in <em>src/enums.h</em>:
<verb>
000 Used mostly with UDP traffic.
100 Continue
101 Switching Protocols
*102 Processing
200 OK
201 Created
202 Accepted
203 Non-Authoritative Information
204 No Content
205 Reset Content
206 Partial Content
*207 Multi Status
300 Multiple Choices
301 Moved Permanently
302 Moved Temporarily
303 See Other
304 Not Modified
305 Use Proxy
[307 Temporary Redirect]
400 Bad Request
401 Unauthorized
402 Payment Required
403 Forbidden
404 Not Found
405 Method Not Allowed
406 Not Acceptable
407 Proxy Authentication Required
408 Request Timeout
409 Conflict
410 Gone
411 Length Required
412 Precondition Failed
413 Request Entity Too Large
414 Request URI Too Large
415 Unsupported Media Type
[416 Request Range Not Satisfiable]
[417 Expectation Failed]
*424 Locked
*424 Failed Dependency
*433 Unprocessable Entity
500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
505 HTTP Version Not Supported
*507 Insufficient Storage
600 Squid header parsing error
</verb>
<sect1>Request methods
<label id="request-methods">
<P>
Squid recognizes several request methods as defined in
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2616.txt"
name="RFC 2616">. Newer versions of Squid (2.2.STABLE5 and above)
also recognize
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2616.txt"
name="RFC 2518"> ``HTTP Extensions for Distributed Authoring --
WEBDAV'' extensions.
<verb>
method defined cachabil. meaning
--------- ---------- ---------- -------------------------------------------
GET HTTP/0.9 possibly object retrieval and simple searches.
HEAD HTTP/1.0 possibly metadata retrieval.
POST HTTP/1.0 CC or Exp. submit data (to a program).
PUT HTTP/1.1 never upload data (e.g. to a file).
DELETE HTTP/1.1 never remove resource (e.g. file).
TRACE HTTP/1.1 never appl. layer trace of request route.
OPTIONS HTTP/1.1 never request available comm. options.
CONNECT HTTP/1.1r3 never tunnel SSL connection.
ICP_QUERY Squid never used for ICP based exchanges.
PURGE Squid never remove object from cache.
PROPFIND rfc2518 ? retrieve properties of an object.
PROPATCH rfc2518 ? change properties of an object.
MKCOL rfc2518 never create a new collection.
MOVE rfc2518 never create a duplicate of src in dst.
COPY rfc2518 never atomically move src to dst.
LOCK rfc2518 never lock an object against modifications.
UNLOCK rfc2518 never unlock an object.
</verb>
<sect1>Hierarchy Codes
<label id="hier-codes">
<P>
The following hierarchy codes are used with Squid-2:
<descrip>
<tag/NONE/
For TCP HIT, TCP failures, cachemgr requests and all UDP
requests, there is no hierarchy information.
<tag/DIRECT/
The object was fetched from the origin server.
<tag/SIBLING_HIT/
The object was fetched from a sibling cache which replied with
UDP_HIT.
<tag/PARENT_HIT/
The object was requested from a parent cache which replied with
UDP_HIT.
<tag/DEFAULT_PARENT/
No ICP queries were sent. This parent was chosen because it was
marked ``default'' in the config file.
<tag/SINGLE_PARENT/
The object was requested from the only parent appropriate for the
given URL.
<tag/FIRST_UP_PARENT/
The object was fetched from the first parent in the list of
parents.
<tag/NO_PARENT_DIRECT/
The object was fetched from the origin server, because no parents
existed for the given URL.
<tag/FIRST_PARENT_MISS/
The object was fetched from the parent with the fastest (possibly
weighted) round trip time.
<tag/CLOSEST_PARENT_MISS/
This parent was chosen, because it included the the lowest RTT
measurement to the origin server. See also the <em/closests-only/
peer configuration option.
<tag/CLOSEST_PARENT/
The parent selection was based on our own RTT measurements.
<tag/CLOSEST_DIRECT/
Our own RTT measurements returned a shorter time than any parent.
<tag/NO_DIRECT_FAIL/
The object could not be requested because of a firewall
configuration, see also <em/never_direct/ and related material,
and no parents were available.
<tag/SOURCE_FASTEST/
The origin site was chosen, because the source ping arrived fastest.
<tag/ROUNDROBIN_PARENT/
No ICP replies were received from any parent. The parent was
chosen, because it was marked for round robin in the config file
and had the lowest usage count.
<tag/CACHE_DIGEST_HIT/
The peer was chosen, because the cache digest predicted a
hit. This option was later replaced in order to distinguish
between parents and siblings.
<tag/CD_PARENT_HIT/
The parent was chosen, because the cache digest predicted a
hit.
<tag/CD_SIBLING_HIT/
The sibling was chosen, because the cache digest predicted a
hit.
<tag/NO_CACHE_DIGEST_DIRECT/
This output seems to be unused?
<tag/CARP/
The peer was selected by CARP.
<tag/ANY_PARENT/
part of <em>src/peer_select.c:hier_strings[]</em>.
<tag/INVALID CODE/
part of <em>src/peer_select.c:hier_strings[]</em>.
</descrip>
<P>
Almost any of these may be preceded by 'TIMEOUT_' if the two-second
(default) timeout occurs waiting for all ICP replies to arrive from
neighbors, see also the <em/icp_query_timeout/ configuration option.
<P>
The following hierarchy codes were removed from Squid-2:
<verb>
code meaning
-------------------- -------------------------------------------------
PARENT_UDP_HIT_OBJ hit objects are not longer available.
SIBLING_UDP_HIT_OBJ hit objects are not longer available.
SSL_PARENT_MISS SSL can now be handled by squid.
FIREWALL_IP_DIRECT No special logging for hosts inside the firewall.
LOCAL_IP_DIRECT No special logging for local networks.
</verb>
<sect1><em>cache/log</em> (Squid-1.x)
<label id="swaplog">
<P>
This file has a rather unfortunate name. It also is often called the
<em/swap log/. It is a record of every cache object written to disk.
It is read when Squid starts up to ``reload'' the cache. If you remove
this file when squid is NOT running, you will effectively wipe out your
cache contents. If you remove this file while squid IS running,
you can easily recreate it. The safest way is to simply shutdown
the running process:
<verb>
% squid -k shutdown
</verb>
This will disrupt service, but at least you will have your swap log
back.
Alternatively, you can tell squid to rotate its log files. This also
causes a clean swap log to be written.
<verb>
% squid -k rotate
</verb>
<P>
For Squid-1.1, there are six fields:
<enum>
<item>
<bf/fileno/:
The swap file number holding the object data. This is mapped to a pathname on your filesystem.
<item>
<bf/timestamp/:
This is the time when the object was last verified to be current. The time is a
hexadecimal representation of Unix time.
<item>
<bf/expires/:
This is the value of the Expires header in the HTTP reply. If an Expires header
was not present, this will be -2 or fffffffe. If the Expires header was
present, but invalid (unparsable), this will be -1 or ffffffff.
<item>
<bf/lastmod/:
Value of the HTTP reply Last-Modified header. If missing it will be -2,
if invalid it will be -1.
<item>
<bf/size/:
Size of the object, including headers.
<item>
<bf/url/:
The URL naming this object.
</enum>
<sect1><em>swap.state</em> (Squid-2.x)
<P>
In Squid-2, the swap log file is now called <em/swap.state/. This is
a binary file that includes MD5 checksums, and <em/StoreEntry/ fields.
Please see the <url url="../Prog-Guide/" name="Programmers Guide"> for
information on the contents and format of that file.
<p>
If you remove <em/swap.state/ while Squid is running, simply send
Squid the signal to rotate its log files:
<verb>
% squid -k rotate
</verb>
Alternatively, you can tell Squid to shutdown and it will
rewrite this file before it exits.
<p>
If you remove the <em/swap.state/ while Squid is not running, you will
not lose your entire cache. In this case, Squid will scan all of
the cache directories and read each swap file to rebuild the cache.
This can take a very long time, so you'll have to be patient.
<p>
By default the <em/swap.state/ file is stored in the top-level
of each <em/cache_dir/. You can move the logs to a different
location with the <em/cache_swap_log/ option.
<sect1>Which log files can I delete safely?
<p>
You should never delete <em/access.log/, <em/store.log/,
<em/cache.log/, or <em/swap.state/ while Squid is running.
With Unix, you can delete a file when a process
has the file opened. However, the filesystem space is
not reclaimed until the process closes the file.
<p>
If you accidentally delete <em/swap.state/ while Squid is running,
you can recover it by following the instructions in the previous
questions. If you delete the others while Squid is running,
you can not recover them.
<p>
The correct way to maintain your log files is with Squid's ``rotate''
feature. You should rotate your log files at least once per day.
The current log files are closed and then renamed with numeric extensions
(.0, .1, etc). If you want to, you can write your own scripts
to archive or remove the old log files. If not, Squid will
only keep up to <em/logfile_rotate/ versions of each log file.
The logfile rotation procedure also writes a clean <em/swap.state/
file, but it does not leave numbered versions of the old files.
<P>
To rotate Squid's logs, simple use this command:
<verb>
squid -k rotate
</verb>
For example, use this cron entry to rotate the logs at midnight:
<verb>
0 0 * * * /usr/local/squid/bin/squid -k rotate
</verb>
<sect1>How can I disable Squid's log files?
<P>
To disable <em/access.log/:
<verb>
cache_access_log /dev/null
</verb>
<P>
To disable <em/store.log/:
<verb>
cache_store_log none
</verb>
<P>
It is a bad idea to disable the <em/cache.log/ because this file
contains many important status and debugging messages. However,
if you really want to, you can:
To disable <em/access.log/:
<verb>
cache_log /dev/null
</verb>
<sect1>My log files get very big!
<label id="log-large">
<P>
You need to <em/rotate/ your log files with a cron job. For example:
<verb>
0 0 * * * /usr/local/squid/bin/squid -k rotate
</verb>
<sect1>Managing log files
<P>
The preferred log file for analysis is the <em/access.log/ file in native
format. For long term evaluations, the log file should be obtained at
regular intervals. Squid offers an easy to use API for rotating log files,
in order that they may be moved (or removed) without disturbing the cache
operations in progress. The procedures were described above.
<P>
Depending on the disk space allocated for log file storage, it is
recommended to set up a cron job which rotates the log files every 24, 12,
or 8 hour. You will need to set your <em/logfile_rotate/ to a sufficiently
large number. During a time of some idleness, you can safely transfer the
log files to your analysis host in one burst.
<P>
Before transport, the log files can be compressed during off-peak time. On
the analysis host, the log file are concatinated into one file, so one file
for 24 hours is the yield. Also note that with <em/log_icp_queries/
enabled, you might have around 1 GB of uncompressed log information per day
and busy cache. Look into you cache manager info page to make an educated
guess on the size of your log files.
<P>
The EU project <url url="http://www.desire.org/" name="DESIRE">
developed some
<url url="http://www.uninett.no/prosjekt/desire/arneberg/statistics.html" name="some basic rules">
to obey when handling and processing log files:
<itemize>
<item>Respect the privacy of your clients when publishing results.
<item>Keep logs unavailable unless anonymized. Most countries have laws on
privacy protection, and some even on how long you are legally allowed to
keep certain kinds of information.
<item>Rotate and process log files at least once a day. Even if you don't
process the log files, they will grow quite large, see section
<ref id="log-large">. If you rely on processing the log files, reserve
a large enough partition solely for log files.
<item>Keep the size in mind when processing. It might take longer to
process log files than to generate them!
<item>Limit yourself to the numbers you are interested in. There is data
beyond your dreams available in your log file, some quite obvious, others
by combination of different views. Here are some examples for figures to
watch:
<itemize>
<item>The hosts using your cache.
<item>The elapsed time for HTTP requests - this is the latency the user
sees. Usually, you will want to make a distinction for HITs and MISSes
and overall times. Also, medians are preferred over averages.
<item>The requests handled per interval (e.g. second, minute or hour).
</itemize>
</itemize>
<sect1>Why do I get ERR_NO_CLIENTS_BIG_OBJ messages so often?
<P>
This message means that the requested object was in ``Delete Behind''
mode and the user aborted the transfer. An object will go into
``Delete Behind'' mode if
<itemize>
<item>It is larger than <em/maximum_object_size/
<item>It is being fetched from a neighbor which has the <em/proxy-only/ option set.
</itemize>
<sect1>What does ERR_LIFETIME_EXP mean?
<P>
This means that a timeout occurred while the object was being transferred. Most
likely the retrieval of this object was very slow (or it stalled before finishing)
and the user aborted the request. However, depending on your settings for
<em/quick_abort/, Squid may have continued to try retrieving the object.
Squid imposes a maximum amount of time on all open sockets, so after some amount
of time the stalled request was aborted and logged win an ERR_LIFETIME_EXP
message.
<sect1>Retrieving ``lost'' files from the cache
<P>
<quote><it>
I've been asked to retrieve an object which was accidentally
destroyed at the source for recovery.
So, how do I figure out where the things are so I can copy
them out and strip off the headers?
</it></quote>
<P>
The following method applies only to the Squid-1.1 versions:
<P>
Use <em>grep</em> to find the named object (Url) in the
<ref id="swaplog" name="cache/log"> file. The first field in
this file is an integer <em/file number/.
<P>
Then, find the file <em/fileno-to-pathname.pl/ from the ``scripts''
directory of the Squid source distribution. The usage is
<verb>
perl fileno-to-pathname.pl [-c squid.conf]
</verb>
file numbers are read on stdin, and pathnames are printed on
stdout.
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Operational issues
<sect1>How do I see system level Squid statistics?
<P>
The Squid distribution includes a CGI utility called <em/cachemgr.cgi/
which can be used to view squid statistics with a web browser.
This document has a section devoted to <em/cachemgr.cgi/ usage
which you should consult for more information.
<sect1>How can I find the biggest objects in my cache?
<P>
<verb>
sort -r -n +4 -5 access.log | awk '{print $5, $7}' | head -25
</verb>
<sect1>I want to restart Squid with a clean cache
<p>
<em>Note: The information here is current for version 2.2.</em>
<P>
First of all, you must stop Squid of course. You can use
the command:
<verb>
% squid -k shutdown
</verb>
<p>
The fastest way to restart with an entirely clean cache is
to over write the <em/swap.state/ files for each <em/cache_dir/
in your config file. Note, you can not just remove the
<em/swap.state/ file, or truncate it to zero size. Instead,
you should put just one byte of garbage there. For example:
<verb>
% echo "" > /cache1/swap.state
</verb>
Repeat that for every <em/cache_dir/, then restart Squid.
Be sure to leave the <em/swap.state/ file with the same
owner and permissions that it had before!
<p>
Another way, which takes longer, is to have squid recreate all the
<em/cache_dir/ directories. But first you must move the existing
directories out of the way. For example, you can try this:
<verb>
% cd /cache1
% mkdir JUNK
% mv ?? swap.state* JUNK
% rm -rf JUNK &
</verb>
Repeat this for your other <em/cache_dir/'s, then tell Squid
to create new directories:
<verb>
% squid -z
</verb>
<sect1>How can I proxy/cache Real Audio?
<P>
by <url url="mailto:roever@nse.simac.nl" name="Rodney van den Oever">,
and <url url="mailto:jrg@blodwen.demon.co.uk" name="James R Grinter">
<P>
<itemize>
<item>
Point the RealPlayer at your Squid server's HTTP port (e.g. 3128).
<item>
Using the Preferences->Transport tab, select <em/Use specified transports/
and with the <em/Specified Transports/ button, select use <em/HTTP Only/.
</itemize>
The RealPlayer (and RealPlayer Plus) manual states:
<verb>
Use HTTP Only
Select this option if you are behind a firewall and cannot
receive data through TCP. All data will be streamed through
HTTP.
Note: You may not be able to receive some content if you select
this option.
</verb>
<P>
Again, from the documentation:
<verb>
RealPlayer 4.0 identifies itself to the firewall when making a
request for content to a RealServer. The following string is
attached to any URL that the Player requests using HTTP GET:
/SmpDsBhgRl
Thus, to identify an HTTP GET request from the RealPlayer, look
for:
http://[^/]+/SmpDsBhgRl
The Player can also be identified by the mime type in a POST to
the RealServer. The RealPlayer POST has the following mime
type:
"application/x-pncmd"
</verb>
Note that the first request is a POST, and the second has a '?' in the URL, so
standard Squid configurations would treat it as non-cachable. It also looks
rather ``magic.''
<P>
HTTP is an alternative delivery mechanism introduced with version 3 players,
and it allows a reasonable approximation to ``streaming'' data - that is playing
it as you receive it. For more details, see their notes on
<url url="http://www.real.com/products/encoder/realvideo/httpstream.html"
name="HTTP Pseudo-Streaming">.
<P>
It isn't available in the general case: only if someone has made the realaudio
file available via an HTTP server, or they're using a version 4 server, they've
switched it on, and you're using a version 4 client. If someone has made the
file available via their HTTP server, then it'll be cachable. Otherwise, it
won't be (as far as we can tell.)
<P>
The more common RealAudio link connects via their own <em/pnm:/ method and is
transferred using their proprietary protocol (via TCP or UDP) and not using
HTTP. It can't be cached nor proxied by Squid, and requires something such as
the simple proxy that Progressive Networks themselves have made available, if
you're in a firewall/no direct route situation. Their product does not cache
(and I don't know of any software available that does.)
<P>
Some confusion arises because there is also a configuration option to use an
HTTP proxy (such as Squid) with the Realaudio/RealVideo players. This is
because the players can fetch the ``<tt/.ram/'' file that contains the <em/pnm:/
reference for the audio/video stream. They fetch that .ram file from an HTTP
server, using HTTP.
<sect1>How can I purge an object from my cache?
<label id="purging-objects">
<P>
Squid does not allow
you to purge objects unless it is configured with access controls
in <em/squid.conf/. First you must add something like
<verb>
acl PURGE method purge
acl localhost src 127.0.0.1
http_access allow purge localhost
http_access deny purge
</verb>
The above only allows purge requests which come from the local host and
denies all other purge requests.
<P>
To purge an object, you can use the <em/client/ program:
<verb>
client -m PURGE http://www.miscreant.com/
</verb>
If the purge was successful, you will see a ``200 OK'' response:
<verb>
HTTP/1.0 200 OK
Date: Thu, 17 Jul 1997 16:03:32 GMT
Server: Squid/1.1.14
</verb>
If the object was not found in the cache, you will see a ``404 Not Found''
response:
<verb>
HTTP/1.0 404 Not Found
Date: Thu, 17 Jul 1997 16:03:22 GMT
Server: Squid/1.1.14
</verb>
<sect1>Using ICMP to Measure the Network
<label id="using-icmp">
<P>
As of version 1.1.9, Squid is able to utilize ICMP Round-Trip-Time (RTT)
measurements to select the optimal location to forward a cache miss.
Previously, cache misses would be forwarded to the parent cache
which returned the first ICP reply message. These were logged
with FIRST_PARENT_MISS in the access.log file. Now we can
select the parent which is closest (RTT-wise) to the origin
server.
<sect2>Supporting ICMP in your Squid cache
<P>
It is more important that your parent caches enable the ICMP
features. If you are acting as a parent, then you may want
to enable ICMP on your cache. Also, if your cache makes
RTT measurements, it will fetch objects directly if your
cache is closer than any of the parents.
<P>
If you want your Squid cache to measure RTT's to origin servers,
Squid must be compiled with the USE_ICMP option. This is easily
accomplished by uncommenting "-DUSE_ICMP=1" in <em>src/Makefile</em> and/or
<em>src/Makefile.in</em>.
<P>
An external program called <em/pinger/ is responsible for sending and
receiving ICMP packets. It must run with root privileges. After
Squid has been compiled, the pinger program must be installed
separately. A special Makefile target will install <em/pinger/ with
appropriate permissions.
<verb>
% make install
% su
# make install-pinger
</verb>
There are three configuration file options for tuning the
measurement database on your cache. <em/netdb_low/ and <em/netdb_high/
specify high and low water marks for keeping the database to a
certain size (e.g. just like with the IP cache). The <em/netdb_ttl/
option specifies the minimum rate for pinging a site. If
<em/netdb_ttl/ is set to 300 seconds (5 minutes) then an ICMP packet
will not be sent to the same site more than once every five
minutes. Note that a site is only pinged when an HTTP request for
the site is received.
<P>
Another option, <em/minimum_direct_hops/ can be used to try finding
servers which are close to your cache. If the measured hop count
to the origin server is less than or equal to <em/minimum_direct_hops/,
the request will be forwarded directly to the origin server.
<sect2>Utilizing your parents database
<P>
Your parent caches can be asked to include the RTT measurements
in their ICP replies. To do this, you must enable <em/query_icmp/
in your config file:
<verb>
query_icmp on
</verb>
This causes a flag to be set in your outgoing ICP queries.
<P>
If your parent caches return ICMP RTT measurements then
the eighth column of your access.log will have lines
similar to:
<verb>
CLOSEST_PARENT_MISS/it.cache.nlanr.net
</verb>
In this case, it means that <em/it.cache.nlanr.net/ returned
the lowest RTT to the origin server. If your cache measured
a lower RTT than any of the parents, the request will
be logged with
<verb>
CLOSEST_DIRECT/www.sample.com
</verb>
<sect2>Inspecting the database
<P>
The measurement database can be viewed from the cachemgr by
selecting "Network Probe Database." Hostnames are aggregated
into /24 networks. All measurements made are averaged over
time. Measurements are made to specific hosts, taken from
the URLs of HTTP requests. The recv and sent fields are the
number of ICMP packets sent and received. At this time they
are only informational.
<P>
A typical database entry looks something like this:
<verb>
Network recv/sent RTT Hops Hostnames
192.41.10.0 20/ 21 82.3 6.0 www.jisedu.org www.dozo.com
bo.cache.nlanr.net 42.0 7.0
uc.cache.nlanr.net 48.0 10.0
pb.cache.nlanr.net 55.0 10.0
it.cache.nlanr.net 185.0 13.0
</verb>
This means we have sent 21 pings to both www.jisedu.org and
www.dozo.com. The average RTT is 82.3 milliseconds. The
next four lines show the measured values from our parent
caches. Since <em/bo.cache.nlanr.net/ has the lowest RTT,
it would be selected as the location to forward a request
for a www.jisedu.org or www.dozo.com URL.
<sect1>Why are so few requests logged as TCP_IMS_MISS?
<P>
When Squid receives an <em/If-Modified-Since/ request, it will
not forward the request unless the object needs to be refreshed
according to the <em/refresh_pattern/ rules. If the request
does need to be refreshed, then it will be logged as TCP_REFRESH_HIT
or TCP_REFRESH_MISS.
<P>
If the request is not forwarded, Squid replies to the IMS request
according to the object in its cache. If the modification times are the
same, then Squid returns TCP_IMS_HIT. If the modification times are
different, then Squid returns TCP_IMS_MISS. In most cases, the cached
object will not have changed, so the result is TCP_IMS_HIT. Squid will
only return TCP_IMS_MISS if some other client causes a newer version of
the object to be pulled into the cache.
<sect1>How can I make Squid NOT cache some servers or URLs?
<P>
In Squid-2, you use the <em/no_cache/ option to specify uncachable
requests. For example, this makes all responses from origin servers
in the 10.0.1.0/24 network uncachable:
<verb>
acl Local dst 10.0.1.0/24
no_cache deny Local
</verb>
<p>
This example makes all URL's with '.html' uncachable:
<verb>
acl HTML url_regex .html$
no_cache deny HTML
</verb>
<p>
This example makes a specific URL uncachable:
<verb>
acl XYZZY url_regex ^http://www.i.suck.com/foo.html$
no_cache deny XYZZY
</verb>
<p>
This example caches nothing between the hours of 8AM to 11AM:
<verb>
acl Morning time 08:00-11:00
no_cache deny Morning
</verb>
<P>
In Squid-1.1,
whether or not an object gets cached is controlled by the
<em/cache_stoplist/, and <em/cache_stoplist_pattern/ options. So, you may add:
<verb>
cache_stoplist my.domain.com
</verb>
Specifying uncachable objects by IP address is harder. The <url url="../1.1/patches.html"
name="1.1 patch page"> includes a patch called <em/no-cache-local.patch/ which
changes the behaviour of the <em/local_ip/ and <em/local_domain/ so
that matching requests are NOT CACHED, in addition to being fetched directly.
<sect1>How can I delete and recreate a cache directory?
<P>
Deleting an existing cache directory is not too difficult. Unfortunately,
you can't simply change squid.conf and then reconfigure. You can't
stop using a <em>cache_dir</em> while Squid is running. Also note
that Squid requires at least one <em>cache_dir</em> to run.
<enum>
<item>
Edit your <em/squid.conf/ file and comment out, or delete
the <em/cache_dir/ line for the cache directory that you want to
remove.
<item>
If you don't have any <em>cache_dir</em> lines in your squid.conf,
then Squid was using the default. You'll need to add a new
<em>cache_dir</em> line because Squid will continue to use
the default otherwise. You can add a small, temporary directory, fo
example:
<verb>
/usr/local/squid/cachetmp ....
</verb>
If you add a new <em>cache_dir</em> you have to run <em>squid -z</em>
to initialize that directory.
<item>
Remeber that
you can not delete a cache directory from a running Squid process;
you can not simply reconfigure squid. You must
shutdown Squid:
<verb>
squid -k shutdown
</verb>
<item>
Once Squid exits, you may immediately start it up again. Since you
deleted the old <em>cache_dir</em> from squid.conf, Squid won't
try to access that directory.
If you
use the RunCache script, Squid should start up again automatically.
<item>
Now Squid is no longer using the cache directory that you removed
from the config file. You can verify this by checking "Store Directory"
information with the cache manager. From the command line, type:
<verb>
client mgr:storedir
</verb>
<item>
Now that Squid is not using the cache directory, you can <em/rm -rf/ it,
format the disk, build a new filesystem, or whatever.
</enum>
<P>
The procedure is similar to recreate the directory.
<enum>
<item>
Edit <em/squid.conf/ and add a new <em/cache_dir/ line.
<item>
Initialize the new directory by running
<verb>
% squid -z
</verb>
NOTE: it is safe to run this even if Squid is already running. <em/squid -z/
will harmlessly try to create all of the subdirectories that already exist.
<item>
Reconfigure Squid
<verb>
squid -k reconfigure
</verb>
Unlike deleting, you can add new cache directories while Squid is
already running.
</enum>
<sect1>Why can't I run Squid as root?
<P>
by Dave J Woolley
<P>
If someone were to discover a buffer overrun bug in Squid and it runs as
a user other than root, they can only corrupt the files writeable to
that user, but if it runs a root, they can take over the whole machine.
This applies to all programs that don't absolutely need root status, not
just squid.
<sect1>Can you tell me a good way to upgrade Squid with minimal downtime?
<P>
Here is a technique that was described by <url url="mailto:radu@netsoft.ro"
name="Radu Greab">.
<P>
Start a second Squid server on an unused HTTP port (say 4128). This
instance of Squid probably doesn't need a large disk cache. When this
second server has finished reloading the disk store, swap the
<em/http_port/ values in the two <em/squid.conf/ files. Set the
original Squid to use port 5128, and the second one to use 3128. Next,
run ``squid -k reconfigure'' for both Squids. New requests will go to
the second Squid, now on port 3128 and the first Squid will finish
handling its current requests. After a few minutes, it should be safe
to fully shut down the first Squid and upgrade it. Later you can simply
repeat this process in reverse.
<sect1>Can Squid listen on more than one HTTP port?
<p>
<em>Note: The information here is current for version 2.3.</em>
<p>
Yes, you can specify multiple <em/http_port/ lines in your <em/squid.conf/
file. Squid attempts to bind() to each port that you specify. Sometimes
Squid may not be able to bind to a port, either because of permissions
or because the port is already in use. If Squid can bind to at least
one port, then it will continue running. If it can not bind to
any of the ports, then Squid stops.
<p>
With version 2.3 and later you can specify IP addresses
and port numbers together (see the squid.conf comments).
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Memory
<sect1>Why does Squid use so much memory!?
<P>
Squid uses a lot of memory for performance reasons. It takes much, much
longer to read something from disk than it does to read directly from
memory.
<P>
A small amount of metadata for each cached object is kept in memory.
This is the <em/StoreEntry/ data structure. For <em/Squid-2/ this is
56-bytes on "small" pointer architectures (Intel, Sparc, MIPS, etc) and
88-bytes on "large" pointer architectures (Alpha). In addition, There
is a 16-byte cache key (MD5 checksum) associated with each
<em/StoreEntry/. This means there are 72 or 104 bytes of metadata in
memory for every object in your cache. A cache with 1,000,000
objects therefore requires 72&nbsp;MB of memory for <em/metadata only/.
In practice it requires much more than that.
<P>
Squid-1.1 also uses a lot of memory to store in-transit objects.
This version stores incoming objects only in memory, until the transfer
is complete. At that point it decides whether or not to store the object
on disk. This means that when users download large files, your memory
usage will increase significantly. The squid.conf parameter <em/maximum_object_size/
determines how much memory an in-transit object can consume before we
mark it as uncachable. When an object is marked uncachable, there is no
need to keep all of the object in memory, so the memory is freed for
the part of the object which has already been written to the client.
In other words, lowering <em/maximum_object_size/ also lowers Squid-1.1
memory usage.
<P>
Other uses of memory by Squid include:
<itemize>
<item>
Disk buffers for reading and writing
<item>
Network I/O buffers
<item>
IP Cache contents
<item>
FQDN Cache contents
<item>
Netdb ICMP measurement database
<item>
Per-request state information, including full request and
reply headers
<item>
Miscellaneous statistics collection.
<item>
``Hot objects'' which are kept entirely in memory.
</itemize>
<sect1>How can I tell how much memory my Squid process is using?
<P>
One way is to simply look at <em/ps/ output on your system.
For BSD-ish systems, you probably want to use the <em/-u/ option
and look at the <em/VSZ/ and <em/RSS/ fields:
<verb>
wessels &tilde; 236% ps -axuhm
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
squid 9631 4.6 26.4 141204 137852 ?? S 10:13PM 78:22.80 squid -NCYs
</verb>
For SYSV-ish, you probably want to use the <em/-l/ option.
When interpreting the <em/ps/ output, be sure to check your <em/ps/
manual page. It may not be obvious if the reported numbers are kbytes,
or pages (usually 4 kb).
<P>
A nicer way to check the memory usage is with a program called
<em/top/:
<verb>
last pid: 20128; load averages: 0.06, 0.12, 0.11 14:10:58
46 processes: 1 running, 45 sleeping
CPU states: % user, % nice, % system, % interrupt, % idle
Mem: 187M Active, 1884K Inact, 45M Wired, 268M Cache, 8351K Buf, 1296K Free
Swap: 1024M Total, 256K Used, 1024M Free
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
9631 squid 2 0 138M 135M select 78:45 3.93% 3.93% squid
</verb>
<P>
Finally, you can ask the Squid process to report its own memory
usage. This is available on the Cache Manager <em/info/ page.
Your output may vary depending upon your operating system and
Squid version, but it looks similar to this:
<verb>
Resource usage for squid:
Maximum Resident Size: 137892 KB
Memory usage for squid via mstats():
Total space in arena: 140144 KB
Total free: 8153 KB 6%
</verb>
<P>
If your RSS (Resident Set Size) value is much lower than your
process size, then your cache performance is most likely suffering
due to <ref id="paging" name="paging">.
<sect1>My Squid process grows without bounds.
<P>
You might just have your <em/cache_mem/ parameter set too high.
See the ``<ref id="lower-mem-usage" name="What can I do to reduce Squid's memory usage?">''
entry below.
<P>
When a process continually grows in size, without levelling off
or slowing down, it often indicates a memory leak. A memory leak
is when some chunk of memory is used, but not free'd when it is
done being used.
<P>
Memory leaks are a real problem for programs (like Squid) which do all
of their processing within a single process. Historically, Squid has
had real memory leak problems. But as the software has matured, we
believe almost all of Squid's memory leaks have been eliminated, and
new ones are least easy to identify.
<P>
Memory leaks may also be present in your system's libraries, such
as <em/libc.a/ or even <em/libmalloc.a/. If you experience the ever-growing
process size phenomenon, we suggest you first try an
<ref id="alternate-malloc" name="alternative malloc library">.
<sect1>I set <em/cache_mem/ to XX, but the process grows beyond that!
<P>
The <em/cache_mem/ parameter <bf/does NOT/ specify the maximum
size of the process. It only specifies how much memory to use
for caching ``hot'' (very popular) replies. Squid's actual memory
usage is depends very strongly on your cache size (disk space) and
your incoming request load. Reducing <em/cache_mem/ will usually
also reduce the process size, but not necessarily, and there are
other ways to reduce Squid's memory usage (see below).
<sect1>How do I analyze memory usage from the cache manger output?
<label id="analyze-memory-usage">
<P>
<P>
<it>
Note: This information is specific to Squid-1.1 versions
</it>
<P>
Look at your <em/cachemgr.cgi/ <tt/Cache
Information/ page. For example:
<verb>
Memory usage for squid via mallinfo():
Total space in arena: 94687 KB
Ordinary blocks: 32019 KB 210034 blks
Small blocks: 44364 KB 569500 blks
Holding blocks: 0 KB 5695 blks
Free Small blocks: 6650 KB
Free Ordinary blocks: 11652 KB
Total in use: 76384 KB 81%
Total free: 18302 KB 19%
Meta Data:
StoreEntry 246043 x 64 bytes = 15377 KB
IPCacheEntry 971 x 88 bytes = 83 KB
Hash link 2 x 24 bytes = 0 KB
URL strings = 11422 KB
Pool MemObject structures 514 x 144 bytes = 72 KB ( 70 free)
Pool for Request structur 516 x 4380 bytes = 2207 KB ( 2121 free)
Pool for in-memory object 6200 x 4096 bytes = 24800 KB ( 22888 free)
Pool for disk I/O 242 x 8192 bytes = 1936 KB ( 1888 free)
Miscellaneous = 2600 KB
total Accounted = 58499 KB
</verb>
<P>
First note that <tt/mallinfo()/ reports 94M in ``arena.'' This
is pretty close to what <em/top/ says (97M).
<P>
Of that 94M, 81% (76M) is actually being used at the moment. The
rest has been freed, or pre-allocated by <tt/malloc(3)/
and not yet used.
<P>
Of the 76M in use, we can account for 58.5M (76%). There are some
calls to <tt/malloc(3)/ for which we can't account.
<P>
The <tt/Meta Data/ list gives the breakdown of where the
accounted memory has gone. 45% has gone to <tt/StoreEntry/
and URL strings. Another 42% has gone to buffering hold objects
in VM while they are fetched and relayed to the clients (<tt/Pool
for in-memory object/).
<P>
The pool sizes are specified by <em/squid.conf/ parameters.
In version 1.0, these pools are somewhat broken: we keep a stack
of unused pages instead of freeing the block. In the <tt/Pool
for in-memory object/, the unused stack size is 1/2 of
<tt/cache_mem/. The <tt>Pool for disk I/O</tt> is
hardcoded at 200. For <tt/MemObject/ and <tt/Request/
it's 1/8 of your system's <tt/FD_SETSIZE/ value.
<P>
If you need to lower your process size, we recommend lowering the
max object sizes in the 'http', 'ftp' and 'gopher' config lines.
You may also want to lower <tt/cache_mem/ to suit your
needs. But if you <tt/make cache_mem/ too low, then some
objects may not get saved to disk during high-load periods. Newer
Squid versions allow you to set <tt/memory_pools off/ to
disable the free memory pools.
<sect1>The ``Total memory accounted'' value is less than the size of my Squid process.
<P>
We are not able to account for <em/all/ memory that Squid uses. This
would require excessive amounts of code to keep track of every last byte.
We do our best to account for the major uses of memory.
<P>
Also, note that the <em/malloc/ and <em/free/ functions have
their own overhead. Some additional memory is required to keep
track of which chunks are in use, and which are free. Additionally,
most operating systems do not allow processes to shrink in size.
When a process gives up memory by calling <em/free/, the total
process size does not shrink. So the process size really
represents the maximum size your Squid process has reached.
<sect1>xmalloc: Unable to allocate 4096 bytes!
<label id="malloc-death">
<P>
by <url url="mailto:hno@hem.passagen.se" name="Henrik Nordstrom">
<P>
Messages like "FATAL: xcalloc: Unable to allocate 4096 blocks of 1 bytes!"
appear when Squid can't allocate more memory, and on most operating systems
(inclusive BSD) there are only two possible reasons:
<enum>
<item>The machine is out of swap
<item>The process' maximum data segment size has been reached
</enum>
The first case is detected using the normal swap monitoring tools
available on the platform (<em/pstat/ on SunOS, perhaps <em/pstat/ is
used on BSD as well).
<P>
To tell if it is the second case, first rule out the first case and then
monitor the size of the Squid process. If it dies at a certain size with
plenty of swap left then the max data segment size is reached without no
doubts.
<P>
The data segment size can be limited by two factors:
<enum>
<item>Kernel imposed maximum, which no user can go above
<item>The size set with ulimit, which the user can control.
</enum>
<P>
When squid starts it sets data and file ulimit's to the hard level. If
you manually tune ulimit before starting Squid make sure that you set
the hard limit and not only the soft limit (the default operation of
ulimit is to only change the soft limit). root is allowed to raise the
soft limit above the hard limit.
<P>
This command prints the hard limits:
<verb>
ulimit -aH
</verb>
<P>
This command sets the data size to unlimited:
<verb>
ulimit -HSd unlimited
</verb>
<sect2>BSD/OS
<P>
by <url url="mailto:Arjan.deVet@adv.IAEhv.nl" name="Arjan de Vet">
<P>
The default kernel limit on BSD/OS for datasize is 64MB (at least on 3.0
which I'm using).
<P>
Recompile a kernel with larger datasize settings:
<verb>
maxusers 128
# Support for large inpcb hash tables, e.g. busy WEB servers.
options INET_SERVER
# support for large routing tables, e.g. gated with full Internet routing:
options "KMEMSIZE=\(16*1024*1024\)"
options "DFLDSIZ=\(128*1024*1024\)"
options "DFLSSIZ=\(8*1024*1024\)"
options "SOMAXCONN=128"
options "MAXDSIZ=\(256*1024*1024\)"
</verb>
See <em>/usr/share/doc/bsdi/config.n</em> for more info.
<P>
In /etc/login.conf I have this:
<verb>
default:\
:path=/bin /usr/bin /usr/contrib/bin:\
:datasize-cur=256M:\
:openfiles-cur=1024:\
:openfiles-max=1024:\
:maxproc-cur=1024:\
:stacksize-cur=64M:\
:radius-challenge-styles=activ,crypto,skey,snk,token:\
:tc=auth-bsdi-defaults:\
:tc=auth-ftp-bsdi-defaults:
#
# Settings used by /etc/rc and root
# This must be set properly for daemons started as root by inetd as well.
# Be sure reset these values back to system defaults in the default class!
#
daemon:\
:path=/bin /usr/bin /sbin /usr/sbin:\
:widepasswords:\
:tc=default:
# :datasize-cur=128M:\
# :openfiles-cur=256:\
# :maxproc-cur=256:\
</verb>
<P>
This should give enough space for a 256MB squid process.
<sect2>FreeBSD (2.2.X)
<P>
by Duane Wessels
<P>
The procedure is almost identical to that for BSD/OS above.
Increase the open filedescriptor limit in <em>/sys/conf/param.c</em>:
<verb>
int maxfiles = 4096;
int maxfilesperproc = 1024;
</verb>
Increase the maximum and default data segment size in your kernel
config file, e.g. <em>/sys/conf/i386/CONFIG</em>:
<verb>
options "MAXDSIZ=(512*1024*1024)"
options "DFLDSIZ=(128*1024*1024)"
</verb>
We also found it necessary to increase the number of mbuf clusters:
<verb>
options "NMBCLUSTERS=10240"
</verb>
And, if you have more than 256 MB of physical memory, you probably
have to disable BOUNCE_BUFFERS (whatever that is), so comment
out this line:
<verb>
#options BOUNCE_BUFFERS #include support for DMA bounce buffers
</verb>
Also, update limits in <em>/etc/login.conf</em>:
<verb>
# Settings used by /etc/rc
#
daemon:\
:coredumpsize=infinity:\
:datasize=infinity:\
:maxproc=256:\
:maxproc-cur@:\
:memoryuse-cur=64M:\
:memorylocked-cur=64M:\
:openfiles=4096:\
:openfiles-cur@:\
:stacksize=64M:\
:tc=default:
</verb>
And don't forget to run ``cap_mkdb /etc/login.conf'' after editing that file.
<sect2>OSF, Digital Unix
<P>
by <url url="mailto:ongbh@zpoprp.zpo.dec.com" name="Ong Beng Hui">
<P>
To increase the data size for Digital UNIX, edit the file <tt>/etc/sysconfigtab</tt>
and add the entry...
<verb>
proc:
per-proc-data-size=1073741824
</verb>
Or, with csh, use the limit command, such as
<verb>
&gt; limit datasize 1024M
</verb>
<P>
Editing <tt>/etc/sysconfigtab</tt> requires a reboot, but the limit command
doesn't.
<sect1>fork: (12) Cannot allocate memory
<P>
When Squid is reconfigured (SIGHUP) or the logs are rotated (SIGUSR1),
some of the helper processes (dnsserver) must be killed and
restarted. If your system does not have enough virtual memory,
the Squid process may not be able to fork to start the new helper
processes.
The best way to fix this is to increase your virtual memory by adding
swap space. Normally your system uses raw disk partitions for swap
space, but most operating systems also support swapping on regular
files (Digital Unix excepted). See your system manual pages for
<em/swap/, <em/swapon/, and <em/mkfile/.
<sect1>What can I do to reduce Squid's memory usage?
<label id="lower-mem-usage">
<P>
If your cache performance is suffering because of memory limitations,
you might consider buying more memory. But if that is not an option,
There are a number of things to try:
<itemize>
<item>
Try a <ref id="alternate-malloc" name="different malloc library">.
<item>
Reduce the <em/cache_mem/ parameter in the config file. This controls
how many ``hot'' objects are kept in memory. Reducing this parameter
will not significantly affect performance, but you may recieve
some warnings in <em/cache.log/ if your cache is busy.
<item>
Turn the <em/memory_pools off/ in the config file. This causes
Squid to give up unused memory by calling <em/free()/ instead of
holding on to the chunk for potential, future use.
<item>
Reduce the <em/cache_swap/ parameter in your config file. This will
reduce the number of objects Squid keeps. Your overall hit ratio may go down a
little, but your cache will perform significantly better.
<item>
Reduce the <em/maximum_object_size/ parameter (Squid-1.1 only).
You won't be able to
cache the larger objects, and your byte volume hit ratio may go down,
but Squid will perform better overall.
<item>
If you are using Squid-1.1.x, try the ``NOVM'' version.
</itemize>
<sect1>Using an alternate <em/malloc/ library.
<label id="alternate-malloc">
<P>
Many users have found improved performance and memory utilization when
linking Squid with an external malloc library. We recommend either
GNU malloc, or dlmalloc.
<sect2>Using GNU malloc
<P>
To make Squid use GNU malloc follow these simple steps:
<enum>
<item>Download the GNU malloc source, available from one of
<url url="http://www.gnu.org/order/ftp.html"
name="The GNU FTP Mirror sites">.
<item>Compile GNU malloc
<verb>
% gzip -dc malloc.tar.gz | tar xf -
% cd malloc
% vi Makefile # edit as needed
% make
</verb>
<item>Copy libmalloc.a to your system's library directory and be sure to
name it <em/libgnumalloc.a/.
<verb>
% su
# cp malloc.a /usr/lib/libgnumalloc.a
</verb>
<item>(Optional) Copy the GNU malloc.h to your system's include directory and
be sure to name it <em/gnumalloc.h/. This step is not required, but if
you do this, then Squid will be able to use the <em/mstat()/ function to
report memory usage statistics on the cachemgr info page.
<verb>
# cp malloc.h /usr/include/gnumalloc.h
</verb>
<item>Reconfigure and recompile Squid
<verb>
% make realclean
% ./configure ...
% make
% make install
</verb>
Note, In later distributions, 'realclean' has been changed to 'distclean'.
As the configure script runs, watch its output. You should find that
it locates libgnumalloc.a and optionally gnumalloc.h.
</enum>
<sect2>dlmalloc
<P>
<url url="http://g.oswego.edu/dl/html/malloc.html" name="dlmalloc">
has been written by <url url="mailto:dl@cs.oswego.edu"
name="Doug Lea">. According to Doug:
<quote>
This is not the fastest, most space-conserving, most portable, or
most tunable malloc ever written. However it is among the fastest
while also being among the most space-conserving, portable and tunable.
</quote>
<P>
dlmalloc is included with the <em/Squid-2/ source distribution.
To use this library, you simply give an option to the <em/configure/
script:
<verb>
% ./configure --enable-dlmalloc ...
</verb>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>The Cache Manager
<label id="cachemgr-section">
<P>
by <url url="mailto:JLarmour@origin-at.co.uk" name="Jonathan Larmour">
<sect1>What is the cache manager?
<P>
The cache manager (<em/cachemgr.cgi/) is a CGI utility for
displaying statistics about the <em/squid/ process as it runs.
The cache manager is a convenient way to manage the cache and view
statistics without logging into the server.
<sect1>How do you set it up?
<P>
That depends on which web server you're using. Below you will
find instructions for configuring the CERN and Apache servers
to permit <em/cachemgr.cgi/ usage.
<P>
<em/EDITOR&quot;S NOTE: readers are encouraged to submit instructions
for configuration of cachemgr.cgi on other web server platforms, such
as Netscape./
<P>
After you edit the server configuration files, you will probably
need to either restart your web server or or send it a <tt/SIGHUP/ signal
to tell it to re-read its configuration files.
<P>
When you're done configuring your web server, you'll connect to
the cache manager with a web browser, using a URL such as:
<verb>
http://www.example.com/Squid/cgi-bin/cachemgr.cgi/
</verb>
<sect1>Cache manager configuration for CERN httpd 3.0
<P>
First, you should ensure that only specified workstations can access
the cache manager. That is done in your CERN <em/httpd.conf/, not in
<em/squid.conf/.
<verb>
Protection MGR-PROT {
Mask @(workstation.example.com)
}
</verb>
Wildcards are acceptable, IP addresses are acceptable, and others
can be added with a comma-separated list of IP addresses. There
are many more ways of protection. Your server documentation has
details.
<P>
You also need to add:
<verb>
Protect /Squid/* MGR-PROT
Exec /Squid/cgi-bin/*.cgi /usr/local/squid/bin/*.cgi
</verb>
This marks the script as executable to those in <tt/MGR-PROT/.
<sect1>Cache manager configuration for Apache
<P>
First, make sure the cgi-bin directory you're using is listed with a
<tt/ScriptAlias/ in your Apache <em/srm.conf/ file like this:
<verb>
ScriptAlias /Squid/cgi-bin/ /usr/local/squid/cgi-bin/
</verb>
It's probably a <bf/bad/ idea to <tt/ScriptAlias/
the entire <em//usr/local/squid/bin/ directory where all the
Squid executables live.
<P>
Next, you should ensure that only specified workstations can access
the cache manager. That is done in your Apache <em/access.conf/,
not in <em/squid.conf/. At the bottom of <em/access.conf/
file, insert:
<verb>
<Location /Squid/cgi-bin/cachemgr.cgi>
order deny,allow
deny from all
allow from workstation.example.com
&etago;Location>
</verb>
You can have more than one allow line, and you can allow
domains or networks.
<P>
Alternately, <em/cachemgr.cgi/ can be password-protected. You'd
add the following to <em/access.conf/:
<verb>
<Location /Squid/cgi-bin/cachemgr.cgi>
AuthUserFile /path/to/password/file
AuthGroupFile /dev/null
AuthName User/Password Required
AuthType Basic
require user cachemanager
&etago;Location>
</verb>
Consult the Apache documentation for information on using <em/htpasswd/
to set a password for this ``user.''
<sect1>Cache manager configuration for Roxen 2.0 and later
<p>
by Francesco ``kinkie'' Chemolli
<p>
Notice: this is <em/not/ how things would get best done
with Roxen, but this what you need to do go adhere to the
example.
Also, knowledge of basic Roxen configuration is required.
<p>
This is what's required to start up a fresh Virtual Server, only
serving the cache manager. If you already have some Virtual Server
you wish to use to host the Cache Manager, just add a new CGI
support module to it.
<p>
Create a new virtual server, and set it to host http://www.example.com/.
Add to it at least the following modules:
<itemize>
<item>Content Types
<item>CGI scripting support
</itemize>
<p>
In the <em/CGI scripting support/ module, section <em/Settings/,
change the following settings:
<itemize>
<item>CGI-bin path: set to /Squid/cgi-bin/
<item>Handle *.cgi: set to <em/no/
<item>Run user scripts as owner: set to <em/no/
<item>Search path: set to the directory containing the cachemgr.cgi file
</itemize>
<p>
In section <em/Security/, set <em/Patterns/ to:
<verb>
allow ip=1.2.3.4
</verb>
where 1.2.3.4 is the IP address for workstation.example.com
<p>
Save the configuration, and you're done.
<sect1>Cache manager ACLs in <em/squid.conf/
<P>
The default cache manager access configuration in <em/squid.conf/ is:
<verb>
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl all src 0.0.0.0/0.0.0.0
</verb>
With the following rules:
<verb>
http_access deny manager !localhost
http_access allow all
</verb>
<P>
The first ACL is the most important as the cache manager program
interrogates squid using a special <tt/cache_object/ protocol.
Try it yourself by doing:
<P>
<verb>
telnet mycache.example.com 3128
GET cache_object://mycache.example.com/info HTTP/1.0
</verb>
<P>
The default ACLs say that if the request is for a
<tt/cache_object/, and it isn't the local host, then deny
access; otherwise allow access.
<P>
In fact, only allowing localhost access means that on the
initial <em/cachemgr.cgi/ form you can only specify the cache
host as <tt/localhost/. We recommend the following:
<verb>
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl example src 123.123.123.123/255.255.255.255
acl all src 0.0.0.0/0.0.0.0
</verb>
Where <tt/123.123.123.123/ is the IP address of your web server.
Then modify the rules like this:
<verb>
http_access allow manager localhost
http_access allow manager example
http_access deny manager
http_access allow all
</verb>
If you're using <em/miss_access/, then don't forget to also add
a <em/miss_access/ rule for the cache manager:
<verb>
miss_access allow manager
</verb>
<P>
The default ACLs assume that your web server is on the same machine
as <em/squid/. Remember that the connection from the cache
manager program to squid originates at the web server, not the
browser. So if your web server lives somewhere else, you should
make sure that IP address of the web server that has <em/cachemgr.cgi/
installed on it is in the <tt/example/ ACL above.
<P>
Always be sure to send a <tt/SIGHUP/ signal to <em/squid/
any time you change the <em/squid.conf/ file.
<sect1>Why does it say I need a password and a URL?
<P>
If you ``drop'' the list box, and browse it, you will see that the
password is only required to shutdown the cache, and the URL is
required to refresh an object (i.e., retrieve it from its original
source again) Otherwise these fields can be left blank: a password
is not required to obtain access to the informational aspects of
<em/cachemgr.cgi/.
<sect1>I want to shutdown the cache remotely. What's the password?
<P>
See the <tt/cachemgr_passwd/ directive in <em/squid.conf/.
<sect1>How do I make the cache host default to <em/my/ cache?
<P>
When you run <em/configure/ use the <em/--enable-cachemgr-hostname/ option:
<verb>
% ./configure --enable-cachemgr-hostname=`hostname` ...
</verb>
<p>
Note, if you do this after you already installed Squid before, you need to
make sure <em/cachemgr.cgi/ gets recompiled. For example:
<verb>
% cd src
% rm cachemgr.o cachemgr.cgi
% make cachemgr.cgi
</verb>
<p>
Then copy <em/cachemgr.cgi/ to your HTTP server's <em/cgi-bin/ directory.
<sect1>What's the difference between Squid TCP connections and Squid UDP connections?
<P>
Browsers and caches use TCP connections to retrieve web objects
from web servers or caches. UDP connections are used when another
cache using you as a sibling or parent wants to find out if you
have an object in your cache that it's looking for. The UDP
connections are ICP queries.
<sect1>It says the storage expiration will happen in 1970!
<P>
Don't worry. The default (and sensible) behavior of <em/squid/
is to expire an object when it happens to overwrite it. It doesn't
explicitly garbage collect (unless you tell it to in other ways).
<sect1>What do the Meta Data entries mean?
<P>
<descrip>
<tag/StoreEntry/
Entry describing an object in the cache.
<tag/IPCacheEntry/
An entry in the DNS cache.
<tag/Hash link/
Link in the cache hash table structure.
<tag/URL strings/
The strings of the URLs themselves that map to
an object number in the cache, allowing access to the
StoreEntry.
</descrip>
<P>
Basically just like the <tt/log/ file in your cache directory:
<enum>
<item><tt/PoolMemObject structures/
<item>Info about objects currently in memory,
(eg, in the process of being transferred).
<item><tt/Pool for Request structures/
<item>Information about each request as it happens.
<item><tt/Pool for in-memory object/
<item>Space for object data as it is retrieved.
</enum>
<P>
If <em/squid/ is much smaller than this field, run for cover!
Something is very wrong, and you should probably restart <em/squid/.
<sect1>In the utilization section, what is <tt/Other/?
<P>
<tt/Other/ is a default category to track objects which
don't fall into one of the defined categories.
<sect1>In the utilization section, why is the <tt>Transfer KB/sec</tt>
column always zero?
<P>
This column contains gross estimations of data transfer rates
averaged over the entire time the cache has been running. These
numbers are unreliable and mostly useless.
<sect1>In the utilization section, what is the <tt/Object Count/?
<P>
The number of objects of that type in the cache right now.
<sect1>In the utilization section, what is the <tt>Max/Current/Min KB</tt>?
<P>
These refer to the size all the objects of this type have grown
to/currently are/shrunk to.
<sect1>What is the <tt>I/O</tt> section about?
<P>
These are histograms on the number of bytes read from the network
per <tt/read(2)/ call. Somewhat useful for determining
maximum buffer sizes.
<sect1>What is the <tt/Objects/ section for?
<P>
<bf><em/Warning:/</bf> this will download to your browser
a list of every URL in the cache and statistics about it. It can
be very, very large. <bf><em/Sometimes it will be larger than
the amount of available memory in your client!/</bf> You
probably don't need this information anyway.
<sect1>What is the <tt/VM Objects/ section for?
<P>
<tt/VM Objects/ are the objects which are in Virtual Memory.
These are objects which are currently being retrieved and
those which were kept in memory for fast access (accelerator
mode).
<sect1>What does <tt/AVG RTT/ mean?
<P>
Average Round Trip Time. This is how long on average after
an ICP ping is sent that a reply is received.
<sect1>In the IP cache section, what's the difference between a hit, a negative hit and a miss?
<P>
A HIT means that the document was found in the cache. A
MISS, that it wasn't found in the cache. A negative hit
means that it was found in the cache, but it doesn't exist.
<sect1>What do the IP cache contents mean anyway?
<P>
The hostname is the name that was requested to be resolved.
<P>
For the <tt/Flags/ column:
<itemize>
<item><tt/C/ Means positively cached.
<item><tt/N/ Means negatively cached.
<item><tt/P/ Means the request is pending being dispatched.
<item><tt/D/ Means the request has been dispatched and we're waiting for an answer.
<item><tt/L/ Means it is a locked entry because it represents a parent or sibling.
</itemize>
The <tt/TTL/ column represents ``Time To Live'' (i.e., how long
the cache entry is valid). (May be negative if the document has
expired.)
<P>
The <tt/N/ column is the number of IP addresses from which
the cache has documents.
<P>
The rest of the line lists all the IP addresses that have been associated
with that IP cache entry.
<P>
<sect1>What is the fqdncache and how is it different from the ipcache?
<P>
IPCache contains data for the Hostname to IP-Number mapping, and
FQDNCache does it the other way round. For example:
<em/IP Cache Contents:/
<verb>
Hostname Flags lstref TTL N [IP-Number]
gorn.cc.fh-lippe.de C 0 21581 1 193.16.112.73
lagrange.uni-paderborn.de C 6 21594 1 131.234.128.245
www.altavista.digital.com C 10 21299 4 204.123.2.75 ...
2/ftp.symantec.com DL 1583 -772855 0
Flags: C --> Cached
D --> Dispatched
N --> Negative Cached
L --> Locked
lstref: Time since last use
TTL: Time-To-Live until information expires
N: Count of addresses
</verb>
<P>
<em/FQDN Cache Contents:/
<verb>
IP-Number Flags TTL N Hostname
130.149.17.15 C -45570 1 andele.cs.tu-berlin.de
194.77.122.18 C -58133 1 komet.teuto.de
206.155.117.51 N -73747 0
Flags: C --> Cached
D --> Dispatched
N --> Negative Cached
L --> Locked
TTL: Time-To-Live until information expires
N: Count of names
</verb>
<sect1>What does ``Page faults with physical i/o: 4897'' mean?
<label id="paging">
<P>
This question was asked on the <em/squid-users/ mailing list, to which
there were three excellent replies.
<P>
by <url url="mailto:JLarmour@origin-at.co.uk" name="Jonathan Larmour">
<P>
You get a ``page fault'' when your OS tries to access something in memory
which is actually swapped to disk. The term ``page fault'' while correct at
the kernel and CPU level, is a bit deceptive to a user, as there's no
actual error - this is a normal feature of operation.
<P>
Also, this doesn't necessarily mean your squid is swapping by that much.
Most operating systems also implement paging for executables, so that only
sections of the executable which are actually used are read from disk into
memory. Also, whenever squid needs more memory, the fact that the memory
was allocated will show up in the page faults.
<P>
However, if the number of faults is unusually high, and getting bigger,
this could mean that squid is swapping. Another way to verify this is using
a program called ``vmstat'' which is found on most UNIX platforms. If you run
this as ``vmstat 5'' this will update a display every 5 seconds. This can
tell you if the system as a whole is swapping a lot (see your local man
page for vmstat for more information).
<P>
It is very bad for squid to swap, as every single request will be blocked
until the requested data is swapped in. It is better to tweak the <em/cache_mem/
and/or <em/memory_pools/ setting in squid.conf, or switch to the NOVM versions
of squid, than allow this to happen.
<P>
by <url url="mailto:peter@spinner.dialix.com.au" name="Peter Wemm">
<P>
There's two different operations at work, Paging and swapping. Paging
is when individual pages are shuffled (either discarded or swapped
to/from disk), while ``swapping'' <em/generally/ means the entire
process got sent to/from disk.
<P>
Needless to say, swapping a process is a pretty drastic event, and usually
only reserved for when there's a memory crunch and paging out cannot free
enough memory quickly enough. Also, there's some variation on how
swapping is implemented in OS's. Some don't do it at all or do a hybrid
of paging and swapping instead.
<P>
As you say, paging out doesn't necessarily involve disk IO, eg: text (code)
pages are read-only and can simply be discarded if they are not used (and
reloaded if/when needed). Data pages are also discarded if unmodified, and
paged out if there's been any changes. Allocated memory (malloc) is always
saved to disk since there's no executable file to recover the data from.
mmap() memory is variable.. If it's backed from a file, it uses the same
rules as the data segment of a file - ie: either discarded if unmodified or
paged out.
<P>
There's also ``demand zeroing'' of pages as well that cause faults.. If you
malloc memory and it calls brk()/sbrk() to allocate new pages, the chances
are that you are allocated demand zero pages. Ie: the pages are not
``really'' attached to your process yet, but when you access them for the
first time, the page fault causes the page to be connected to the process
address space and zeroed - this saves unnecessary zeroing of pages that are
allocated but never used.
<P>
The ``page faults with physical IO'' comes from the OS via getrusage(). It's
highly OS dependent on what it means. Generally, it means that the process
accessed a page that was not present in memory (for whatever reason) and
there was disk access to fetch it. Many OS's load executables by demand
paging as well, so the act of starting squid implicitly causes page faults
with disk IO - however, many (but not all) OS's use ``read ahead'' and
``prefault'' heuristics to streamline the loading. Some OS's maintain
``intent queues'' so that pages can be selected as pageout candidates ahead
of time. When (say) squid touches a freshly allocated demand zero page and
one is needed, the OS can page out one of the candidates on the spot,
causing a 'fault with physical IO' with demand zeroing of allocated memory
which doesn't happen on many other OS's. (The other OS's generally put
the process to sleep while the pageout daemon finds a page for it).
<P>
The meaning of ``swapping'' varies. On FreeBSD for example, swapping out is
implemented as unlocking upages, kernel stack, PTD etc for aggressive
pageout with the process. The only thing left of the process in memory is
the 'struct proc'. The FreeBSD paging system is highly adaptive and can
resort to paging in a way that is equivalent to the traditional swapping
style operation (ie: entire process). FreeBSD also tries stealing pages
from active processes in order to make space for disk cache. I suspect
this is why setting 'memory_pools off' on the non-NOVM squids on FreeBSD is
reported to work better - the VM/buffer system could be competing with
squid to cache the same pages. It's a pity that squid cannot use mmap() to
do file IO on the 4K chunks in it's memory pool (I can see that this is not
a simple thing to do though, but that won't stop me wishing. :-).
<P>
by <url url="mailto:webadm@info.cam.ac.uk" name="John Line">
<P>
The comments so far have been about what paging/swapping figures mean in
a ``traditional'' context, but it's worth bearing in mind that on some systems
(Sun's Solaris 2, at least), the virtual memory and filesystem handling are
unified and what a user process sees as reading or writing a file, the system
simply sees as paging something in from disk or a page being updated so it
needs to be paged out. (I suppose you could view it as similar to the operating
system memory-mapping the files behind-the-scenes.)
<P>
The effect of this is that on Solaris 2, paging figures will also include file
I/O. Or rather, the figures from vmstat certainly appear to include file I/O,
and I presume (but can't quickly test) that figures such as those quoted by
Squid will also include file I/O.
<P>
To confirm the above (which represents an impression from what I've read and
observed, rather than 100% certain facts...), using an otherwise idle Sun Ultra
1 system system I just tried using cat (small, shouldn't need to page) to copy
(a) one file to another, (b) a file to /dev/null, (c) /dev/zero to a file, and
(d) /dev/zero to /dev/null (interrupting the last two with control-C after a
while!), while watching with vmstat. 300-600 page-ins or page-outs per second
when reading or writing a file (rather than a device), essentially zero in
other cases (and when not cat-ing).
<P>
So ... beware assuming that all systems are similar and that paging figures
represent *only* program code and data being shuffled to/from disk - they
may also include the work in reading/writing all those files you were
accessing...
<sect2>Ok, so what is unusually high?
<P>
You'll probably want to compare the number of page faults to the number of
HTTP requests. If this ratio is close to, or exceeding&nbsp;1, then
Squid is paging too much.
<sect1>What does the IGNORED field mean in the 'cache server list'?
<P>
This refers to ICP replies which Squid ignored, for one of these
reasons:
<itemize>
<item>
The URL in the reply could not be found in the cache at all.
<item>
The URL in the reply was already being fetched. Probably
this ICP reply arrived too late.
<item>
The URL in the reply did not have a MemObject associated with
it. Either the request is already finished, or the user aborted
before the ICP arrived.
<item>
The reply came from a multicast-responder, but the
<em/cache_peer_access/ configuration does not allow us to
forward this request to that
neighbor.
<item>
Source-Echo replies from known neighbors are ignored.
<item>
ICP_OP_DENIED replies are ignored after the first 100.
</itemize>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Access Controls
<label id="access-controls">
<sect1>Introduction
<p>
Squid's access control scheme is relatively comprehensive and difficult
for some people to understand. There are two different components: <em/ACL elements/,
and <em/access lists/. An access list consists of an <em/allow/ or <em/deny/ action
followed by a number of ACL elements.
<sect2>ACL elements
<p>
<em>Note: The information here is current for version 2.4.</em>
<p>
Squid knows about the following types of ACL elements:
<itemize>
<item>
<bf/src/: source (client) IP addresses
<item>
<bf/dst/: destination (server) IP addresses
<item>
<bf/myip/: the local IP address of a client's connection
<item>
<bf/srcdomain/: source (client) domain name
<item>
<bf/dstdomain/: destination (server) domain name
<item>
<bf/srcdom_regex/: source (client) regular expression pattern matching
<item>
<bf/dstdom_regex/: destination (server) regular expression pattern matching
<item>
<bf/time/: time of day, and day of week
<item>
<bf/url_regex/: URL regular expression pattern matching
<item>
<bf/urlpath_regex/: URL-path regular expression pattern matching, leaves out the protocol and hostname
<item>
<bf/port/: destination (server) port number
<item>
<bf/myport/: local port number that client connected to
<item>
<bf/proto/: transfer protocol (http, ftp, etc)
<item>
<bf/method/: HTTP request method (get, post, etc)
<item>
<bf/browser/: regular expression pattern matching on the request's user-agent header
<item>
<bf/ident/: string matching on the user's name
<item>
<bf/ident_regex/: regular expression pattern matching on the user's name
<item>
<bf/src_as/: source (client) Autonomous System number
<item>
<bf/dst_as/: destination (server) Autonomous System number
<item>
<bf/proxy_auth/: user authentication via external processes
<item>
<bf/proxy_auth_regex/: user authentication via external processes
<item>
<bf/snmp_community/: SNMP community string matching
<item>
<bf/maxconn/: a limit on the maximum number of connections from a single client IP address
<item>
<bf/req_mime_type/: regular expression pattern matching on the request content-type header
<item>
<bf/arp/: Ethernet (MAC) address matching
</itemize>
<p>
Notes:
<p>
Not all of the ACL elements can be used with all types of access lists (described below).
For example, <em/snmp_community/ is only meaningful when used with <em/snmp_access/. The
<em/src_as/ and <em/dst_as/ types are only used in <em/cache_peer_access/ access lists.
<p>
The <em/arp/ ACL requires the special configure option --enable-arp-acl. Furthermore, the
ARP ACL code is not portable to all operating systems. It works on Linux, Solaris, and
some *BSD variants.
<p>
The SNMP ACL element and access list require the --enable-snmp configure option.
<p>
Some ACL elements can cause processing delays. For example, use of <em/src_domain/ and <em/srcdom_regex/
require a reverse DNS lookup on the client's IP address. This lookup adds some delay to the request.
<p>
Each ACL element is assigned a unique <em/name/. A named ACL element consists of a <em/list of values/.
When checking for a match, the multiple values use OR logic. In other words, an ACL element is <em/matched/
when any one of its values is a match.
<p>
You can't give the same name to two different types of ACL elements. It will generate a syntax error.
<p>
You can put different values for the same ACL name on different lines. Squid combines them into
one list.
<sect2>Access Lists
<p>
There are a number of different access lists:
<itemize>
<item>
<bf/http_access/: Allows HTTP clients (browsers) to access the HTTP port. This is the primary access control list.
<item>
<bf/icp_access/: Allows neighbor caches to query your cache with ICP.
<item>
<bf/miss_access/: Allows certain clients to forward cache misses through your cache.
<item>
<bf/no_cache/: Defines responses that should not be cached.
<item>
<bf/redirector_access/: Controls which requests are sent through the redirector pool.
<item>
<bf/ident_lookup_access/: Controls which requests need an Ident lookup.
<item>
<bf/always_direct/: Controls which requests should always be forwarded directly to origin servers.
<item>
<bf/never_direct/: Controls which requests should never be forwarded directly to origin servers.
<item>
<bf/snmp_access/: Controls SNMP client access to the cache.
<item>
<bf/broken_posts/: Defines requests for which squid appends an extra CRLF after POST message bodies as required by some broken origin servers.
<item>
<bf/cache_peer_access/: Controls which requests can be forwarded to a given neighbor (peer).
</itemize>
<p>
Notes:
<p>
An access list <em/rule/ consists of an <em/allow/ or <em/deny/ keyword, followed by a list of ACL element names.
<p>
An access list consists of one or more access list rules.
<p>
Access list rules are checked in the order they are written. List searching terminates as soon as one
of the rules is a match.
<p>
If a rule has multiple ACL elements, it uses AND logic. In other
words, <em/all/ ACL elements of the rule must be a match in order
for the rule to be a match. This means that it is possible to
write a rule that can never be matched. For example, a port number
can never be equal to both 80 AND 8000 at the same time.
<p>
If none of the rules are matched, then the default action is the
<em/opposite/ of the last rule in the list. Its a good idea to
be explicit with the default action. The best way is to thse
the <em/all/ ACL. For example:
<verb>
acl all src 0/0
http_access deny all
</verb>
<sect1>How do allow my clients to use the cache?
<p>
Define an ACL that corresponds to your client's IP addresses.
For example:
<verb>
acl myclients src 172.16.5.0/24
</verb>
Next, allow those clients in the <em/http_access/ list:
<verb>
http_access allow myclients
</verb>
<sect1>how do I configure Squid not to cache a specific server?
<p>
<verb>
acl someserver dstdomain .someserver.com
no_cache deny someserver
</verb>
<sect1>How do I implement an ACL ban list?
<P>
As an example, we will assume that you would like to prevent users from
accessing cooking recipes.
<P>
One way to implement this would be to deny access to any URLs
that contain the words ``cooking'' or ``recipe.''
You would use these configuration lines:
<verb>
acl Cooking1 url_regex cooking
acl Recipe1 url_regex recipe
http_access deny Cooking1
http_access deny Recipe1
http_access allow all
</verb>
The <em/url_regex/ means to search the entire URL for the regular
expression you specify. Note that these regular expressions are case-sensitive,
so a url containing ``Cooking'' would not be denied.
<P>
Another way is to deny access to specific servers which are known
to hold recipes. For example:
<verb>
acl Cooking2 dstdomain gourmet-chef.com
http_access deny Cooking2
http_access allow all
</verb>
The <em/dstdomain/ means to search the hostname in the URL for the
string ``gourmet-chef.com.''
Note that when IP addresses are used in URLs (instead of domain names),
Squid-1.1 implements relaxed access controls. If the a domain name
for the IP address has been saved in Squid's ``FQDN cache,'' then
Squid can compare the destination domain against the access controls.
However, if the domain is not immediately available, Squid allows
the request and makes a lookup for the IP address so that it may
be available for future reqeusts.
<sect1>How do I block specific users or groups from accessing my cache?
<sect2>Ident
<P>
You can use
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc931.txt"
name="ident lookups">
to allow specific users access to your cache. This requires that an
<url url="ftp://ftp.lysator.liu.se/pub/ident/servers"
name="ident server">
process runs on the user's machine(s).
In your <em/squid.conf/ configuration
file you would write something like this:
<verb>
ident_lookup on
acl friends user kim lisa frank joe
http_access allow friends
http_access deny all
</verb>
<sect2>Proxy Authentication
<label id="proxy-auth-acl">
<P>
Another option is to use proxy-authentication. In this scheme, you assign
usernames and passwords to individuals. When they first use the proxy
they are asked to authenticate themselves by entering their username and
password.
<P>
In Squid v2 this authentication is hanled via external processes. For
information on how to configure this, please see
<ref id="configuring-proxy-auth" name="Configuring Proxy Authentication">.
<sect1>Do you have a CGI program which lets users change their own proxy passwords?
<P>
<url url="mailto:orso@ineparnet.com.br" name="Pedro L Orso">
has adapted the Apache's <em/htpasswd/ into a CGI program
called <url url="/htpasswd/chpasswd-cgi.tar.gz" name="chpasswd.cgi">.
<sect1>Is there a way to do ident lookups only for a certain host and compare the result with a userlist in squid.conf?
<P>
Sort of.
<P>
If you use a <em/user/ ACL in squid conf, then Squid will perform
an
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc931.txt"
name="ident lookup">
for every client request. In other words, Squid-1.1 will perform
ident lookups for all requests or no requests. Defining a <em/user/ ACL
enables ident lookups, regardless of the <em/ident_lookup/ setting.
<P>
However, even though ident lookups are performed for every request, Squid does
not wait for the lookup to complete unless the ACL rules require it. Consider this
configuration:
<verb>
acl host1 src 10.0.0.1
acl host2 src 10.0.0.2
acl pals user kim lisa frank joe
http_access allow host1
http_access allow host2 pals
</verb>
Requests coming from 10.0.0.1 will be allowed immediately because
there are no user requirements for that host. However, requests
from 10.0.0.2 will be allowed only after the ident lookup completes, and
if the username is in the set kim, lisa, frank, or joe.
<sect1>Common Mistakes
<sect2>And/Or logic
<P>
You've probably noticed (and been frustrated by) the fact that
you cannot combine access controls with terms like ``and'' or ``or.''
These operations are already built in to the access control scheme
in a fundamental way which you must understand.
<itemize>
<item>
<bf>All elements of an <em/acl/ entry are OR'ed together</bf>.
<item>
<bf>All elements of an <em/access/ entry are AND'ed together</bf>.
e.g. <em/http_access/ and <em/icp_access/.
</itemize>
<P>
For example, the following access control configuration will never work:
<verb>
acl ME src 10.0.0.1
acl YOU src 10.0.0.2
http_access allow ME YOU
</verb>
In order for the request to be allowed, it must match the ``ME'' acl AND the ``YOU'' acl.
This is impossible because any IP address could only match one or the other. This
should instead be rewritten as:
<verb>
acl ME src 10.0.0.1
acl YOU src 10.0.0.2
http_access allow ME
http_access allow YOU
</verb>
Or, alternatively, this would also work:
<verb>
acl US src 10.0.0.1 10.0.0.2
http_access allow US
</verb>
<sect2>allow/deny mixups
<P>
<it>
I have read through my squid.conf numerous times, spoken to my
neighbors, read the FAQ and Squid Docs and cannot for the life of
me work out why the following will not work.
</it>
<P>
<it>
I can successfully access cachemgr.cgi from our web server machine here,
but I would like to use MRTG to monitor various aspects of our proxy.
When I try to use 'client' or GET cache_object from the machine the
proxy is running on, I always get access denied.
</it>
<verb>
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl server src 1.2.3.4/255.255.255.255
acl all src 0.0.0.0/0.0.0.0
acl ourhosts src 1.2.0.0/255.255.0.0
http_access deny manager !localhost !server
http_access allow ourhosts
http_access deny all
</verb>
<P>
The intent here is to allow cache manager requests from the <em/localhost/
and <em/server/ addresses, and deny all others. This policy has been
expressed here:
<verb>
http_access deny manager !localhost !server
</verb>
<P>
The problem here is that for allowable requests, this access rule is
not matched. For example, if the source IP address is <em/localhost/,
then ``!localhost'' is <em/false/ and the access rule is not matched, so
Squid continues checking the other rules. Cache manager requests from
the <em/server/ address work because <em/server/ is a subset of <em/ourhosts/
and the second access rule will match and allow the request. Also note that
this means any cache manager request from <em/ourhosts/ would be allowed.
<P>
To implement the desired policy correctly, the access rules should be
rewritten as
<verb>
http_access allow manager localhost
http_access allow manager server
http_access deny manager
http_access allow ourhosts
http_access deny all
</verb>
If you're using <em/miss_access/, then don't forget to also add
a <em/miss_access/ rule for the cache manager:
<verb>
miss_access allow manager
</verb>
<P>
You may be concerned that the having five access rules instead of three
may have an impact on the cache performance. In our experience this is
not the case. Squid is able to handle a moderate amount of access control
checking without degrading overall performance. You may like to verify
that for yourself, however.
<sect2>Differences between <em/src/ and <em/srcdomain/ ACL types.
<P>
For the <em/srcdomain/ ACL type, Squid does a reverse lookup
of the client's IP address and checks the result with the domains
given on the <em/acl/ line. With the <em/src/ ACL type, Squid
converts hostnames to IP addresses at startup and then only compares
the client's IP address. The <em/src/ ACL is preferred over <em/srcdomain/
because it does not require address-to-name lookups for each request.
<sect1>I set up my access controls, but they don't work! why?
<P>
You can debug your access control configuration by setting the
<em/debug_options/ parameter in <em/squid.conf/ and
watching <em/cache.log/ as requests are made. The access control
routes correspond to debug section 28, so you might enter:
<verb>
debug_options ALL,1 28,9
</verb>
<sect1>Proxy-authentication and neighbor caches
<P>
The problem...
<quote>
<verb>
[ Parents ]
/ \
/ \
[ Proxy A ] --- [ Proxy B ]
|
|
USER
</verb>
<P>
<it>
Proxy A sends and ICP query to Proxy B about an object, Proxy B replies with an
ICP_HIT. Proxy A forwards the HTTP request to Proxy B, but
does not pass on the authentication details, therefore the HTTP GET from
Proxy A fails.
</it>
</quote>
<P>
Only ONE proxy cache in a chain is allowed to ``use'' the Proxy-Authentication
request header. Once the header is used, it must not be passed on to
other proxies.
<P>
Therefore, you must allow the neighbor caches to request from each other
without proxy authentication. This is simply accomplished by listing
the neighbor ACL's first in the list of <em/http_access/ lines. For example:
<verb>
acl proxy-A src 10.0.0.1
acl proxy-B src 10.0.0.2
acl user_passwords proxy_auth /tmp/user_passwds
http_access allow proxy-A
http_access allow proxy-B
http_access allow user_passwords
http_access deny all
</verb>
<sect1>Is there an easy way of banning all Destination addresses except one?
<P>
<verb>
acl GOOD dst 10.0.0.1
acl BAD dst 0.0.0.0/0.0.0.0
http_access allow GOOD
http_access deny BAD
</verb>
<sect1>Does anyone have a ban list of porn sites and such?
<P>
<itemize>
<item><url url="http://web.onda.com.br/orso/" name="Pedro Lineu Orso's List">
<item><url url="http://www.hklc.com/squidblock/" name="Linux Center Hong Kong's List">
<item>
Snerpa, an ISP in Iceland operates a DNS-database of
IP-addresses of blacklisted sites containing porn, violence,
etc. which is utilized using a small perl-script redirector.
Information on this on the <url
url="http://www.snerpa.is/notendur/infilter/infilter-en.phtml"
name="INfilter"> webpage.
</itemize>
<sect1>Squid doesn't match my subdomains
<P>
There is a subtle problem with domain-name based access controls
when a single ACL element has an entry that is a subdomain of
another entry. For example, consider this list:
<verb>
acl FOO dstdomain boulder.co.us vail.co.us co.us
</verb>
<P>
In the first place, the above list is simply wrong because
the first two (<em/boulder.co.us/ and <em/vail.co.us/) are
unnecessary. Any domain name that matches one of the first two
will also match the last one (<em/co.us/). Ok, but why does this
happen?
<P>
The problem stems from the data structure used to index domain
names in an access control list. Squid uses <em/Splay trees/
for lists of domain names. As other tree-based data structures,
the searching algorithm requires a comparison function that returns
-1, 0, or +1 for any pair of keys (domain names). This is similar
to the way that <em/strcmp()/ works.
<P>
The problem is that it is wrong to say that <em/co.us/ is greater-than,
equal-to, or less-than <em/boulder.co.us/.
<P>
For example, if you
said that <em/co.us/ is LESS than <em/fff.co.us/, then
the Splay tree searching algorithm might never discover
<em/co.us/ as a match for <em/kkk.co.us/.
<P>
similarly, if you said that <em/co.us/ is GREATER than <em/fff.co.us/,
then the Splay tree searching algorithm might never
discover <em/co.us/ as a match for <em/bbb.co.us/.
<P>
The bottom line is that you can't have one entry that is a subdomain
of another. Squid-2.2 will warn you if it detects this condition.
<sect1>Why does Squid deny some port numbers?
<P>
It is dangerous to allow Squid to connect to certain port numbers.
For example, it has been demonstrated that someone can use Squid
as an SMTP (email) relay. As I'm sure you know, SMTP relays are
one of the ways that spammers are able to flood our mailboxes.
To prevent mail relaying, Squid denies requests when the URL port
number is 25. Other ports should be blocked as well, as a precaution.
<P>
There are two ways to filter by port number: either allow specific
ports, or deny specific ports. By default, Squid does the first. This
is the ACL entry that comes in the default <em/squid.conf/:
<verb>
acl Safe_ports port 80 21 443 563 70 210 1025-65535
http_access deny !Safe_ports
</verb>
The above configuration denies requests when the URL port number is
not in the list. The list allows connections to the standard
ports for HTTP, FTP, Gopher, SSL, WAIS, and all non-priveleged
ports.
<P>
Another approach is to deny dangerous ports. The dangerous
port list should look something like:
<verb>
acl Dangerous_ports 7 9 19 22 23 25 53 109 110 119
http_access deny Dangerous_ports
</verb>
...and probably many others.
<P>
Please consult the <em>/etc/services</em> file on your system
for a list of known ports and protocols.
<sect1>Does Squid support the use of a database such as mySQL for storing the ACL list?
<p>
<em>Note: The information here is current for version 2.2.</em>
<p>
No, it does not.
<sect1>How can I allow a single address to access a specific URL?
<p>
This example allows only the <em/special_client/ to access
the <em/special_url/. Any other client that tries to access
the <em/special_url/ is denied.
<verb>
acl special_client src 10.1.2.3
acl special_url url_regex ^http://www.squid-cache.org/Doc/FAQ/$
http_access allow special_client special_url
http_access deny special_url
</verb>
<sect1>How can I allow some clients to use the cache at specific times?
<p>
Let's say you have two workstations that should only be allowed access
to the Internet during working hours (8:30 - 17:30). You can use
something like this:
<verb>
acl FOO src 10.1.2.3 10.1.2.4
acl WORKING time MTWHF 08:30-17:30
http_access allow FOO WORKING
http_access deny FOO
</verb>
<sect1>Problems with IP ACL's that have complicated netmasks
<p>
<em>Note: The information here is current for version 2.3.</em>
<p>
The following ACL entry gives inconsistent or unexpected results:
<verb>
acl restricted src 10.0.0.128/255.0.0.128 10.85.0.0/16
</verb>
The reason is that IP access lists are stored in ``splay'' tree
data structures. These trees require the keys to be sortable.
When you use a complicated, or non-standard, netmask (255.0.0.128), it confuses
the function that compares two address/mask pairs.
<p>
The best way to fix this problem is to use separate ACL names
for each ACL value. For example, change the above to:
<verb>
acl restricted1 src 10.0.0.128/255.0.0.128
acl restricted2 src 10.85.0.0/16
</verb>
<p>
Then, of course, you'll have to rewrite your <em/http_access/
lines as well.
<sect1>Can I set up ACL's based on MAC address rather than IP?
<p>
Yes, for some operating systes. Squid calls these ``ARP ACLs'' and
they are supported on Linux, Solaris, and probably BSD variants.
<p>
NOTE: Squid can only determine the MAC address for clients that
are on the same subnet. If the client is on a different subnet,
then Squid can not find out its MAC address.
<p>
To use ARP (MAC) access controls, you
first need to compile in the optional code. Do this with
the <em/--enable-arp-acl/ configure option:
<verb>
% ./configure --enable-arp-acl ...
% make clean
% make
</verb>
If <em>src/acl.c</em> doesn't compile, then ARP ACLs are probably not
supported on your system.
<p>
If everything compiles, then you can add some ARP ACL lines to
your <em/squid.conf/:
<verb>
acl M1 arp 01:02:03:04:05:06
acl M2 arp 11:12:13:14:15:16
http_access allow M1
http_access allow M2
http_access deny all
</verb>
<sect1>Debugging ACLs
<p>
If ACLs are giving you problems and you don't know why they
aren't working, you can use this tip to debug them.
<p>
In <em>squid.conf</em> enable debugging for section 32 at level 2.
For example:
<verb>
debug_options ALL,1 32,2
</verb>
The restart or reconfigure squid.
<p>
From now on, your <em/cache.log/ should contain a line for every
request that explains if it was allowed, or denied, and which
ACL was the last one that it matched.
<sect1>Can I limit the number of connections from a client?
<p>
Yes, use the <em/maxconn/ ACL type in conjunction with <em/http_access deny/.
For example:
<verb>
acl losers src 1.2.3.0/24
acl 5CONN maxconn 5
http_access deny 5CONN losers
</verb>
<p>
Given the above configuration, when a client whose source IP address
is in the 1.2.3.0/24 subnet tries to establish 6 or more connections
at once, Squid returns an error page. Unless you use the
<em/deny_info/ feature, the error message will just say ``access
denied.''
<p>
Note, the <em/maxconn/ ACL type is kind of tricky because it
uses less-than comparison. The ACL is a match when the number
of established connections is <em/greater/ than the value you
specify. Because of that, you don't want to use the <em/maxconn/
ACL with <em/http_access allow/.
<p>
Also note that you could use <em/maxconn/ in conjunction with
a user type (ident, proxy_auth), rather than an IP address type.
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Troubleshooting
<sect1>Why am I getting ``Proxy Access Denied?''
<P>
You may need to set up the <em/http_access/ option to allow
requests from your IP addresses. Please see <ref id="access-controls"
name="the Access Controls section"> for information about that.
<P>
If <em/squid/ is in httpd-accelerator mode, it will accept normal
HTTP requests and forward them to a HTTP server, but it will not
honor proxy requests. If you want your cache to also accept
proxy-HTTP requests then you must enable this feature:
<verb>
httpd_accel_with_proxy on
</verb>
Alternately, you may have misconfigured one of your ACLs. Check the
<em/access.log/ and <em/squid.conf/ files for clues.
<sect1>I can't get <tt/local_domain/ to work; <em/Squid/ is caching the objects from my local servers.
<P>
The <tt/local_domain/ directive does not prevent local
objects from being cached. It prevents the use of sibling caches
when fetching local objects. If you want to prevent objects from
being cached, use the <tt/cache_stoplist/ or <tt/http_stop/
configuration options (depending on your version).
<sect1>I get <tt/Connection Refused/ when the cache tries to retrieve an object located on a sibling, even though the sibling thinks it delivered the object to my cache.
<P>
If the HTTP port number is wrong but the ICP port is correct you
will send ICP queries correctly and the ICP replies will fool your
cache into thinking the configuration is correct but large objects
will fail since you don't have the correct HTTP port for the sibling
in your <em/squid.conf/ file. If your sibling changed their
<tt/http_port/, you could have this problem for some time
before noticing.
<sect1>Running out of filedescriptors
<label id="filedescriptors">
<P>
If you see the <tt/Too many open files/ error message, you
are most likely running out of file descriptors. This may be due
to running Squid on an operating system with a low filedescriptor
limit. This limit is often configurable in the kernel or with
other system tuning tools. There are two ways to run out of file
descriptors: first, you can hit the per-process limit on file
descriptors. Second, you can hit the system limit on total file
descriptors for all processes.
<sect2>Linux
<P>
Dancer has a <url url="http://www2.simegen.com/~dancer/minihowto.html"
name="Mini-'Adding File-descriptors-to-linux for squid' HOWTO">, but
this information seems specific to the Linux 2.0.36 kernel.
<p>
Henrik has a <url url="http://squid.sourceforge.net/hno/linux-lfd.html" name="How to get many filedescriptors on Linux 2.2.X"> page.
<P>
You also might want to
have a look at
<url url="http://www.linux.org.za/oskar/patches/kernel/filehandle/"
name="filehandle patch">
by
<url url="mailto:michael@metal.iinet.net.au"
name="Michael O'Reilly">
<P>
If your kernel version is 2.2.x or greater, you can read and write
the maximum number of file handles and/or inodes
simply by accessing the special files:
<verb>
/proc/sys/fs/file-max
/proc/sys/fs/inode-max
</verb>
So, to increase your file descriptor limit:
<verb>
echo 3072 > /proc/sys/fs/file-max
</verb>
<P>
If your kernel version is between 2.0.35 and 2.1.x (?), you can read and write
the maximum number of file handles and/or inodes
simply by accessing the special files:
<verb>
/proc/sys/kernel/file-max
/proc/sys/kernel/inode-max
</verb>
<P>
While this does increase the current number of file descriptors,
Squid's <em/configure/ script probably won't figure out the
new value unless you also update the include files, specifically
the value of <em/OPEN_MAX/ in
<em>/usr/include/linux/limits.h</em>.
<sect2>Solaris
<P>
Add the following to your <em>/etc/system</em> file to
increase your maximum file descriptors per process:
<P>
<verb>
set rlim_fd_max = 4096
</verb>
<P>
Next you should re-run the <em>configure</em> script
in the top directory so that it finds the new value.
If it does not find the new limit, then you might try
editing <em>include/autoconf.h</em> and setting
<tt/#define DEFAULT_FD_SETSIZE/ by hand. Note that
<em>include/autoconf.h</em> is created from <em>autoconf.h.in</em>
every time you run configure. Thus, if you edit it by
hand, you might lose your changes later on.
<P>
If you have a very old version of Squid (1.1.X), and you
want to use more than 1024 descriptors, then you must
edit <em>src/Makefile</em> and enable
<tt/&dollar;(USE_POLL_OPT)/. Then recompile <em/squid/.
<p>
<url url="mailto:voeckler at rvs dot uni-hannover dot de" name="Jens-S. Voeckler">
advises that you should NOT change the soft limit (<em/rlim_fd_cur/) to anything
larger than 256. It will break other programs, such as the license
manager needed for the SUN workshop compiler. Jens-S. also says that it
should be safe to raise the limit as high as 16,384.
<sect2>IRIX
<p>
For some hints, please see SGI's <url
url="http://www.sgi.com/tech/web/irix62.html" name="Tuning IRIX 6.2 for
a Web Server"> document.
<sect2>FreeBSD
<P>
by <url url="mailto:torsten.sturm@axis.de" name="Torsten Sturm">
<enum>
<item>How do I check my maximum filedescriptors?
<P>Do <tt/sysctl -a/ and look for the value of
<tt/kern.maxfilesperproc/.
<item>How do I increase them?
<verb>
sysctl -w kern.maxfiles=XXXX
sysctl -w kern.maxfilesperproc=XXXX
</verb>
<bf>Warning</bf>: You probably want <tt/maxfiles
&gt maxfilesperproc/ if you're going to be pushing the
limit.
<item>What is the upper limit?
<P>I don't think there is a formal upper limit inside the kernel.
All the data structures are dynamically allocated. In practice
there might be unintended metaphenomena (kernel spending too much
time searching tables, for example).
</enum>
<sect2>General BSD
<P>
For most BSD-derived systems (SunOS, 4.4BSD, OpenBSD, FreeBSD,
NetBSD, BSD/OS, 386BSD, Ultrix) you can also use the ``brute force''
method to increase these values in the kernel (requires a kernel
rebuild):
<enum>
<item>How do I check my maximum filedescriptors?
<P>Do <tt/pstat -T/ and look for the <tt/files/
value, typically expressed as the ratio of <tt/current/maximum/.
<item>How do I increase them the easy way?
<P>One way is to increase the value of the <tt/maxusers/ variable
in the kernel configuration file and build a new kernel. This method
is quick and easy but also has the effect of increasing a wide variety of
other variables that you may not need or want increased.
<item>Is there a more precise method?
<P>Another way is to find the <em/param.c/ file in your kernel
build area and change the arithmetic behind the relationship between
<tt/maxusers/ and the maximum number of open files.
</enum>
Here are a few examples which should lead you in the right direction:
<enum>
<item>SunOS
<P>Change the value of <tt/nfile/ in <tt//usr/kvm/sys/conf.common/param.c/tt> by altering this equation:
<verb>
int nfile = 16 * (NPROC + 16 + MAXUSERS) / 10 + 64;
</verb>
Where <tt/NPROC/ is defined by:
<verb>
#define NPROC (10 + 16 * MAXUSERS)
</verb>
<item>FreeBSD (from the 2.1.6 kernel)
<P>Very similar to SunOS, edit <em>/usr/src/sys/conf/param.c</em>
and alter the relationship between <tt/maxusers/ and the
<tt>maxfiles</tt> and <tt>maxfilesperproc</tt> variables:
<verb>
int maxfiles = NPROC*2;
int maxfilesperproc = NPROC*2;
</verb>
Where <tt>NPROC</tt> is defined by:
<tt>#define NPROC (20 + 16 * MAXUSERS)</tt>
The per-process limit can also be adjusted directly in the kernel
configuration file with the following directive:
<tt>options OPEN_MAX=128</tt>
<item>BSD/OS (from the 2.1 kernel)
<P>Edit <tt>/usr/src/sys/conf/param.c</tt> and adjust the
<tt>maxfiles</tt> math here:
<verb>
int maxfiles = 3 * (NPROC + MAXUSERS) + 80;
</verb>
Where <tt>NPROC</tt> is defined by:
<tt>#define NPROC (20 + 16 * MAXUSERS)</tt>
You should also set the <tt>OPEN_MAX</tt> value in your kernel
configuration file to change the per-process limit.
</enum>
<sect2>Reconfigure afterwards
<P>
<bf/NOTE:/ After you rebuild/reconfigure your kernel with more
filedescriptors, you must then recompile Squid. Squid's configure
script determines how many filedescriptors are available, so you
must make sure the configure script runs again as well. For example:
<verb> cd squid-1.1.x
make realclean
./configure --prefix=/usr/local/squid
make
</verb>
<sect1>What are these strange lines about removing objects?
<P>
For example:
<verb>
97/01/23 22:31:10| Removed 1 of 9 objects from bucket 3913
97/01/23 22:33:10| Removed 1 of 5 objects from bucket 4315
97/01/23 22:35:40| Removed 1 of 14 objects from bucket 6391
</verb>
These log entries are normal, and do not indicate that <em/squid/ has
reached <tt/cache_swap_high/.
<P>
Consult your cache information page in <em/cachemgr.cgi/ for
a line like this:
<verb>
Storage LRU Expiration Age: 364.01 days
</verb>
Objects which have not been used for that amount of time are removed as
a part of the regular maintenance. You can set an upper limit on the
<tt/LRU Expiration Age/ value with <tt/reference_age/ in the config
file.
<sect1>Can I change a Windows NT FTP server to list directories in Unix format?
<P>
Why, yes you can! Select the following menus:
<itemize>
<item>Start
<item>Programs
<item>Microsoft Internet Server (Common)
<item>Internet Service Manager
</itemize>
<P>
This will bring up a box with icons for your various services. One of
them should be a little ftp ``folder.'' Double click on this.
<P>
You will then have to select the server (there should only be one)
Select that and then choose ``Properties'' from the menu and choose the
``directories'' tab along the top.
<P>
There will be an option at the bottom saying ``Directory listing style.''
Choose the ``Unix'' type, not the ``MS-DOS'' type.
<P>
<quote>
--Oskar Pearson &lt;oskar@is.co.za&gt;
</quote>
<sect1>Why am I getting ``Ignoring MISS from non-peer x.x.x.x?''
<P>
You are receiving ICP MISSes (via UDP) from a parent or sibling cache
whose IP address your cache does not know about. This may happen
in two situations.
<P>
<enum>
<item>
If the peer is multihomed, it is sending packets out an interface
which is not advertised in the DNS. Unfortunately, this is a
configuration problem at the peer site. You can tell them to either
add the IP address interface to their DNS, or use Squid's
'udp_outgoing_address' option to force the replies
out a specific interface. For example:
<P>
<em/on your parent squid.conf:/
<verb>
udp_outgoing_address proxy.parent.com
</verb>
<em/on your squid.conf:/
<verb>
cache_host proxy.parent.com parent 3128 3130
</verb>
<item>
You can also see this warning when sending ICP queries to
multicast addresses. For security reasons, Squid requires
your configuration to list all other caches listening on the
multicast group address. If an unknown cache listens to that address
and sends replies, your cache will log the warning message. To fix
this situation, either tell the unknown cache to stop listening
on the multicast address, or if they are legitimate, add them
to your configuration file.
</enum>
<sect1>DNS lookups for domain names with underscores (_) always fail.
<P>
The standards for naming hosts
(<url url="http://ds.internic.net/rfc/rfc952.txt" name="RFC 952">,
<url url="http://ds.internic.net/rfc/rfc1101.txt" name="RFC 1101">)
do not allow underscores in domain names:
<quote>
A "name" (Net, Host, Gateway, or Domain name) is a text string up
to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus
sign (-), and period (.).
</quote>
The resolver library that ships with recent versions of BIND enforces
this restriction, returning an error for any host with underscore in
the hostname. The best solution is to complain to the hostmaster of the
offending site, and ask them to rename their host.
<p>
See also the
<url url="http://www.intac.com/~cdp/cptd-faq/section4.html#underscore"
name="comp.protocols.tcp-ip.domains FAQ">.
<P>
Some people have noticed that
<url url="http://ds.internic.net/rfc/rfc1033.txt" name="RFC 1033">
implies that underscores <bf/are/ allowed. However, this is an
<em/informational/ RFC with a poorly chosen
example, and not a <em/standard/ by any means.
<sect1>Why does Squid say: ``Illegal character in hostname; underscores are not allowed?'
<P>
See the above question. The underscore character is not
valid for hostnames.
<P>
Some DNS resolvers allow the underscore, so yes, the hostname
might work fine when you don't use Squid.
<P>
To make Squid allow underscores in hostnames, re-run the
<em>configure</em> script with this option:
<verb>
% ./configure --enable-underscores ...
</verb>
and then recompile:
<verb>
% make clean
% make
</verb>
<sect1>Why am I getting access denied from a sibling cache?
<P>
The answer to this is somewhat complicated, so please hold on.
<em/NOTE:/ most of this text is taken from
<url url="http://www.nlanr.net/&percnt;7ewessels/Papers/icp-squid.ps.gz"
name="ICP and the Squid Web Cache">.
<P>
An ICP query does not include any parent or sibling designation,
so the receiver really has no indication of how the peer
cache is configured to use it. This issue becomes important
when a cache is willing to serve cache hits to anyone, but only
handle cache misses for its paying users or customers. In other
words, whether or not to allow the request depends on if the
result is a hit or a miss. To accomplish this,
Squid acquired the <tt/miss_access/ feature
in October of 1996.
<P>
The necessity of ``miss access'' makes life a little bit complicated,
and not only because it was awkward to implement. Miss access
means that the ICP query reply must be an extremely accurate prediction
of the result of a subsequent HTTP request. Ascertaining
this result is actually very hard, if not impossible to
do, since the ICP request cannot convey the
full HTTP request.
Additionally, there are more types of HTTP request results than there
are for ICP. The ICP query reply will either be a hit or miss.
However, the HTTP request might result in a ``<tt/304 Not Modified/'' reply
sent from the origin server. Such a reply is not strictly a hit since the peer
needed to forward a conditional request to the source. At the same time,
its not strictly a miss either since the local object data is still valid,
and the Not-Modified reply is quite small.
<P>
One serious problem for cache hierarchies is mismatched freshness
parameters. Consider a cache <em/C/ using ``strict''
freshness parameters so its users get maximally current data.
<em/C/ has a sibling <em/S/ with less strict freshness parameters.
When an object is requested at <em/C/, <em/C/ might
find that <em/S/ already has the object via an ICP query and
ICP HIT response. <em/C/ then retrieves the object
from <em/S/.
<P>
In an HTTP/1.0 world, <em/C/ (and <em/C/'s client)
will receive an object that was never
subject to its local freshness rules. Neither HTTP/1.0 nor ICP provides
any way to ask only for objects less than a certain age. If the
retrieved object is stale by <em/C/s rules,
it will be removed from <em/C/s cache, but
it will subsequently be fetched from <em/S/ so long as it
remains fresh there. This configuration miscoupling
problem is a significant deterrent to establishing
both parent and sibling relationships.
<P>
HTTP/1.1 provides numerous request headers to specify freshness
requirements, which actually introduces
a different problem for cache hierarchies: ICP
still does not include any age information, neither in query nor
reply. So <em/S/ may return an ICP HIT if its
copy of the object is fresh by its configuration
parameters, but the subsequent HTTP request may result
in a cache miss due to any
<tt/Cache-control:/ headers originated by <em/C/ or by
<em/C/'s client. Situations now emerge where the ICP reply
no longer matches the HTTP request result.
<P>
In the end, the fundamental problem is that the ICP query does not
provide enough information to accurately predict whether
the HTTP request
will be a hit or miss. In fact, the current ICP Internet Draft is very
vague on this subject. What does ICP HIT really mean? Does it mean
``I know a little about that URL and have some copy of the object?'' Or
does it mean ``I have a valid copy of that object and you are allowed to
get it from me?''
<P>
So, what can be done about this problem? We really need to change ICP
so that freshness parameters are included. Until that happens, the members
of a cache hierarchy have only two options to totally eliminate the ``access
denied'' messages from sibling caches:
<enum>
<item>Make sure all members have the same <tt/refresh_rules/ parameters.
<item>Do not use <tt/miss_access/ at all. Promise your sibling cache
administrator that <em/your/ cache is properly configured and that you
will not abuse their generosity. The sibling cache administrator can
check his log files to make sure you are keeping your word.
</enum>
If neither of these is realistic, then the sibling relationship should not
exist.
<sect1>Cannot bind socket FD NN to *:8080 (125) Address already in use
<P>
This means that another processes is already listening on port 8080
(or whatever you're using). It could mean that you have a Squid process
already running, or it could be from another program. To verify, use
the <em/netstat/ command:
<verb>
netstat -naf inet | grep LISTEN
</verb>
That will show all sockets in the LISTEN state. You might also try
<verb>
netstat -naf inet | grep 8080
</verb>
If you find that some process has bound to your port, but you're not sure
which process it is, you might be able to use the excellent
<url url="ftp://vic.cc.purdue.edu/pub/tools/unix/lsof/"
name="lsof">
program. It will show you which processes own every open file descriptor
on your system.
<sect1>icpDetectClientClose: ERROR xxx.xxx.xxx.xxx: (32) Broken pipe
<P>
This means that the client socket was closed by the client
before Squid was finished sending data to it. Squid detects this
by trying to <tt/read(2)/ some data from the socket. If the
<tt/read(2)/ call fails, then Squid konws the socket has been
closed. Normally the <tt/read(2)/ call returns <em/ECONNRESET: Connection reset by peer/
and these are NOT logged. Any other error messages (such as
<em/EPIPE: Broken pipe/ are logged to <em/cache.log/. See the ``intro'' of
section 2 of your Unix manual for a list of all error codes.
<sect1>icpDetectClientClose: FD 135, 255 unexpected bytes
<P>
These are caused by misbehaving Web clients attempting to use persistent
connections. Squid-1.1 does not support persistent connections.
<sect1>Does Squid work with NTLM Authentication?
<P>
<url url="/Versions/v2/2.5/" name="Version 2.5"> will
support Microsoft NTLM authentication. However, there are some
limits on our support: We cannot proxy connections to a origin
server that use NTLM authentication, but we can act as a web
accelerator or proxy server and authenticate the client connection
using NTLM.
<p>
We support NT4, Samba, and Windows 2000 Domain Controllers. For
more information get squid 2.5 and run <em>./configure --help</em>.
<p>
Why we cannot proxy NTLM even though we can use it.
Quoting from summary at the end of the browser authentication section in
<url url="http://support.microsoft.com/support/kb/articles/Q198/1/16.ASP"
name="this article">:
<quote>
In summary, Basic authentication does not require an implicit end-to-end
state, and can therefore be used through a proxy server. Windows NT
Challenge/Response authentication requires implicit end-to-end state and
will not work through a proxy server.
</quote>
<P>
Squid transparently passes the NTLM request and response headers between
clients and servers. NTLM relies on a single end-end connection (possibly
with men-in-the-middle, but a single connection every step of the way. This
implies that for NTLM authentication to work at all with proxy caches, the
proxy would need to tightly link the client-proxy and proxy-server links, as
well as understand the state of the link at any one time. NTLM through a
CONNECT might work, but we as far as we know that hasn't been implemented
by anyone, and it would prevent the pages being cached - removing the value
of the proxy.
<p>
NTLM authentication is carried entirely inside the HTTP protocol, but is
different from Basic authentication in many ways.
<enum>
<item>
It is dependent on a stateful end-to-end connection which collides with
RFC 2616 for proxy-servers to disjoin the client-proxy and proxy-server
connections.
<item>
It is only taking place once per connection, not per request. Once the
connection is authenticated then all future requests on the same connection
inherities the authentication. The connection must be reestablished to set
up other authentication or re-identify the user.
</enum>
<p>
The reasons why it is not implemented in Netscape is probably:
<itemize>
<item> It is very specific for the Windows platform
<item> It is not defined in any RFC or even internet draft.
<item> The protocol has several shortcomings, where the most apparent one is
that it cannot be proxied.
<item> There exists an open internet standard which does mostly the same but
without the shortcomings or platform dependencies: <url url="ftp://ftp.isi.edu/in-notes/rfc2617.txt" name="digest authentication">.
</itemize>
<sect1>The <em/default/ parent option isn't working!
<P>
This message was received at <em/squid-bugs/:
<quote>
<it>If you have only one parent, configured as:</it>
<verb>
cache_host xxxx parent 3128 3130 no-query default
</verb>
<it>nothing is sent to the parent; neither UDP packets, nor TCP connections.</it>
</quote>
<P>
Simply adding <em/default/ to a parent does not force all requests to be sent
to that parent. The term <em/default/ is perhaps a poor choice of words. A <em/default/
parent is only used as a <bf/last resort/. If the cache is able to make direct connections,
direct will be preferred over default. If you want to force all requests to your parent
cache(s), use the <em/never_direct/ option:
<verb>
acl all src 0.0.0.0/0.0.0.0
never_direct allow all
</verb>
<sect1>``Hot Mail'' complains about: Intrusion Logged. Access denied.
<P>
``Hot Mail'' is proxy-unfriendly and requires all requests to come from
the same IP address. You can fix this by adding to your
<em/squid.conf/:
<verb>
hierarchy_stoplist hotmail.com
</verb>
<sect1>My Squid becomes very slow after it has been running for some time.
<P>
This is most likely because Squid is using more memory than it should be
for your system. When the Squid process becomes large, it experiences a lot
of paging. This will very rapidly degrade the performance of Squid.
Memory usage is a complicated problem. There are a number
of things to consider.
<P>
First, examine the Cache Manager <em/Info/ ouput and look at these two lines:
<verb>
Number of HTTP requests received: 121104
Page faults with physical i/o: 16720
</verb>
Note, if your system does not have the <em/getrusage()/ function, then you will
not see the page faults line.
<P>
Divide the number of page faults by the number of connections. In this
case 16720/121104 = 0.14. Ideally this ratio should be in the 0.0 - 0.1
range. It may be acceptable to be in the 0.1 - 0.2 range. Above that,
however, and you will most likely find that Squid's performance is
unacceptably slow.
<P>
If the ratio is too high, you will need to make some changes to
<ref id="lower-mem-usage" name="lower the
amount of memory Squid uses">.
<sect1>WARNING: Failed to start 'dnsserver'
<P>
This could be a permission problem. Does the Squid userid have
permission to execute the <em/dnsserver/ program?
<P>
You might also try testing <em/dnsserver/ from the command line:
<verb>
> echo oceana.nlanr.net | ./dnsserver
</verb>
Should produce something like:
<verb>
$name oceana.nlanr.net
$h_name oceana.nlanr.net
$h_len 4
$ipcount 1
132.249.40.200
$aliascount 0
$ttl 82067
$end
</verb>
<sect1>Sending in Squid bug reports
<P>
Bug reports for Squid should be sent to the <url url="mailto:squid-bugs@ircache.net"
name="squid-bugs alias">. Any bug report must include
<itemize>
<item>The Squid version
<item>Your Operating System type and version
</itemize>
<sect2>crashes and core dumps
<label id="coredumps">
<P>
There are two conditions under which squid will exit abnormally and
generate a coredump. First, a SIGSEGV or SIGBUS signal will cause Squid
to exit and dump core. Second, many functions include consistency
checks. If one of those checks fail, Squid calls abort() to generate a
core dump.
<P>
Many people report that Squid doesn't leave a coredump anywhere. This may be
due to one of the following reasons:
<itemize>
<item>
Resource Limits. The shell has limits on the size of a coredump
file. You may need to increase the limit.
<item>
No debugging symbols.
The Squid binary must have debugging symbols in order to get
a meaningful coredump.
<item>
Threads and Linux. On Linux, threaded applications do not generate
core dumps. When you use --enable-async-io, it uses threads and
you can't get a coredump.
<item>
It did leave a coredump file, you just can't find it.
</itemize>
<p>
<bf/Resource Limits/:
These limits can usually be changed in
shell scripts. The command to change the resource limits is usually
either <em/limit/ or <em/limits/. Sometimes it is a shell-builtin function,
and sometimes it is a regular program. Also note that you can set resource
limits in the <em>/etc/login.conf</em> file on FreeBSD and maybe other BSD
systems.
<P>
To change the coredumpsize limit you might use a command like:
<verb>
limit coredumpsize unlimited
</verb>
or
<verb>
limits coredump unlimited
</verb>
<p>
<bf/Debugging Symbols/:
To see if your Squid binary has debugging symbols, use this command:
<verb>
% nm /usr/local/squid/bin/squid | head
</verb>
The binary has debugging symbols if you see gobbledegook like this:
<verb>
0812abec B AS_tree_head
080a7540 D AclMatchedName
080a73fc D ActionTable
080908a4 r B_BYTES_STR
080908bc r B_GBYTES_STR
080908ac r B_KBYTES_STR
080908b4 r B_MBYTES_STR
080a7550 D Biggest_FD
08097c0c R CacheDigestHashFuncCount
08098f00 r CcAttrs
</verb>
There are no debugging symbols if you see this instead:
<verb>
/usr/local/squid/bin/squid: no symbols
</verb>
Debugging symbols may have been
removed by your <em/install/ program. If you look at the
squid binary from the source directory, then it might have
the debugging symbols.
<P>
<bf/Coredump Location/:
The core dump file will be left in one of the following locations:
<enum>
<item>The <em/coredump_dir/ directory, if you set that option.
<item>The first <em/cache_dir/ directory if you have used the
<em/cache_effective_user/ option.
<item>The current directory when Squid was started
</enum>
Recent versions of Squid report their current directory after
starting, so look there first:
<verb>
2000/03/14 00:12:36| Set Current Directory to /usr/local/squid/cache
</verb>
If you cannot find a core file, then either Squid does not have
permission to write in its current directory, or perhaps your shell
limits (csh and clones) are preventing the core file from being written.
<p>
Often you can get a coredump if you run Squid from the
command line like this:
<verb>
% limit core un
% /usr/local/squid/bin/squid -NCd1
</verb>
<P>
Once you have located the core dump file, use a debugger such as
<em/dbx/ or <em/gdb/ to generate a stack trace:
<verb>
tirana-wessels squid/src 270% gdb squid /T2/Cache/core
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.15.1 (hppa1.0-hp-hpux10.10), Copyright 1995 Free Software Foundation, Inc...
Core was generated by `squid'.
Program terminated with signal 6, Aborted.
[...]
(gdb) where
#0 0xc01277a8 in _kill ()
#1 0xc00b2944 in _raise ()
#2 0xc007bb08 in abort ()
#3 0x53f5c in __eprintf (string=0x7b037048 "", expression=0x5f <Address 0x5f out of bounds>, line=8, filename=0x6b <Address 0x6b out of bounds>)
#4 0x29828 in fd_open (fd=10918, type=3221514150, desc=0x95e4 "HTTP Request") at fd.c:71
#5 0x24f40 in comm_accept (fd=2063838200, peer=0x7b0390b0, me=0x6b) at comm.c:574
#6 0x23874 in httpAccept (sock=33, notused=0xc00467a6) at client_side.c:1691
#7 0x25510 in comm_select_incoming () at comm.c:784
#8 0x25954 in comm_select (sec=29) at comm.c:1052
#9 0x3b04c in main (argc=1073745368, argv=0x40000dd8) at main.c:671
</verb>
<P>
If possible, you might keep the coredump file around for a day or
two. It is often helpful if we can ask you to send additional
debugger output, such as the contents of some variables.
<sect1>Debugging Squid
<P>
If you believe you have found a non-fatal bug (such as incorrect HTTP
processing) please send us a section of your cache.log with debugging to
demonstrate the problem. The cache.log file can become very large, so
alternatively, you may want to copy it to an FTP or HTTP server where we
can download it.
<P>
It is very simple to
enable full debugging on a running squid process. Simply use the <em/-k debug/
command line option:
<verb>
% ./squid -k debug
</verb>
This causes every <em/debug()/ statement in the source code to write a line
in the <em/cache.log/ file.
You also use the same command to restore Squid to normal debugging.
<P>
To enable selective debugging (e.g. for one source file only), you
need to edit <em/squid.conf/ and add to the <em/debug_options/ line.
Every Squid source file is assigned a different debugging <em/section/.
The debugging section assignments can be found by looking at the top
of individual source files, or by reading the file <em>doc/debug-levels.txt</em>
(correctly renamed to <em/debug-sections.txt/ for Squid-2).
You also specify the debugging <em/level/ to control the amount of
debugging. Higher levels result in more debugging messages.
For example, to enable full debugging of Access Control functions,
you would use
<verb>
debug_options ALL,1 28,9
</verb>
Then you have to restart or reconfigure Squid.
<P>
Once you have the debugging captured to <em/cache.log/, take a look
at it yourself and see if you can make sense of the behaviour which
you see. If not, please feel free to send your debugging output
to the <em/squid-users/ or <em/squid-bugs/ lists.
<sect1>FATAL: ipcache_init: DNS name lookup tests failed
<P>
Squid normally tests your system's DNS configuration before
it starts server requests. Squid tries to resolve some
common DNS names, as defined in the <em/dns_testnames/ configuration
directive. If Squid cannot resolve these names, it could mean:
<enum>
<item>your DNS nameserver is unreachable or not running.
<item>your <em>/etc/resolv.conf</em> file may contain incorrect information.
<item>your <em>/etc/resolv.conf</em> file may have incorrect permissions, and
may be unreadable by Squid.
</enum>
<P>
To disable this feature, use the <em/-D/ command line option.
<P>
Note, Squid does NOT use the <em/dnsservers/ to test the DNS. The
test is performed internally, before the <em/dnsservers/ start.
<sect1>FATAL: Failed to make swap directory /var/spool/cache: (13) Permission denied
<P>
Starting with version 1.1.15, we have required that you first run
<verb>
squid -z
</verb>
to create the swap directories on your filesystem. If you have set the
<em/cache_effective_user/ option, then the Squid process takes on the
given userid before making the directories. If the <em/cache_dir/
directory (e.g. /var/spool/cache) does not exist, and the Squid userid
does not have permission to create it, then you will get the ``permission
denied'' error. This can be simply fixed by manually creating the
cache directory.
<verb>
# mkdir /var/spool/cache
# chown <userid> <groupid> /var/spool/cache
# squid -z
</verb>
<P>
Alternatively, if the directory already exists, then your operating
system may be returning ``Permission Denied'' instead of ``File Exists''
on the mkdir() system call. This
<url url="store.c-mkdir.patch" name="patch">
by
<url url="mailto:miquels@cistron.nl" name="Miquel van Smoorenburg">
should fix it.
<sect1>FATAL: Cannot open HTTP Port
<P>
Either (1) the Squid userid does not have permission to bind to the port, or
(2) some other process has bound itself to the port.
Remember that root privileges are required to open port numbers
less than 1024. If you see this message when using a high port number,
or even when starting Squid as root, then the port has already been
opened by another process.
Maybe you are running in the HTTP Accelerator mode and there is
already a HTTP server running on port 80? If you're really stuck,
install the way cool
<url url="ftp://vic.cc.purdue.edu/pub/tools/unix/lsof/"
name="lsof">
utility to show you which process has your port in use.
<sect1>FATAL: All redirectors have exited!
<P>
This is explained in the <ref id="redirectors-exit" name="Redirector section">.
<sect1>FATAL: file_map_allocate: Exceeded filemap limit
<p>
See the next question.
<sect1>FATAL: You've run out of swap file numbers.
<p>
<em>Note: The information here applies to version 2.2 and earlier.</em>
<P>
Squid keeps an in-memory bitmap of disk files that are
available for use, or are being used. The size of this
bitmap is determined at run name, based on two things:
the size of your cache, and the average (mean) cache object size.
The size of your cache is specified in squid.conf, on the
<em/cache_dir/ lines. The mean object size can also
be specified in squid.conf, with the 'store_avg_object_size'
directive. By default, Squid uses 13 Kbytes as the average size.
<P>
When allocating the bitmaps, Squid allocates this many bits:
<verb>
2 * cache_size / store_avg_object_size
</verb>
So, if you exactly specify the correct average object size,
Squid should have 50% filemap bits free when the cache is full.
You can see how many filemap bits are being used by looking
at the 'storedir' cache manager page. It looks like this:
<verb>
Store Directory #0: /usr/local/squid/cache
First level subdirectories: 4
Second level subdirectories: 4
Maximum Size: 1024000 KB
Current Size: 924837 KB
Percent Used: 90.32%
Filemap bits in use: 77308 of 157538 (49%)
Flags:
</verb>
<P>
Now, if you see the ``You've run out of swap file numbers'' message,
then it means one of two things:
<enum>
<item>
You've found a Squid bug.
<item>
Your cache's average file size is much smaller
than the 'store_avg_object_size' value.
</enum>
To check the average file size of object currently in your
cache, look at the cache manager 'info' page, and you will
find a line like:
<verb>
Mean Object Size: 11.96 KB
</verb>
<P>
To make the warning message go away, set 'store_avg_object_size'
to that value (or lower) and then restart Squid.
<sect1>I am using up over 95% of the filemap bits?!!
<p>
<em>Note: The information here is current for version 2.3</em>
<p>
Calm down, this is now normal. Squid now dynamically allocates
filemap bits based on the number of objects in your cache.
You won't run out of them, we promise.
<sect1>FATAL: Cannot open /usr/local/squid/logs/access.log: (13) Permission denied
<p>
In Unix, things like <em/processes/ and <em/files/ have an <em/owner/.
For Squid, the process owner and file owner should be the same. If they
are not the same, you may get messages like ``permission denied.''
<p>
To find out who owns a file, use the <em/ls -l/ command:
<verb>
% ls -l /usr/local/squid/logs/access.log
</verb>
<p>
A process is normally owned by the user who starts it. However,
Unix sometimes allows a process to change its owner. If you
specified a value for the <em/effective_user/
option in <em/squid.conf/, then that will be the process owner.
The files must be owned by this same userid.
<p>
If all this is confusing, then you probably should not be
running Squid until you learn some more about Unix.
As a reference, I suggest <url url="http://www.oreilly.com/catalog/lunix4/"
name="Learning the UNIX Operating System, 4th Edition">.
<sect1>When using a username and password, I can not access some files.
<P>
<it>If I try by way of a test, to access</it>
<verb>
ftp://username:password@ftpserver/somewhere/foo.tar.gz
</verb>
<it>I get</it>
<verb>
somewhere/foo.tar.gz: Not a directory.
</verb>
<P>
Use this URL instead:
<verb>
ftp://username:password@ftpserver/%2fsomewhere/foo.tar.gz
</verb>
<sect1>pingerOpen: icmp_sock: (13) Permission denied
<P>
This means your <em/pinger/ program does not have root priveleges.
You should either do this:
<verb>
% su
# make install-pinger
</verb>
or
<verb>
# chown root /usr/local/squid/bin/pinger
# chmod 4755 /usr/local/squid/bin/pinger
</verb>
<sect1>What is a forwarding loop?
<P>
A forwarding loop is when a request passes through one proxy more than
once. You can get a forwarding loop if
<itemize>
<item>a cache forwards requests to itself. This might happen with
transparent caching (or server acceleration) configurations.
<item>a pair or group of caches forward requests to each other. This can
happen when Squid uses ICP, Cache Digests, or the ICMP RTT database
to select a next-hop cache.
</itemize>
<P>
Forwarding loops are detected by examining the <em/Via/ request header.
Each cache which "touches" a request must add its hostname to the
<em/Via/ header. If a cache notices its own hostname in this header
for an incoming request, it knows there is a forwarding loop somewhere.
NOTE:
A pair of caches which have the same <em/visible_hostname/ value
will report forwarding loops.
<P>
When Squid detects a forwarding loop, it is logged to the <em/cache.log/
file with the recieved <em/Via/ header. From this header you can determine
which cache (the last in the list) forwarded the request to you.
<P>
One way to reduce forwarding loops is to change a <em/parent/
relationship to a <em/sibling/ relationship.
<P>
Another way is to use <em/cache_peer_access/ rules. For example:
<verb>
# Our parent caches
cache_peer A.example.com parent 3128 3130
cache_peer B.example.com parent 3128 3130
cache_peer C.example.com parent 3128 3130
# An ACL list
acl PEERS src A.example.com
acl PEERS src B.example.com
acl PEERS src C.example.com
# Prevent forwarding loops
cache_peer_access A.example.com allow !PEERS
cache_peer_access B.example.com allow !PEERS
cache_peer_access C.example.com allow !PEERS
</verb>
The above configuration instructs squid to NOT forward a request
to parents A, B, or C when a request is received from any one
of those caches.
<sect1>accept failure: (71) Protocol error
<P>
This error message is seen mostly on Solaris systems.
<url url="mailto:mtk@ny.ubs.com" name="Mark Kennedy">
gives a great explanation:
<quote>
Error 71 &lsqb;EPROTO&rsqb; is an obscure way of reporting that clients made it onto your
server's TCP incoming connection queue but the client tore down the
connection before the server could accept it. I.e. your server ignored
its clients for too long. We've seen this happen when we ran out of
file descriptors. I guess it could also happen if something made squid
block for a long time.
</quote>
<sect1>storeSwapInFileOpened: ... Size mismatch
<P>
<it>
Got these messages in my cache log - I guess it means that the index
contents do not match the contents on disk.
</it>
<verb>
1998/09/23 09:31:30| storeSwapInFileOpened: /var/cache/00/00/00000015: Size mismatch: 776(fstat) != 3785(object)
1998/09/23 09:31:31| storeSwapInFileOpened: /var/cache/00/00/00000017: Size mismatch: 2571(fstat) != 4159(object)
</verb>
<P>
<it>
What does Squid do in this case?
</it>
<P>
NOTE, these messages are specific to Squid-2. These happen when Squid
reads an object from disk for a cache hit. After it opens the file,
Squid checks to see if the size is what it expects it should be. If the
size doesn't match, the error is printed. In this case, Squid does not
send the wrong object to the client. It will re-fetch the object from
the source.
<sect1>Why do I get <em>fwdDispatch: Cannot retrieve 'https://www.buy.com/corp/ordertracking.asp'</em>
<P>
These messages are caused by buggy clients, mostly Netscape Navigator.
What happens is, Netscape sends an HTTPS/SSL request over a persistent HTTP connection.
Normally, when Squid gets an SSL request, it looks like this:
<verb>
CONNECT www.buy.com:443 HTTP/1.0
</verb>
Then Squid opens a TCP connection to the destination host and port, and
the <em/real/ request is sent encrypted over this connection. Thats the
whole point of SSL, that all of the information must be sent encrypted.
<P>
With this client bug, however, Squid receives a request like this:
<verb>
GET https://www.buy.com/corp/ordertracking.asp HTTP/1.0
Accept: */*
User-agent: Netscape ...
...
</verb>
Now, all of the headers, and the message body have been sent, <em/unencrypted/
to Squid. There is no way for Squid to somehow turn this into an SSL request.
The only thing we can do is return the error message.
<P>
Note, this browser bug does represent a security risk because the browser
is sending sensitive information unencrypted over the network.
<sect1>Squid can't access URLs like http://3626046468/ab2/cybercards/moreinfo.html
<p>
by Dave J Woolley (DJW at bts dot co dot uk)
<p>
These are illegal URLs, generally only used by illegal sites;
typically the web site that supports a spammer and is expected to
survive a few hours longer than the spamming account.
<p>
Their intention is to:
<itemize>
<item>
confuse content filtering rules on proxies, and possibly
some browsers' idea of whether they are trusted sites on
the local intranet;
<item>
confuse whois (?);
<item>
make people think they are not IP addresses and unknown
domain names, in an attempt to stop them trying to locate
and complain to the ISP.
</itemize>
<p>
Any browser or proxy that works with them should be considered a
security risk.
<p>
<url url="http://www.ietf.org/rfc/rfc1738.txt" name="RFC 1738">
has this to say about the hostname part of a URL:
<quote>
The fully qualified domain name of a network host, or its IP
address as a set of four decimal digit groups separated by
".". Fully qualified domain names take the form as described
in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123
[5]: a sequence of domain labels separated by ".", each domain
label starting and ending with an alphanumerical character and
possibly also containing "-" characters. The rightmost domain
label will never start with a digit, though, which
syntactically distinguishes all domain names from the IP
addresses.
</quote>
<sect1>I get a lot of ``URI has whitespace'' error messages in my cache log, what should I do?
<p>
Whitespace characters (space, tab, newline, carriage return) are
not allowed in URI's and URL's. Unfortunately, a number of Web services
generate URL's with whitespace. Of course your favorite browser silently
accomodates these bad URL's. The servers (or people) that generate
these URL's are in violation of Internet standards. The whitespace
characters should be encoded.
<P>
If you want Squid to accept URL's with whitespace, you have to
decide how to handle them. There are four choices that you
can set with the <em/uri_whitespace/ option:
<enum>
<item>
DENY:
The request is denied with an ``Invalid Request'' message.
This is the default.
<item>
ALLOW:
The request is allowed and the URL remains unchanged.
<item>
ENCODE:
The whitespace characters are encoded according to
<url url="http://www.ietf.org/rfc/rfc1738.txt"
name="RFC 1738">. This can be considered a violation
of the HTTP specification.
<item>
CHOP:
The URL is chopped at the first whitespace character
and then processed normally. This also can be considered
a violation of HTTP.
</enum>
<sect1>commBind: Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address
<label id="comm-bind-loopback-fail">
<p>
This likely means that your system does not have a loopback network device, or
that device is not properly configured.
All Unix systems should have a network device named <em/lo0/, and it should
be configured with the address 127.0.0.1. If not, you may get the above
error message.
To check your system, run:
<verb>
% ifconfig lo0
</verb>
The result should look something like:
<verb>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
inet 127.0.0.1 netmask 0xff000000
</verb>
<p>
If you use FreeBSD, see <ref id="freebsd-no-lo0" name="this">.
<sect1>Unknown cache_dir type '/var/squid/cache'
<p>
The format of the <em/cache_dir/ option changed with version
2.3. It now takes a <em/type/ argument. All you need to do
is insert <tt/ufs/ in the line, like this:
<verb>
cache_dir ufs /var/squid/cache ...
</verb>
<sect1>unrecognized: 'cache_dns_program /usr/local/squid/bin/dnsserver'
<p>
As of Squid 2.3, the default is to use internal DNS lookup code.
The <em/cache_dns_program/ and <em/dns_children/ options are not
known squid.conf directives in this case. Simply comment out
these two options.
<p>
If you want to use external DNS lookups, with the <em/dnsserver/
program, then add this to your configure command:
<verb>
--disable-internal-dns
</verb>
<sect1>Is <em/dns_defnames/ broken in 2.3.STABLE1 and STABLE2?
<p>
Sort of. As of Squid 2.3, the default is to use internal DNS lookup code.
The <em/dns_defnames/ option is only used with the external <em/dnsserver/
processes. If you relied on <em/dns_defnames/ before, you have three choices:
<enum>
<item>
See if the <em/append_domain/ option will work for you instead.
<item>
Configure squid with --disable-internal-dns to use the external
dnsservers.
<item>
Enhance <em>src/dns_internal.c</em> to understand the <tt/search/
and <tt/domain/ lines from <em>/etc/resolv.conf</em>.
</enum>
<sect1>What does <em>sslReadClient: FD 14: read failure: (104) Connection reset by peer</em> mean?
<p>
``Connection reset by peer'' is an error code that Unix operating systems
sometimes return for <em/read/, <em/write/, <em/connect/, and other
system calls.
<p>
Connection reset means that the other host, the peer, sent us a RESET
packet on a TCP connection. A host sends a RESET when it receives
an unexpected packet for a nonexistent connection. For example, if
one side sends data at the same time that the other side closes
a connection, when the other side receives the data it may send
a reset back.
<p>
The fact that these messages appear in Squid's log might indicate
a problem, such as a broken origin server or parent cache. On
the other hand, they might be ``normal,'' especially since
some applications are known to force connection resets rather
than a proper close.
<p>
You probably don't need to worry about them, unless you receive
a lot of user complaints relating to SSL sites.
<p>
<url url="raj at cup dot hp dot com" name="Rick Jones"> notes that
if the server is running a Microsoft TCP stack, clients
receive RST segments whenever the listen queue overflows. In other words,
if the server is really busy, new connections receive the reset message.
This is contrary to rational behaviour, but is unlikely to change.
<sect1>What does <em>Connection refused</em> mean?
<p>
This is an error message, generated by your operating system,
in response to a <em/connect()/ system call. It happens when
there is no server at the other end listening on the port number
that we tried to connect to.
<p>
Its quite easy to generate this error on your own. Simply
telnet to a random, high numbered port:
<verb>
% telnet localhost 12345
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
</verb>
It happens because there is no server listening for connections
on port 12345.
<p>
When you see this in response to a URL request, it probably means
the origin server web site is temporarily down. It may also mean
that your parent cache is down, if you have one.
<sect1>squid: ERROR: no running copy
<p>
You may get this message when you run commands like <tt/squid -krotate/.
<p>
This error message usually means that the <em/squid.pid/ file is
missing. Since the PID file is normally present when squid is running,
the absence of the PID file usually means Squid is not running.
If you accidentally delete the PID file, Squid will continue running, and
you won't be able to send it any signals.
<p>
If you accidentally removed the PID file, there are two ways to get it back.
<enum>
<item>run <tt/ps/ and find the Squid process id. You'll probably see
two processes, like this:
<verb>
bender-wessels % ps ax | grep squid
83617 ?? Ss 0:00.00 squid -s
83619 ?? S 0:00.48 (squid) -s (squid)
</verb>
You want the second process id, 83619 in this case. Create the PID file and put the
process id number there. For example:
<verb>
echo 83619 > /usr/local/squid/logs/squid.pid
</verb>
<item>
Use the above technique to find the Squid process id. Send the process a HUP
signal, which is the same as <tt/squid -kreconfigure/:
<verb>
kill -HUP 83619
</verb>
The reconfigure process creates a new PID file automatically.
</enum>
<sect1>FATAL: getgrnam failed to find groupid for effective group 'nogroup'
<p>
You are probably starting Squid as root. Squid is trying to find
a group-id that doesn't have any special priveleges that it will
run as. The default is <em/nogroup/, but this may not be defined
on your system. You need to edit <em/squid.conf/ and set
<em/cache_effective_group/ to the name of an unpriveledged group
from <em>/etc/group</em>. There is a good chance that <em/nobody/
will work for you.
<sect1>``Unsupported Request Method and Protocol'' for <em/https/ URLs.
<p>
<em>Note: The information here is current for version 2.3.</em>
<p>
This is correct. Squid does not know what to do with an <em/https/
URL. To handle such a URL, Squid would need to speak the SSL
protocol. Unfortunately, it does not (yet).
<p>
Normally, when you type an <em/https/ URL into your browser, one of
two things happens.
<enum>
<item>The browser opens an SSL connection directly to the origin
server.
<item>The browser tunnels the request through Squid with the
<em/CONNECT/ request method.
</enum>
<p>
The <em/CONNECT/ method is a way to tunnel any kind of
connection through an HTTP proxy. The proxy doesn't
understand or interpret the contents. It just passes
bytes back and forth between the client and server.
For the gory details on tunnelling and the CONNECT
method, please see
<url url="ftp://ftp.isi.edu/in-notes/rfc2817.txt" name="RFC 2817">
and <url url="http://www.web-cache.com/Writings/Internet-Drafts/draft-luotonen-web-proxy-tunneling-01.txt"
name="Tunneling TCP based protocols through Web proxy servers"> (expired).
<sect1>Squid uses 100% CPU
<p>
There may be many causes for this.
<p>
Andrew Doroshenko reports that removing <em>/dev/null</em>, or
mounting a filesystem with the <em>nodev</em> option, can cause
Squid to use 100% of CPU. His suggested solution is to ``touch /dev/null.''
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>How does Squid work?
<sect1>What are cachable objects?
<P>
An Internet Object is a file, document or response to a query for
an Internet service such as FTP, HTTP, or gopher. A client requests
an Internet object from a caching proxy; if the object
is not already cached, the proxy server fetches
the object (either from the host specified in the URL or from a
parent or sibling cache) and delivers it to the client.
<sect1>What is the ICP protocol?
<label id="what-is-icp">
<P>
ICP is a protocol used for communication among squid caches.
The ICP protocol is defined in two Internet RFC's.
<url url="http://www.ircache.net/Cache/ICP/rfc2186.txt"
name="RFC 2186">
describes the protocol itself, while
<url url="http://www.ircache.net/Cache/ICP/rfc2187.txt"
name="RFC 2187">
describes the application of ICP to hierarchical Web caching.
<P>
ICP is primarily used within a cache hierarchy to locate specific
objects in sibling caches. If a squid cache does not have a
requested document, it sends an ICP query to its siblings, and the
siblings respond with ICP replies indicating a ``HIT'' or a ``MISS.''
The cache then uses the replies to choose from which cache to
resolve its own MISS.
<P>
ICP also supports multiplexed transmission of multiple object
streams over a single TCP connection. ICP is currently implemented
on top of UDP. Current versions of Squid also support ICP via
multicast.
<sect1>What is the <em/dnsserver/?
<P>
The <em/dnsserver/ is a process forked by <em/squid/ to
resolve IP addresses from domain names. This is necessary because
the <tt>gethostbyname(3)</tt> function blocks the calling process
until the DNS query is completed.
<P>
Squid must use non-blocking I/O at all times, so DNS lookups are
implemented external to the main process. The <em/dnsserver/
processes do not cache DNS lookups, that is implemented inside the
<em/squid/ process.
<P>
<sect1>What is the <em/ftpget/ program for?
<P>
<em/ftpget/ exists only in Squid 1.1 and Squid 1.0 versions.
<P>
The <em/ftpget/ program is an FTP client used for retrieving
files from FTP servers. Because the FTP protocol is complicated,
it is easier to implement it separately from the main <em/squid/
code.
<P>
<sect1>FTP PUT's don't work!
<P>
FTP PUT should work with Squid-2.0 and later versions. If you
are using Squid-1.1, then you need to upgrade before PUT will work.
<sect1>What is a cache hierarchy? What are parents and siblings?
<P>
A cache hierarchy is a collection of caching proxy servers organized
in a logical parent/child and sibling arrangement so that caches
closest to Internet gateways (closest to the backbone transit
entry-points) act as parents to caches at locations farther from
the backbone. The parent caches resolve ``misses'' for their children.
In other words, when a cache requests an object from its parent,
and the parent does not have the object in its cache, the parent
fetches the object, caches it, and delivers it to the child. This
ensures that the hierarchy achieves the maximum reduction in
bandwidth utilization on the backbone transit links, helps reduce
load on Internet information servers outside the network served by
the hierarchy, and builds a rich cache on the parents so that the
other child caches in the hierarchy will obtain better ``hit'' rates
against their parents.
<P>
In addition to the parent-child relationships, squid supports the
notion of siblings: caches at the same level in the hierarchy,
provided to distribute cache server load. Each cache in the
hierarchy independently decides whether to fetch the reference from
the object's home site or from parent or sibling caches, using a
a simple resolution protocol. Siblings will not fetch an object
for another sibling to resolve a cache ``miss.''
<sect1>What is the Squid cache resolution algorithm?
<P>
<itemize>
<item>Send ICP queries to all appropriate siblings
<item>Wait for all replies to arrive with a configurable timeout
(the default is two seconds).
<item>Begin fetching the object upon receipt of the first HIT reply,
or
<item>Fetch the object from the first parent which replied with MISS
(subject to weighting values), or
<item>Fetch the object from the source
</itemize>
<P>
The algorithm is somewhat more complicated when firewalls
are involved.
<P>
The <tt/single_parent_bypass/ directive can be used to skip
the ICP queries if the only appropriate sibling is a parent cache
(i.e., if there's only one place you'd fetch the object from, why
bother querying?)
<sect1>What features are Squid developers currently working on?
<P>
There are several open issues for the caching project namely
more automatic load balancing and (both configured and
dynamic) selection of parents, routing, multicast
cache-to-cache communication, and better recognition of URLs
that are not worth caching.
<P>
For our other to-do list items, please
see our ``TODO'' file in the recent source distributions.
<P>
Prospective developers should review the resources available at the
<url url="http://www.squid-cache.org/Devel/"
name="Squid developers corner">
<sect1>Tell me more about Internet traffic workloads
<P>
Workload can be characterized as the burden a client or
group of clients imposes on a system. Understanding the
nature of workloads is important to the managing system
capacity.
If you are interested in Internet traffic workloads then NLANR's
<url url="http://www.nlanr.net/NA/"
name="Network Analysis activities"> is a good place to start.
<sect1>What are the tradeoffs of caching with the NLANR cache system?
<P>
The NLANR root caches are at the NSF supercomputer centers (SCCs),
which are interconnected via NSF's high speed backbone service
(vBNS). So inter-cache communication between the NLANR root caches
does not cross the Internet.
<P>
The benefits of hierarchical caching (namely, reduced network
bandwidth consumption, reduced access latency, and improved
resiliency) come at a price. Caches higher in the hierarchy must
field the misses of their descendents. If the equilibrium hit rate
of a leaf cache is 50%, half of all leaf references have to be
resolved through a second level cache rather than directly from
the object's source. If this second level cache has most of the
documents, it is usually still a win, but if higher level caches
often don't have the document, or become overloaded, then they
could actually increase access latency, rather than reduce it.
<P>
<sect1>Where can I find out more about firewalls?
<P>
Please see the
<url url="http://lists.gnac.net/firewalls/"
name="Firewalls mailing list and FAQ">
information site.
<sect1>What is the ``Storage LRU Expiration Age?''
<P>
For example:
<verb>
Storage LRU Expiration Age: 4.31 days
</verb>
<P>
The LRU expiration age is a dynamically-calculated value. Any objects
which have not been accessed for this amount of time will be removed from
the cache to make room for new, incoming objects. Another way of looking
at this is that it would
take your cache approximately this many days to go from empty to full at
your current traffic levels.
<P>
As your cache becomes more busy, the LRU age becomes lower so that more
objects will be removed to make room for the new ones. Ideally, your
cache will have an LRU age value in the range of at least 3 days. If the
LRU age is lower than 3 days, then your cache is probably not big enough
to handle the volume of requests it receives. By adding more disk space
you could increase your cache hit ratio.
<P>
The configuration parameter <em/reference_age/ places an upper limit on
your cache's LRU expiration age.
<sect1>What is ``Failure Ratio at 1.01; Going into hit-only-mode for 5 minutes''?
<P>
Consider a pair of caches named A and B. It may be the case that A can
reach B, and vice-versa, but B has poor reachability to the rest of the
Internet.
In this case, we would like B to recognize that it has poor reachability
and somehow convey this fact to its neighbor caches.
<P>
Squid will track the ratio of failed-to-successful requests over short
time periods. A failed request is one which is logged as ERR_DNS_FAIL, ERR_CONNECT_FAIL, or ERR_READ_ERROR. When the failed-to-successful ratio exceeds 1.0,
then Squid will return ICP_MISS_NOFETCH instead of ICP_MISS to neighbors.
Note, Squid will still return ICP_HIT for cache hits.
<sect1>Does squid periodically re-read its configuration file?
<P>
No, you must send a HUP signal to have Squid re-read its configuration file,
including access control lists. An easy way to do this is with the <em/-k/
command line option:
<verb>
squid -k reconfigure
</verb>
<sect1>How does <em/unlinkd/ work?
<P>
<em/unlinkd/ is an external process used for unlinking unused cache files.
Performing the unlink operation in an external process opens up some
race-condition problems for Squid. If we are not careful, the following
sequence of events could occur:
<enum>
<item>
An object with swap file number <bf/S/ is removed from the cache.
<item>
We want to unlink file <bf/F/ which corresponds to swap file number <bf/S/,
so we write pathname <bf/F/ to the <em/unlinkd/ socket.
We also mark <bf/S/ as available in the filemap.
<item>
We have a new object to swap out. It is allocated to the first available
file number, which happens to be <bf/S/. Squid opens file <bf/F/ for writing.
<item>
The <em/unlinkd/ process reads the request to unlink <bf/F/ and issues the
actual unlink call.
</enum>
<P>
So, the problem is, how can we guarantee that <em/unlinkd/ will not
remove a cache file that Squid has recently allocated to a new object?
The approach we have taken is to have Squid keep a stack of unused (but
not deleted!) swap file numbers. The stack size is hard-coded at 128
entries. We only give unlink requests to <em/unlinkd/ when the unused
file number stack is full. Thus, if we ever have to start unlinking
files, we have a pool of 128 file numbers to choose from which we know
will not be removed by <em/unlinkd/.
<P>
In terms of implementation, the only way to send unlink requests to
the <em/unlinkd/ process is via the <em/storePutUnusedFileno/ function.
<P>
Unfortunately there are times when Squid can not use the <em/unlinkd/ process
but must call <em/unlink(2)/ directly. One of these times is when the cache
swap size is over the high water mark. If we push the released file numbers
onto the unused file number stack, and the stack is not full, then no files
will be deleted, and the actual disk usage will remain unchanged. So, when
we exceed the high water mark, we must call <em/unlink(2)/ directly.
<sect1>What is an icon URL?
<P>
One of the most unpleasant things Squid must do is generate HTML
pages of Gopher and FTP directory listings. For some strange
reason, people like to have little <em/icons/ next to each
listing entry, denoting the type of object to which the
link refers (image, text file, etc.).
<P>
In Squid 1.0 and 1.1, we used internal browser icons with names
like <em/gopher-internal-image/. Unfortunately, these were
not very portable. Not all browsers had internal icons, or
even used the same names. Perhaps only Netscape and Mosaic
used these names.
<P>
For Squid 2 we include a set of icons in the source distribution.
These icon files are loaded by Squid as cached objects at runtime.
Thus, every Squid cache now has its own icons to use in Gopher and FTP
listings. Just like other objects available on the web, we refer to
the icons with
<url url="http://info.internet.isi.edu/in-notes/rfc/files/rfc1738.txt"
name="Uniform Resource Locators">, or <em/URLs/.
<sect1>Can I make my regular FTP clients use a Squid cache?
<P>
Nope, its not possible. Squid only accepts HTTP requests. It speaks
FTP on the <em/server-side/, but <bf/not/ on the <em/client-side/.
<P>
The very cool
<url url="ftp://gnjilux.cc.fer.hr/pub/unix/util/wget/"
name="wget">
will download FTP URLs via Squid (and probably any other proxy cache).
<sect1>Why is the select loop average time so high?
<P>
<it>
Is there any way to speed up the time spent dealing with select? Cachemgr
shows:
</it>
<verb>
Select loop called: 885025 times, 714.176 ms avg
</verb>
<P>
This number is NOT how much time it takes to handle filedescriptor I/O.
We simply count the number of times select was called, and divide the
total process running time by the number of select calls.
<P>
This means, on average it takes your cache .714 seconds to check all
the open file descriptors once. But this also includes time select()
spends in a wait state when there is no I/O on any file descriptors.
My relatively idle workstation cache has similar numbers:
<verb>
Select loop called: 336782 times, 715.938 ms avg
</verb>
But my busy caches have much lower times:
<verb>
Select loop called: 16940436 times, 10.427 ms avg
Select loop called: 80524058 times, 10.030 ms avg
Select loop called: 10590369 times, 8.675 ms avg
Select loop called: 84319441 times, 9.578 ms avg
</verb>
<sect1>How does Squid deal with Cookies?
<P>
The presence of Cookies headers in <bf/requests/ does not affect whether
or not an HTTP reply can be cached. Similarly, the presense of
<em/Set-Cookie/ headers in <bf/replies/ does not affect whether
the reply can be cached.
<P>
The proper way to deal with <em/Set-Cookie/ reply headers, according
to <url url="http://info.internet.isi.edu/in-notes/rfc/files/rfc2109.txt" name="RFC 2109">
is to cache the whole object, <em/EXCEPT/ the <em/Set-Cookie/ header lines.
<P>
With Squid-1.1, we can not filter out specific HTTP headers, so
Squid-1.1 does not cache any response which contains a <em/Set-Cookie/
header.
<P>
With Squid-2, however, we can filter out specific HTTP headers. But instead
of filtering them on the receiving-side, we filter them on the sending-side.
Thus, Squid-2 does cache replies with <em/Set-Cookie/ headers, but
it filters out the <em/Set-Cookie/ header itself for cache hits.
<sect1>How does Squid decide when to refresh a cached object?
<P>
When checking the object freshness, we calculate these values:
<itemize>
<item>
<em/OBJ_DATE/ is the time when the object was given out by the
origin server. This is taken from the HTTP Date reply header.
<item>
<em/OBJ_LASTMOD/ is the time when the object was last modified,
given by the HTTP Last-Modified reply header.
<item>
<em/OBJ_AGE/ is how much the object has aged <em/since/ it was retrieved:
<verb>
OBJ_AGE = NOW - OBJ_DATE
</verb>
<item>
<em/LM_AGE/ is how old the object was <em/when/ it was retrieved:
<verb>
LM_AGE = OBJ_DATE - OBJ_LASTMOD
</verb>
<item>
<em/LM_FACTOR/ is the ratio of <em/OBJ_AGE/ to <em/LM_AGE/:
<verb>
LM_FACTOR = OBJ_AGE / LM_AGE
</verb>
<item>
<em/CLIENT_MAX_AGE/ is the (optional) maximum object age the client will
accept as taken from the HTTP/1.1 Cache-Control request header.
<item>
<em/EXPIRES/ is the (optional) expiry time from the server reply headers.
</itemize>
<P>
These values are compared with the parameters of the <em/refresh_pattern/
rules. The refresh parameters are:
<itemize>
<item>URL regular expression
<item><em/CONF_MIN/:
The time (in minutes) an object without an explicit expiry
time should be considered fresh. The recommended value is 0, any higher
values may cause dynamic applications to be erronously cached unless the
application designer has taken the appropriate actions.
<item><em/CONF_PERCENT/:
A percentage of the objects age (time since last
modification age) an object without explicit exipry time will be
considered fresh.
<item><em/CONF_MAX/:
An upper limit on how long objects without an explicit
expiry time will be considered fresh.
</itemize>
<P>
The URL regular expressions are checked in the order listed until a
match is found. Then the algorithms below are applied for determining
if an object is fresh or stale.
<sect2>Squid-1.1 and Squid-1.NOVM algorithm
<P>
<verb>
if (CLIENT_MAX_AGE)
if (OBJ_AGE > CLIENT_MAX_AGE)
return STALE
if (OBJ_AGE <= CONF_MIN)
return FRESH
if (EXPIRES) {
if (EXPIRES <= NOW)
return STALE
else
return FRESH
}
if (OBJ_AGE > CONF_MAX)
return STALE
if (LM_FACTOR < CONF_PERCENT)
return FRESH
return STALE
</verb>
<P>
<url url="mailto:bertold@tohotom.vein.hu" name="Kolics Bertold">
has made an excellent
<url url="http://www.squid-cache.org/Doc/FAQ/refresh-flowchart.gif"
name="flow chart diagram"> showing this process.
<sect2>Squid-2 algorithm
<P>
For Squid-2 the refresh algorithm has been slightly modified to give the
<em/EXPIRES/ value a higher precedence, and the <em/CONF_MIN/ value
lower precedence:
<verb>
if (EXPIRES) {
if (EXPIRES <= NOW)
return STALE
else
return FRESH
}
if (CLIENT_MAX_AGE)
if (OBJ_AGE > CLIENT_MAX_AGE)
return STALE
if (OBJ_AGE > CONF_MAX)
return STALE
if (OBJ_DATE > OBJ_LASTMOD) {
if (LM_FACTOR < CONF_PERCENT)
return FRESH
else
return STALE
}
if (OBJ_AGE <= CONF_MIN)
return FRESH
return STALE
</verb>
<sect1>What exactly is a <em/deferred read/?
<P>
The cachemanager I/O page lists <em/deferred reads/ for various
server-side protocols.
<P>
Sometimes reading on the server-side gets ahead of writing to the
client-side. Especially if your cache is on a fast network and your
clients are connected at modem speeds. Squid-1.1 will read up to 256k
(per request) ahead before it starts to defer the server-side reads.
<sect1>Why is my cache's inbound traffic equal to the outbound traffic?
<P>
<it>
I've been monitoring
the traffic on my cache's ethernet adapter an found a behavior I can't explain:
the inbound traffic is equal to the outbound traffic. The differences are
negligible. The hit ratio reports 40%.
Shouldn't the outbound be at least 40% greater than the inbound?
</it>
<P>
by <url url="mailto:david@avarice.nepean.uws.edu.au" name="David J N Begley">
<P>
I can't account for the exact behavior you're seeing, but I can offer this
advice; whenever you start measuring raw Ethernet or IP traffic on
interfaces, you can forget about getting all the numbers to exactly match what
Squid reports as the amount of traffic it has sent/received.
<P>
Why?
<P>
Squid is an application - it counts whatever data is sent to, or received
from, the lower-level networking functions; at each successively lower layer,
additional traffic is involved (such as header overhead, retransmits and
fragmentation, unrelated broadcasts/traffic, etc.). The additional traffic is
never seen by Squid and thus isn't counted - but if you run MRTG (or any
SNMP/RMON measurement tool) against a specific interface, all this additional
traffic will "magically appear".
<P>
Also remember that an interface has no concept of upper-layer networking (so
an Ethernet interface doesn't distinguish between IP traffic that's entirely
internal to your organization, and traffic that's to/from the Internet); this
means that when you start measuring an interface, you have to be aware of
*what* you are measuring before you can start comparing numbers elsewhere.
<P>
It is possible (though by no means guaranteed) that you are seeing roughly
equivalent input/output because you're measuring an interface that both
retrieves data from the outside world (Internet), *and* serves it to end users
(internal clients). That wouldn't be the whole answer, but hopefully it gives
you a few ideas to start applying to your own circumstance.
<P>
To interpret any statistic, you have to first know what you are measuring;
for example, an interface counts inbound and outbound bytes - that's it. The
interface doesn't distinguish between inbound bytes from external Internet
sites or from internal (to the organization) clients (making requests). If
you want that, try looking at RMON2.
<P>
Also, if you're talking about a 40% hit rate in terms of object
requests/counts then there's absolutely no reason why you should expect a 40%
reduction in traffic; after all, not every request/object is going to be the
same size so you may be saving a lot in terms of requests but very little in
terms of actual traffic.
<sect1>How come some objects do not get cached?
<P>
To determine whether a given object may be cached, Squid takes many
things into consideration. The current algorithm (for Squid-2)
goes something like this:
<enum>
<item>
Responses with <em/Cache-Control: Private/ are NOT cachable.
<item>
Responses with <em/Cache-Control: No-Cache/ are NOT cachable.
<item>
Responses with <em/Cache-Control: No-Store/ are NOT cachable.
<item>
Responses for requests with an <em/Authorization/ header
are cachable ONLY if the reponse includes <em/Cache-Control: Public/.
<item>
Responses with <em/Vary/ headers are NOT cachable because Squid
does not yet support Vary features.
<item>
The following HTTP status codes are cachable:
<itemize>
<item>200 OK
<item>203 Non-Authoritative Information
<item>300 Multiple Choices
<item>301 Moved Permanently
<item>410 Gone
</itemize>
However, if Squid receives one of these responses from a neighbor
cache, it will NOT be cached if ALL of the <em/Date/, <em/Last-Modified/,
and <em/Expires/ reply headers are missing. This prevents such objects
from bouncing back-and-forth between siblings forever.
<item>
A 302 Moved Temporarily response is cachable ONLY if the response
also includes an <em/Expires/ header.
<item>
The following HTTP status codes are ``negatively cached'' for
a short amount of time (configurable):
<itemize>
<item>204 No Content
<item>305 Use Proxy
<item>400 Bad Request
<item>403 Forbidden
<item>404 Not Found
<item>405 Method Not Allowed
<item>414 Request-URI Too Large
<item>500 Internal Server Error
<item>501 Not Implemented
<item>502 Bad Gateway
<item>503 Service Unavailable
<item>504 Gateway Time-out
</itemize>
<item>
All other HTTP status codes are NOT cachable, including:
<itemize>
<item>206 Partial Content
<item>303 See Other
<item>304 Not Modified
<item>401 Unauthorized
<item>407 Proxy Authentication Required
</itemize>
</enum>
<sect1>What does <em/keep-alive ratio/ mean?
<P>
The <em/keep-alive ratio/ shows up in the <em/server_list/
cache manager page for Squid 2.
<P>
This is a mechanism to try detecting neighbor caches which might
not be able to deal with HTTP/1.1 persistent connections. Every
time we send a <em/proxy-connection: keep-alive/ request header
to a neighbor, we count how many times the neighbor sent us
a <em/proxy-connection: keep-alive/ reply header. Thus, the
<em/keep-alive ratio/ is the ratio of these two counters.
<P>
If the ratio stays above 0.5, then we continue to assume the neighbor
properly implements persistent connections. Otherwise, we will stop
sending the keep-alive request header to that neighbor.
<sect1>How does Squid's cache replacement algorithm work?
<P>
Squid uses an LRU (least recently used) algorithm to replace old cache
objects. This means objects which have not been accessed for the
longest time are removed first. In the source code, the
StoreEntry->lastref value is updated every time an object is accessed.
<P>
Objects are not necessarily removed ``on-demand.'' Instead, a regularly
scheduled event runs to periodically remove objects. Normally this
event runs every second.
<P>
Squid keeps the cache disk usage between the low and high water marks.
By default the low mark is 90%, and the high mark is 95% of the total
configured cache size. When the disk usage is close to the low mark,
the replacement is less aggressive (fewer objects removed). When the
usage is close to the high mark, the replacement is more aggressive
(more objects removed).
<P>
When selecting objects for removal, Squid examines some number of objects
and determines which can be removed and which cannot.
A number of factors determine whether or not any given object can be
removed. If the object is currently being requested, or retrieved
from an upstream site, it will not be removed. If the object is
``negatively-cached'' it will be removed. If the object has a private
cache key, it will be removed (there would be no reason to keep it --
because the key is private, it can never be ``found'' by subsequent requests).
Finally, if the time since last access is greater than the LRU threshold,
the object is removed.
<P>
The LRU threshold value is dynamically calculated based on the current
cache size and the low and high marks. The LRU threshold scaled
exponentially between the high and low water marks. When the store swap
size is near the low water mark, the LRU threshold is large. When the
store swap size is near the high water mark, the LRU threshold is small.
The threshold automatically adjusts to the rate of incoming requests.
In fact, when your cache size has stabilized, the LRU threshold
represents how long it takes to fill (or fully replace) your cache at
the current request rate. Typical values for the LRU threshold are 1 to
10 days.
<P>
Back to selecting objects for removal. Obviously it is not possible to
check every object in the cache every time we need to remove some of them.
We can only check a small subset each time. The way in which
this is implemented is very different between Squid-1.1 and Squid-2.
<sect2>Squid 1.1
<P>
The Squid cache storage is implemented as a hash table with some number
of "hash buckets." Squid-1.1 scans one bucket at a time and sorts all the
objects in the bucket by their LRU age. Objects with an LRU age
over the threshold are removed. The scan rate is adjusted so that
it takes approximately 24 hours to scan the entire cache. The
store buckets are randomized so that we don't always scan the same
buckets at the same time of the day.
<P>
This algorithm has some flaws. Because we only scan one bucket,
there are going to be better candidates for removal in some of
the other 16,000 or so buckets. Also, the qsort() function
might take a non-trivial amount of CPU time, depending on how many
entries are in each bucket.
<sect2>Squid 2
<P>
For Squid-2 we eliminated the need to use qsort() by indexing
cached objects into an automatically sorted linked list. Every time
an object is accessed, it gets moved to the top of the list. Over time,
the least used objects migrate to the bottom of the list. When looking
for objects to remove, we only need to check the last 100 or so objects
in the list. Unfortunately this approach increases our memory usage
because of the need to store three additional pointers per cache object.
But for Squid-2 we're still ahead of the game because we also replaced
plain-text cache keys with MD5 hashes.
<sect1>What are private and public keys?
<label id="pub-priv-keys">
<P>
<em/keys/ refers to the database keys which Squid uses to index
cache objects. Every object in the cache--whether saved on disk
or currently being downloaded--has a cache key. For Squid-1.0 and
Squid-1.1 the cache key was basically the URL. Squid-2 uses
MD5 checksums for cache keys.
<P>
The Squid cache uses the notions of <em/private/ and <em/public/
cache keys. An object can start out as being private, but may later be
changed to public status. Private objects are associated with only a single
client whereas a public object may be sent to multiple clients at the
same time. In other words, public objects can be located by any cache
client. Private keys can only be located by a single client--the one
who requested it.
<P>
Objects are changed from private to public after all of the HTTP
reply headers have been received and parsed. In some cases, the
reply headers will indicate the object should not be made public.
For example, if the <em/no-cache/ Cache-Control directive is used.
<sect1>What is FORW_VIA_DB for?
<P>
We use it to collect data for <url
url="http://www.ircache.net/Cache/Plankton/" name="Plankton">.
<sect1>Does Squid send packets to port 7 (echo)? If so, why?
<P>
It may. This is an old feature from the Harvest cache software.
The cache would send ICP ``SECHO'' message to the echo ports of
origin servers. If the SECHO message came back before any of the
other ICP replies, then it meant the origin server was probably
closer than any neighbor cache. In that case Harvest/Squid sent
the request directly to the origin server.
<P>
With more attention focused on security, many administrators filter
UDP packets to port 7. The Computer Emergency Response Team (CERT)
once issued an advisory note (<url
url="http://www.cert.org/advisories/CA-96.01.UDP_service_denial.html"
name="CA-96.01: UDP Port Denial-of-Service Attack">) that says UDP
echo and chargen services can be used for a denial of service
attack. This made admins extremely nervous about any packets
hitting port 7 on their systems, and they made complaints.
<P>
The <em/source_ping/ feature has been disabled in Squid-2.
If you're seeing packets to port 7 that are coming from a
Squid cache (remote port 3130), then its probably a
very old version of Squid.
<sect1>What does ``WARNING: Reply from unknown nameserver &lsqb;a.b.c.d&rsqb;'' mean?
<P>
It means Squid sent a DNS query to one IP address, but the response
came back from a different IP address. By default Squid checks that
the addresses match. If not, Squid ignores the response.
<P>There are a number of reasons why this would happen:
<enum>
<item>
Your DNS name server just works this way, either becuase
its been configured to, or because its stupid and doesn't
know any better.
<item>
You have a weird broadcast address, like 0.0.0.0, in
your <em>/etc/resolv.conf</em> file.
<item>
Somebody is trying to send spoofed DNS responses to
your cache.
</enum>
<P>
If you recognize the IP address in the warning as one of your
name server hosts, then its probably numbers (1) or (2).
<P>
You can make these warnings stop, and allow responses from
``unknown'' name servers by setting this configuration option:
<verb>
ignore_unknown_nameservers off
</verb>
<sect1>How does Squid distribute cache files among the available directories?
<p>
<em>Note: The information here is current for version 2.2.</em>
<p>
See <em/storeDirMapAllocate()/ in the source code.
<p>
When Squid wants to create a new disk file for storing an object, it
first selects which <em/cache_dir/ the object will go into. This is done
with the <em/storeDirSelectSwapDir()/ function. If you have <em/N/
cache directories, the function identifies the <em>3N/4</em> (75%)
of them with the most available space. These directories are
then used, in order of having the most available space. When Squid has
stored one URL to each of the
<em>3N/4</em> <em/cache_dir/'s, the process repeats and
<em/storeDirSelectSwapDir()/ finds a new set of <em>3N/4</em>
cache directories with the most available space.
<p>
Once the <em/cache_dir/ has been selected, the next step is to find
an available <em/swap file number/. This is accomplished
by checking the <em/file map/, with the <em/file_map_allocate()/
function. Essentially the swap file numbers are allocated
sequentially. For example, if the last number allocated
happens to be 1000, then the next one will be the first
number after 1000 that is not already being used.
<sect1>Why do I see negative byte hit ratio?
<p>
Byte hit ratio is calculated a bit differently than
Request hit ratio. Squid counts the number of bytes read
from the network on the server-side, and the number of bytes written to
the client-side. The byte hit ratio is calculated as
<verb>
(client_bytes - server_bytes) / client_bytes
</verb>
If server_bytes is greater than client_bytes, you end up
with a negative value.
<p>
The server_bytes may be greater than client_bytes for a number
of reasons, including:
<itemize>
<item>
Cache Digests and other internally generated requests.
Cache Digest messages are quite large. They are counted
in the server_bytes, but since they are consumed internally,
they do not count in client_bytes.
<item>
User-aborted requests. If your <em/quick_abort/ setting
allows it, Squid sometimes continues to fetch aborted
requests from the server-side, without sending any
data to the client-side.
<item>
Some range requests, in combination with Squid bugs, can
consume more bandwidth on the server-side than on the
client-side. In a range request, the client is asking for
only some part of the object. Squid may decide to retrieve
the whole object anyway, so that it can be used later on.
This means downloading more from the server than sending
to the client. You can affect this behavior with
the <em/range_offset_limit/ option.
</itemize>
<sect1>What does ``Disabling use of private keys'' mean?
<p>
First you need to understand the
<ref id="pub-priv-keys" name="difference between public and private
keys">.
<p>
When Squid sends ICP queries, it uses the ICP <em/reqnum/ field
to hold the private key data. In other words, when Squid gets an
ICP reply, it uses the <em/reqnum/ value to build the private cache key for
the pending object.
<p>
Some ICP implementations always set the <em/reqnum/ field to zero
when they send a reply. Squid can not use private cache keys with
such neighbor caches because Squid will not be able to
locate cache keys for those ICP replies. Thus, if Squid detects a neighbor
cache that sends zero reqnum's, it
disables the use of private cache keys.
<p>
Not having private cache keys has some important privacy
implications. Two users could receive one response that was
meant for only one of the users. This response could contain
personal, confidential information. You will need to disable
the ``zero reqnum'' neighbor if you want Squid to use private
cache keys.
<sect1>What is a half-closed filedescriptor?
<p>
TCP allows connections to be in a ``half-closed'' state. This
is accomplished with the <em/shutdown(2)/ system call. In Squid,
this means that a client has closed its side of the connection for
writing, but leaves it open for reading. Half-closed connections
are tricky because Squid can't tell the difference between a
half-closed connection, and a fully closed one.
<p>
If Squid tries to read a connection, and <em/read()/ returns
0, and Squid knows that the client doesn't have the whole
response yet, Squid puts marks the filedescriptor as half-closed.
Most likely the client has aborted the request and the connection
is really closed. However, there is a slight chance that
the client is using the <em/shutdown()/ call, and that it
can still read the response.
<p>
To disable half-closed connections, simply put this in
squid.conf:
<verb>
half_closed_clients off
</verb>
Then, Squid will always close its side of the connection
instead of marking it as half-closed.
<sect1>What does --enable-heap-replacement do?
<p>
Squid has traditionally used an LRU replacement algorithm. As of
<url url="/Versions/v2/2.3/" name="version 2.3">, you can use
some other replacement algorithms by using the <em/--enable-heap-replacement/
configure option. Currently, the heap replacement code supports two
additional algorithms: LFUDA, and GDS.
<p>
The heap replacement code was contributed by John Dilley and others
from Hewlett-Packard. Their work is described in these papers:
<enum>
<item>
<url url="http://www.hpl.hp.com/techreports/1999/HPL-1999-69.html"
name="Enhancement and Validation of Squid's Cache Replacement Policy">
(HP Tech Report).
<item>
<url url="http://workshop.ircache.net/Papers/dilley-abstract.html"
name="Enhancement and Validation of the Squid Cache Replacement Policy">
(WCW 1999 paper).
</enum>
<sect1>Why is actual filesystem space used greater than what Squid thinks?
<p>
If you compare <em/df/ output and cachemgr <em/storedir/ output,
you will notice that actual disk usage is greater than what Squid
reports. This may be due to a number of reasons:
<itemize>
<item>
Squid doesn't keep track of the size of the <em/swap.state/
file, which normally resides on each <em/cache_dir/.
<item>
Directory entries and take up filesystem space.
<item>
Other applications might be using the same disk partition.
<item>
Your filesystem block size might be larger than what Squid
thinks. When calculating total disk usage, Squid rounds
file sizes up to a whole number of 1024 byte blocks. If
your filesystem uses larger blocks, then some "wasted" space
is not accounted.
</itemize>
<sect1>How do <em/positive_dns_ttl/ and <em/negative_dns_ttl/ work?
<p>
<em/positive_dns_ttl/ is how long Squid caches a successful DNS
lookup. Similarly, <em/negative_dns_ttl/ is how long Squid caches
a failed DNS lookup.
<p>
<em/positive_dns_ttl/ is not always used. It is NOT used in the following
cases:
<itemize>
<item>Squid-2.3 and later versions with internal DNS lookups. Internal
lookups are the default for Squid-2.3 and later.
<item>If you applied the ``DNS TTL'' <ref id="dns-ttl-hack" name="patch">
for BIND.
<item>If you are using FreeBSD, then it already has the DNS TTL patch
built in.
</itemize>
<p>
Let's say you have the following settings:
<verb>
positive_dns_ttl 1 hours
negative_dns_ttl 1 minutes
</verb>
When Squid looks up a name like <em/www.squid-cache.org/, it gets back
an IP address like 204.144.128.89. The address is cached for the
next hour. That means, when Squid needs to know the address for
<em/www.squid-cache.org/ again, it uses the cached answer for the
next hour. After one hour, the cached information expires, and Squid
makes a new query for the address of <em/www.squid-cache.org/.
<p>
If you have the DNS TTL patch, or are using internal lookups, then
each hostname has its own TTL value, which was set by the domain
name administrator. You can see these values in the 'ipcache'
cache manager page. For example:
<verb>
Hostname Flags lstref TTL N
www.squid-cache.org C 73043 12784 1( 0) 204.144.128.89-OK
www.ircache.net C 73812 10891 1( 0) 192.52.106.12-OK
polygraph.ircache.net C 241768 -181261 1( 0) 192.52.106.12-OK
</verb>
The TTL field shows how how many seconds until the entry expires.
Negative values mean the entry is already expired, and will be refreshed
upon next use.
<p>
The <em/negative_dns_ttl/ specifies how long to cache failed DNS lookups.
When Squid fails to resolve a hostname, you can be pretty sure that
it is a real failure, and you are not likely to get a successful
answer within a short time period. Squid retries its lookups
many times before declaring a lookup has failed.
If you like, you can set <em/negative_dns_ttl/ to zero.
<sect1>What does <em>swapin MD5 mismatch</em> mean?
<p>
It means that Squid opened up a disk file to serve a cache hit, but
it found that the stored object doesn't match what the user's request.
Squid stores the MD5 digest of the URL at the start of each disk file.
When the file is opened, Squid checks that the disk file MD5 matches the
MD5 of the URL requested by the user. If they don't match, the warning
is printed and Squid forwards the request to the origin server.
<p>
You do not need to worry about this warning. It means that Squid is
recovering from a corrupted cache directory.
<sect1>What does <em>failed to unpack swapfile meta data</em> mean?
<p>
Each of Squid's disk cache files has a metadata section at the beginning.
This header is used to store the URL MD5, some StoreEntry data, and more.
When Squid opens a disk file for reading, it looks for the meta data
header and unpacks it.
<p>
This warning means that Squid couln't unpack the meta data. This is
non-fatal bug, from which Squid can recover. Perhaps
the meta data was just missing, or perhaps the file got corrupted.
<p>
You do not need to worry about this warning. It means that Squid is
double-checking that the disk file matches what Squid thinks should
be there, and the check failed. Squid recorvers and generates
a cache miss in this case.
<sect1>Why doesn't Squid make <em/ident/ lookups in interception mode?
<p>
Its a side-effect of the way interception proxying works.
<p>
When Squid is configured for interception proxying, the operating system
pretends that it is the origin server. That means that the "local" socket
address for intercepted TCP
connections is really the origin server's IP address. If you run
<em/netstat -n/ on your interception proxy, you'll see a lot of
foreign IP addresses in the <em/Local Address/ column.
<p>
When Squid wants to make an ident query, it creates a new TCP socket
and <em/binds/ the local endpoint to the same IP address as the
local end of the client's TCP connection. Since the local address
isn't really local (its some far away origin server's IP address),
the <em/bind()/ system call fails. Squid handles this as a failed
ident lookup.
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Multicast
<sect1>What is Multicast?
<P>
Multicast is essentially the ability to send one IP packet to multiple
receivers. Multicast is often used for audio and video conferencing systems.
<P>
You often hear about <url url="http://www.mbone.com/" name="the Mbone"> in
reference to Multicast. The Mbone is essentially a ``virtual backbone''
which exists in the Internet itself. If you want to send and/or receive
Multicast, you need to be ``on the Mbone.''
<sect1>How do I know if I'm on the Mbone?
<P>
One way is to ask someone who manages your network. If your network manager
doesn't know, or looks at you funny, then you are very likely NOT on the Mbone
<P>
Another way is to use the <em/mtrace/ program, which can be found
on the <url url="ftp://parcftp.xerox.com/pub/net-research/ipmulti/"
name="Xerox PARC FTP site">. Mtrace is similar to traceroute. It will
tell you about the multicast path between your site and another. For example:
<verb>
> mtrace mbone.ucar.edu
mtrace: WARNING: no multicast group specified, so no statistics printed
Mtrace from 128.117.64.29 to 192.172.226.25 via group 224.2.0.1
Querying full reverse path... * switching to hop-by-hop:
0 oceana-ether.nlanr.net (192.172.226.25)
-1 avidya-ether.nlanr.net (192.172.226.57) DVMRP thresh^ 1
-2 mbone.sdsc.edu (198.17.46.39) DVMRP thresh^ 1
-3 * nccosc-mbone.dren.net (138.18.5.224) DVMRP thresh^ 48
-4 * * FIXW-MBONE.NSN.NASA.GOV (192.203.230.243) PIM/Special thresh^ 64
-5 dec3800-2-fddi-0.SanFrancisco.mci.net (204.70.158.61) DVMRP thresh^ 64
-6 dec3800-2-fddi-0.Denver.mci.net (204.70.152.61) DVMRP thresh^ 1
-7 mbone.ucar.edu (192.52.106.7) DVMRP thresh^ 64
-8 mbone.ucar.edu (128.117.64.29)
Round trip time 196 ms; total ttl of 68 required.
</verb>
<P>
If you think you need to be on the Mbone, this is
<url url="http://www.mbone.com/mbone/how-to-join.html" name="how you can join">.
<sect1>Should I be using Multicast ICP?
<P>
Short answer: No, probably not.
<P>
Reasons why you SHOULD use Multicast:
<enum>
<item>
It reduces the number of times Squid calls <em/sendto()/ to put a UDP
packet onto the network.
<item>
Its trendy and cool to use Multicast.
</enum>
<P>
Reasons why you SHOULD NOT use Multicast:
<enum>
<item>
Multicast tunnels/configurations/infrastructure are often unstable.
You may lose multicast connectivity but still have unicast connectivity.
<item>
Multicast does not simplify your Squid configuration file. Every trusted
neighbor cache must still be specified.
<item>
Multicast does not reduce the number of ICP replies being sent around.
It does reduce the number of ICP queries sent, but not the number of replies.
<item>
Multicast exposes your cache to some privacy issues. There are no special
emissions required to join a multicast group. Anyone may join your
group and eavesdrop on ICP query messages. However, the scope of your
multicast traffic can be controlled such that it does not exceed certain
boundaries.
</enum>
<P>
We only recommend people to use Multicast ICP over network
infrastructure which they have close control over. In other words, only
use Multicast over your local area network, or maybe your wide area
network if you are an ISP. We think it is probably a bad idea to use
Multicast ICP over congested links or commodity backbones.
<sect1>How do I configure Squid to send Multicast ICP queries?
<P>
To configure Squid to send ICP queries to a Multicast address, you
need to create another neighbour cache entry specified as <em/multicast/.
For example:
<verb>
cache_host 224.9.9.9 multicast 3128 3130 ttl=64
</verb>
224.9.9.9 is a sample multicast group address.
<em/multicast/ indicates that this
is a special type of neighbour. The HTTP-port argument (3128)
is ignored for multicast peers, but the ICP-port (3130) is
very important. The final argument, <em/ttl=64/
specifies the multicast TTL value for queries sent to this
address.
It is probably a good
idea to increment the minimum TTL by a few to provide a margin
for error and changing conditions.
<P>
You must also specify which of your neighbours will respond
to your multicast queries, since it would
be a bad idea to implicitly trust any ICP reply from an unknown
address. Note that ICP replies are sent back to <em/unicast/
addresses; they are NOT multicast, so Squid has no indication
whether a reply is from a regular query or a multicast
query. To configure your multicast group neighbours, use the
<em/cache_host/ directive and the <em/multicast-responder/
option:
<verb>
cache_host cache1 sibling 3128 3130 multicast-responder
cache_host cache2 sibling 3128 3130 multicast-responder
</verb>
Here all fields are relevant. The ICP port number (3130)
must be the same as in the <em/cache_host/ line defining the
multicast peer above. The third field must either be
<em/parent/ or <em/sibling/ to indicate how Squid should treat replies.
With the <em/multicast-responder/ flag set for a peer,
Squid will NOT send ICP queries to it directly (i.e. unicast).
<sect1>How do I know what Multicast TTL to use?
<P>
The Multicast TTL (which is specified on the <em/cache_host/ line
of your multicast group) determines how ``far'' your ICP queries
will go. In the Mbone, there is a certain TTL threshold defined
for each network interface or tunnel. A multicast packet's TTL must
be larger than the defined TTL for that packet to be forwarded across
that link. For example, the <em/mrouted/ manual page recommends:
<verb>
32 for links that separate sites within an organization.
64 for links that separate communities or organizations, and are
attached to the Internet MBONE.
128 for links that separate continents on the MBONE.
</verb>
<P>
A good way to determine the TTL you need is to run <em/mtrace/ as shown above
and look at the last line. It will show you the minimum TTL required to
reach the other host.
<P>
If you set you TTL too high, then your ICP messages may travel ``too far''
and will be subject to eavesdropping by others.
If you're only using multicast on your LAN, as we suggest, then your TTL will
be quite small, for example <em/ttl=4/.
<sect1>How do I configure Squid to receive and respond to Multicast ICP?
<P>
You must tell Squid to join a multicast group address with the
<em/mcast_groups/ directive. For example:
<verb>
mcast_groups 224.9.9.9
</verb>
Of course, all members of your Multicast ICP group will need to use the
exact same multicast group address.
<P>
<bf/NOTE:/ Choose a multicast group address with care! If two organizations
happen to choose the same multicast address, then they may find that their
groups ``overlap'' at some point. This will be especially true if one of the
querying caches uses a large TTL value. There are two ways to reduce the risk
of group overlap:
<enum>
<item>
Use a unique group address
<item>
Limit the scope of multicast messages with TTLs or administrative scoping.
</enum>
<P>
Using a unique address is a good idea, but not without some potential
problems. If you choose an address randomly, how do you know that
someone else will not also randomly choose the same address? NLANR
has been assigned a block of multicast addresses by the IANA for use
in situations such as this. If you would like to be assigned one
of these addresses, please <url url="mailto:nlanr-cache@nlanr.net"
name="write to us">. However, note that NLANR or IANA have no
authority to prevent anyone from using an address assigned to you.
<P>
Limiting the scope of your multicast messages is probably a better
solution. They can be limited with the TTL value discussed above, or
with some newer techniques known as administratively scoped
addresses. Here you can configure well-defined boundaries for the
traffic to a specific address. The
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2365.txt" name="Administratively Scoped IP Multicast RFC">
describes this.
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>System-Dependent Weirdnesses
<sect1>Solaris
<sect2>select()
<P>
<em/select(3c)/ won't handle more than 1024 file descriptors. The
<em/configure/ script should enable <em/poll()/ by default for
Solaris. <em/poll()/ allows you to use many more filedescriptors,
probably 8192 or more.
<p>
For older Squid versions you can enable <em/poll()/
manually by changing HAVE_POLL in <em>include/autoconf.h</em>, or
by adding -DUSE_POLL=1 to the DEFINES in src/Makefile.
<sect2>malloc
<P>
libmalloc.a is leaky. Squid's configure does not use -lmalloc on Solaris.
<sect2>DNS lookups and <em/nscd/
<P>
by <url url="mailto:david@avarice.nepean.uws.edu.au" name="David J N Begley">.
<P>
DNS lookups can be slow because of some mysterious thing called
<bf/ncsd/. You should edit <em>/etc/nscd.conf</em> and make it say:
<verb>
enable-cache hosts no
</verb>
<P>
Apparently nscd serializes DNS queries thus slowing everything down when
an application (such as Squid) hits the resolver hard. You may notice
something similar if you run a log processor executing many DNS resolver
queries - the resolver starts to slow.. right.. down.. . . .
<p>
According to
<url url="mailto:andre at online dot ee" name="Andres Kroonmaa">,
users of Solaris starting from version 2.6 and up should NOT
completely disable <em/nscd/ daemon. <em/nscd/ should be running and
caching passwd and group files, although it is suggested to
disable hosts caching as it may interfere with DNS lookups.
<p>
Several library calls rely on available free FILE descriptors
FD &lt; 256. Systems running without nscd may fail on such calls
if first 256 files are all in use.
<p>
Since solaris 2.6 Sun has changed the way some system calls
work and is using <em/nscd/ daemon as a implementor of them. To
communicate to <em/nscd/ Solaris is using undocumented door calls.
Basically <em/nscd/ is used to reduce memory usage of user-space
system libraries that use passwd and group files. Before 2.6
Solaris cached full passwd file in library memory on the first
use but as this was considered to use up too much ram on large
multiuser systems Sun has decided to move implementation of
these calls out of libraries and to a single dedicated daemon.
<sect2>DNS lookups and <em>/etc/nsswitch.conf</em>
<P>
by <url url="mailto:ARMISTEJ@oeca.otis.com" name="Jason Armistead">.
<P>
The <em>/etc/nsswitch.conf</em> file determines the order of searches
for lookups (amongst other things). You might only have it set up to
allow NIS and HOSTS files to work. You definitely want the "hosts:"
line to include the word <em/dns/, e.g.:
<verb>
hosts: nis dns [NOTFOUND=return] files
</verb>
<sect2>DNS lookups and NIS
<P>
by <url url="mailto:cudch@csv.warwick.ac.uk" name="Chris Tilbury">.
<P>
Our site cache is running on a Solaris 2.6 machine. We use NIS to distribute
authentication and local hosts information around and in common with our
multiuser systems, we run a slave NIS server on it to help the response of
NIS queries.
<P>
We were seeing very high name-ip lookup times (avg &tilde;2sec)
and ip->name lookup times (avg &tilde;8 sec), although there didn't
seem to be that much of a problem with response times for valid
sites until the cache was being placed under high load. Then,
performance went down the toilet.
<P>
After some time, and a bit of detective work, we found the problem.
On Solaris 2.6, if you have a local NIS server running (<em/ypserv/)
and you have NIS in your <em>/etc/nsswitch.conf</em> hosts entry,
then check the flags it is being started with. The 2.6 ypstart
script checks to see if there is a <em/resolv.conf/ file present
when it starts ypserv. If there is, then it starts it with the
<em/-d/ option.
<P>
This has the same effect as putting the <em/YP_INTERDOMAIN/ key in
the hosts table -- namely, that failed NIS host lookups are tried
against the DNS by the NIS server.
<P>
This is a <bf/bad thing(tm)/! If NIS itself tries to resolve names
using the DNS, then the requests are serialised through the NIS
server, creating a bottleneck (This is the same basic problem that
is seen with <em/nscd/). Thus, one failing or slow lookup can, if
you have NIS before DNS in the service switch file (which is the
most common setup), hold up every other lookup taking place.
<P>
If you're running in this kind of setup, then you will want to make
sure that
<enum>
<item>ypserv doesn't start with the <em/-d/ flag.
<item>you don't have the <em/YP_INTERDOMAIN/ key in the hosts table
(find the <em/B=-b/ line in the yp Makefile and change it to <em/B=/)
</enum>
<P>
We changed these here, and saw our average lookup times drop by up
to an order of magnitude (&tilde;150msec for name-ip queries and
&tilde;1.5sec for ip-name queries, the latter still so high, I
suspect, because more of these fail and timeout since they are not
made so often and the entries are frequently non-existent anyway).
<sect2>Tuning
<P>
<url url="http://www.rvs.uni-hannover.de/people/voeckler/tune/EN/tune.html"
name="Solaris 2.x - tuning your TCP/IP stack and more"> by <url
url="http://www.rvs.uni-hannover.de/people/voeckler/" name="Jens-S.
Vckler">
<sect2>disk write error: (28) No space left on device
<P>
You might get this error even if your disk is not full, and is not out
of inodes. Check your syslog logs (/var/adm/messages, normally) for
messages like either of these:
<verb>
NOTICE: realloccg /proxy/cache: file system full
NOTICE: alloc: /proxy/cache: file system full
</verb>
<P>
In a nutshell, the UFS filesystem used by Solaris can't cope with the
workload squid presents to it very well. The filesystem will end up
becoming highly fragmented, until it reaches a point where there are
insufficient free blocks left to create files with, and only fragments
available. At this point, you'll get this error and squid will revise
its idea of how much space is actually available to it. You can do a
"fsck -n raw_device" (no need to unmount, this checks in read only mode)
to look at the fragmentation level of the filesystem. It will probably
be quite high (>15%).
<P>
Sun suggest two solutions to this problem. One costs money, the other is
free but may result in a loss of performance (although Sun do claim it
shouldn't, given the already highly random nature of squid disk access).
<P>
The first is to buy a copy of VxFS, the Veritas Filesystem. This is an
extent-based filesystem and it's capable of having online defragmentation
performed on mounted filesystems. This costs money, however (VxFS is not
very cheap!)
<P>
The second is to change certain parameters of the UFS filesystem. Unmount
your cache filesystems and use tunefs to change optimization to "space" and
to reduce the "minfree" value to 3-5% (under Solaris 2.6 and higher, very
large filesystems will almost certainly have a minfree of 2% already and you
shouldn't increase this). You should be able to get fragmentation down to
around 3% by doing this, with an accompanied increase in the amount of space
available.
<P>
Thanks to <url url="mailto:cudch@csv.warwick.ac.uk" name="Chris Tilbury">.
<sect2>Solaris X86 and IPFilter
<P>
by <url url="mailto:jeff@sisna.com" name="Jeff Madison">
<P>
Important update regarding Squid running on Solaris x86. I have been
working for several months to resolve what appeared to be a memory leak in
squid when running on Solaris x86 regardless of the malloc that was used. I
have made 2 discoveries that anyone running Squid on this platform may be
interested in.
<P>
Number 1: There is not a memory leak in Squid even though after the system
runs for some amount of time, this varies depending on the load the system
is under, Top reports that there is very little memory free. True to the
claims of the Sun engineer I spoke to this statistic from Top is incorrect.
The odd thing is that you do begin to see performance suffer substantially
as time goes on and the only way to correct the situation is to reboot the
system. This leads me to discovery number 2.
<P>
Number 2: There is some type of resource problem, memory or other, with
IPFilter on Solaris x86. I have not taken the time to investigate what the
problem is because we no longer are using IPFilter. We have switched to a
Alteon ACE 180 Gigabit switch which will do the trans-proxy for you. After
moving the trans-proxy, redirection process out to the Alteon switch Squid
has run for 3 days strait under a huge load with no problem what so ever.
We currently have 2 boxes with 40 GB of cached objects on each box. This 40
GB was accumulated in the 3 days, from this you can see what type of load
these boxes are under. Prior to this change we were never able to operate
for more than 4 hours.
<P>
Because the problem appears to be with IPFilter I would guess that you
would only run into this issue if you are trying to run Squid as a
transparent proxy using IPFilter. That makes sense. If there is anyone
with information that would indicate my finding are incorrect I am willing
to investigate further.
<sect2>Changing the directory lookup cache size
<P>
by <url url="mailto:mbatchelor@citysearch.com" name="Mike Batchelor">
<P>
On Solaris, the kernel variable for the directory name lookup cache size is
<em>ncsize</em>. In <em>/etc/system</em>, you might want to try
<verb>
set ncsize = 8192
</verb>
or even
higher. The kernel variable <em/ufs_inode/ - which is the size of the inode
cache itself - scales with <em/ncsize/ in Solaris 2.5.1 and later. Previous
versions of Solaris required both to be adjusted independently, but now, it is
not recommended to adjust <em/ufs_inode/ directly on 2.5.1 and later.
<P>
You can set <em/ncsize/ quite high, but at some point - dependent on the
application - a too-large <em/ncsize/ will increase the latency of lookups.
<P>
Defaults are:
<verb>
Solaris 2.5.1 : (max_nprocs + 16 + maxusers) + 64
Solaris 2.6/Solaris 7 : 4 * (max_nprocs + maxusers) + 320
</verb>
<sect2>The priority_paging algorithm
<P>
by <url url="mailto:mbatchelor@citysearch.com" name="Mike Batchelor">
<P>
Another new tuneable (actually a toggle) in Solaris 2.5.1, 2.6 or Solaris 7 is
the <em/priority_paging/ algorithm. This is actually a complete rewrite of the
virtual memory system on Solaris. It will page out application data last, and
filesystem pages first, if you turn it on (set <em/priority_paging/ = 1 in
<em>/etc/system</em>). As you may know, the Solaris buffer cache grows to fill
available pages, and under the old VM system, applications could get paged out
to make way for the buffer cache, which can lead to swap thrashing and
degraded application performance. The new <em/priority_paging/ helps keep
application and shared library pages in memory, preventing the buffer cache
from paging them out, until memory gets REALLY short. Solaris 2.5.1 requires
patch 103640-25 or higher and Solaris 2.6 requires 105181-10 or higher to get
priority_paging. Solaris 7 needs no patch, but all versions have it turned
off by default.
<sect1>FreeBSD
<sect2>T/TCP bugs
<P>
We have found that with FreeBSD-2.2.2-RELEASE, there some bugs with T/TCP. FreeBSD will
try to use T/TCP if you've enabled the ``TCP Extensions.'' To disable T/TCP,
use <em/sysinstall/ to disable TCP Extensions,
or edit <em>/etc/rc.conf</em> and set
<verb>
tcp_extensions="NO" # Allow RFC1323 & RFC1544 extensions (or NO).
</verb>
or add this to your /etc/rc files:
<verb>
sysctl -w net.inet.tcp.rfc1644=0
</verb>
<sect2>mbuf size
<P>
We noticed an odd thing with some of Squid's interprocess communication.
Often, output from the <em/dnsserver/ processes would NOT be read in
one chunk. With full debugging, it looks like this:
<verb>
1998/04/02 15:18:48| comm_select: FD 46 ready for reading
1998/04/02 15:18:48| ipcache_dnsHandleRead: Result from DNS ID 2 (100 bytes)
1998/04/02 15:18:48| ipcache_dnsHandleRead: Incomplete reply
....other processing occurs...
1998/04/02 15:18:48| comm_select: FD 46 ready for reading
1998/04/02 15:18:48| ipcache_dnsHandleRead: Result from DNS ID 2 (9 bytes)
1998/04/02 15:18:48| ipcache_parsebuffer: parsing:
$name www.karup.com
$h_name www.karup.inter.net
$h_len 4
$ipcount 2
38.15.68.128
38.15.67.128
$ttl 2348
$end
</verb>
Interestingly, it is very common to get only 100 bytes on the first
read. When two read() calls are required, this adds additional latency
to the overall request. On our caches running Digital Unix, the median
<em/dnsserver/ response time was measured at 0.01 seconds. On our
FreeBSD cache, however, the median latency was 0.10 seconds.
<P>
Here is a simple patch to fix the bug:
<verb>
===================================================================
RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v
retrieving revision 1.40
retrieving revision 1.41
diff -p -u -r1.40 -r1.41
--- src/sys/kern/uipc_socket.c 1998/05/15 20:11:30 1.40
+++ /home/ncvs/src/sys/kern/uipc_socket.c 1998/07/06 19:27:14 1.41
@@ -31,7 +31,7 @@
* SUCH DAMAGE.
*
* @(#)uipc_socket.c 8.3 (Berkeley) 4/15/94
- * $Id: FAQ.sgml,v 1.2 2004/09/09 12:36:20 cvsdist Exp $
+ * $Id: FAQ.sgml,v 1.2 2004/09/09 12:36:20 cvsdist Exp $
*/
#include <sys/param.h>
@@ -491,6 +491,7 @@ restart:
mlen = MCLBYTES;
len = min(min(mlen, resid), space);
} else {
+ atomic = 1;
nopages:
len = min(min(mlen, resid), space);
/*
</verb>
<P>
Another technique which may help, but does not fix the bug, is to
increase the kernel's mbuf size.
The default is 128 bytes. The MSIZE symbol is defined in
<em>/usr/include/machine/param.h</em>. However, to change it we added
this line to our kernel configuration file:
<verb>
options MSIZE="256"
</verb>
<sect2>Dealing with NIS
<P>
<em>/var/yp/Makefile</em> has the following section:
<verb>
# The following line encodes the YP_INTERDOMAIN key into the hosts.byname
# and hosts.byaddr maps so that ypserv(8) will do DNS lookups to resolve
# hosts not in the current domain. Commenting this line out will disable
# the DNS lookups.
B=-b
</verb>
You will want to comment out the <em/B=-b/ line so that <em/ypserv/ does not
do DNS lookups.
<sect2>FreeBSD 3.3: The lo0 (loop-back) device is not configured on startup
<label id="freebsd-no-lo0">
<p>
Squid requires a the loopback interface to be up and configured. If it is not, you will
get errors such as <ref id="comm-bind-loopback-fail" name="commBind">.
<p>
From <url url="http://www.freebsd.org/releases/3.3R/errata.html" name="FreeBSD 3.3 Errata Notes">:
<p>
<quote>
Fix: Assuming that you experience this problem at all, edit <em>/etc/rc.conf</em>
and search for where the network_interfaces variable is set. In
its value, change the word <em/auto/ to <em/lo0/ since the auto keyword
doesn't bring the loop-back device up properly, for reasons yet to
be adequately determined. Since your other interface(s) will already
be set in the network_interfaces variable after initial installation,
it's reasonable to simply s/auto/lo0/ in rc.conf and move on.
</quote>
<p>
Thanks to <url url="mailto:robl at lentil dot org" name="Robert Lister">.
<sect2>FreeBSD 3.x or newer: Speed up disk writes using Softupdates
<label id="freebsd-softupdates">
<p>
by <url url="mailto:andre.albsmeier@mchp.siemens.de" name="Andre Albsmeier">
<p>
FreeBSD 3.x and newer support Softupdates. This is a mechanism to
speed up disk writes as it is possible by mounting ufs volumes
async. However, Softupdates does this in a way that a performance
similar or better than async is achieved but without loosing security
in a case of a system crash. For more detailed information and the
copyright terms see <em>/sys/contrib/softupdates/README</em> and
<em>/sys/ufs/ffs/README.softupdate</em>.
<p>
To build a system supporting softupdates, you have to build
a kernel with <tt>options SOFTUPDATES</tt> set (see <em/LINT/ for a commented
out example). After rebooting with the new kernel, you can enable
softupdates on a per filesystem base with the command:
<verb>
$ tunefs -n /mountpoint
</verb>
The filesystem in question MUST NOT be mounted at
this time. After that, softupdates are permanently enabled and the
filesystem can be mounted normally. To verify that the softupdates
code is running, simply issue a mount command and an output similar
to the following will appear:
<verb>
$ mount
/dev/da2a on /usr/local/squid/cache (ufs, local, noatime, soft-updates, writes: sync 70 async 225)
</verb>
<sect1>OSF1/3.2
<P>
If you compile both libgnumalloc.a and Squid with <em/cc/, the <em/mstats()/
function returns bogus values. However, if you compile libgnumalloc.a with
<em/gcc/, and Squid with <em/cc/, the values are correct.
<sect1>BSD/OS
<sect2>gcc/yacc
<P>
Some people report
<ref id="bsdi-compile" name="difficulties compiling squid on BSD/OS">.
<sect2>process priority
<P>
<it>
I've noticed that my Squid process
seems to stick at a nice value of four, and clicks back to that even
after I renice it to a higher priority. However, looking through the
Squid source, I can't find any instance of a setpriority() call, or
anything else that would seem to indicate Squid's adjusting its own
priority.
</it>
<P>
by <url url="mailto:bogstad@pobox.com" name="Bill Bogstad">
<P>
BSD Unices traditionally have auto-niced non-root processes to 4 after
they used alot (4 minutes???) of CPU time. My guess is that it's the BSD/OS
not Squid that is doing this. I don't know offhand if there is a way to
disable this on BSD/OS.
<P>
by <url url="mailto:Arjan.deVet@adv.iae.nl" name="Arjan de Vet">
<P>
You can get around this by
starting Squid with nice-level -4 (or another negative value).
<p>
by <url url="mailto:bert_driehuis at nl dot compuware dot com" name="Bert Driehuis">
<p>
The autonice behavior is a leftover from the history of BSD as a
university OS. It penalises CPU bound jobs by nicing them after using 600
CPU seconds.
Adding
<verb>
sysctl -w kern.autonicetime=0
</verb>
to <em>/etc/rc.local</em> will disable the behavior systemwide.
<sect1>Linux
<sect2>Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address
<P>
Try a different version of Linux. We have received many reports of this
``bug'' from people running Linux 2.0.30. The <em/bind(2)/ system
call should NEVER give this error when binding to port 0.
<sect2>FATAL: Don't run Squid as root, set 'cache_effective_user'!
<P>
Some users have reported that setting <tt/cache_effective_user/
to <tt/nobody/ under Linux does not work.
However, it appears that using any <tt/cache_effective_user/ other
than <tt/nobody/ will succeed. One solution is to create a
user account for Squid and set <tt/cache_effective_user/ to that.
Alternately you can change the UID for the <tt/nobody/ account
from 65535 to 65534.
<P>
Another problem is that RedHat 5.0 Linux seems to have a broken
<em/setresuid()/ function. There are two ways to fix this.
Before running configure:
<verb>
% setenv ac_cv_func_setresuid no
% ./configure ...
% make clean
% make install
</verb>
Or after running configure, manually edit include/autoconf.h and
change the HAVE_SETRESUID line to:
<verb>
#define HAVE_SETRESUID 0
</verb>
<P>
Also, some users report this error is due to a NIS configuration
problem. By adding <em/compat/ to the <em/passwd/ and <em/group/
lines of <em>/etc/nsswitch.conf</em>, the problem goes away.
(<url url="mailto:acli@ada.ddns.org" name="Ambrose Li">).
<P>
<URL URL="mailto:galifrey@crown.net" name="Russ Mellon"> notes
that these problems with <em/cache_effective_user/ are fixed in
version 2.2.x of the Linux kernel.
<sect2>Large ACL lists make Squid slow
<P>
The regular expression library which comes with Linux is known
to be very slow. Some people report it entirely fails to work
after long periods of time.
<P>
To fix, use the GNUregex library included with the Squid source code.
With Squid-2, use the <em/--enable-gnuregex/ configure option.
<sect2>gethostbyname() leaks memory in RedHat 6.0 with glibc 2.1.1.
<p>
by <url url="mailto:radu at netsoft dot ro" name="Radu Greab">
<p>
The gethostbyname() function leaks memory in RedHat
6.0 with glibc 2.1.1. The quick fix is to delete nisplus service from
hosts entry in <em>/etc/nsswitch.conf</em>. In my tests dnsserver memory use
remained stable after I made the above change.
<p>
See <url url="http://developer.redhat.com/bugzilla/show_bug.cgi?id=3919" name="RedHat bug id 3919">.
<sect2>assertion failed: StatHist.c:91: `statHistBin(H, max) == H->capacity - 1' on Alpha system.
<p>
by <url url="mailto:jraymond@gnu.org" name="Jamie Raymond">
<p>
Some early versions of Linux have a kernel bug that causes this.
All that is needed is a recent kernel that doesn't have the mentioned bug.
<sect2>tools.c:605: storage size of `rl' isn't known
<p>
This is a bug with some versions of glibc. The glibc headers
incorrectly depended on the contents of some kernel headers.
Everything broke down when the kernel folks rearranged a bit in
the kernel-specific header files.
<p>
We think this glibc bug is present in versions
2.1.1 (or 2.1.0) and earlier. There are two solutions:
<enum>
<item>
Make sure /usr/include/linux and /usr/include/asm are from the kernel
version glibc is build/configured for, not any other kernel version.
Only compiling of loadable kernel modules outside of the kernel sources
depends on having the current versions of these, and for such builds
-I/usr/src/linux/include (or where ever the new kernel headers are
located) can be used to resolve the matter.
<item>
Upgrade glibc to 2.1.2 or later. This is always a good idea anyway,
provided a prebuilt upgrade package exists for the Linux distribution
used.. Note: Do not attempt to manually build and install glibc from
source unless you know exactly what you are doing, as this can easily
render the system unuseable.
</enum>
<sect1>HP-UX
<sect2>StatHist.c:74: failed assertion `statHistBin(H, min) == 0'
<P>
This was a very mysterious and unexplainable bug with GCC on HP-UX.
Certain functions, when specified as <em/static/, would cause
math bugs. The compiler also failed to handle implied
int-double conversions properly. These bugs should all be
handled correctly in Squid version 2.2.
<sect1>IRIX
<sect2><em/dnsserver/ always returns 255.255.255.255
<P>
There is a problem with GCC (2.8.1 at least) on
Irix 6 which causes it to always return the string 255.255.255.255 for _ANY_
address when calling inet_ntoa(). If this happens to you, compile Squid
with the native C compiler instead of GCC.
<sect1>SCO-UNIX
<P>
by <url url="mailto:f.j.bosscha@nhl.nl" name="F.J. Bosscha">
<P>
To make squid run comfortable on SCO-unix you need to do the following:
<P>
Increase the <em/NOFILES/ paramater and the <em/NUMSP/ parameter and compile squid
with I had, although squid told in the cache.log file he had 3000
filedescriptors, problems with the messages that there were no
filedescriptors more available. After I increase also the NUMSP value
the problems were gone.
<P>
One thing left is the number of tcp-connections the system can handle.
Default is 256, but I increase that as well because of the number of
clients we have.
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Redirectors
<sect1>What is a redirector?
<P>
Squid has the ability to rewrite requested URLs. Implemented
as an external process (similar to a dnsserver), Squid can be
configured to pass every incoming URL through a <em/redirector/ process
that returns either a new URL, or a blank line to indicate no change.
<P>
The <em/redirector/ program is <bf/NOT/ a standard part of the Squid
package. However, some examples are provided below, and in the
"contrib/" directory of the source distribution. Since everyone has
different needs, it is up to the individual administrators to write
their own implementation.
<sect1>Why use a redirector?
<P>
A redirector allows the administrator to control the locations to which
his users goto. Using this in conjunction with transparent proxies
allows simple but effective porn control.
<sect1>How does it work?
<P>
The redirector program must read URLs (one per line) on standard input,
and write rewritten URLs or blank lines on standard output. Note that
the redirector program can not use buffered I/O. Squid writes
additional information after the URL which a redirector can use to make
a decision. The input line consists of four fields:
<verb>
URL ip-address/fqdn ident method
</verb>
<P>Do you have any examples?
<P>
A simple very fast redirector called <url
url="http://www.senet.com.au/squirm/" name="SQUIRM"> is a good place to
start, it uses the regex lib to allow pattern matching.
<P>
Also see <url url="http://ivs.cs.uni-magdeburg.de/&percnt;7eelkner/webtools/jesred/"
name="jesred">.
<P>
The following Perl script may also be used as a template for writing
your own redirector:
<verb>
#!/usr/local/bin/perl
$|=1;
while (<>) {
s@http://fromhost.com@http://tohost.org@;
print;
}
</verb>
<sect1>Can I use the redirector to return HTTP redirect messages?
<P>
Normally, the <em/redirector/ feature is used to rewrite requested URLs.
Squid then transparently requests the new URL. However, in some situations,
it may be desirable to return an HTTP "301" or "302" redirect message
to the client. This is now possible with Squid version 1.1.19.
<P>
Simply modify your redirector program to append either "301:" or "302:"
before the new URL. For example, the following script might be used
to direct external clients to a secure Web server for internal documents:
<verb>
#!/usr/local/bin/perl
$|=1;
while (<>) {
@X = split;
$url = $X[0];
if ($url =~ /^http:\/\/internal\.foo\.com/) {
$url =~ s/^http/https/;
$url =~ s/internal/secure/;
print "302:$url\n";
} else {
print "$url\n";
}
}
</verb>
<P>
Please see sections 10.3.2 and 10.3.3 of
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2068.txt"
name="RFC 2068">
for an explanation of the 301 and 302 HTTP reply codes.
<sect1>FATAL: All redirectors have exited!
<label id="redirectors-exit">
<P>
A redirector process must <bf/never/ exit (stop running). If you see
the ``All redirectories have exited'' message, it probably means your
redirector program has a bug. Maybe it runs out of memory or has memory
access errors. You may want to test your redirector program outside of
squid with a big input list, taken from your <em/access.log/ perhaps.
Also, check for <ref id="coredumps" name="coredump"> files from the redirector program.
<sect1>Redirector interface is broken re IDENT values
<p>
<it>
I added a redirctor consisting of
</it>
<verb>
#! /bin/sh
/usr/bin/tee /tmp/squid.log
</verb>
<it>
and many of the redirector requests don't have a username in the
ident field.
</it>
<p>
Squid does not delay a request to wait for an ident lookup,
unless you use the ident ACLs. Thus, it is very likely that
the ident was not available at the time of calling the redirector,
but became available by the time the request is complete and
logged to access.log.
<p>
If you want to block requests waiting for ident lookup, try something
like this:
<verb>
acl foo ident REQUIRED
http_access allow foo
</verb>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Cache Digests
<label id="cache-digests">
<P>
<EM>Cache Digest FAQs compiled by
<URL url="mailto:ndoherty@eei.ericsson.se" name="Niall Doherty">.
</EM>
<SECT1>What is a Cache Digest?
<P>
A Cache Digest is a summary of the contents of an Internet Object Caching
Server.
It contains, in a compact (i.e. compressed) format, an indication of whether
or not particular URLs are in the cache.
<P>
A "lossy" technique is used for compression, which means that very high
compression factors can be achieved at the expense of not having 100%
correct information.
<SECT1>How and why are they used?
<P>
Cache servers periodically exchange their digests with each other.
<P>
When a request for an object (URL) is received from a client a cache
can use digests from its peers to find out which of its peers (if any)
have that object.
The cache can then request the object from the closest peer (Squid
uses the NetDB database to determine this).
<P>
Note that Squid will only make digest queries in those digests that are
<EM>enabled</EM>.
It will disable a peers digest IFF it cannot fetch a valid digest
for that peer.
It will enable that peers digest again when a valid one is fetched.
<P>
The checks in the digest are very fast and they eliminate the need
for per-request queries to peers. Hence:
<P>
<ITEMIZE>
<ITEM>Latency is eliminated and client response time should be improved.
<ITEM>Network utilisation may be improved.
</ITEMIZE>
<P>
Note that the use of Cache Digests (for querying the cache contents of peers)
and the generation of a Cache Digest (for retrieval by peers) are independent.
So, it is possible for a cache to make a digest available for peers, and not
use the functionality itself and vice versa.
<SECT1>What is the theory behind Cache Digests?
<P>
Cache Digests are based on Bloom Filters - they are a method for
representing a set of keys with lookup capabilities;
where lookup means "is the key in the filter or not?".
<P>
In building a cache digest:
<P>
<ITEMIZE>
<ITEM> A vector (1-dimensional array) of m bits is allocated, with all
bits initially set to 0.
<ITEM> A number, k, of independent hash functions are chosen, h1, h2,
..., hk, with range { 1, ..., m }
(i.e. a key hashed with any of these functions gives a value between 1
and m inclusive).
<ITEM> The set of n keys to be operated on are denoted by:
A = { a1, a2, a3, ..., an }.
</ITEMIZE>
<SECT2>Adding a Key
<P>
To add a key the value of each hash function for that key is calculated.
So, if the key was denoted by <EM>a</EM>, then h1(a), h2(a), ...,
hk(a) are calculated.
<P>
The value of each hash function for that key represents an index into
the array and the corresponding bits are set to 1. So, a digest with
6 hash functions would have 6 bits to be set to 1 for each key added.
<P>
Note that the addition of a number of <EM>different</EM> keys could
cause one particular bit to be set to 1 multiple times.
<SECT2>Querying a Key
<P>
To query for the existence of a key the indices into the array are
calculated from the hash functions as above.
<P>
<ITEMIZE>
<ITEM> If any of the corresponding bits in the array are 0 then the key is
not present.
<ITEM> If all of the corresponding bits in the array are 1 then the key is
<EM>likely</EM> to be present.
</ITEMIZE>
<P>
Note the term <EM>likely</EM>.
It is possible that a <EM>collision</EM> in the digest can occur, whereby
the digest incorrectly indicates a key is present.
This is the price paid for the compact representation.
While the probability of a collision can never be reduced to zero it can
be controlled.
Larger values for the ratio of the digest size to the number of entries added
lower the probability.
The number of hash functions chosen also influence the probability.
<SECT2>Deleting a Key
<P>
To delete a key, it is not possible to simply set the associated bits
to 0 since any one of those bits could have been set to 1 by the addition
of a different key!
<P>
Therefore, to support deletions a counter is required for each bit position
in the array.
The procedures to follow would be:
<P>
<ITEMIZE>
<ITEM> When adding a key, set appropriate bits to 1 and increment the
corresponding counters.
<ITEM> When deleting a key, decrement the appropriate counters (while > 0),
and if a counter reaches 0 <EM>then</EM> the corresponding bit is set to 0.
</ITEMIZE>
<SECT1>How is the size of the Cache Digest in Squid determined?
<P>
Upon initialisation, the <EM>capacity</EM> is set to the number
of objects that can be (are) stored in the cache.
Note that there are upper and lower limits here.
<P>
An arbitrary constant, bits_per_entry (currently set to 5), is
used to calculate the size of the array using the following formula:
<P>
<VERB>
number of bits in array = capacity * bits_per_entry + 7
</VERB>
<P>
The size of the digest, in bytes, is therefore:
<P>
<VERB>
digest size = int (number of bits in array / 8)
</VERB>
<P>
When a digest rebuild occurs, the change in the cache size (capacity)
is measured.
If the capacity has changed by a large enough amount (10%) then
the digest array is freed and reallocated memory, otherwise the
same digest is re-used.
<SECT1>What hash functions (and how many of them) does Squid use?
<P>
The protocol design allows for a variable number of hash functions (k).
However, Squid employs a very efficient method using a fixed number - four.
<P>
Rather than computing a number of independent hash functions over a URL
Squid uses a 128-bit MD5 hash of the key (actually a combination of the URL
and the HTTP retrieval method) and then splits this into four equal
chunks.
Each chunk, modulo the digest size (m), is used as the value for one of
the hash functions - i.e. an index into the bit array.
<P>
Note: As Squid retrieves objects and stores them in its cache on disk,
it adds them to the in-RAM index using a lookup key which is an MD5 hash
- the very one discussed above.
This means that the values for the Cache Digest hash functions are
already available and consequently the operations are extremely
efficient!
<P>
Obviously, modifying the code to support a variable number of hash functions
would prove a little more difficult and would most likely reduce efficiency.
<SECT1>How are objects added to the Cache Digest in Squid?
<P>
Every object referenced in the index in RAM is checked to see if
it is suitable for addition to the digest.
A number of objects are not suitable, e.g. those that are private,
not cachable, negatively cached etc. and are skipped immediately.
<P>
A <EM>freshness</EM> test is next made in an attempt to guess if
the object will expire soon, since if it does, it is not worthwhile
adding it to the digest.
The object is checked against the refresh patterns for staleness...
<P>
Since Squid stores references to objects in its index using the MD5 key
discussed earlier there is no URL actually available for each object -
which means that the pattern used will fall back to the default pattern, ".".
This is an unfortunate state of affairs, but little can be done about
it.
A <EM>cd_refresh_pattern</EM> option will be added to the configuration
file soon which will at least make the confusion a little clearer :-)
<P>
Note that it is best to be conservative with your refresh pattern
for the Cache Digest, i.e.
do <EM>not</EM> add objects if they might become stale soon.
This will reduce the number of False Hits.
<SECT1>Does Squid support deletions in Cache Digests? What are diffs/deltas?
<P>
Squid does not support deletions from the digest.
Because of this the digest must, periodically, be rebuilt from scratch to
erase stale bits and prevent digest pollution.
<P>
A more sophisticated option is to use <EM>diffs</EM> or <EM>deltas</EM>.
These would be created by building a new digest and comparing with the
current/old one.
They would essentially consist of aggregated deletions and additions
since the <EM>previous</EM> digest.
<P>
Since less bandwidth should be required using these it would be possible
to have more frequent updates (and hence, more accurate information).
<P>
Costs:
<P>
<ITEMIZE>
<ITEM>RAM - extra RAM needed to hold two digests while comparisons takes place.
<ITEM>CPU - probably a negligible amount.
</ITEMIZE>
<SECT1>When and how often is the local digest built?
<P>
The local digest is built:
<P>
<ITEMIZE>
<ITEM> when store_rebuild completes after startup
(the cache contents have been indexed in RAM), and
<ITEM> periodically thereafter. Currently, it is rebuilt every hour
(more data and experience is required before other periods, whether
fixed or dynamically varying, can "intelligently" be chosen).
The good thing is that the local cache decides on the expiry time and
peers must obey (see later).
</ITEMIZE>
<P>
While the [new] digest is being built in RAM the old version (stored
on disk) is still valid, and will be returned to any peer requesting it.
When the digest has completed building it is then swapped out to disk,
overwriting the old version.
<P>
The rebuild is CPU intensive, but not overly so.
Since Squid is programmed using an event-handling model, the approach
taken is to split the digest building task into chunks (i.e. chunks
of entries to add) and to register each chunk as an event.
If CPU load is overly high, it is possible to extend the build period
- as long as it is finished before the next rebuild is due!
<P>
It may prove more efficient to implement the digest building as a separate
process/thread in the future...
<SECT1>How are Cache Digests transferred between peers?
<P>
Cache Digests are fetched from peers using the standard HTTP protocol
(note that a <EM>pull</EM> rather than <EM>push</EM> technique is
used).
<P>
After the first access to a peer, a <EM>peerDigestValidate</EM> event
is queued
(this event decides if it is time to fetch a new version of a digest
from a peer).
The queuing delay depends on the number of peers already queued
for validation - so that all digests from different peers are not
fetched simultaneously.
<P>
A peer answering a request for its digest will specify an expiry
time for that digest by using the HTTP <EM>Expires</EM> header.
The requesting cache thus knows when it should request a fresh
copy of that peers digest.
<P>
Note: requesting caches use an If-Modified-Since request in case the peer
has not rebuilt its digest for some reason since the last time it was
fetched.
<SECT1>How and where are Cache Digests stored?
<P>
<SECT2>Cache Digest built locally
<P>
Since the local digest is generated purely for the benefit of its neighbours
keeping it in RAM is not strictly required.
However, it was decided to keep the local digest in RAM partly because of
the following:
<P>
<ITEMIZE>
<ITEM> Approximately the same amount of memory will be (re-)allocated on every
rebuild of the digest,
<ITEM> the memory requirements are probably quite small (when compared to other
requirements of the cache server),
<ITEM> if ongoing updates of the digest are to be supported (e.g. additions/deletions) it will be necessary to perform these operations on a digest
in RAM, and
<ITEM> if diffs/deltas are to be supported the "old" digest would have to
be swapped into RAM anyway for the comparisons.
</ITEMIZE>
<P>
When the digest is built in RAM, it is then swapped out to disk, where it is
stored as a "normal" cache item - which is how peers request it.
<SECT2>Cache Digest fetched from peer
<P>
When a query from a client arrives, <EM>fast lookups</EM> are
required to decide if a request should be made to a neighbour cache.
It it therefore required to keep all peer digests in RAM.
<P>
Peer digests are also stored on disk for the following reasons:
<P>
<ITEMIZE>
<ITEM><EM>Recovery</EM>
- If stopped and restarted, peer digests can be reused from the local
on-disk copy (they will soon be validated using an HTTP IMS request
to the appropriate peers as discussed earlier), and
<ITEM><EM>Sharing</EM>
- peer digests are stored as normal objects in the cache. This
allows them to be given to neighbour caches.
</ITEMIZE>
<SECT1>How are the Cache Digest statistics in the Cache Manager to be interpreted?
<P>
Cache Digest statistics can be seen from the Cache Manager or through the
<EM>client</EM> utility.
The following examples show how to use the <EM>client</EM> utility
to request the list of possible operations from the localhost, local
digest statistics from the localhost, refresh statistics from the
localhost and local digest statistics from another cache, respectively.
<P>
<VERB>
./client mgr:menu
./client mgr:store_digest
./client mgr:refresh
./client -h peer mgr:store_digest
</VERB>
<P>
The available statistics provide a lot of useful debugging information.
The refresh statistics include a section for Cache Digests which
explains why items were added (or not) to the digest.
<P>
The following example shows local digest statistics for a 16GB
cache in a corporate intranet environment
(may be a useful reference for the discussion below).
<P>
<VERB>
store digest: size: 768000 bytes
entries: count: 588327 capacity: 1228800 util: 48%
deletion attempts: 0
bits: per entry: 5 on: 1953311 capacity: 6144000 util: 32%
bit-seq: count: 2664350 avg.len: 2.31
added: 588327 rejected: 528703 ( 47.33 %) del-ed: 0
collisions: on add: 0.23 % on rej: 0.23 %
</VERB>
<P>
<EM>entries:capacity</EM> is a measure of how many items "are likely" to
be added to the digest.
It represents the number of items that were in the local cache at the
start of digest creation - however, upper and lower limits currently
apply.
This value is multiplied by <EM>bits: per entry</EM> (an arbitrary constant)
to give <EM>bits:capacity</EM>, which is the size of the cache digest in bits.
Dividing this by 8 will give <EM>store digest: size</EM> which is the
size in bytes.
<P>
The number of items represented in the digest is given by
<EM>entries:count</EM>.
This should be equal to <EM>added</EM> minus <EM>deletion attempts</EM>.
Since (currently) no modifications are made to the digest after the initial
build (no additions are made and deletions are not supported)
<EM>deletion attempts</EM> will always be 0 and <EM>entries:count</EM>
should simply be equal to <EM>added</EM>.
<P>
<EM>entries:util</EM> is not really a significant statistic.
At most it gives a measure of how many of the items in the store were
deemed suitable for entry into the cache compared to how many were
"prepared" for.
<P>
<EM>rej</EM> shows how many objects were rejected.
Objects will not be added for a number of reasons, the most common being
refresh pattern settings.
Remember that (currently) the default refresh pattern will be used for
checking for entry here and also note that changing this pattern can
significantly affect the number of items added to the digest!
Too relaxed and False Hits increase, too strict and False Misses increase.
Remember also that at time of validation (on the peer) the "real" refresh
pattern will be used - so it is wise to keep the default refresh pattern
conservative.
<P>
<EM>bits: on</EM> indicates the number of bits in the digest that are set
to 1.
<EM>bits: util</EM> gives this figure as a percentage of the total number
of bits in the digest.
As we saw earlier, a figure of 50% represents the optimal trade-off.
Values too high (say > 75%) would cause a larger number of collisions,
and hence False Hits,
while lower values mean the digest is under-utilised (using unnecessary RAM).
Note that low values are normal for caches that are starting to fill up.
<P>
A bit sequence is an uninterrupted sequence of bits with the same value.
<EM>bit-seq: avg.len</EM> gives some insight into the quality of the hash
functions.
Long values indicate problem, even if <EM>bits:util</EM> is 50%
(> 3 = suspicious, > 10 = very suspicious).
<SECT1>What are False Hits and how should they be handled?
<P>
A False Hit occurs when a cache believes a peer has an object
and asks the peer for it <EM>but</EM> the peer is not able to
satisfy the request.
<P>
Expiring or stale objects on the peer are frequent causes of False
Hits.
At the time of the query actual refresh patterns are used on the
peer and stale entries are marked for revalidation.
However, revalidation is prohibited unless the peer is behaving
as a parent, or <EM>miss_access</EM> is enabled.
Thus, clients can receive error messages instead of revalidated
objects!
<P>
The frequency of False Hits can be reduced but never eliminated
completely, therefore there must be a robust way of handling them
when they occur.
The philosophy behind the design of Squid is to use lightweight
techniques and optimise for the common case and robustly handle the
unusual case (False Hits).
<P>
Squid will soon support the HTTP <EM>only-if-cached</EM> header.
Requests for objects made to a peer will use this header and if
the objects are not available, the peer can reply appropriately
allowing Squid to recognise the situation.
The following describes what Squid is aiming towards:
<P>
<ITEMIZE>
<ITEM>Cache Digests used to obtain good estimates of where a
requested object is located in a Cache Hierarchy.
<ITEM>Persistent HTTP Connections between peers.
There will be no TCP startup overhead and both latency and
network load will be similar for ICP (i.e. fast).
<ITEM>HTTP False Hit Recognition using the <EM>only-if-cached</EM>
HTTP header - allowing fall back to another peer or, if no other
peers are available with the object, then going direct (or
<EM>through</EM> a parent if behind a firewall).
</ITEMIZE>
<SECT1>How can Cache Digest related activity be traced/debugged?
<P>
<SECT2>Enabling Cache Digests
<P>
If you wish to use Cache Digests (available in Squid version 2) you need to
add a <EM>configure</EM> option, so that the relevant code is compiled in:
<P>
<VERB>
./configure --enable-cache-digests ...
</VERB>
<SECT2>What do the access.log entries look like?
<P>
If a request is forwarded to a neighbour due a HIT in that neighbour's
Cache Digest the hierarchy (9th) field of the access.log file for
the <EM>local cache</EM> will look like <EM>CACHE_DIGEST_HIT/neighbour</EM>.
The Log Tag (4th field) should obviously show a MISS.
<P>
On the peer cache the request should appear as a normal HTTP request
from the first cache.
<SECT2>What does a False Hit look like?
<P>
The easiest situation to analyse is when two caches (say A and B) are
involved neither of which uses the other as a parent.
In this case, a False Hit would show up as a CACHE_DIGEST_HIT on A and
<EM>NOT</EM> as a TCP_HIT on B (or vice versa).
If B does not fetch the object for A then the hierarchy field will
look like <EM>NONE/-</EM> (and A will have received an Access Denied
or Forbidden message).
This will happen if the object is not "available" on B and B does not
have <EM>miss_access</EM> enabled for A (or is not acting as a parent
for A).
<SECT2>How is the cause of a False Hit determined?
<P>
Assume A requests a URL from B and receives a False Hit
<ITEMIZE>
<ITEM> Using the <EM>client</EM> utility <EM>PURGE</EM> the URL from A, e.g.
<P>
<VERB>
./client -m PURGE 'URL'
</VERB>
<ITEM> Using the <EM>client</EM> utility request the object from A, e.g.
<P>
<VERB>
./client 'URL'
</VERB>
</ITEMIZE>
<P>
The HTTP headers of the request are available.
Two header types are of particular interest:
<P>
<ITEMIZE>
<ITEM> <EM>X-Cache</EM> - this shows whether an object is available or not.
<ITEM> <EM>X-Cache-Lookup</EM> - this keeps the result of a store table lookup
<EM>before</EM> refresh causing rules are checked (i.e. it indicates if the
object is available before any validation would be attempted).
</ITEMIZE>
<P>
The X-Cache and X-Cache-Lookup headers from A should both show MISS.
<P>
If A requests the object from B (which it will if the digest lookup indicates
B has it - assuming B is closest peer of course :-) then there will be another
set of these headers from B.
<P>
If the X-Cache header from B shows a MISS a False Hit has occurred.
This means that A thought B had an object but B tells A it does not
have it available for retrieval.
The reason why it is not available for retrieval is indicated by the
X-Cache-Lookup header. If:
<P>
<ITEMIZE>
<ITEM>
<EM>X-Cache-Lookup = MISS</EM> then either A's (version of
B's) digest is out-of-date or corrupt OR a collision occurred
in the digest (very small probability) OR B recently purged
the object.
<ITEM>
<EM>X-Cache-Lookup = HIT</EM> then B had the object, but
refresh rules (or A's max-age requirements) prevent A from
getting a HIT (validation failed).
</ITEMIZE>
<SECT2>Use The Source
<P>
If there is something else you need to check you can always look at the
source code.
The main Cache Digest functionality is organised as follows:
<P>
<ITEMIZE>
<ITEM> <EM>CacheDigest.c (debug section 70)</EM> Generic Cache Digest routines
<ITEM> <EM>store_digest.c (debug section 71)</EM> Local Cache Digest routines
<ITEM> <EM>peer_digest.c (debug section 72)</EM> Peer Cache Digest routines
</ITEMIZE>
<P>
Note that in the source the term <EM>Store Digest</EM> refers to the digest
created locally.
The Cache Digest code is fairly self-explanatory (once you understand how Cache
Digests work):
<SECT1>What about ICP?
<P>
COMING SOON!
<SECT1>Is there a Cache Digest Specification?
<P>
There is now, thanks to
<URL url="mailto:martin@net.lut.ac.uk" name="Martin Hamilton"> and
<URL url="mailto:rousskov@ircache.net" name="Alex Rousskov">.
<P>
Cache Digests, as implemented in Squid 2.1.PATCH2, are described in
<URL url="/CacheDigest/cache-digest-v5.txt" name="cache-digest-v5.txt">.
You'll notice the format is similar to an Internet Draft.
We decided not to submit this document as a draft because Cache Digests
will likely undergo some important changes before we want to try to make
it a standard.
<sect1>Would it be possible to stagger the timings when cache_digests are retrieved from peers?
<p>
<em>Note: The information here is current for version 2.2.</em>
<p>
Squid already has code to spread the digest updates. The algorithm is
currently controlled by a few hard-coded constants in <em/peer_digest.c/. For
example, <em/GlobDigestReqMinGap/ variable determines the minimum interval
between two requests for a digest. You may want to try to increase the
value of GlobDigestReqMinGap from 60 seconds to whatever you feel
comfortable with (but it should be smaller than hour/number_of_peers, of
course).
<p>
Note that whatever you do, you still need to give Squid enough time and
bandwidth to fetch all the digests. Depending on your environment, that
bandwidth may be more or less than an ICP would require. Upcoming digest
deltas (x10 smaller than the digests themselves) may be the only way to
solve the ``big scale'' problem.
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Transparent Caching/Proxying
<label id="trans-caching">
<P>
<it>
How can I make my users' browsers use my cache without configuring
the browsers for proxying?
</it>
First, it is <em/critical/ to read the full comments in the squid.conf
file! That is the only authoritative source for configuration
information. However, the following instructions are correct as of
this writing (July 1999.)
<P>
Getting transparent caching to work requires four distinct steps:
<enum>
<item>
<bf/Compile and run a version of Squid which accepts
connections for other addresses/. For some operating systems,
you need to have configured and built a version of Squid which
can recognize the hijacked connections and discern the
destination addresses. For Linux this seems to work
automatically. For *BSD-based systems, you probably have to
configure squid with the <em/--enable-ipf-transparent/ option.
(Do a <em/make clean/ if you previously configured without that
option, or the correct settings may not be present.)
<item>
<bf/Configure Squid to accept and process the connections/.
You have to change the Squid configuration settings to
recognize the hijacked connections and discern the destination
addresses. Here are the important settings in <em/squid.conf/:
<verb>
http_port 8080
httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
</verb>
<item>
<bf/Get your cache server to accept the packets/. You have to
configure your cache host to accept the redirected packets -
any IP address, on port 80 - and deliver them to your cache
application. This is typically done with IP
filtering/forwarding features built into the kernel.
On linux they call this <em/ipfilter/ (kernel 2.4.x),
<em/ipchains/ (2.2.x) or <em/ipfwadm/ (2.0.x).
On FreeBSD and other
*BSD systems they call it <em/ip filter/ or <em/ipnat/; on many
systems, it may require rebuilding the kernel or adding a new
loadable kernel module.
<item>
<bf/Get the packets to your cache server/. There are several
ways to do this. First, if your proxy machine is already in
the path of the packets (i.e. it is routing between your proxy
users and the Internet) then you don't have to worry about this
step. This would be true if you install Squid on a firewall
machine, or on a UNIX-based router. If the cache is not in the
natural path of the connections, then you have to divert the
packets from the normal path to your cache host using a router
or switch. You may be able to do this with a Cisco router using
their "route maps" feature, depending on your IOS version. You
might also use a so-called layer-4 switch, such as the Alteon
ACE-director or the Foundry Networks ServerIron. Finally, you
might be able to use a stand-alone router/load-balancer type
product, or routing capabilities of an access server.
</enum>
<bf/Notes/:
<itemize>
<item>The <em/http_port 8080/ in this example assumes you will redirect
incoming port 80 packets to port 8080 on your cache machine. If you
are running Squid on port 3128 (for example) you can leave it there via
<em/http_port 3128/, and redirect to that port via your IP filtering or
forwarding commands.
<item>In the <em/httpd_accel_host/ option, <em/virtual/ is the magic word!
<item>The <em/httpd_accel_with_proxy on/ is required to enable transparent
proxy mode; essentially in transparent proxy mode Squid thinks it is acting
both as an accelerator (hence accepting packets for other IPs on port 80) and
a caching proxy (hence serving files out of cache.)
<item> You <bf/must/ use <em/httpd_accel_uses_host_header on/ to get
the cache to work properly in transparent mode. This enables the cache
to index its stored objects under the true hostname, as is done in a
normal proxy, rather than under the IP address. This is especially
important if you want to use a parent cache hierarchy, or to share
cache data between transparent proxy users and non-transparent proxy
users, which you can do with Squid in this configuration.
</itemize>
<sect1>Transparent caching for Solaris, SunOS, and BSD systems
<sect2>Install IP Filter
<P>
First, get and install the
<url url="ftp://coombs.anu.edu.au/pub/net/ip-filter/"
name="IP Filter package">.
<sect2>Configure ipnat
<P>
Put these lines in <em>/etc/ipnat.rules</em>:
<verb>
# Redirect direct web traffic to local web server.
rdr de0 1.2.3.4/32 port 80 -> 1.2.3.4 port 80 tcp
# Redirect everything else to squid on port 8080
rdr de0 0.0.0.0/0 port 80 -> 1.2.3.4 port 8080 tcp
</verb>
<P>
Modify your startup scripts to enable ipnat. For example, on FreeBSD it
looks something like this:
<verb>
/sbin/modload /lkm/if_ipl.o
/sbin/ipnat -f /etc/ipnat.rules
chgrp nobody /dev/ipnat
chmod 644 /dev/ipnat
</verb>
<sect2>Configure Squid
<sect3>Squid-2
<P>
Squid-2 (after version beta25) has IP filter support built in.
Simple enable it when you run <em/configure/:
<verb>
./configure --enable-ipf-transparent
</verb>
Add these lines to your <em/squid.conf/ file:
<verb>
http_port 8080
httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
</verb>
Note, you don't have to use port 8080, but it must match whatever you
used in the <em>/etc/ipnat.rules</em> file.
<sect3>Squid-1.1
<P>
Patches for Squid-1.X are available from
<url url="http://www.fan.net.au/~q/squid/" name="Quinton Dolan's Squid page">.
Add these lines to <em/squid.conf/:
<verb>
http_port 8080
httpd_accel virtual 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
</verb>
<P>
Thanks to <url url="mailto:q@fan.net.au" name="Quinton Dolan">.
<sect1>Transparent caching with Linux
<label id="trans-linux-1">
<P>
by <url url="mailto:Rodney.van.den.Oever@tip.nl" name="Rodney van den Oever">
<P><bf/Note:/ Transparent proxying does NOT work with Linux&nbsp;2.0.30!
Linux&nbsp;2.0.29 is known to work well. If you're using a more recent
kernel, like 2.2.X, then you should probably use an ipchains configuration,
<ref id="trans-linux-2" name="as described below">.
<P>
<bf/Warning:/ this technique has some shortcomings.
<enum>
<item><bf/This method only supports the HTTP protocol, not gopher or FTP/
<item>Since the browser wasn't set up to use a proxy server, it uses
the FTP protocol (with destination port 21) and not the required
HTTP protocol. You can't setup a redirection-rule to the proxy
server since the browser is speaking the wrong protocol. A similar
problem occurs with gopher. Normally all proxy requests are
translated by the client into the HTTP protocol, but since the
client isn't aware of the redirection, this never happens.
</enum>
<P>
If you can live with the side-effects, go ahead and compile your
kernel with firewalling and redirection support. Here are the
important parameters from <em>/usr/src/linux/.config</em>:
<verb>
#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
#
# Networking options
#
CONFIG_FIREWALL=y
# CONFIG_NET_ALIAS is not set
CONFIG_INET=y
CONFIG_IP_FORWARD=y
# CONFIG_IP_MULTICAST is not set
CONFIG_IP_FIREWALL=y
# CONFIG_IP_FIREWALL_VERBOSE is not set
CONFIG_IP_MASQUERADE=y
CONFIG_IP_TRANSPARENT_PROXY=y
CONFIG_IP_ALWAYS_DEFRAG=y
# CONFIG_IP_ACCT is not set
CONFIG_IP_ROUTER=y
</verb>
<P>
You may also need to enable <bf/IP Forwarding/. One way to do it
is to add this line to your startup scripts:
<verb>
echo 1 > /proc/sys/net/ipv4/ip_forward
</verb>
<P>
Go to the
<url url="http://www.xos.nl/linux/ipfwadm/"
name="Linux IP Firewall and Accounting"> page,
obtain the source distribution to <em/ipfwadm/ and install it.
Older versions of <em/ipfwadm/ may not work. You might need
at least version <bf/2.3.0/.
You'll use <em/ipfwadm/ to setup the redirection rules. I
added this rule to the script that runs from <em>/etc/rc.d/rc.inet1</em>
(Slackware) which sets up the interfaces at boot-time. The redirection
should be done before any other Input-accept rule. To really make
sure it worked I disabled the forwarding (masquerading) I normally
do.
<P>
<em>/etc/rc.d/rc.firewall</em>:
<verb>
#!/bin/sh
# rc.firewall Linux kernel firewalling rules
FW=/sbin/ipfwadm
# Flush rules, for testing purposes
for i in I O F # A # If we enabled accounting too
do
${FW} -$i -f
done
# Default policies:
${FW} -I -p rej # Incoming policy: reject (quick error)
${FW} -O -p acc # Output policy: accept
${FW} -F -p den # Forwarding policy: deny
# Input Rules:
# Loopback-interface (local access, eg, to local nameserver):
${FW} -I -a acc -S localhost/32 -D localhost/32
# Local Ethernet-interface:
# Redirect to Squid proxy server:
${FW} -I -a acc -P tcp -D default/0 80 -r 8080
# Accept packets from local network:
${FW} -I -a acc -P all -S localnet/8 -D default/0 -W eth0
# Only required for other types of traffic (FTP, Telnet):
# Forward localnet with masquerading (udp and tcp, no icmp!):
${FW} -F -a m -P tcp -S localnet/8 -D default/0
${FW} -F -a m -P udp -S localnet/8 -D default/0
</verb>
<P>
Here all traffic from the local LAN with any destination gets redirected to
the local port 8080. Rules can be viewed like this:
<verb>
IP firewall input rules, default policy: reject
type prot source destination ports
acc all 127.0.0.1 127.0.0.1 n/a
acc/r tcp 10.0.0.0/8 0.0.0.0/0 * -> 80 => 8080
acc all 10.0.0.0/8 0.0.0.0/0 n/a
acc tcp 0.0.0.0/0 0.0.0.0/0 * -> *
</verb>
<P>
I did some testing on Windows 95 with both Microsoft Internet
Explorer 3.01 and Netscape Communicator pre-release and it worked
with both browsers with the proxy-settings disabled.
<P>
At one time <em/squid/ seemed to get in a loop when I pointed the
browser to the local port 80. But this could be avoided by adding a
reject rule for client to this address:
<verb>
${FW} -I -a rej -P tcp -S localnet/8 -D hostname/32 80
IP firewall input rules, default policy: reject
type prot source destination ports
acc all 127.0.0.1 127.0.0.1 n/a
rej tcp 10.0.0.0/8 10.0.0.1 * -> 80
acc/r tcp 10.0.0.0/8 0.0.0.0/0 * -> 80 => 8080
acc all 10.0.0.0/8 0.0.0.0/0 n/a
acc tcp 0.0.0.0/0 0.0.0.0/0 * -> *
</verb>
<P>
<em/NOTE on resolving names/: Instead of just
passing the URLs to the proxy server, the browser itself has to
resolve the URLs. Make sure the workstations are setup to query
a local nameserver, to minimize outgoing traffic.
<P>
If you're already running a nameserver at the firewall or proxy server
(which is a good idea anyway IMHO) let the workstations use this
nameserver.
<P>
Additional notes from
<url url="mailto:RichardA@noho.co.uk"
name="Richard Ayres">
<quote>
<P>
I'm using such a setup. The only issues so far have been that:
<enum>
<item>
It's fairly useless to use my service providers parent caches
(cache-?.www.demon.net) because by proxying squid only sees IP addresses,
not host names and demon aren't generally asked for IP addresses by other
users;
<item>
Linux kernel 2.0.30 is a no-no as transparent proxying is broken (I use
2.0.29);
<item>
Client browsers must do host name lookups themselves, as they don't know
they're using a proxy;
<item>
The Microsoft Network won't authorize its users through a proxy, so I
have to specifically *not* redirect those packets (my company is a MSN
content provider).
</enum>
Aside from this, I get a 30-40% hit rate on a 50MB cache for 30-40 users and
am quite pleased with the results.
</quote>
<P>
See also <url url="http://www.unxsoft.com/transproxy.html"
name="Daniel Kiracofe's page">.
<sect1>Transparent caching with Cisco routers
<P>
by <url url="mailto:John.Saunders@scitec.com.au" name="John Saunders">
<P>
This works with at least IOS 11.1 and later I guess. Possibly earlier,
as I'm no CISCO expert I can't say for sure. If your router is doing
anything more complicated that shuffling packets between an ethernet
interface and either a serial port or BRI port, then you should work
through if this will work for you.
<P>
First define a route map with a name of proxy-redirect (name doesn't
matter) and specify the next hop to be the machine Squid runs on.
<verb>
!
route-map proxy-redirect permit 10
match ip address 110
set ip next-hop 203.24.133.2
!
</verb>
Define an access list to trap HTTP requests. The second line allows
the Squid host direct access so an routing loop is not formed.
By carefully writing your access list as show below, common
cases are found quickly and this can greatly reduce the load on your
router's processor.
<verb>
!
access-list 110 deny tcp any any neq www
access-list 110 deny tcp host 203.24.133.2 any
access-list 110 permit tcp any any
!
</verb>
Apply the route map to the ethernet interface.
<verb>
!
interface Ethernet0
ip policy route-map proxy-redirect
!
</verb>
<sect2>possible bugs
<P>
<url url="mailto:morgan@curtin.net" name="Bruce Morgan"> notes that
there is a Cisco bug relating to transparent proxying using IP
policy route maps, that causes NFS and other applications to break.
Apparently there are two bug reports raised in Cisco, but they are
not available for public dissemination.
<P>
The problem occurs with o/s packets with more than 1472 data bytes. If you try
to ping a host with more than 1472 data bytes across a Cisco interface with the
access-lists and ip policy route map, the icmp request will fail. The
packet will be fragmented, and the first fragment is checked against the
access-list and rejected - it goes the "normal path" as it is an icmp
packet - however when the second fragment is checked against the
access-list it is accepted (it isn't regarded as an icmp packet), and
goes to the action determined by the policy route map!
<P>
<url url="mailto:John.Saunders@scitec.com.au" name="John"> notes that you
may be able to get around this bug by carefully writing your access lists.
If the last/default rule is to permit then this bug
would be a problem, but if the last/default rule was to deny then
it won't be a problem. I guess fragments, other than the first,
don't have the information available to properly policy route them.
Normally TCP packets should not be fragmented, at least my network
runs an MTU of 1500 everywhere to avoid fragmentation. So this would
affect UDP and ICMP traffic only.
<P>
Basically, you will have to pick between living with the bug or better
performance. This set has better performance, but suffers from the
bug:
<verb>
access-list 110 deny tcp any any neq www
access-list 110 deny tcp host 10.1.2.3 any
access-list 110 permit tcp any any
</verb>
Conversely, this set has worse performance, but works for all protocols:
<verb>
access-list 110 deny tcp host 10.1.2.3 any
access-list 110 permit tcp any any eq www
access-list 110 deny tcp any any
</verb>
<sect1>Transparent caching with LINUX 2.0.29 and CISCO IOS 11.1
<P>
Just for kicks, here's an email message posted to squid-users
on how to make transparent proxying work with a Cisco router
and Squid running on Linux.
<P>
by <url url="mailto:signal@shreve.net" name="Brian Feeny">
<P>
Here is how I have Transparent proxying working for me, in an environment
where my router is a Cisco 2501 running IOS 11.1, and Squid machine is
running Linux 2.0.33.
<P>
Many thanks to the following individuals and the squid-users list for
helping me get redirection and transparent proxying working on my
Cisco/Linux box.
<itemize>
<item>Lincoln Dale
<item>Riccardo Vratogna
<item>Mark White
<item>Henrik Nordstrom
</itemize>
<P>
First, here is what I added to my Cisco, which is running IOS 11.1. In
IOS 11.1 the route-map command is "process switched" as opposed to the
faster "fast-switched" route-map which is found in IOS 11.2 and later.
You may wish to be running IOS 11.2. I am running 11.1, and have had no
problems with my current load of about 150 simultaneous connections to
squid.:
<verb>
!
interface Ethernet0
description To Office Ethernet
ip address 208.206.76.1 255.255.255.0
no ip directed-broadcast
no ip mroute-cache
ip policy route-map proxy-redir
!
access-list 110 deny tcp host 208.206.76.44 any eq www
access-list 110 permit tcp any any eq www
route-map proxy-redir permit 10
match ip address 110
set ip next-hop 208.206.76.44
</verb>
<P>
So basically from above you can see I added the "route-map" declaration,
and an access-list, and then turned the route-map on under int e0 "ip
policy route-map proxy-redir"
<P>
ok, so the Cisco is taken care of at this point. The host above:
208.206.76.44, is the ip number of my squid host.
<P>
My squid box runs Linux, so I had to do the following on it:
<P>
my kernel (2.0.33) config looks like this:
<verb>
#
# Networking options
#
CONFIG_FIREWALL=y
# CONFIG_NET_ALIAS is not set
CONFIG_INET=y
CONFIG_IP_FORWARD=y
CONFIG_IP_MULTICAST=y
CONFIG_SYN_COOKIES=y
# CONFIG_RST_COOKIES is not set
CONFIG_IP_FIREWALL=y
# CONFIG_IP_FIREWALL_VERBOSE is not set
CONFIG_IP_MASQUERADE=y
# CONFIG_IP_MASQUERADE_IPAUTOFW is not set
CONFIG_IP_MASQUERADE_ICMP=y
CONFIG_IP_TRANSPARENT_PROXY=y
CONFIG_IP_ALWAYS_DEFRAG=y
# CONFIG_IP_ACCT is not set
CONFIG_IP_ROUTER=y
</verb>
<P>
You will need Firewalling and Transparent Proxy turned on at a minimum.
<P>
Then some ipfwadm stuff:
<verb>
# Accept all on loopback
ipfwadm -I -a accept -W lo
# Accept my own IP, to prevent loops (repeat for each interface/alias)
ipfwadm -I -a accept -P tcp -D 208.206.76.44 80
# Send all traffic destined to port 80 to Squid on port 3128
ipfwadm -I -a accept -P tcp -D 0/0 80 -r 3128
</verb>
<P>
it accepts packets on port 80 (redirected from the Cisco), and redirects
them to 3128 which is the port my squid process is sitting on. I put all
this in /etc/rc.d/rc.local
<P>
I am using
<url url="/Versions/1.1/1.1.20/" name="v1.1.20 of Squid"> with
<url url="http://hem.passagen.se/hno/squid/squid-1.1.20.host&lowbar;and&lowbar;virtual.patch"
name="Henrik's patch">
installed. You will want to install this patch if using a setup similar
to mine.
<sect1>The cache is trying to connect to itself...
<P>
by <url url="mailto:hno@hem.passagen.se" name="Henrik Nordstrom">
<P>
I think almost everyone who have tried to build a transparent proxy
setup have been bitten by this one.
<P>
Measures you can take:
<itemize>
<item>
Deny Squid from fetching objects from itself (using ACL lists).
<item>
Apply a small patch that prevents Squid from looping infinitely
(available from <url url="http://hem.passagen.se/hno/squid/" name="Henrik's Squid Patches">)
<item>
Don't run Squid on port 80, and redirect port 80 not destined for
the local machine to Squid (redirection == ipfilter/ipfw/ipfadm). This
avoids the most common loops.
<item>
If you are using ipfilter then you should also use transproxyd in
front of Squid. Squid does not yet know how to interface to ipfilter
(patches are welcome: squid-bugs@ircache.net).
</itemize>
<sect1>Transparent caching with FreeBSD
<label id="trans-freebsd">
<P>
by Duane Wessels
<P>
I set out yesterday to make transparent caching work with Squid and
FreeBSD. It was, uh, fun.
<P>
It was relatively easy to configure a cisco to divert port 80
packets to my FreeBSD box. Configuration goes something like this:
<verb>
access-list 110 deny tcp host 10.0.3.22 any eq www
access-list 110 permit tcp any any eq www
route-map proxy-redirect permit 10
match ip address 110
set ip next-hop 10.0.3.22
int eth2/0
ip policy route-map proxy-redirect
</verb>
Here, 10.0.3.22 is the IP address of the FreeBSD cache machine.
<P>
Once I have packets going to the FreeBSD box, I need to get the
kernel to deliver them to Squid.
I started on FreeBSD-2.2.7, and then downloaded
<url url="ftp://coombs.anu.edu.au/pub/net/ip-filter/"
name="IPFilter">. This was a dead end for me. The IPFilter distribution
includes patches to the FreeBSD kernel sources, but many of these had
conflicts. Then I noticed that the IPFilter page says
``It comes as a part of [FreeBSD-2.2 and later].'' Fair enough. Unfortunately,
you can't hijack connections with the FreeBSD-2.2.X IPFIREWALL code (<em/ipfw/), and
you can't (or at least I couldn't) do it with <em/natd/ either.
<P>
FreeBSD-3.0 has much better support for connection hijacking, so I suggest
you start with that. You need to build a kernel with the following options:
<verb>
options IPFIREWALL
options IPFIREWALL_FORWARD
</verb>
<P>
Next, its time to configure the IP firewall rules with <em/ipfw/.
By default, there are no "allow" rules and all packets are denied.
I added these commands to <em>/etc/rc.local</em>
just to be able to use the machine on my network:
<verb>
ipfw add 60000 allow all from any to any
</verb>
But we're still not hijacking connections. To accomplish that,
add these rules:
<verb>
ipfw add 49 allow tcp from 10.0.3.22 to any
ipfw add 50 fwd 127.0.0.1 tcp from any to any 80
</verb>
The second line (rule 50) is the one which hijacks the connection.
The first line makes sure we never hit rule 50 for traffic originated
by the local machine. This prevents forwarding loops.
<P>
Note that I am not changing the port number here. That is,
port 80 packets are simply diverted to Squid on port 80.
My Squid configuration is:
<verb>
http_port 80
httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
</verb>
<P>
If you don't want Squid to listen on port 80 (because that
requires root privileges) then you can use another port.
In that case your ipfw redirect rule looks like:
<verb>
ipfw add 50 fwd 127.0.0.1,3128 tcp from any to any 80
</verb>
and the <em/squid.conf/ lines are:
<verb>
http_port 3128
httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
</verb>
<sect1>Transparent caching with Linux and ipchains
<label id="trans-linux-2">
<P>
by <url url="mailto:Support@dnet.co.uk" name="Martin Lyons">
<P>
You need to configure your kernel for ipchains.
Configuring Linux kernels is beyond the scope of
this FAQ. One way to do it is:
<verb>
# cd /usr/src/linux
# make menuconfig
</verb>
<p>
The following shows important kernel features to include:
<verb>
[*] Network firewalls
[ ] Socket Filtering
[*] Unix domain sockets
[*] TCP/IP networking
[ ] IP: multicasting
[ ] IP: advanced router
[ ] IP: kernel level autoconfiguration
[*] IP: firewalling
[ ] IP: firewall packet netlink device
[*] IP: always defragment (required for masquerading)
[*] IP: transparent proxy support
</verb>
<P>
You must include the <em>IP: always defragment</em>, otherwise it prevents
you from using the REDIRECT chain.
<P>
You can use this script as a template for your own <em/rc.firewall/
to configure ipchains:
<verb>
#!/bin/sh
# rc.firewall Linux kernel firewalling rules
# Leon Brooks (leon at brooks dot fdns dot net)
FW=/sbin/ipchains
ADD="$FW -A"
# Flush rules, for testing purposes
for i in I O F # A # If we enabled accounting too
do
${FW} -F $i
done
# Default policies:
${FW} -P input REJECT # Incoming policy: reject (quick error)
${FW} -P output ACCEPT # Output policy: accept
${FW} -P forward DENY # Forwarding policy: deny
# Input Rules:
# Loopback-interface (local access, eg, to local nameserver):
${ADD} input -j ACCEPT -s localhost/32 -d localhost/32
# Local Ethernet-interface:
# Redirect to Squid proxy server:
${ADD} input -p tcp -d 0/0 80 -j REDIRECT 8080
# Accept packets from local network:
${ADD} input -j ACCEPT -s localnet/8 -d 0/0 -i eth0
# Only required for other types of traffic (FTP, Telnet):
# Forward localnet with masquerading (udp and tcp, no icmp!):
${ADD} forward -j MASQ -p tcp -s localnet/8 -d 0/0
${ADD} forward -j MASQ -P udp -s localnet/8 -d 0/0
</verb>
<P>
Also, <url url="mailto:andrew@careless.net" name="Andrew Shipton">
notes that with 2.0.x kernels you don't need to enable packet forwarding,
but with the 2.1.x and 2.2.x kernels using ipchains you do. Packet
forwarding is enabled with the following command:
<verb>
echo 1 > /proc/sys/net/ipv4/ip_forward
</verb>
<sect1>Transparent caching with ACC Tigris digital access server
<P>
by <url url="mailto:John.Saunders@scitec.com.au" name="John Saunders">
<P>
This is to do with configuring transparent proxy
for an ACC Tigris digital access server (like a CISCO 5200/5300
or an Ascend MAX 4000). I've found that doing this in the NAS
reduces traffic on the LAN and reduces processing load on the
CISCO. The Tigris has ample CPU for filtering.
<P>
Step 1 is to create filters that allow local traffic to pass.
Add as many as needed for all of your address ranges.
<verb>
ADD PROFILE IP FILTER ENTRY local1 INPUT 10.0.3.0 255.255.255.0 0.0.0.0 0.0.0.0 NORMAL
ADD PROFILE IP FILTER ENTRY local2 INPUT 10.0.4.0 255.255.255.0 0.0.0.0 0.0.0.0 NORMAL
</verb>
<P>
Step 2 is to create a filter to trap port 80 traffic.
<verb>
ADD PROFILE IP FILTER ENTRY http INPUT 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 = 0x6 D= 80 NORMAL
</verb>
<P>
Step 3 is to set the "APPLICATION_ID" on port 80 traffic to 80.
This causes all packets matching this filter to have ID 80
instead of the default ID of 0.
<verb>
SET PROFILE IP FILTER APPLICATION_ID http 80
</verb>
<P>
Step 4 is to create a special route that is used for
packets with "APPLICATION_ID" set to 80. The routing
engine uses the ID to select which routes to use.
<verb>
ADD IP ROUTE ENTRY 0.0.0.0 0.0.0.0 PROXY-IP 1
SET IP ROUTE APPLICATION_ID 0.0.0.0 0.0.0.0 PROXY-IP 80
</verb>
<P>
Step 5 is to bind everything to a filter ID called transproxy.
List all local filters first and the http one last.
<verb>
ADD PROFILE ENTRY transproxy local1 local2 http
</verb>
<P>
With this in place use your RADIUS server to send back the
``Framed-Filter-Id = transproxy'' key/value pair to the NAS.
<P>
You can check if the filter is being assigned to logins with
the following command:
<verb>
display profile port table
</verb>
<sect1>``Connection reset by peer'' and Cisco policy routing
<P>
<url url="mailto:fygrave at tigerteam dot net" name="Fyodor">
has tracked down the cause of unusual ``connection reset by peer'' messages
when using Cisco policy routing to hijack HTTP requests.
<P>
When the network link between router and the cache goes down for just a
moment, the packets that are supposed to be redirected are instead sent
out the default route. If this happens, a TCP ACK from the client host
may be sent to the origin server, instead of being diverted to the
cache. The origin server, upon receiving an unexpected ACK packet,
sends a TCP RESET back to the client, which aborts the client's request.
<P>
To work around this problem, you can install a static route to the
<em/null0/ interface for the cache address with a higher metric (lower
precedence), such as 250. Then, when the link goes down, packets from the client
just get dropped instead of sent out the default route. For example, if
1.2.3.4 is the IP address of your Squid cache, you may add:
<verb>
ip route 1.2.3.4 255.255.255.255 Null0 250
</verb>
This appears to cause the correct behaviour.
<sect1>WCCP - Web Cache Coordination Protocol
<p>
Contributors: <url url="mailto:glenn@ircache.net" name="Glenn Chisholm"> and
<url url="mailto:ltd@cisco.com" name="Lincoln Dale">.
<sect2>Does Squid support WCCP?
<p>
CISCO's Web Cache Coordination Protocol V1.0 is supported in squid
2.3 and later. support WCCP V2.0. Now that WCCP V2 is an open protocol,
Squid may be able to support it in the future.
<sect2>Configuring your Router
<p>
There are two different methods of configuring WCCP on CISCO routers.
The first method is for routers that only support V1.0 of the
protocol. The second is for routers that support both.
<sect3>IOS Version 11.x
<P>
It is possible that later versions of IOS 11.x will support V2.0 of the
protocol. If that is the case follow the 12.x instructions. Several
people have reported that the squid implimentation of WCCP does not
work with their 11.x routers. If you experience this please mail the
debug output from your router to <em/squid-bugs/.
<verb>
conf t
wccp enable
!
interface [Interface Carrying Outgoing Traffic]x/x
!
ip wccp web-cache redirect
!
CTRL Z
write mem
</verb>
<sect3> IOS Version 12.x
<P>
Some of the early versions of 12.x do not have the 'ip wccp version'
command. You will need to upgrade your IOS version to use V1.0.
<p>
You will need to be running at least IOS Software Release <em/12.0(5)T/
if you're running the 12.0 T-train. IOS Software Releases <em/12.0(3)T/
and <em/12.0(4)T/ do not have WCCPv1, but <em/12.0(5)T/ does.
<verb>
conf t
ip wccp version 1
ip wccp web-cache
!
interface [Interface Carrying Outgoing/Incomming Traffic]x/x
ip wccp web-cache redirect out|in
!
CTRL Z
write mem
</verb>
<sect2>IOS 12.3 problems
<p>
Some people report problems with WCCP and IOS 12.3. They see
truncated or fragmented GRE packets arriving at the cache. Apparently
it works if you disable Cisco Express Forwarding for the interface:
<verb>
conf t
ip cep # some systems may need 'ip cep global'
int Ethernet0/0
no ip route-cache cef
CTRL Z
</verb>
<sect2>Configuring FreeBSD
<P>
FreeBSD first needs to be configured to recieve and strip the GRE
encapsulation from the packets from the router. To do this you will
need to patch and recompile your kernel.
<P>
First, a patch needs to be applied to your kernel for GRE
support. Apply the
<url url="../../WCCP-support/FreeBSD-3.x/gre.patch" name="patch for FreeBSD-3.x kernels">
or the
<url url="../../WCCP-support/FreeBSD-4.x/gre.patch" name="patch for FreeBSD-4.x kernels">
as appropriate.
<P>
Secondly you will need to download
<url url="../../WCCP-support/FreeBSD-3.x/gre.c" name="gre.c for FreeBSD-3.x">
or
<url url="../../WCCP-support/FreeBSD-4.x/gre.c" name="gre.c for FreeBSD-4.x">
and copy it to <em>/usr/src/sys/netinet/gre.c</em>.
<P>
Finally add "OPTION GRE" to your kernel config file and rebuild
your kernel. Note, the <em/opt_gre.h/ file is
created when you run <em/config/.
Once your kernel is installed you will need to
<ref id="trans-freebsd" name="configure FreeBSD for transparent proxying">.
<sect2>Configuring Linux 2.2
<p>
Al Blake has written a <url url="http://www.spc.int/it/TechHead/Wccp-squid.html"
name="Cookbook for setting up transparent WCCP using Squid on RedHat Linux and a cisco access server">.
<P>
There are currently two methods for supporting WCCP with Linux 2.2.
A specific purpose module. Or the standard Linux GRE tunneling
driver. People have reported difficulty with the standard GRE
tunneling driver, however it does allow GRE functionality other
than WCCP. You should choose the method that suits your enviroment.
<sect3>Standard Linux GRE Tunnel
<P>
Linux 2.2 kernels already support GRE, as long as the GRE module is
compiled into the kernel.
<P>
You will need to patch the <em/ip_gre.c/ code that comes with your Linux
kernel with this <url url="http://www.vsb.cz/~hal01/cache/wccp/ip_gre.patch"
name="patch"> supplied by <url url="mailto:Jan.Haluza@vsb.cz" name="Jan Haluza">.
<P>
Ensure that the GRE code is either built as static or as a module by chosing
the appropriate option in your kernel config. Then rebuild your kernel.
If it is a module you will need to:
<verb>
modprobe ip_gre
</verb>
The next step is to tell Linux to establish an IP tunnel between the router and
your host. Daniele Orlandi <!-- daniele@orlandi.com --> reports
that you have to give the gre1 interface an address, but any old
address seems to work.
<verb>
iptunnel add gre1 mode gre remote &lt;Router-IP&gt; local &lt;Host-IP&gt; dev &lt;interface&gt;
ifconfig gre1 127.0.0.2 up
</verb>
&lt;Router-IP&gt; is the IP address of your router that is intercepting the
HTTP packets. &lt;Host-IP&gt; is the IP address of your cache, and
&lt;interface&gt; is the network interface that receives those packets (probably eth0).
<sect3>WCCP Specific Module
<P>
This module is not part of the standard Linux distributon. It needs
to be compiled as a module and loaded on your system to function.
Do not attempt to build this in as a static part of your kernel.
<P>
Download the <url url="../../WCCP-support/Linux/ip_wccp.c" name="Linux WCCP module">
and compile it as you would any Linux network module.
<P>
Copy the module to <em>/lib/modules/kernel-version/ipv4/ip_wccp.o</em>. Edit
<em>/lib/modules/kernel-version/modules.dep</em> and add:
<verb>
/lib/modules/kernel-version/ipv4/ip_wccp.o:
</verb>
<P>
Finally you will need to load the module:
<verb>
modprobe ip_wccp
</verb>
<sect3>Common Steps
<P>
The machine should now be striping the GRE encapsulation from any packets
recieved and requeuing them. The system will also need to be configured
for transparent proxying, either with <ref id="trans-linux-1" name="ipfwadm">
or with <ref id="trans-linux-2" name="ipchains">.
<sect2>Configuring Others
<P>
If you have managed to configuring your operating system to support WCCP
with Squid
please contact us with the details so we may share them with others.
<sect1>Can someone tell me what version of cisco IOS WCCP is added in?
<p>
IOS releases:
<itemize>
<item>11.1(19?)CA/CC or later
<item>11.2(14)P or later
<item>12.0(anything) or later
</itemize>
<sect1>What about WCCPv2?
<p>
Cisco has published WCCPv2 as an <url url="http://www.ietf.org/internet-drafts/draft-wilson-wrec-wccp-v2-00.txt"
name="Internet Draft"> (expires Jan 2001).
At this point, Squid does not support WCCPv2, but anyone
is welcome to code it up and contribute to the Squid project.
<sect1>Transparent caching with Foundry L4 switches
<p>
by <url url="mailto:signal at shreve dot net" name="Brian Feeny">.
<p>
First, configure Squid for transparent caching as detailed
at the <ref id="trans-caching" name="beginning of this section">.
<p>
Next, configure
the Foundry layer 4 switch to
transparently redirect traffic to your Squid box or boxes. By default,
the Foundry
redirects to port 80 of your squid box. This can
be changed to a different port if needed, but won't be covered
here.
<p>
In addition, the switch does a "health check" of the port to make
sure your squid is answering. If you squid does not answer, the
switch defaults to sending traffic directly thru instead of
redirecting it. When the Squid comes back up, it begins
redirecting once again.
<p>
This example assumes you have two squid caches:
<verb>
squid1.foo.com 192.168.1.10
squid2.foo.com 192.168.1.11
</verb>
<p>
We will assume you have various workstations, customers, etc, plugged
into the switch for which you want them to be transparently proxied.
The squid caches themselves should be plugged into the switch as well.
Only the interface that the router is connected to is important. Where you
put the squid caches or other connections does not matter.
<p>
This example assumes your router is plugged into interface <bf/17/
of the switch. If not, adjust the following commands accordingly.
<enum>
<item>
Enter configuration mode:
<verb>
telnet@ServerIron#conf t
</verb>
<item>
Configure each squid on the Foundry:
<verb>
telnet@ServerIron(config)# server cache-name squid1 192.168.1.10
telnet@ServerIron(config)# server cache-name squid2 192.168.1.11
</verb>
<item>
Add the squids to a cache-group:
<verb>
telnet@ServerIron(config)#server cache-group 1
telnet@ServerIron(config-tc-1)#cache-name squid1
telnet@ServerIron(config-tc-1)#cache-name squid2
</verb>
<item>
Create a policy for caching http on a local port
<verb>
telnet@ServerIron(config)# ip policy 1 cache tcp http local
</verb>
<item>
Enable that policy on the port connected to your router
<verb>
telnet@ServerIron(config)#int e 17
telnet@ServerIron(config-if-17)# ip-policy 1
</verb>
</enum>
<p>
Since all outbound traffic to the Internet goes out interface
<bf/17/ (the router), and interface <bf/17/ has the caching policy applied to
it, HTTP traffic is going to be intercepted and redirected to the
caches you have configured.
<p>
The default port to redirect to can be changed. The load balancing
algorithm used can be changed (Least Used, Round Robin, etc). Ports
can be exempted from caching if needed. Access Lists can be applied
so that only certain source IP Addresses are redirected, etc. This
information was left out of this document since this was just a quick
howto that would apply for most people, not meant to be a comprehensive
manual of how to configure a Foundry switch. I can however revise this
with any information necessary if people feel it should be included.
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>SNMP
<P>
Contributors: <url url="mailto:glenn@ircache.net" name="Glenn Chisholm">.
<sect1>Does Squid support SNMP?
<P>
True SNMP support is available in squid 2 and above. A significant change in the implimentation
occured starting with the development 2.2 code. Therefore there are two sets of instructions
on how to configure SNMP in squid, please make sure that you follow the correct one.
<sect1>Enabling SNMP in Squid
<P>
To use SNMP, it must first be enabled with the <em/configure/ script,
and squid rebuilt. To enable is first run the script:
<verb>
./configure --enable-snmp [ ... other configure options ]
</verb>
Next, recompile after cleaning the source tree :
<verb>
make clean
make all
make install
</verb>
Once the compile is completed and the new binary is installed the <em/squid.conf/ file
needs to be configured to allow access; the default is to deny all requests. The
instructions on how to do this have been broken into two parts, the first for all versions
of Squid from 2.2 onwards and the second for 2.1 and below.
<sect1>Configuring Squid 2.2
<P>
To configure SNMP first specify a list of communities that you would like to allow access
by using a standard acl of the form:
<verb>
acl aclname snmp_community string
</verb>
For example:
<verb>
acl snmppublic snmp_community public
acl snmpjoebloggs snmp_community joebloggs
</verb>
This creates two acl's, with two different communities, public and joebloggs. You can
name the acl's and the community strings anything that you like.
<P>
To specify the port that the agent will listen on modify the "snmp_port" parameter,
it is defaulted to 3401. The port that the agent will forward requests that can
not be furfilled by this agent to is set by "forward_snmpd_port" it is defaulted
to off. It must be configured for this to work. Remember that as the requests will
be originating from this agent you will need to make sure that you configure
your access accordingly.
<P>
To allow access to Squid's SNMP agent, define an <em/snmp_access/ ACL with the community
strings that you previously defined.
For example:
<verb>
snmp_access allow snmppublic localhost
snmp_access deny all
</verb>
The above will allow anyone on the localhost who uses the community <em/public/ to
access the agent. It will deny all others access.
<p>
If you do not define any <em/snmp_access/ ACL's, then
SNMP access is denied by default.
<P>
Finally squid allows to you to configure the address that the agent will bind to
for incomming and outgoing traffic. These are defaulted to 0.0.0.0, changing these
will cause the agent to bind to a specific address on the host, rather than the
default which is all.
<verb>
snmp_incoming_address 0.0.0.0
snmp_outgoing_address 0.0.0.0
</verb>
<sect1>Configuring Squid 2.1
<P>
Prior to Squid 2.1 the SNMP code had a number of issues with the ACL's. If you are
a frequent user of SNMP with Squid, please upgrade to 2.2 or higher.
<p>
A sort of default, working configuration is:
<verb>
snmp_port 3401
snmp_mib_path /local/squid/etc/mib.txt
snmp_agent_conf view all .1.3.6 included
snmp_agent_conf view squid .1.3.6 included
snmp_agent_conf user squid - all all public
snmp_agent_conf user all all all all squid
snmp_agent_conf community public squid squid
snmp_agent_conf community readwrite all all
</verb>
<P>
Note that for security you are advised to restrict SNMP access to your
caches. You can do this easily as follows:
<verb>
acl snmpmanagementhosts 1.2.3.4/255.255.255.255 1.2.3.0/255.255.255.0
snmp_acl public deny all !snmpmanagementhosts
snmp_acl readwrite deny all
</verb>
You must follow these instructions for 2.1 and below exactly or you are
likely to have problems. The parser has some issues which have been corrected
in 2.2.
<sect1>How can I query the Squid SNMP Agent
<P>
You can test if your Squid supports SNMP with the <em/snmpwalk/ program
(<em/snmpwalk/ is a part of the
<url url="http://www.ece.ucdavis.edu/ucd-snmp/" name="UCD-SNMP project">).
Note that you have to specify the SNMP port, which in Squid defaults to
3401.
<verb>
snmpwalk -p 3401 hostname communitystring .1.3.6.1.4.1.3495.1.1
</verb>
If it gives output like:
<verb>
enterprises.nlanr.squid.cacheSystem.cacheSysVMsize = 7970816
enterprises.nlanr.squid.cacheSystem.cacheSysStorage = 2796142
enterprises.nlanr.squid.cacheSystem.cacheUptime = Timeticks: (766299) 2:07:42.99
</verb>
then it is working ok, and you should be able to make nice statistics out of it.
<P>
For an explanation of what every string (OID) does, you should
refer to the <url url="http://www.ircache.net/Cache/cache-snmp/"
name="Cache SNMP web pages">.
<sect1>What can I use SNMP and Squid for?
<P>
There are a lot of things you can do with SNMP and Squid. It can be useful
in some extent for a longer term overview of how your proxy is doing. It can
also be used as a problem solver. For example: how is it going with your
filedescriptor usage? or how much does your LRU vary along a day. Things
you can't monitor very well normally, aside from clicking at the cachemgr
frequently. Why not let MRTG do it for you?
<sect1>How can I use SNMP with Squid?
<p>
There are a number of tools that you can use to monitor Squid via SNMP. A very popular one
is MRTG, there are however a number of others. To learn what they are and to get additional
documentation, please visit the <url url="http://www.ircache.net/Cache/cache-snmp/"
name="Cache SNMP web pages">.
<sect2>MRTG
<P>
We use <url url="http://ee-staff.ethz.ch/&percnt;7eoetiker/webtools/mrtg/mrtg.html" name="MRTG">
to query Squid through its <url url="http://www.nlanr.net/Cache/cache-snmp/" name="SNMP interface">.
<P>
To get instruction on using MRTG with Squid please visit these pages:
<enum>
<item><url url="http://unary.calvin.edu/squid.html" name="Squid + MRTG graphs">
<item><url url="http://www.ircache.net/Cache/cache-snmp/" name="Cache SNMP web pages">
</enum>
<sect1>Where can I get more information/discussion about Squid and SNMP?
<P>
General Discussion: <url url="mailto:cache-snmp@ircache.net" name="cache-snmp@ircache.net">
These messages are <url url="http://www.squid-cache.org/mail-archive/cache-snmp/"
name="archived">.
<P>
Subscriptions should be sent to: <url url="mailto:cache-snmp-request@ircache.net"
name="cache-snmp-request@ircache.net">.
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Squid version 2
<sect1>What are the new features?
<P>
<itemize>
<item>HTTP/1.1 persistent connections.
<item>Lower VM usage; in-transit objects are not held fully in memory.
<item>Totally independent swap directories.
<item>Customizable error texts.
<item>FTP supported internally; no more ftpget.
<item>Asynchronous disk operations (optional, requires pthreads library).
<item>Internal icons for FTP and gopher directories.
<item>snprintf() used everywhere instead of sprintf().
<item>SNMP.
<item><url url="/urn-support.html" name="URN support">
<item>Routing requests based on AS numbers.
<item><url url="FAQ-16.html" name="Cache Digests">
<item>...and many more!
</itemize>
<sect1>How do I configure 'ssl_proxy' now?
<P>
By default, Squid connects directly to origin servers for SSL requests.
But if you must force SSL requests through a parent, first tell Squid
it can not go direct for SSL:
<verb>
acl SSL method CONNECT
never_direct allow SSL
</verb>
With this in place, Squid <em/should/ pick one of your parents to
use for SSL requests. If you want it to pick a particular parent,
you must use the <em/cache_peer_access/ configuration:
<verb>
cache_peer parent1 parent 3128 3130
cache_peer parent2 parent 3128 3130
cache_peer_access parent2 allow !SSL
</verb>
The above lines tell Squid to NOT use <em/parent2/ for SSL, so it
should always use <em/parent1/.
<sect1>Logfile rotation doesn't work with Async I/O
<P>
It is a know limitation when using Async I/O on Linux. The Linux
Threads package steals (uses internally) the SIGUSR1 signal that squid uses
to rotate logs.
<P>
In order to not disturb the threads package SIGUSR1 use is disabled in
Squid when threads is enabled on Linux.
<sect1>Adding a new cache disk
<P>
Simply add your new <em/cache_dir/ line to <em/squid.conf/, then
run <em/squid -z/ again. Squid will create swap directories on the
new disk and leave the existing ones in place.
<sect1>Squid 2 performs badly on Linux
<P>
by <url url="mailto:hno@hem.passagen.se" name="Henrik Nordstrom">
<P>
You may have enabled Asyncronous I/O with the <em/--enable-async-io/
configure option.
Be careful when using threads on Linux. Most versions of libc5 and
early versions of glibc have problems with threaded applications. I
would not recommend <em/--enable-async-io/ on Linux unless your system
uses a recent version of glibc.
<P>
You should also know that <em/--enable-async-io/ is not optimal unless
you have a very busy cache. For low loads the cache performs slightly
better without <em/--enable-async-io/.
Try recompiling Squid without <em/--enable-async-io/. If a non-threaded
Squid performs better then your libc probably can't handle threads
correctly. (don't forget "make clean" after running configure)
<sect1>How do I configure proxy authentication with Squid-2?
<label id="configuring-proxy-auth">
<P>
For Squid-2, the implementation and configuration has changed.
Authentication is now handled via external processes.
Arjan's <url url="http://www.iae.nl/users/devet/squid/proxy&lowbar;auth/" name="proxy auth page">
describes how to set it up. Some simple instructions are given below as well.
<enum>
<item>
We assume you have configured an ACL entry with proxy_auth, for example:
<verb>
acl foo proxy_auth REQUIRED
http_access allow foo
</verb>
<item>
You will need to compile and install an external authenticator program.
Most people will want to use <em/ncsa_auth/. The source for this program
is included in the source distribution, in the <em>auth_modules/NCSA</em>
directory.
<verb>
% cd auth_modules/NCSA
% make
% make install
</verb>
You should now have an <em/ncsa_auth/ program in the same directory where
your <em/squid/ binary lives.
<item>
You may need to create a password file. If you have been using
proxy authentication before, you probably already have such a file.
You can get <url url="../../htpasswd/" name="apache's htpasswd program">
from our server. Pick a pathname for your password file. We will assume
you will want to put it in the same directory as your squid.conf.
<item>
Configure the external authenticator in <em/squid.conf/.
For <em/ncsa_auth/ you need to give the pathname to the executable and
the password file as an argument. For example:
<verb>
authenticate_program /usr/local/squid/bin/ncsa_auth /usr/local/squid/etc/passwd
</verb>
</enum>
<P>
After all that, you should be able to start up Squid. If we left something out, or
haven't been clear enough, please let us know (squid-faq@ircache.net).
<sect1>Why does proxy-auth reject all users with Squid-2.2?
<P>
The ACL for proxy-authentication has changed from:
<verb>
acl foo proxy_auth timeout
</verb>
to:
<verb>
acl foo proxy_auth username
</verb>
Please update your ACL appropriately - a username of <em/REQUIRED/ will permit
all valid usernames. The timeout is now specified with the configuration
option:
<verb>
authenticate_ttl timeout
</verb>
<sect1>Delay Pools
<P>
by <url url="mailto:luyer@ucs.uwa.edu.au" name="David Luyer">.
<P>
<bf>
The information here is current for version 2.2. It is strongly
recommended that you use at least Squid 2.2 if you wish to use delay pools.
</bf>
<P>
Delay pools provide a way to limit the bandwidth of certain requests
based on any list of criteria. The idea came from a Western Australian
university who wanted to restrict student traffic costs (without
affecting staff traffic, and still getting cache and local peering hits
at full speed). There was some early Squid 1.0 code by Central Network
Services at Murdoch University, which I then developed (at the University
of Western Australia) into a much more complex patch for Squid 1.0
called ``DELAY_HACK.'' I then tried to code it in a much cleaner style
and with slightly more generic options than I personally needed, and
called this ``delay pools'' in Squid 2. I almost completely recoded
this in Squid 2.2 to provide the greater flexibility requested by people
using the feature.
<P>
To enable delay pools features in Squid 2.2, you must use the
<em>--enable-delay-pools</em> configure option before compilation.
<P>
Terminology for this FAQ entry:
<descrip>
<tag/pool/
a collection of bucket groups as appropriate to a given class
<tag/bucket group/
a group of buckets within a pool, such as the per-host bucket
group, the per-network bucket group or the aggregate bucket
group (the aggregate bucket group is actually a single bucket)
<tag/bucket/
an individual delay bucket represents a traffic allocation
which is replenished at a given rate (up to a given limit) and
causes traffic to be delayed when empty
<tag/class/
the class of a delay pool determines how the delay is applied,
ie, whether the different client IPs are treated seperately or
as a group (or both)
<tag/class 1/
a class 1 delay pool contains a single unified bucket which is
used for all requests from hosts subject to the pool
<tag/class 2/
a class 2 delay pool contains one unified bucket and 255
buckets, one for each host on an 8-bit network (IPv4 class C)
<tag/class 3/
contains 255 buckets for the subnets in a 16-bit network, and
individual buckets for every host on these networks (IPv4 class
B)
</descrip>
<P>
Delay pools allows you to limit traffic for clients or client groups,
with various features:
<itemize>
<item>
can specify peer hosts which aren't affected by delay pools,
ie, local peering or other 'free' traffic (with the
<em>no-delay</em> peer option).
<item>
delay behavior is selected by ACLs (low and high priority
traffic, staff vs students or student vs authenticated student
or so on).
<item>
each group of users has a number of buckets, a bucket has an
amount coming into it in a second and a maximum amount it can
grow to; when it reaches zero, objects reads are deferred
until one of the object's clients has some traffic allowance.
<item>
any number of pools can be configured with a given class and
any set of limits within the pools can be disabled, for example
you might only want to use the aggregate and per-host bucket
groups of class 3, not the per-network one.
</itemize>
<P>
This allows options such as creating a number of class 1 delay pools
and allowing a certain amount of bandwidth to given object types (by
using URL regular expressions or similar), and many other uses I'm sure
I haven't even though of beyond the original fair balancing of a
relatively small traffic allocation across a large number of users.
<P>
There are some limitations of delay pools:
<itemize>
<item>
delay pools are incompatible with slow aborts; quick abort
should be set fairly low to prevent objects being retrived at
full speed once there are no clients requesting them (as the
traffic allocation is based on the current clients, and when
there are no clients attached to the object there is no way to
determine the traffic allocation).
<item>
delay pools only limits the actual data transferred and is not
inclusive of overheads such as TCP overheads, ICP, DNS, icmp
pings, etc.
<item>
it is possible for one connection or a small number of
connections to take all the bandwidth from a given bucket and
the other connections to be starved completely, which can be a
major problem if there are a number of large objects being
transferred and the parameters are set in a way that a few
large objects will cause all clients to be starved (potentially
fixed by a currently experimental patch).
</itemize>
<sect2>How can I limit Squid's total bandwidth to, say, 512 Kbps?
<P>
<verb>
acl all src 0.0.0.0/0.0.0.0 # might already be defined
delay_pools 1
delay_class 1 1
delay_access 1 allow all
delay_parameters 1 64000/64000 # 512 kbits == 64 kbytes per second
</verb>
<bf>
For an explanation of these tags please see the configuration file.
</bf>
<P>
The 1 second buffer (max = restore = 64kbytes/sec) is because a limit
is requested, and no responsiveness to a busrt is requested. If you
want it to be able to respond to a burst, increase the aggregate_max to
a larger value, and traffic bursts will be handled. It is recommended
that the maximum is at least twice the restore value - if there is only
a single object being downloaded, sometimes the download rate will fall
below the requested throughput as the bucket is not empty when it comes
to be replenished.
<sect2>How to limit a single connection to 128 Kbps?
<P>
You can not limit a single HTTP request's connection speed. You
<EM>can</EM> limit individual hosts to some bandwidth rate. To limit a
specific host, define an <EM>acl</EM> for that host and use the example
above. To limit a group of hosts, then you must use a delay pool of
class 2 or 3. For example:
<verb>
acl only128kusers src 192.168.1.0/255.255.192.0
acl all src 0.0.0.0/0.0.0.0
delay_pools 1
delay_class 1 3
delay_access 1 allow only128kusers
delay_access 1 deny all
delay_parameters 1 64000/64000 -1/-1 16000/64000
</verb>
<bf>
For an explanation of these tags please see the configuration file.
</bf>
The above gives a solution where a cache is given a total of 512kbits to
operate in, and each IP address gets only 128kbits out of that pool.
<sect2>How do you personally use delay pools?
<P>
We have six local cache peers, all with the options 'proxy-only no-delay'
since they are fast machines connected via a fast ethernet and microwave (ATM)
network.
<P>
For our local access we use a dstdomain ACL, and for delay pool exceptions
we use a dst ACL as well since the delay pool ACL processing is done using
'fast lookups', which means (among other things) it won't wait for a DNS
lookup if it would need one.
<P>
Our proxy has two virtual interfaces, one which requires student
authentication to connect from machines where a department is not
paying for traffic, and one which uses delay pools. Also, users of the
main Unix system are allowed to choose slow or fast traffic, but must
pay for any traffic they do using the fast cache. Ident lookups are
disabled for accesses through the slow cache since they aren't needed.
Slow accesses are delayed using a class 3 delay pool to give fairness
between departments as well as between users. We recognize users of
Lynx on the main host are grouped together in one delay bucket but they
are mostly viewing text pages anyway, so this isn't considered a
serious problem. If it was we could take those hosts into a class 1
delay pool and give it a larger allocation.
<P>
I prefer using a slow restore rate and a large maximum rate to give
preference to people who are looking at web pages as their individual
bucket fills while they are reading, and those downloading large
objects are disadvantaged. This depends on which clients you believe
are more important. Also, one individual 8 bit network (a residential
college) have paid extra to get more bandwidth.
<P>
The relevant parts of my configuration file are (IP addresses, etc, all
changed):
<verb>
# ACL definitions
# Local network definitions, domains a.net, b.net
acl LOCAL-NET dstdomain a.net b.net
# Local network; nets 64 - 127. Also nearby network class A, 10.
acl LOCAL-IP dst 192.168.64.0/255.255.192.0 10.0.0.0/255.0.0.0
# Virtual i/f used for slow access
acl virtual_slowcache myip 192.168.100.13/255.255.255.255
# All permitted slow access, nets 96 - 127
acl slownets src 192.168.96.0/255.255.224.0
# Special 'fast' slow access, net 123
acl fast_slow src 192.168.123.0/255.255.255.0
# User hosts
acl my_user_hosts src 192.168.100.2/255.255.255.254
# "All" ACL
acl all src 0.0.0.0/0.0.0.0
# Don't need ident lookups for billing on (free) slow cache
ident_lookup_access allow my_user_hosts !virtual_slowcache
ident_lookup_access deny all
# Security access checks
http_access [...]
# These people get in for slow cache access
http_access allow virtual_slowcache slownets
http_access deny virtual_slowcache
# Access checks for main cache
http_access [...]
# Delay definitions (read config file for clarification)
delay_pools 2
delay_initial_bucket_level 50
delay_class 1 3
delay_access 1 allow virtual_slowcache !LOCAL-NET !LOCAL-IP !fast_slow
delay_access 1 deny all
delay_parameters 1 8192/131072 1024/65536 256/32768
delay_class 2 2
delay_access 2 allow virtual_slowcache !LOCAL-NET !LOCAL-IP fast_slow
delay_access 2 deny all
delay_parameters 2 2048/65536 512/32768
</verb>
<P>
The same code is also used by a some of departments using class 2 delay
pools to give them more flexibility in giving different performance to
different labs or students.
<sect2>Where else can I find out about delay pools?
<P>
This is also pretty well documented in the configuration file, with
examples. Since people seem to loose their config files, here's a copy
of the relevant section.
<verb>
# DELAY POOL PARAMETERS (all require DELAY_POOLS compilation option)
# -----------------------------------------------------------------------------
# TAG: delay_pools
# This represents the number of delay pools to be used. For example,
# if you have one class 2 delay pool and one class 3 delays pool, you
# have a total of 2 delay pools.
#
# To enable this option, you must use --enable-delay-pools with the
# configure script.
#delay_pools 0
# TAG: delay_class
# This defines the class of each delay pool. There must be exactly one
# delay_class line for each delay pool. For example, to define two
# delay pools, one of class 2 and one of class 3, the settings above
# and here would be:
#
#delay_pools 2 # 2 delay pools
#delay_class 1 2 # pool 1 is a class 2 pool
#delay_class 2 3 # pool 2 is a class 3 pool
#
# The delay pool classes are:
#
# class 1 Everything is limited by a single aggregate
# bucket.
#
# class 2 Everything is limited by a single aggregate
# bucket as well as an "individual" bucket chosen
# from bits 25 through 32 of the IP address.
#
# class 3 Everything is limited by a single aggregate
# bucket as well as a "network" bucket chosen
# from bits 17 through 24 of the IP address and a
# "individual" bucket chosen from bits 17 through
# 32 of the IP address.
#
# NOTE: If an IP address is a.b.c.d
# -> bits 25 through 32 are "d"
# -> bits 17 through 24 are "c"
# -> bits 17 through 32 are "c * 256 + d"
# TAG: delay_access
# This is used to determine which delay pool a request falls into.
# The first matched delay pool is always used, ie, if a request falls
# into delay pool number one, no more delay are checked, otherwise the
# rest are checked in order of their delay pool number until they have
# all been checked. For example, if you want some_big_clients in delay
# pool 1 and lotsa_little_clients in delay pool 2:
#
#delay_access 1 allow some_big_clients
#delay_access 1 deny all
#delay_access 2 allow lotsa_little_clients
#delay_access 2 deny all
# TAG: delay_parameters
# This defines the parameters for a delay pool. Each delay pool has
# a number of "buckets" associated with it, as explained in the
# description of delay_class. For a class 1 delay pool, the syntax is:
#
#delay_parameters pool aggregate
#
# For a class 2 delay pool:
#
#delay_parameters pool aggregate individual
#
# For a class 3 delay pool:
#
#delay_parameters pool aggregate network individual
#
# The variables here are:
#
# pool a pool number - ie, a number between 1 and the
# number specified in delay_pools as used in
# delay_class lines.
#
# aggregate the "delay parameters" for the aggregate bucket
# (class 1, 2, 3).
#
# individual the "delay parameters" for the individual
# buckets (class 2, 3).
#
# network the "delay parameters" for the network buckets
# (class 3).
#
# A pair of delay parameters is written restore/maximum, where restore is
# the number of bytes (not bits - modem and network speeds are usually
# quoted in bits) per second placed into the bucket, and maximum is the
# maximum number of bytes which can be in the bucket at any time.
#
# For example, if delay pool number 1 is a class 2 delay pool as in the
# above example, and is being used to strictly limit each host to 64kbps
# (plus overheads), with no overall limit, the line is:
#
#delay_parameters 1 -1/-1 8000/8000
#
# Note that the figure -1 is used to represent "unlimited".
#
# And, if delay pool number 2 is a class 3 delay pool as in the above
# example, and you want to limit it to a total of 256kbps (strict limit)
# with each 8-bit network permitted 64kbps (strict limit) and each
# individual host permitted 4800bps with a bucket maximum size of 64kb
# to permit a decent web page to be downloaded at a decent speed
# (if the network is not being limited due to overuse) but slow down
# large downloads more significantly:
#
#delay_parameters 2 32000/32000 8000/8000 600/64000
#
# There must be one delay_parameters line for each delay pool.
# TAG: delay_initial_bucket_level (percent, 0-100)
# The initial bucket percentage is used to determine how much is put
# in each bucket when squid starts, is reconfigured, or first notices
# a host accessing it (in class 2 and class 3, individual hosts and
# networks only have buckets associated with them once they have been
# "seen" by squid).
#
#delay_initial_bucket_level 50
</verb>
<sect1>Can I preserve my cache when upgrading from 1.1 to 2?
<P>
At the moment we do not have a script which will convert your cache
contents from the 1.1 to the Squid-2 format. If enough people ask for
one, then somebody will probably write such a script.
<P>
If you like, you can configure a new Squid-2 cache with your old
Squid-1.1 cache as a sibling. After a few days, weeks, or
however long you want to wait, shut down the old Squid cache.
If you want to force-load your new cache with the objects
from the old cache, you can try something like this:
<enum>
<item>
Install Squid-2 and configure it to have the same
amount of disk space as your Squid-1 cache, even
if there is not currently that much space free.
<item>
Configure Squid-2 with Squid-1 as a parent cache.
You might want to enable <em/never_direct/ on
the Squid-2 cache so that all of Squid-2's requests
go through Squid-1.
<item>
Enable the <ref id="purging-objects" name="PURGE method"> on Squid-1.
<item>
Set the refresh rules on Squid-1 to be very liberal so that it
does not generate IMS requests for cached objects.
<item>
Create a list of all the URLs in the Squid-1 cache. These can
be extracted from the access.log, store.log and swap logs.
<item>
For every URL in the list, request the URL from Squid-2, and then
immediately send a PURGE request to Squid-1.
<item>
Eventually Squid-2 will have all the objects, and Squid-1
will be empty.
</enum>
<sect1>Customizable Error Messages
<P>
Squid-2 lets you customize your error messages. The source distribution
includes error messages in different languages. You can select the
language with the configure option:
<verb>
--enable-err-language=lang
</verb>
<P>
Furthermore, you can rewrite the error message template files if you like.
This list describes the tags which Squid will insert into the messages:
<descrip>
<tag/%B/ URL with FTP %2f hack
<tag/%c/ Squid error code
<tag/%d/ seconds elapsed since request received
<tag/%e/ errno
<tag/%E/ strerror()
<tag/%f/ FTP request line
<tag/%F/ FTP reply line
<tag/%g/ FTP server message
<tag/%h/ cache hostname
<tag/%H/ server host name
<tag/%i/ client IP address
<tag/%I/ server IP address
<tag/%L/ contents of <em/err_html_text/ config option
<tag/%M/ Request Method
<tag/%p/ URL port \#
<tag/%P/ Protocol
<tag/%R/ Full HTTP Request
<tag/%S/ squid signature from ERR_SIGNATURE
<tag/%s/ caching proxy software with version
<tag/%t/ local time
<tag/%T/ UTC
<tag/%U/ URL without password
<tag/%u/ URL without password, %2f added to path
<tag/%w/ cachemgr email address
<tag/%z/ dns server error message
</descrip>
<sect1>My squid.conf from version 1.1 doesn't work!
<P>
Yes, a number of configuration directives have been renamed.
Here are some of them:
<descrip>
<tag/cache_host/
This is now called <em/cache_peer/. The old term does not
really describe what you are configuring, but the new name
tells you that you are configuring a peer for your cache.
<tag/cache_host_domain/
Renamed to <em/cache_peer_domain/.
<tag/local_ip, local_domain/
The functaionality provided by these directives is now implemented
as access control lists. You will use the <em/always_direct/ and
<em/never_direct/ options. The new <em/squid.conf/ file has some
examples.
<tag/cache_stoplist/
This directive also has been reimplemented with access control
lists. You will use the <em/no_cache/ option. For example:
<verb>
acl Uncachable url_regex cgi ?
no_cache deny Uncachable
</verb>
<tag/cache_swap/
This option used to specify the cache disk size. Now you
specify the disk size on each <em/cache_dir/ line.
<tag/cache_host_acl/
This option has been renamed to <em/cache_peer_access/
<bf/and/ the syntax has changed. Now this option is a
true access control list, and you must include an
<em/allow/ or <em/deny/ keyword. For example:
<verb>
acl that-AS dst_as 1241
cache_peer_access thatcache.thatdomain.net allow that-AS
cache_peer_access thatcache.thatdomain.net deny all
</verb>
This example sends requests to your peer <em/thatcache.thatdomain.net/
only for origin servers in Autonomous System Number 1241.
<tag/units/
In Squid-1.1 many of the configuration options had implied
units associated with them. For example, the <em/connect_timeout/
value may have been in seconds, but the <em/read_timeout/ value
had to be given in minutes. With Squid-2, these directives take
units after the numbers, and you will get a warning if you
leave off the units. For example, you should now write:
<verb>
connect_timeout 120 seconds
read_timeout 15 minutes
</verb>
</descrip>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>httpd-accelerator mode
<sect1>What is the httpd-accelerator mode?
<label id="what-is-httpd-accelerator">
<P>
Occasionally people have trouble understanding accelerators and
proxy caches, usually resulting from mixed up interpretations of
"incoming" and ``outgoing" data. I think in terms of requests (i.e.,
an outgoing request is from the local site out to the big bad
Internet). The data received in reply is incoming, of course.
Others think in the opposite sense of ``a request for incoming data".
<P>
An accelerator caches incoming requests for outgoing data (i.e.,
that which you publish to the world). It takes load away from your
HTTP server and internal network. You move the server away from
port 80 (or whatever your published port is), and substitute the
accelerator, which then pulls the HTTP data from the ``real"
HTTP server (only the accelerator needs to know where the real
server is). The outside world sees no difference (apart from an
increase in speed, with luck).
<P>
Quite apart from taking the load of a site's normal web server,
accelerators can also sit outside firewalls or other network
bottlenecks and talk to HTTP servers inside, reducing traffic across
the bottleneck and simplifying the configuration. Two or more
accelerators communicating via ICP can increase the speed and
resilience of a web service to any single failure.
<P>
The Squid redirector can make one accelerator act as a single
front-end for multiple servers. If you need to move parts of your
filesystem from one server to another, or if separately administered
HTTP servers should logically appear under a single URL hierarchy,
the accelerator makes the right thing happen.
<P>
If you wish only to cache the ``rest of the world" to improve local users
browsing performance, then accelerator mode is irrelevant. Sites which
own and publish a URL hierarchy use an accelerator to improve other
sites' access to it. Sites wishing to improve their local users' access
to other sites' URLs use proxy caches. Many sites, like us, do both and
hence run both.
<P>
Measurement of the Squid cache and its Harvest counterpart suggest an
order of magnitude performance improvement over CERN or other widely
available caching software. This order of magnitude performance
improvement on hits suggests that the cache can serve as an httpd
accelerator, a cache configured to act as a site's primary httpd server
(on port 80), forwarding references that miss to the site's real httpd
(on port 81).
<P>
In such a configuration, the web administrator renames all
non-cachable URLs to the httpd's port (81). The cache serves
references to cachable objects, such as HTML pages and GIFs, and
the true httpd (on port 81) serves references to non-cachable
objects, such as queries and cgi-bin programs. If a site's usage
characteristics tend toward cachable objects, this configuration
can dramatically reduce the site's web workload.
<P>
Note that it is best not to run a single <em/squid/ process as
both an httpd-accelerator and a proxy cache, since these two modes
will have different working sets. You will get better performance
by running two separate caches on separate machines. However, for
compatability with how administrators are accustomed to running
other servers that provide both proxy and Web serving capability
(eg, CERN), the Squid supports operation as both a proxy and
an accelerator if you set the <tt/httpd_accel_with_proxy/
variable to <tt/on/ inside your <em/squid.conf/
configuration file.
<sect1>How do I set it up?
<P>
First, you have to tell Squid to listen on port 80 (usually), so set the 'http_port'
option:
<verb>
http_port 80
</verb>
<P>
Next, you need to move your normal HTTP server to another port and/or
another machine. If you want to run your HTTP server on the same
machine, then it can not also use port 80 (except see the next FAQ entry
below). A common choice is port 81. Configure squid as follows:
<verb>
httpd_accel_host localhost
httpd_accel_port 81
</verb>
Alternatively, you could move the HTTP server to another machine and leave it
on port 80:
<verb>
httpd_accel_host otherhost.foo.com
httpd_accel_port 80
</verb>
<P>
You should now be able to start Squid and it will serve requests as a HTTP server.
<P>
If you are using Squid has an accelerator for a virtual host system, then you
need to specify
<verb>
httpd_accel_host virtual
</verb>
<P>
Finally, if you want Squid to also accept <em/proxy/ requests (like it used to
before you turned it into an accelerator), then you need to enable this option:
<verb>
httpd_accel_with_proxy on
</verb>
<sect1>When using an httpd-accelerator, the port number for redirects is wrong
<P>
Yes, this is because you probably moved your real httpd to port 81. When
your httpd issues a redirect message (e.g. 302 Moved Temporarily), it knows
it is not running on the standard port (80), so it inserts <em/:81/ in the
redirected URL. Then, when the client requests the redirected URL, it
bypasses the accelerator.
<P>
How can you fix this?
<P>
One way is to leave your httpd running on port 80, but bind the httpd
socket to a <em/specific/ interface, namely the loopback interface.
With <url url="http://www.apache.org/" name="Apache"> you can do it
like this in <em/httpd.conf/:
<verb>
Port 80
BindAddress 127.0.0.1
</verb>
Then, in your <em/squid.conf/ file, you must specify the loopback address
as the accelerator:
<verb>
httpd_accel_host 127.0.0.1
httpd_accel_port 80
</verb>
<P>
Note, you probably also need to add an <em>/etc/hosts</em> entry
of 127.0.0.1 for your server hostname. Otherwise, Squid may
get stuck in a forwarding loop.
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Related Software
<sect1>Clients
<sect2>Wget
<P>
<url url="ftp://gnjilux.cc.fer.hr/pub/unix/util/wget/" name="Wget"> is a
command-line Web client. It supports recursive retrievals and
HTTP proxies.
<sect2>echoping
<P>
If you want to test your Squid cache in batch (from a cron command, for
instance), you can use the <url
url="ftp://ftp.internatif.org/pub/unix/echoping/" name="echoping"> program,
which will tell you (in plain text or via an exit code) if the cache is
up or not, and will indicate the response times.
<sect1>Logfile Analysis
<p>
Rather than maintain the same list in two places, please see the
<url url="/Scripts/" name="Logfile Analysis Scripts"> page
on the Web server.
<sect1>Configuration Tools
<sect2>3Dhierarchy.pl
<P>
Kenichi Matsui has a simple perl script which generates a 3D hierarchy map (in VRML) from
squid.conf.
<url url="ftp://ftp.nemoto.ecei.tohoku.ac.jp/pub/Net/WWW/VRML/converter/3Dhierarchy.pl" name="3Dhierarchy.pl">.
<sect1>Squid add-ons
<sect2>transproxy
<P>
<url url="http://www.transproxy.nlc.net.au/" name="transproxy">
is a program used in conjunction with the Linux Transparent Proxy
networking feature, and ipfwadm, to transparently proxy HTTP and
other requests. Transproxy is written by <url url="mailto:john@nlc.net.au" name="John Saunders">.
<sect2>Iain's redirector package
<P>
A <url url="ftp://ftp.sbs.de/pub/www/cache/redirector/redirector.tar.gz" name="redirector package"> from
<url url="mailto:iain@ecrc.de" name="Iain Lea"> to allow Intranet (restricted) or Internet
(full) access with URL deny and redirection for sites that are not deemed
acceptable for a userbase all via a single proxy port.
<sect2>Junkbusters
<P>
<url url="http://internet.junkbuster.com" name="Junkbusters"> Corp has a
copyleft privacy-enhancing, ad-blocking proxy server which you can
use in conjunction with Squid.
<sect2>Squirm
<P>
<url url="http://www.senet.com.au/squirm/" name="Squirm"> is a configurable, efficient redirector for Squid
by <url url="mailto:chris@senet.com.au" name="Chris Foote">. Features:
<itemize>
<item> Very fast
<item> Virtually no memory usage
<item> It can re-read it's config files while running by sending it a HUP signal
<item> Interactive test mode for checking new configs
<item> Full regular expression matching and replacement
<item> Config files for patterns and IP addresses.
<item> If you mess up the config file, Squirm runs in Dodo Mode so your squid keeps working :-)
</itemize>
<sect2>chpasswd.cgi
<P>
<url url="mailto:orso@ineparnet.com.br" name="Pedro L Orso">
has adapated the Apache's <url url="../../htpasswd/" name="htpasswd"> into a CGI program
called <url url="http://www.ineparnet.com.br/orso/index.html" name="chpasswd.cgi">.
<sect2>jesred
<P>
<url url="http://ivs.cs.uni-magdeburg.de/~elkner/webtools/jesred/" name="jesred">
by <url url="mailto:elkner@wotan.cs.Uni-Magdeburg.DE" name="Jens Elkner">.
<sect2>squidGuard
<P>
<url url="http://ftp.ost.eltele.no/pub/www/proxy/" name="squidGuard"> is
a free (GPL), flexible and efficient filter and
redirector program for squid. It lets you define multiple access
rules with different restrictions for different user groups on a squid
cache. squidGuard uses squid standard redirector interface.
<sect2>Central Squid Server
<P>
The <url url="http://www.senet.com.au/css/" name="Smart Neighbour">
(or 'Central Squid Server' - CSS) is a cut-down
version of Squid without HTTP or object caching functionality. The
CSS deals only with ICP messages. Instead of caching objects, the CSS
records the availability of objects in each of its neighbour caches.
Caches that have smart neighbours update each smart neighbour with the
status of their cache by sending ICP_STORE_NOTIFY/ICP_RELEASE_NOTIFY
messages upon storing/releasing an object from their cache. The CSS
maintains an up to date 'object map' recording the availability of
objects in its neighbouring caches.
<sect1>Ident Servers
<p>
For
<url url="http://info.ost.eltele.no/freeware/identd/" name="Windows NT">,
<url url="http://identd.sourceforge.net/" name="Windows 95/98">,
and
<url url="http://www2.lysator.liu.se/~pen/pidentd/" name="Unix">.
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>DISKD
<sect1>What is DISKD?
<p>
DISKD refers to some features in Squid-2.4 to improve Disk I/O performance.
The basic idea is that each <em/cache_dir/ has its own <em/diskd/ child process.
The diskd process performs all disk I/O operations (open, close, read, write, unlink)
for the cache_dir. Message queues are used to send requests and responses between
the Squid and diskd processes. Shared memory is used for chunks of data to
be read and written.
<sect1>Does it perform better?
<p>
Yes. We benchmarked Squid-2.4 with DISKD at the
<url url="http://polygraph.ircache.net/Results/bakeoff-2/" name="Second IRCache Bake-Off">.
The results are also described <url url="/Benchmarking/bakeoff-02/" name="here">.
At the bakeoff, we got 160 req/sec with diskd. Without diskd, we'd have gotten about 40 req/sec.
<sect1>What do I need to use it?
<p>
<enum>
<item>
Squid-2.4
<item>
Your operating system must support message queues.
<item>
Your operating system must support shared memory.
</enum>
<sect1>If I use DISKD, do I have to wipe out my current cache?
<p>
No. Diskd uses the same storage scheme as the standard "UFS"
type. It only changes how I/O is performed.
<sect1>How do I configure message queues?
<p>
Most Unix operating systems have message queue support
by default. One way to check is to see if you have
an <em/ipcs/ command.
<p>
However, you will likely need to increase the message
queue parameters for Squid. Message queue implementations
normally have the following parameters:
<descrip>
<tag/MSGMNB/
Maximum number of bytes in a single queue.
<tag/MSGMNI/
Maximum number of message queue identifiers.
<tag/MSGSEG/
Maximum number of message segments.
<tag/MSGMAX/
Maximum size of a message segment.
<tag/MSGTQL/
Maximum number of messages in the whole system.
</descrip>
<p>
The messages between Squid and diskd are 32 bytes. Thus, MSGMAX
should be 32 or greater. You may want to set it to a larger
value, just to be safe.
<p>
We'll have two queues for each <em/cache_dir/ -- one in each direction.
So, MSGMNI needs to be at least two times the number of <em/cache_dir/'s.
<p>
MSGMNB and MSGTQL affect how many messages can be in the queues
at one time. I've found that 75 messages per queue is about
the limit of decent performance. Thus, MSGMNB must be
at least 75*MSGMAX, and MSGTQL must be at least 75 times
the number of <em/cache_dir/'s.
<sect2>FreeBSD
<p>
Your kernel must have
<verb>
options SYSVMSG
</verb>
<p>
You can set the parameters in the kernel as follows. This is just
an example. Make sure the values are appropriate for your system:
<verb>
options MSGMNB=16384 # max # of bytes in a queue
options MSGMNI=41 # number of message queue identifiers
options MSGSEG=2049 # number of message segments
options MSGSSZ=64 # size of a message segment
options MSGTQL=512 # max messages in system
</verb>
<sect2>Digital Unix
<p>
Message queue support seems to be in the kernel
by default. Setting the options is as follows:
<verb>
options MSGMNB="8192" # max # bytes on queue
options MSGMNI="31" # # of message queue identifiers
options MSGMAX="2049" # max message size
options MSGTQL="1024" # # of system message headers
</verb>
<p>
by <url url="mailto:B.C.Phillips at massey dot ac dot nz" name="Brenden Phillips">
<p>
If you have a newer version (DU64), then you can probably use
<em/sysconfig/ instead. To see what the current IPC settings are run
<verb>
# sysconfig -q ipc
</verb>
To change them make a file like this called ipc.stanza:
<verb>
ipc:
msg-max = 2049
msg-mni = 31
msg-tql = 1024
msg-mnb = 8192
</verb>
then run
<verb>
# sysconfigdb -a -f ipc.stanza
</verb>
You have to reboot for the change to take effect.
<sect2>Linux
<p>
In my limited browsing on Linux, I didn't see any way to change
message queue parameters except to modify the include files
and build a new kernel. On my system, the file
is <em>/usr/src/linux/include/linux/msg.h</em>.
<sect2>Solaris
<p>
Refer to <url url="http://www.sunworld.com/sunworldonline/swol-11-1997/swol-11-insidesolaris.html"
name="Demangling Message Queues"> in Sunworld Magazine.
<p>
I don't think the above article really tells you how to set the parameters.
You do it in <em>/etc/system</em> with lines like this:
<verb>
set msgsys:msginfo_msgmax=2049
set msgsys:msginfo_msgmnb=8192
set msgsys:msginfo_msgmni=31
set msgsys:msginfo_msgssz=64
set msgsys:msginfo_msgtql=1024
</verb>
<p>
Of course, you must reboot whenever you modify <em>/etc/system</em>
before changes take effect.
<sect1>How do I configure shared memory?
<p>
Shared memory uses a set of parameters similar to the ones for message
queues. The Squid DISKD implementation uses one shared memory area
for each cache_dir. Each shared memory area is about
800 kilobytes in size. You may need to modify your system's
shared memory parameters:
<p>
<descrip>
<tag/SHMSEG/
Maximum number of shared memory segments per process.
<tag/SHMMNI/
Maximum number of shared memory segments for the whole system.
<tag/SHMMAX/
Largest shared memory segment size allowed.
<tag/SHMALL/
Total amount of shared memory that can be used.
</descrip>
<p>
For Squid and DISKD, <em/SHMMNI/ and <em/SHMMNI/ must be greater than
or equal to the number of <em/cache_dir/'s that you have. <em/SHMMAX/
must be at least 800 kilobytes. <em/SHMALL/ must be at least
<em/SHMMAX/ 800 kilobytes multiplied by the number of <em/cache_dir/'s.
<sect2>FreeBSD
<p>
Your kernel must have
<verb>
options SYSVSHM
</verb>
<p>
You can set the parameters in the kernel as follows. This is just
an example. Make sure the values are appropriate for your system:
<verb>
options SHMSEG=16 # max shared mem id's per process
options SHMMNI=32 # max shared mem id's per system
options SHMMAX=2097152 # max shared memory segment size (bytes)
options SHMALL=4096 # max amount of shared memory (pages)
</verb>
<sect2>Digital Unix
<p>
Message queue support seems to be in the kernel
by default. Setting the options is as follows:
<verb>
options SHMSEG="16" # max shared mem id's per process
options SHMMNI="32" # max shared mem id's per system
options SHMMAX="2097152" # max shared memory segment size (bytes)
options SHMALL=4096 # max amount of shared memory (pages)
</verb>
<p>
by <url url="mailto:B.C.Phillips at massey dot ac dot nz" name="Brenden Phillips">
<p>
If you have a newer version (DU64), then you can probably use
<em/sysconfig/ instead. To see what the current IPC settings are run
<verb>
# sysconfig -q ipc
</verb>
To change them make a file like this called ipc.stanza:
<verb>
ipc:
shm-seg = 16
shm-mni = 32
shm-max = 2097152
shm-all = 4096
</verb>
then run
<verb>
# sysconfigdb -a -f ipc.stanza
</verb>
You have to reboot for the change to take effect.
<sect2>Linux
<p>
In my limited browsing on Linux, I didn't see any way to change
shared memory parameters except to modify the include files
and build a new kernel. On my system, the file
is <em>/usr/src/linux/include/asm-i386/shmparam.h</em>
<p>
Oh, it looks like you can change <em/SHMMAX/ by writing
the file <em>/proc/sys/kernel/shmmax</em>.
<sect2>Solaris
<p>
Refer to
<url url="http://www.sunworld.com/swol-09-1997/swol-09-insidesolaris.html"
name="Shared memory uncovered">
in Sunworld Magazine.
<p>
To set the values, you can put these lines in <em>/etc/system</em>:
<verb>
set shmsys:shminfo_shmmax=2097152
set shmsys:shminfo_shmmni=32
set shmsys:shminfo_shmseg=16
</verb>
<sect1>Sometimes shared memory and message queues aren't released when Squid exits.
<p>
Yes, this is a little problem sometimes. Seems like the operating system
gets confused and doesn't always release shared memory and message
queue resources when processes exit, especially if they exit abnormally.
To fix it you can ``manually'' clear the resources with the <em/ipcs/ command.
Add this command into your <em/RunCache/ or <em/squid_start/
script:
<verb>
ipcs | grep '^[mq]' | awk '{printf "ipcrm -%s %s\n", $1, $2}' | /bin/sh
</verb>
<sect1>What are the Q1 and Q2 parameters?
<p>
In the source code, these are called <em/magic1/ and <em/magic2/.
These numbers refer to the number of oustanding requests on a message
queue. They are specified on the <em/cache_dir/ option line, after
the L1 and L2 directories:
<verb>
cache_dir diskd -1 /cache1 1024 16 256 64 72
</verb>
<p>
If there are more than Q1 messages outstanding, then the main Squid
process ``blocks'' for a little bit until the diskd process services
some of the messages and sends back some replies.
<p>
If there are more than Q2 messages outstanding, then Squid will
intentionally fail to open disk files for reading and writing.
This is a load-shedding mechanism. If your cache gets really really
busy and the disks can not keep up, Squid bypasses the disks until
the load goes down again.
<p>
Reasonable values for Q1 and Q2 are 64 and 72, respectively.
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Authentication
<sect1>How does Proxy Authentication work in Squid?
<p>
<em>Note: The information here is current for version 2.4.</em>
<p>
Users will be authenticated if squid is configured to use <em/proxy_auth/
ACLs (see next question).
<p>
Browsers send the user's authentication credentials in the
<em/Authorization/ request header.
<p>
If Squid gets a request and the <em/http_access/ rule list
gets to a <em/proxy_auth/ ACL, Squid looks for the <em/Authorization/
header. If the header is present, Squid decodes it and extracts
a username and password.
<p>
If the header is missing, Squid returns
an HTTP reply with status 407 (Proxy Authentication Required).
The user agent (browser) receives the 407 reply and then prompts
the user to enter a name and password. The name and password are
encoded, and sent in the <em/Authorization/ header for subsequent
requests to the proxy.
<p>
Authentication is actually performed outside of main Squid process.
When Squid starts, it spawns a number of authentication subprocesses.
These processes read usernames and passwords on stdin, and reply
with "OK" or "ERR" on stdout. This technique allows you to use
a number of different authentication schemes, although currently
you can only use one scheme at a time.
<p>
The Squid source code comes with a few authentcation processes.
These include:
<itemize>
<item>
LDAP: Uses the Lightweight Directory Access Protocol
<item>
NCSA: Uses an NCSA-style username and password file.
<item>
MSNT: Uses a Windows NT authentication domain.
<item>
PAM: Uses the Linux Pluggable Authentication Modules scheme.
<item>
SMB: Uses a SMB server like Windows NT or Samba.
<item>
getpwam: Uses the old-fashioned Unix password file.
</itemize>
<p>
In order to authenticate users, you need to compile and install
one of the supplied authentication modules, one of <url url="http://www.squid-cache.org/related-software.html#auth" name="the others">,
or supply your own.
<p>
You tell Squid which authentcation program to use with the
<em/authenticate_program/ option in squid.conf. You specify
the name of the program, plus any command line options if
necessary. For example:
<verb>
authenticate_program /usr/local/squid/bin/ncsa_auth /usr/local/squid/etc/passwd
</verb>
<sect1>How do I use authentication in access controls?
<p>
Make sure that your authentication program is installed
and working correctly. You can test it by hand.
<p>
Add some <em/proxy_auth/ ACL entries to your squid configuration.
For example:
<verb>
acl foo proxy_auth REQUIRED
acl all src 0/0
http_access allow foo
http_access deny all
</verb>
The REQURIED term means that any authenticated user will match the
ACL named <em/foo/.
<p>
Squid allows you to provide fine-grained controls
by specifying individual user names. For example:
<verb>
acl foo proxy_auth REQUIRED
acl bar proxy_auth lisa sarah frank joe
acl daytime time 08:00-17:00
acl all src 0/0
http_access allow bar
http_access allow foo daytime
http_access deny all
</verb>
In this example, users named lisa, sarah, joe, and frank
are allowed to use the proxy at all times. Other users
are allowed only during daytime hours.
<sect1>Does Squid cache authentication lookups?
<p>
Yes. Successful authentication lookups are cached for
one hour by default. That means (in the worst case) its possible
for someone to keep using your cache up to an hour after he
has been removed from the authentication database.
<p>
You can control the expiration
time with the <em/authenticate_ttl/ option.
<sect1>Are passwords stored in clear text or encrypted?
<p>
Squid stores cleartext passwords in itsmemory cache.
<p>
Squid writes cleartext usernames and passwords when talking to
the external authentication processes. Note, however, that this
interprocess communication occors over TCP connections bound to
the loopback interface. Thus, its not possile for processes on
other comuters to "snoop" on the authentication traffic.
<p>
Each authentication program must select its own scheme for persistent
storage of passwords and usernames.
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<sect>Terms and Definitions
<sect1>Neighbor
<P>
In Squid, <em/neighbor/ usually means the same thing as <em/peer/.
A neighbor cache is one that you have defined with the <em/cache_host/ configuration
option. Neighbor refers to either a parent or a sibling.
<P>
In Harvest 1.4, neighbor referred to what Squid calls a sibling. That is, Harvest
had <em/parents/ and <em/neighbors/. For backward compatability, the term
neighbor is still accepted in some Squid configuration options.
<sect1>Regular Expression
<p>
Regular expressions are patterns that used for matching sequences
of characters in text. For more information, see
<url url="http://jmason.org/software/sitescooper/tao_regexps.html"
name="A Tao of Regular Expressions"> and
<url url="http://www.newbie.org/gazette/xxaxx/xprmnt02.html"
name="Newbie's page">.
<verb>
$Id: FAQ.sgml,v 1.2 2004/09/09 12:36:20 cvsdist Exp $
</verb>
</article>
<!-- LocalWords: SSL MSIE Netmanage Chameleon WebSurfer unchecking remotehost
-->
<!-- LocalWords: authuser peerstatus peerhost SWAPIN SWAPOUT unparsable
-->