12683 lines
435 KiB
Plaintext
12683 lines
435 KiB
Plaintext
<!doctype linuxdoc system>
|
|
|
|
<article>
|
|
|
|
<titlepag>
|
|
<TITLE>SQUID Frequently Asked Questions</TITLE>
|
|
<author>© 2001 Duane Wessels, <tt/wessels@squid-cache.org/
|
|
<abstract>
|
|
Frequently Asked Questions (with answers!) about the Squid Internet
|
|
Object Cache software.
|
|
</abstract>
|
|
</titlepag>
|
|
|
|
<toc>
|
|
|
|
<p>
|
|
You can download the FAQ as
|
|
<url url="FAQ.ps.gz" name="compressed Postscript">, and
|
|
<url url="FAQ.txt" name="plain text">.
|
|
</p>
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>About Squid, this FAQ, and other Squid information resources
|
|
|
|
<sect1>What is Squid?
|
|
<P>
|
|
Squid is a high-performance proxy caching server for web clients,
|
|
supporting FTP, gopher, and HTTP data objects. Unlike traditional
|
|
caching software, Squid handles all requests in a single,
|
|
non-blocking, I/O-driven process.
|
|
|
|
Squid keeps
|
|
meta data and especially hot objects cached in RAM, caches
|
|
DNS lookups, supports non-blocking DNS lookups, and implements
|
|
negative caching of failed requests.
|
|
|
|
Squid supports SSL, extensive
|
|
access controls, and full request logging. By using the
|
|
lightweight Internet Cache Protocol, Squid caches can be arranged
|
|
in a hierarchy or mesh for additional bandwidth savings.
|
|
|
|
<P>
|
|
Squid consists of a main server program <em/squid/, a Domain Name System
|
|
lookup program <em/dnsserver/, some optional programs for rewriting
|
|
requests and performing authentication, and some management and client
|
|
tools. When <em/squid/ starts up, it spawns a configurable number of
|
|
<em/dnsserver/ processes, each of which can perform a single, blocking
|
|
Domain Name System (DNS) lookup. This reduces the amount of time the
|
|
cache waits for DNS lookups.
|
|
|
|
<P>
|
|
Squid is derived from the ARPA-funded
|
|
<url url="http://harvest.cs.colorado.edu/"
|
|
name="Harvest project">.
|
|
|
|
<sect1>What is Internet object caching?
|
|
<P>
|
|
Internet object caching is a way to store requested Internet objects
|
|
(i.e., data available via the HTTP, FTP, and gopher protocols) on a
|
|
system closer to the requesting site than to the source. Web browsers
|
|
can then use the local Squid cache as a proxy HTTP server, reducing
|
|
access time as well as bandwidth consumption.
|
|
|
|
<sect1>Why is it called Squid?
|
|
<P>
|
|
Harris' Lament says, ``All the good ones are taken."
|
|
|
|
<P>
|
|
We needed to distinguish this new version from the Harvest
|
|
cache software. Squid was the code name for initial
|
|
development, and it stuck.
|
|
|
|
<sect1>What is the latest version of Squid?
|
|
<P>
|
|
Squid is updated often; please see
|
|
<url url="http://www.squid-cache.org/"
|
|
name="the Squid home page">
|
|
for the most recent versions.
|
|
|
|
<sect1>Who is responsible for Squid?
|
|
<P>
|
|
Squid is the result of efforts by numerous individuals from
|
|
the Internet community.
|
|
<url url="mailto:wessels@squid-cache.org"
|
|
name="Duane Wessels">
|
|
of the National Laboratory for Applied Network Research (funded by
|
|
the National Science Foundation) leads code development.
|
|
Please see
|
|
<url url="http://www.squid-cache.org/CONTRIBUTORS"
|
|
name="the CONTRIBUTORS file">
|
|
for a list of our excellent contributors.
|
|
|
|
<sect1>Where can I get Squid?
|
|
<P>
|
|
You can download Squid via FTP from
|
|
<url url="ftp://ftp.squid-cache.org/pub/"
|
|
name="the primary FTP site">
|
|
or one of the many worldwide
|
|
<url url="http://www.squid-cache.org/mirrors.html"
|
|
name="mirror sites">.
|
|
|
|
<P>
|
|
Many sushi bars also have Squid.
|
|
|
|
<sect1>What Operating Systems does Squid support?
|
|
<P>
|
|
The software is designed to operate on any modern Unix system, and
|
|
is known to work on at least the following platforms:
|
|
<itemize>
|
|
<item> Linux
|
|
<item> FreeBSD
|
|
<item> NetBSD
|
|
<item> BSDI
|
|
<item> OSF and Digital Unix
|
|
<item> IRIX
|
|
<item> SunOS/Solaris
|
|
<item> NeXTStep
|
|
<item> SCO Unix
|
|
<item> AIX
|
|
<item> HP-UX
|
|
<item> <ref id="building-os2" name="OS/2">
|
|
</itemize>
|
|
|
|
<P>
|
|
For more specific information, please see
|
|
<url url="http://www.squid-cache.org/platforms.html" name="platforms.html">.
|
|
If you encounter any platform-specific problems, please
|
|
let us know by sending email to
|
|
<url url="mailto:squid-bugs@squid-cache.org"
|
|
name="squid-bugs">.
|
|
|
|
<sect1>Does Squid run on Windows NT?
|
|
<label id="squid-NT">
|
|
<P>
|
|
Recent versions of Squid will <em/compile and run/ on Windows/NT
|
|
with the
|
|
<url url="http://www.cygnus.com/misc/gnu-win32/"
|
|
name="GNU-Win32 package">.
|
|
|
|
<p>
|
|
<url url="http://www.logisense.com/" name="LogiSense">
|
|
has ported Squid to Windows NT and sells a supported
|
|
version. You can also download the source from
|
|
<url url="ftp://ftp.logisense.com/pub/cachexpress/" name="their FTP site">.
|
|
Thanks to LogiSense for making the code available as required by the GPL terms.
|
|
|
|
<p>
|
|
<url url="mailto: robert dot collins at itdomain dot com dot au" name="Robert Collins">
|
|
is working on a Windows NT port as well. You can find more information from him
|
|
at <url url="http://www.ideal.net.au/~collinsdial/Squid2.4.htm" name="his page">.
|
|
|
|
<p>
|
|
<url url="http://serassio.interfree.it/SquidNT.htm" name="Guido Serassio">
|
|
and <url url="http://www.phys-iasi.ro/users/romeo/squidnt.htm" name="Romeo Anghelache"> have Squid NT pages, including
|
|
binaries and patches.
|
|
|
|
<p>
|
|
|
|
|
|
<sect1>What Squid mailing lists are available?
|
|
<P>
|
|
<itemize>
|
|
<item> squid-users@squid-cache.org: general discussions about the
|
|
Squid cache software. Subscribe via
|
|
<it/squid-users-subscribe@squid-cache.org/.
|
|
|
|
Previous messages are available for browsing at
|
|
<url url="http://www.squid-cache.org/mail-archive/squid-users/"
|
|
name="the Squid Users Archive">,
|
|
and also at <url url="http://marc.theaimsgroup.com/?l=squid-users&r=1&w=2" name="theaimsgroup.com">.
|
|
|
|
<item>
|
|
squid-users-digest: digested (daily) version of
|
|
above. Subscribe via
|
|
<it/squid-users-digest-subscribe@squid-cache.org/.
|
|
|
|
<item>
|
|
squid-announce@squid-cache.org: A receive-only list for
|
|
announcements of new versions.
|
|
Subscribe via
|
|
<it/squid-announce-subscribe@squid-cache.org/.
|
|
|
|
<item>
|
|
<it/squid-bugs@squid-cache.org/:
|
|
A closed list for sending us bug reports.
|
|
Bug reports received here are given priority over
|
|
those mentioned on squid-users.
|
|
|
|
<item>
|
|
<it/squid@squid-cache.org/:
|
|
A closed list for sending us feed-back and ideas.
|
|
|
|
<item>
|
|
<it/squid-faq@squid-cache.org/:
|
|
A closed list for sending us feed-back, updates, and additions to
|
|
the Squid FAQ.
|
|
</itemize>
|
|
|
|
<P>
|
|
We also have a few other mailing lists which are not strictly
|
|
Squid-related.
|
|
|
|
<itemize>
|
|
<item>
|
|
<it/cache-snmp@ircache.net/:
|
|
A public list for discussion of Web Caching and SNMP issues and developments.
|
|
Eventually we hope to put forth a standard Web Caching MIB.
|
|
|
|
<item>
|
|
<it/icp-wg@ircache.net/:
|
|
Mostly-idle mailing list for the nonexistent ICP Working Group within
|
|
the IETF. It may be resurrected some day, you never know!
|
|
|
|
</itemize>
|
|
|
|
<sect1>I can't figure out how to unsubscribe from your mailing list.
|
|
<P>
|
|
All of our mailing lists have ``-subscribe'' and ``-unsubscribe''
|
|
addresses that you must
|
|
use for subscribe and unsubscribe requests. To unsubscribe from
|
|
the squid-users list, you send a message to <em/squid-users-unsubscribe@squid-cache.org/.
|
|
|
|
<sect1>What Squid web pages are available?
|
|
<P>
|
|
Several Squid and Caching-related web pages are available:
|
|
<itemize>
|
|
<item>
|
|
<url url="http://www.squid-cache.org/" name="The Squid home page">
|
|
for information on the Squid software
|
|
|
|
<item>
|
|
<url url="http://www.ircache.net/Cache/" name="The IRCache Mesh">
|
|
gives information on our operational mesh of caches.
|
|
|
|
<item>
|
|
<url url="http://www.squid-cache.org/Doc/FAQ/" name="The Squid FAQ"> (uh, you're reading it).
|
|
|
|
<item>
|
|
<url url="http://cache.is.co.za" name="Oskar's Squid Users Guide">.
|
|
|
|
<item>
|
|
<url url="http://www.ircache.net/Cache/FAQ/" name="The Information Resource Caching FAQ">
|
|
|
|
<item>
|
|
<url url="http://www.squid-cache.org/Doc/Prog-Guide/prog-guide.html" name="Squid Programmers Guide">.
|
|
Yeah, its extremely incomplete. I assure you this is the most recent version.
|
|
|
|
<item>
|
|
<url url="http://www.ircache.net/Cache/reading.html" name="Web Caching Reading list">
|
|
|
|
<item><url url="/Versions/1.0/Release-Notes-1.0.txt" name="Squid-1.0 Release Notes">
|
|
<item><url url="/Versions/1.1/Release-Notes-1.1.txt" name="Squid-1.1 Release Notes">
|
|
<item><url url="http://www.squid-cache.org/Doc/Hierarchy-Tutorial/" name="Tutorial on Configuring Hierarchical Squid Caches">
|
|
<item><url url="http://info.internet.isi.edu/in-notes/rfc/files/rfc2186.txt" name="RFC 2186"> ICPv2 -- Protocol
|
|
<item><url url="http://info.internet.isi.edu/in-notes/rfc/files/rfc2187.txt" name="RFC 2187"> ICPv2 -- Application
|
|
<item><url url="http://info.internet.isi.edu/in-notes/rfc/files/rfc1016.txt" name="RFC 1016">
|
|
</itemize>
|
|
|
|
<sect1>Does Squid support SSL/HTTPS/TLS?
|
|
<P>
|
|
Squid supports these encrypted protocols by ``tunelling'' traffic between
|
|
clients and servers.
|
|
Squid can relay the encrypted bits between a client and a server.
|
|
<p>
|
|
Normally, when your browser comes across an <em/https/ URL, it
|
|
does one of two things:
|
|
<enum>
|
|
<item>The browser opens an SSL connection directly to the origin
|
|
server.
|
|
<item>The browser tunnels the request through Squid with the
|
|
<em/CONNECT/ request method.
|
|
</enum>
|
|
<p>
|
|
The <em/CONNECT/ method is a way to tunnel any kind of
|
|
connection through an HTTP proxy. The proxy doesn't
|
|
understand or interpret the contents. It just passes
|
|
bytes back and forth between the client and server.
|
|
For the gory details on tunnelling and the CONNECT
|
|
method, please see
|
|
<url url="ftp://ftp.isi.edu/in-notes/rfc2817.txt" name="RFC 2817">
|
|
and <url url="http://www.web-cache.com/Writings/Internet-Drafts/draft-luotonen-web-proxy-tunneling-01.txt"
|
|
name="Tunneling TCP based protocols through Web proxy servers"> (expired).
|
|
<p>
|
|
Squid can not (yet) encrypt or decrypt such connections, however.
|
|
Some folks are working on a patch, using OpenSSL, that allows Squid to do this.
|
|
|
|
|
|
<sect1>What's the legal status of Squid?
|
|
<P>
|
|
Squid is <url url="squid-copyright.txt" name="copyrighted">
|
|
by the University of California San Diego.
|
|
Squid uses some <url url="squid-credits.txt" name="code developed by others">.
|
|
|
|
<P>
|
|
Squid is
|
|
<url url="http://www.gnu.org/philosophy/free-sw.html"
|
|
name="Free Software">.
|
|
|
|
<P>
|
|
Squid is licensed under the terms of the
|
|
<url url="http://www.gnu.org/copyleft/gpl.html"
|
|
name="GNU General Public License">.
|
|
|
|
<sect1>Is Squid year-2000 compliant?
|
|
<P>
|
|
We think so. Squid uses the Unix time format for all internal time
|
|
representations. Potential problem areas are in printing and
|
|
parsing other time representations. We have made the following
|
|
fixes in to address the year 2000:
|
|
<itemize>
|
|
<item>
|
|
<em/cache.log</em> timestamps use 4-digit years instead of just 2 digits.
|
|
<item>
|
|
<em/parse_rfc1123()/ assumes years less than "70" are after 2000.
|
|
<item>
|
|
<em/parse_iso3307_time()/ checks all four year digits.
|
|
|
|
</itemize>
|
|
|
|
<P>
|
|
Year-2000 fixes were applied to the following Squid versions:
|
|
<itemize>
|
|
<item>
|
|
<url url="/Versions/v2/2.1/" name="squid-2.1">:
|
|
Year parsing bug fixed for dates in the "Wed Jun 9 01:29:59 1993 GMT"
|
|
format (Richard Kettlewell).
|
|
<item>
|
|
squid-1.1.22:
|
|
Fixed likely year-2000 bug in ftpget's timestamp parsing (Henrik Nordstrom).
|
|
<item>
|
|
squid-1.1.20:
|
|
Misc fixes (Arjan de Vet).
|
|
</itemize>
|
|
|
|
<P>Patches:
|
|
<itemize>
|
|
<item>
|
|
<url url="../Y2K/patch3" name="Richard's lib/rfc1123.c patch">.
|
|
If you are still running 1.1.X, then you should apply this patch to
|
|
your source and recompile.
|
|
<item>
|
|
<url url="../Y2K/patch2" name="Henrik's src/ftpget.c patch">.
|
|
<item>
|
|
<url url="../Y2K/patch1" name="Arjan's lib/rfc1123.c patch">.
|
|
</itemize>
|
|
|
|
<p>
|
|
Squid-2.2 and earlier versions have a <url
|
|
url="http://www.squid-cache.org/Versions/v2/2.2/bugs/index.html#squid-2.2.stable5-mkhttpdlogtime-end-of-year" name="New
|
|
Year bug">. This is not strictly a Year-2000 bug; it would happen on the first day of any year.
|
|
|
|
<sect1>Can I pay someone for Squid support?
|
|
<P>
|
|
Yep. Please see the <url url="/Support/services.html"
|
|
name="commercial support page">.
|
|
|
|
|
|
<sect1>Squid FAQ contributors
|
|
<P>
|
|
The following people have made contributions to this document:
|
|
<itemize>
|
|
<item>
|
|
<url url="mailto:JLarmour@origin-at.co.uk" name="Jonathan Larmour">
|
|
<item>
|
|
<url url="mailto:cord@Wunder-Nett.org" name="Cord Beermann">
|
|
<item>
|
|
<url url="mailto:tony@nlanr.net" name="Tony Sterrett">
|
|
<item>
|
|
<url url="mailto:ghynes@compusult.nf.ca" name="Gerard Hynes">
|
|
<item>
|
|
<url url="mailto:tkatayam@pi.titech.ac.jp" name="Katayama, Takeo">
|
|
<item>
|
|
<url url="mailto:wessels@ircache.net" name="Duane Wessels">
|
|
<item>
|
|
<url url="mailto:kc@caida.org" name="K Claffy">
|
|
<item>
|
|
<url url="mailto:pauls@etext.org" name="Paul Southworth">
|
|
<item>
|
|
<url url="mailto:oskar@is.co.za" name="Oskar Pearson">
|
|
<item>
|
|
<url url="mailto:ongbh@zpoprp.zpo.dec.com" name="Ong Beng Hui">
|
|
<item>
|
|
<url url="mailto:torsten.sturm@axis.de" name="Torsten Sturm">
|
|
<item>
|
|
<url url="mailto:jrg@blodwen.demon.co.uk" name="James R Grinter">
|
|
<item>
|
|
<url url="mailto:roever@nse.simac.nl" name="Rodney van den Oever">
|
|
<item>
|
|
<url url="mailto:bertold@tohotom.vein.hu" name="Kolics Bertold">
|
|
<item>
|
|
<url url="mailto:carson@cugc.org" name="Carson Gaspar">
|
|
<item>
|
|
<url url="mailto:michael@metal.iinet.net.au" name="Michael O'Reilly">
|
|
<item>
|
|
<url url="mailto:hclsmith@tallships.istar.ca" name="Hume Smith">
|
|
<item>
|
|
<url url="mailto:RichardA@noho.co.uk" name="Richard Ayres">
|
|
<item>
|
|
<url url="mailto:John.Saunders@scitec.com.au" name="John Saunders">
|
|
<item>
|
|
<url url="mailto:miquels@cistron.nl" name="Miquel van Smoorenburg">
|
|
<item>
|
|
<url url="mailto:david@avarice.nepean.uws.edu.au" name="David J N Begley">
|
|
<item>
|
|
<url url="mailto:SarKev@topnz.ac.nz" name="Kevin Sartorelli">
|
|
<item>
|
|
<url url="mailto:doering@usf.uni-kassel.de" name="Andreas Doering">
|
|
<item>
|
|
<url url="mailto:mark@cal026031.student.utwente.nl" name="Mark Visser">
|
|
<item>
|
|
<url url="mailto:tom@interact.net.au" name="tom minchin">
|
|
<item>
|
|
<url url="mailto:voeckler@rvs.uni-hannover.de" name="Jens-S. Vöckler">
|
|
<item>
|
|
<url url="mailto:andre.albsmeier@mchp.siemens.de" name="Andre Albsmeier">
|
|
<item>
|
|
<url url="mailto:nazard@man-assoc.on.ca" name="Doug Nazar">
|
|
<item>
|
|
<url url="mailto:hno@hem.passagen.se" name="Henrik Nordstrom">
|
|
<item>
|
|
<url url="mailto:mark@rts.com.au" name="Mark Reynolds">
|
|
<item>
|
|
<url url="mailto:Arjan.deVet@adv.IAEhv.nl" name="Arjan de Vet">
|
|
<item>
|
|
<url url="mailto:peter@spinner.dialix.com.au" name="Peter Wemm">
|
|
<item>
|
|
<url url="mailto:webadm@info.cam.ac.uk" name="John Line">
|
|
<item>
|
|
<url url="mailto:ARMISTEJ@oeca.otis.com" name="Jason Armistead">
|
|
<item>
|
|
<url url="mailto:cudch@csv.warwick.ac.uk" name="Chris Tilbury">
|
|
<item>
|
|
<url url="mailto:jeff@sisna.com" name="Jeff Madison">
|
|
<item>
|
|
<url url="mailto:mbatchelor@citysearch.com" name="Mike Batchelor">
|
|
<item>
|
|
<url url="mailto:bogstad@pobox.com" name="Bill Bogstad">
|
|
<item>
|
|
<url url="mailto:radu at netsoft dot ro" name="Radu Greab">
|
|
<item>
|
|
<url url="mailto:f.j.bosscha@nhl.nl" name="F.J. Bosscha">
|
|
<item>
|
|
<url url="mailto:signal@shreve.net" name="Brian Feeny">
|
|
<item>
|
|
<url url="mailto:Support@dnet.co.uk" name="Martin Lyons">
|
|
<item>
|
|
<url url="mailto:luyer@ucs.uwa.edu.au" name="David Luyer">
|
|
<item>
|
|
<url url="mailto:chris@senet.com.au" name="Chris Foote">
|
|
<item>
|
|
<url url="mailto:elkner@wotan.cs.Uni-Magdeburg.DE" name="Jens Elkner">
|
|
</itemize>
|
|
<P>
|
|
Please send corrections, updates, and comments to:
|
|
<url url="mailto:squid-faq@squid-cache.org"
|
|
name="squid-faq@squid-cache.org">.
|
|
|
|
<sect1>About This Document
|
|
<P>
|
|
This document is copyrighted (2000) by Duane Wessels.
|
|
|
|
<P>
|
|
This document was written in SGML and converted with the
|
|
<url url="http://www.sgmltools.org/"
|
|
name="SGML-Tools package">.
|
|
|
|
<sect2>Want to contribute? Please write in SGML...
|
|
|
|
<P>
|
|
It is easier for us if you send us text which is close to "correct" SGML.
|
|
The SQUID FAQ currently uses the LINUXDOC DTD. Its probably easiest
|
|
to follow examples in the this file.
|
|
Here are the basics:
|
|
|
|
<P>
|
|
Use the <url> tag for links, instead of HTML <A HREF ...>
|
|
<verb>
|
|
<url url="http://www.squid-cache.org" name="Squid Home Page">
|
|
</verb>
|
|
|
|
<P>
|
|
Use <em> for emphasis, config options, and pathnames:
|
|
<verb>
|
|
<em>usr/local/squid/etc/squid.conf</em>
|
|
<em/cache_peer/
|
|
</verb>
|
|
|
|
<P>
|
|
Here is how you do lists:
|
|
<verb>
|
|
<itemize>
|
|
<item>foo
|
|
<item>bar
|
|
</itemize>
|
|
</verb>
|
|
|
|
<P>
|
|
Use <verb>, just like HTML's <PRE> to show
|
|
unformatted text.
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Getting and Compiling Squid
|
|
<label id="compiling">
|
|
|
|
<sect1>Which file do I download to get Squid?
|
|
<P>
|
|
You must download a source archive file of the form
|
|
squid-x.y.z-src.tar.gz (eg, squid-1.1.6-src.tar.gz) from
|
|
<url url="http://www.squid-cache.org/"
|
|
name="the Squid home page">, or.
|
|
<url url="ftp://www.squid-cache.org/pub/"
|
|
name="the Squid FTP site">.
|
|
Context diffs are available for upgrading to new versions.
|
|
These can be applied with the <em/patch/ program (available from
|
|
<url url="ftp://prep.ai.mit.edu/pub/gnu/"
|
|
name="the GNU FTP site">).
|
|
|
|
<sect1>How do I compile Squid?
|
|
|
|
<P>
|
|
For <bf/Squid-1.0/ and <bf/Squid-1.1/ versions, you can just
|
|
type <em/make/ from the top-level directory after unpacking
|
|
the source files. For example:
|
|
<verb>
|
|
% tar xzf squid-1.1.21-src.tar.gz
|
|
% cd squid-1.1.21
|
|
% make
|
|
</verb>
|
|
<P>
|
|
For <bf/Squid-2/ you must run the <em/configure/ script yourself
|
|
before running <em/make/:
|
|
<verb>
|
|
% tar xzf squid-2.0.RELEASE-src.tar.gz
|
|
% cd squid-2.0.RELEASE
|
|
% ./configure
|
|
% make
|
|
</verb>
|
|
|
|
<sect1>What kind of compiler do I need?
|
|
<P>
|
|
To compile Squid, you will need an ANSI C compiler. Almost all
|
|
modern Unix systems come with pre-installed compilers which work
|
|
just fine. The old <em/SunOS/ compilers do not have support for ANSI
|
|
C, and the Sun compiler for <em/Solaris/ is a product which
|
|
must be purchased separately.
|
|
|
|
<P>
|
|
If you are uncertain about your system's C compiler, The GNU C compiler is
|
|
available at
|
|
<url url="ftp://prep.ai.mit.edu/pub/gnu/"
|
|
name="the GNU FTP site">.
|
|
In addition to gcc, you may also want or need to install the <em/binutils/
|
|
package.
|
|
|
|
<sect1>What else do I need to compile Squid?
|
|
<p>
|
|
You will need <url url="http://www.perl.com/" name="Perl"> installed
|
|
on your system.
|
|
|
|
<sect1>Do you have pre-compiled binaries available?
|
|
<!-- Binaries list replicated at /binaries.html -->
|
|
|
|
<P>
|
|
The developers do not have the resources to make pre-compiled
|
|
binaries available. Instead, we invest effort into making
|
|
the source code very portable. Some people have made
|
|
binary packages available. Please see our
|
|
<url url="http://www.squid-cache.org/platforms.html" name="Platforms Page">.
|
|
|
|
<p>
|
|
The <url url="http://freeware.sgi.com/" name="SGI Freeware"> site
|
|
has pre-compiled packages for SGI IRIX.
|
|
|
|
<p>
|
|
Squid binaries for
|
|
<url url="http://www.freebsd.org/cgi/ports.cgi?query=squid-2&stype=all"
|
|
name="FreeBSD on Alpha and Intel">.
|
|
|
|
<p>
|
|
Squid binaries for
|
|
<url url="ftp://ftp.netbsd.org/pub/NetBSD/packages/pkgsrc/www/squid/README.html"
|
|
name="NetBSD on everything">
|
|
|
|
<sect1>How do I apply a patch or a diff?
|
|
<P>
|
|
You need the <tt/patch/ program. You should probably duplicate the
|
|
entire directory structure before applying the patch. For example, if
|
|
you are upgrading from squid-1.1.10 to 1.1.11, you would run
|
|
these commands:
|
|
<verb>
|
|
cd squid-1.1.10
|
|
mkdir ../squid-1.1.11
|
|
find . -depth -print | cpio -pdv ../squid-1.1.11
|
|
cd ../squid-1.1.11
|
|
patch < /tmp/diff-1.1.10-1.1.11
|
|
</verb>
|
|
After the patch has been applied, you must rebuild Squid from the
|
|
very beginning, i.e.:
|
|
<verb>
|
|
make realclean
|
|
./configure
|
|
make
|
|
make install
|
|
</verb>
|
|
Note, In later distributions (Squid 2), 'realclean' has been changed
|
|
to 'distclean'.
|
|
|
|
<P>
|
|
If patch keeps asking for a file name, try adding ``-p0'':
|
|
<verb>
|
|
patch -p0 < filename
|
|
</verb>
|
|
|
|
<P>
|
|
If your <tt/patch/ program seems to complain or refuses to work,
|
|
you should get a more recent version, from the
|
|
<url url="ftp://ftp.gnu.ai.mit.edu/pub/gnu/"
|
|
name="GNU FTP site">, for example.
|
|
|
|
<sect1><em/configure/ options
|
|
<P>
|
|
The configure script can take numerous options. The most
|
|
useful is <tt/--prefix/ to install it in a different directory.
|
|
The default installation directory is <em>/usr/local/squid/</em>. To
|
|
change the default, you could do:
|
|
<verb>
|
|
% cd squid-x.y.z
|
|
% ./configure --prefix=/some/other/directory/squid
|
|
</verb>
|
|
|
|
<P>
|
|
Type
|
|
<verb>
|
|
% ./configure --help
|
|
</verb>
|
|
to see all available options. You will need to specify some
|
|
of these options to enable or disable certain features.
|
|
Some options which are used often include:
|
|
|
|
<verb>
|
|
--prefix=PREFIX install architecture-independent files in PREFIX
|
|
[/usr/local/squid]
|
|
--enable-dlmalloc[=LIB] Compile & use the malloc package by Doug Lea
|
|
--enable-gnuregex Compile GNUregex
|
|
--enable-splaytree Use SPLAY trees to store ACL lists
|
|
--enable-xmalloc-debug Do some simple malloc debugging
|
|
--enable-xmalloc-debug-trace
|
|
Detailed trace of memory allocations
|
|
--enable-xmalloc-statistics
|
|
Show malloc statistics in status page
|
|
--enable-carp Enable CARP support
|
|
--enable-async-io Do ASYNC disk I/O using threads
|
|
--enable-icmp Enable ICMP pinging
|
|
--enable-delay-pools Enable delay pools to limit bandwith usage
|
|
--enable-mem-gen-trace Do trace of memory stuff
|
|
--enable-useragent-log Enable logging of User-Agent header
|
|
--enable-kill-parent-hack
|
|
Kill parent on shutdown
|
|
--enable-snmp Enable SNMP monitoring
|
|
--enable-time-hack Update internal timestamp only once per second
|
|
--enable-cachemgr-hostname[=hostname]
|
|
Make cachemgr.cgi default to this host
|
|
--enable-arp-acl Enable use of ARP ACL lists (ether address)
|
|
--enable-htpc Enable HTCP protocol
|
|
--enable-forw-via-db Enable Forw/Via database
|
|
--enable-cache-digests Use Cache Digests
|
|
see http://www.squid-cache.org/Doc/FAQ/FAQ-16.html
|
|
--enable-err-language=lang
|
|
Select language for Error pages (see errors dir)
|
|
</verb>
|
|
|
|
<sect1>undefined reference to __inet_ntoa
|
|
|
|
<P>
|
|
by <url url="mailto:SarKev@topnz.ac.nz" name="Kevin Sartorelli">
|
|
and <url url="mailto:doering@usf.uni-kassel.de" name="Andreas Doering">.
|
|
|
|
<P>
|
|
Probably you've recently installed bind 8.x. There is a mismatch between
|
|
the header files and DNS library that Squid has found. There are a couple
|
|
of things you can try.
|
|
|
|
<P>
|
|
First, try adding <tt/-lbind/ to <em/XTRA_LIBS/ in <em>src/Makefile</em>.
|
|
If <tt/-lresolv/ is already there, remove it.
|
|
|
|
<P>
|
|
If that doesn't seem to work, edit your <em>arpa/inet.h</em> file and comment out the following:
|
|
|
|
<verb>
|
|
#define inet_addr __inet_addr
|
|
#define inet_aton __inet_aton
|
|
#define inet_lnaof __inet_lnaof
|
|
#define inet_makeaddr __inet_makeaddr
|
|
#define inet_neta __inet_neta
|
|
#define inet_netof __inet_netof
|
|
#define inet_network __inet_network
|
|
#define inet_net_ntop __inet_net_ntop
|
|
#define inet_net_pton __inet_net_pton
|
|
#define inet_ntoa __inet_ntoa
|
|
#define inet_pton __inet_pton
|
|
#define inet_ntop __inet_ntop
|
|
#define inet_nsap_addr __inet_nsap_addr
|
|
#define inet_nsap_ntoa __inet_nsap_ntoa
|
|
</verb>
|
|
|
|
<sect1>How can I get true DNS TTL info into Squid's IP cache?
|
|
<label id="dns-ttl-hack">
|
|
<P>
|
|
If you have source for BIND, you can modify it as indicated in the diff
|
|
below. It causes the global variable _dns_ttl_ to be set with the TTL
|
|
of the most recent lookup. Then, when you compile Squid, the configure
|
|
script will look for the _dns_ttl_ symbol in libresolv.a. If found,
|
|
dnsserver will return the TTL value for every lookup.
|
|
<P>
|
|
This hack was contributed by
|
|
<url url="mailto:bne@CareNet.hu" name="Endre Balint Nagy">.
|
|
|
|
<verb>
|
|
diff -ru bind-4.9.4-orig/res/gethnamaddr.c bind-4.9.4/res/gethnamaddr.c
|
|
--- bind-4.9.4-orig/res/gethnamaddr.c Mon Aug 5 02:31:35 1996
|
|
+++ bind-4.9.4/res/gethnamaddr.c Tue Aug 27 15:33:11 1996
|
|
@@ -133,6 +133,7 @@
|
|
} align;
|
|
|
|
extern int h_errno;
|
|
+int _dns_ttl_;
|
|
|
|
#ifdef DEBUG
|
|
static void
|
|
@@ -223,6 +224,7 @@
|
|
host.h_addr_list = h_addr_ptrs;
|
|
haveanswer = 0;
|
|
had_error = 0;
|
|
+ _dns_ttl_ = -1;
|
|
while (ancount-- > 0 && cp < eom && !had_error) {
|
|
n = dn_expand(answer->buf, eom, cp, bp, buflen);
|
|
if ((n < 0) || !(*name_ok)(bp)) {
|
|
@@ -232,8 +234,11 @@
|
|
cp += n; /* name */
|
|
type = _getshort(cp);
|
|
cp += INT16SZ; /* type */
|
|
- class = _getshort(cp);
|
|
- cp += INT16SZ + INT32SZ; /* class, TTL */
|
|
+ class = _getshort(cp);
|
|
+ cp += INT16SZ; /* class */
|
|
+ if (qtype == T_A && type == T_A)
|
|
+ _dns_ttl_ = _getlong(cp);
|
|
+ cp += INT32SZ; /* TTL */
|
|
n = _getshort(cp);
|
|
cp += INT16SZ; /* len */
|
|
if (class != C_IN) {
|
|
</verb>
|
|
|
|
<P>
|
|
And here is a patch for BIND-8:
|
|
<verb>
|
|
*** src/lib/irs/dns_ho.c.orig Tue May 26 21:55:51 1998
|
|
--- src/lib/irs/dns_ho.c Tue May 26 21:59:57 1998
|
|
***************
|
|
*** 87,92 ****
|
|
--- 87,93 ----
|
|
#endif
|
|
|
|
extern int h_errno;
|
|
+ int _dns_ttl_;
|
|
|
|
/* Definitions. */
|
|
|
|
***************
|
|
*** 395,400 ****
|
|
--- 396,402 ----
|
|
pvt->host.h_addr_list = pvt->h_addr_ptrs;
|
|
haveanswer = 0;
|
|
had_error = 0;
|
|
+ _dns_ttl_ = -1;
|
|
while (ancount-- > 0 && cp < eom && !had_error) {
|
|
n = dn_expand(ansbuf, eom, cp, bp, buflen);
|
|
if ((n < 0) || !(*name_ok)(bp)) {
|
|
***************
|
|
*** 404,411 ****
|
|
cp += n; /* name */
|
|
type = ns_get16(cp);
|
|
cp += INT16SZ; /* type */
|
|
! class = ns_get16(cp);
|
|
! cp += INT16SZ + INT32SZ; /* class, TTL */
|
|
n = ns_get16(cp);
|
|
cp += INT16SZ; /* len */
|
|
if (class != C_IN) {
|
|
--- 406,416 ----
|
|
cp += n; /* name */
|
|
type = ns_get16(cp);
|
|
cp += INT16SZ; /* type */
|
|
! class = _getshort(cp);
|
|
! cp += INT16SZ; /* class */
|
|
! if (qtype == T_A && type == T_A)
|
|
! _dns_ttl_ = _getlong(cp);
|
|
! cp += INT32SZ; /* TTL */
|
|
n = ns_get16(cp);
|
|
cp += INT16SZ; /* len */
|
|
if (class != C_IN) {
|
|
</verb>
|
|
|
|
<sect1>My platform is BSD/OS or BSDI and I can't compile Squid
|
|
<label id="bsdi-compile">
|
|
<P>
|
|
<verb>
|
|
cache_cf.c: In function `parseConfigFile':
|
|
cache_cf.c:1353: yacc stack overflow before `token'
|
|
...
|
|
</verb>
|
|
|
|
<P>
|
|
You may need to upgrade your gcc installation to a more recent version.
|
|
Check your gcc version with
|
|
<verb>
|
|
gcc -v
|
|
</verb>
|
|
If it is earlier than 2.7.2, you might consider upgrading.
|
|
|
|
<P>
|
|
Alternatively, you can get pre-compiled Squid binaries for BSD/OS 2.1 at
|
|
the <url url="ftp://ftp.bsdi.com/patches/patches-2.1" name="BSD patches FTP site">,
|
|
patch <url url="ftp://ftp.bsdi.com/patches/patches-2.1/U210-019" name="U210-019">.
|
|
|
|
|
|
<sect1>Problems compiling <em/libmiscutil.a/ on Solaris
|
|
<P>
|
|
The following error occurs on Solaris systems using gcc when the Solaris C
|
|
compiler is not installed:
|
|
<verb>
|
|
/usr/bin/rm -f libmiscutil.a
|
|
/usr/bin/false r libmiscutil.a rfc1123.o rfc1738.o util.o ...
|
|
make[1]: *** [libmiscutil.a] Error 255
|
|
make[1]: Leaving directory `/tmp/squid-1.1.11/lib'
|
|
make: *** [all] Error 1
|
|
</verb>
|
|
Note on the second line the <bf>/usr/bin/false</bf>. This is supposed
|
|
to be a path to the <em/ar/ program. If <em/configure/ cannot find <em/ar/
|
|
on your system, then it substitues <em/false/.
|
|
|
|
<P>
|
|
To fix this you either need to:
|
|
<itemize>
|
|
<item>
|
|
Add <em>/usr/ccs/bin</em> to your PATH. This is where the <em/ar/
|
|
command should be. You need to install SUNWbtool if <em/ar/
|
|
is not there. Otherwise,
|
|
<item>
|
|
Install the <bf/binutils/ package from
|
|
<url url="ftp://prep.ai.mit.edu/pub/gnu/" name="the GNU FTP site">.
|
|
This package includes programs such as <em/ar/, <em/as/, and <em/ld/.
|
|
</itemize>
|
|
|
|
<sect1>I have problems compiling Squid on Platform Foo.
|
|
<P>
|
|
Please check the
|
|
<url url="/platforms.html" name="page of platforms">
|
|
on which Squid is known to compile. Your problem might be listed
|
|
there together with a solution. If it isn't listed there, mail
|
|
us what you are trying, your Squid version, and the problems
|
|
you encounter.
|
|
|
|
<sect1>I see a lot warnings while compiling Squid.
|
|
<P>
|
|
Warnings are usually not a big concern, and can be common with software
|
|
designed to operate on multiple platforms. If you feel like fixing
|
|
compile-time warnings, please do so and send us the patches.
|
|
|
|
|
|
<sect1>Building Squid on OS/2
|
|
<label id="building-os2">
|
|
<P>
|
|
by <url url="mailto:nazard@man-assoc.on.ca" name="Doug Nazar">
|
|
|
|
<P>
|
|
In order in compile squid, you need to have a reasonable facsimile of a
|
|
Unix system installed. This includes <em/bash/, <em/make/, <em/sed/,
|
|
<em/emx/, various file utilities and a few more. I've setup a TVFS
|
|
drive that matches a Unix file system but this probably isn't strictly
|
|
necessary.
|
|
|
|
<P>
|
|
I made a few modifications to the pristine EMX 0.9d install.
|
|
<enum>
|
|
<item>
|
|
added defines for <em/strcasecmp()/ & <em/strncasecmp()/ to <em/string.h/
|
|
<item>
|
|
changed all occurrences of time_t to signed long instead
|
|
of unsigned long
|
|
<item>
|
|
hacked ld.exe
|
|
<enum>
|
|
<item>
|
|
to search for both xxxx.a and libxxxx.a
|
|
<item>
|
|
to produce the correct filename when using the
|
|
-Zexe option
|
|
</enum>
|
|
</enum>
|
|
|
|
<P>
|
|
You will need to run <em>scripts/convert.configure.to.os2</em> (in the
|
|
Squid source distribution) to modify
|
|
the configure script so that it can search for the various programs.
|
|
|
|
<P>
|
|
Next, you need to set a few environment variables (see EMX docs
|
|
for meaning):
|
|
<verb>
|
|
export EMXOPT="-h256 -c"
|
|
export LDFLAGS="-Zexe -Zbin -s"
|
|
</verb>
|
|
|
|
<P>
|
|
Now you are ready to configure squid:
|
|
<verb>
|
|
./configure
|
|
</verb>
|
|
<P>
|
|
Compile everything:
|
|
<verb>
|
|
make
|
|
</verb>
|
|
<P>
|
|
and finally, install:
|
|
<verb>
|
|
make install
|
|
</verb>
|
|
<P>
|
|
This will by default, install into <em>/usr/local/squid</em>. If you wish
|
|
to install somewhere else, see the <em/--prefix/ option for configure.
|
|
|
|
<P>
|
|
Now, don't forget to set EMXOPT before running squid each time. I
|
|
recommend using the -Y and -N options.
|
|
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Installing and Running Squid
|
|
|
|
<sect1>How big of a system do I need to run Squid?
|
|
|
|
<P>
|
|
There are no hard-and-fast rules. The most important resource
|
|
for Squid is physical memory. Your processor does not need
|
|
to be ultra-fast. Your disk system will be the major bottleneck,
|
|
so fast disks are important for high-volume caches. Do not use
|
|
IDE disks if you can help it.
|
|
|
|
<P>
|
|
In late 1998, if you are buying a new machine for
|
|
a cache, I would recommend the following configuration:
|
|
<itemize>
|
|
<item>300 MHz Pentium II CPU
|
|
<item>512 MB RAM
|
|
<item>Five 9 GB UW-SCSI disks
|
|
</itemize>
|
|
Your system disk, and logfile disk can probably be IDE without losing
|
|
any cache performance.
|
|
|
|
<P>
|
|
Also, see <url url="http://wwwcache.ja.net/servers/squids.html"
|
|
name="Squid Sizing for Intel Platforms"> by Martin Hamilton This is a
|
|
very nice page summarizing system configurations people are using for
|
|
large Squid caches.
|
|
|
|
<sect1>How do I install Squid?
|
|
|
|
<P>
|
|
After <ref id="compiling" name="compiling Squid">, you can install it
|
|
with this simple command:
|
|
<verb>
|
|
% make install
|
|
</verb>
|
|
If you have enabled the
|
|
<ref id="using-icmp" name="ICMP features">
|
|
then you will also want to type
|
|
<verb>
|
|
% su
|
|
# make install-pinger
|
|
</verb>
|
|
|
|
<P>
|
|
After installing, you will want to edit and customize
|
|
the <em/squid.conf/ file. By default, this file is
|
|
located at <em>/usr/local/squid/etc/squid.conf</em>.
|
|
|
|
<P>
|
|
Also, a QUICKSTART guide has been included with the source
|
|
distribution. Please see the directory where you
|
|
unpacked the source archive.
|
|
|
|
<sect1>What does the <em/squid.conf/ file do?
|
|
<P>
|
|
The <em/squid.conf/ file defines the configuration for
|
|
<em/squid/. the configuration includes (but not limited to)
|
|
HTTP port number, the ICP request port number, incoming and outgoing
|
|
requests, information about firewall access, and various timeout
|
|
information.
|
|
|
|
<sect1>Do you have a <em/squid.conf/ example?
|
|
<P>
|
|
Yes, after you <tt/make install/, a sample <em/squid.conf/ file will
|
|
exist in the ``etc" directory under the Squid installation directory.
|
|
|
|
The sample <em/squid.conf/ file contains comments explaining each
|
|
option.
|
|
<P>
|
|
|
|
<sect1>How do I start Squid?
|
|
<P>
|
|
After you've finished editing the configuration file, you can
|
|
start Squid for the first time. The procedure depends a little
|
|
bit on which version you are using.
|
|
|
|
<sect2>Squid version 2.X
|
|
<p>
|
|
First, you must create the swap directories. Do this by
|
|
running Squid with the -z option:
|
|
<verb>
|
|
% /usr/local/squid/bin/squid -z
|
|
</verb>
|
|
Once that completes, you can start Squid and try it out.
|
|
Probably the best thing to do is run it from your terminal
|
|
and watch the debugging output. Use this command:
|
|
<verb>
|
|
% /usr/local/squid/bin/squid -NCd1
|
|
</verb>
|
|
If everything is working okay, you will see the line:
|
|
<verb>
|
|
Ready to serve requests.
|
|
</verb>
|
|
If you want to run squid in the background, as a daemon process,
|
|
just leave off all options:
|
|
<verb>
|
|
% /usr/local/squid/bin/squid
|
|
</verb>
|
|
<p>
|
|
NOTE: depending on your configuration, you may need to start
|
|
squid as root.
|
|
|
|
<sect2>Squid version 1.1.X
|
|
|
|
<P>
|
|
With version 1.1.16 and later, you must first run Squid with the
|
|
<bf/-z/ option to create the cache swap directories.
|
|
<verb>
|
|
% /usr/local/squid/bin/squid -z
|
|
</verb>
|
|
Squid will exit when it finishes creating all of the directories.
|
|
Next you can start <em/RunCache/:
|
|
<verb>
|
|
% /usr/local/squid/bin/RunCache &
|
|
</verb>
|
|
|
|
<P>
|
|
For versions before 1.1.6 you should just start <em/RunCache/
|
|
immediately, instead of running <em/squid -z/ first.
|
|
|
|
<sect1>How do I start Squid automatically when the system boots?
|
|
|
|
<sect2>Squid Version 2.X
|
|
|
|
<P>
|
|
Squid-2 has a restart feature built in. This greatly simplifies
|
|
starting Squid and means that you don't need to use <em/RunCache/
|
|
or <em/inittab/. At the minimum, you only need to enter the
|
|
pathname to the Squid executable. For example:
|
|
<verb>
|
|
/usr/local/squid/bin/squid
|
|
</verb>
|
|
|
|
<P>
|
|
Squid will automatically background itself and then spawn
|
|
a child process. In your <em/syslog/ messages file, you
|
|
should see something like this:
|
|
<verb>
|
|
Sep 23 23:55:58 kitty squid[14616]: Squid Parent: child process 14617 started
|
|
</verb>
|
|
That means that process ID 14563 is the parent process which monitors the child
|
|
process (pid 14617). The child process is the one that does all of the
|
|
work. The parent process just waits for the child process to exit. If the
|
|
child process exits unexpectedly, the parent will automatically start another
|
|
child process. In that case, <em/syslog/ shows:
|
|
<verb>
|
|
Sep 23 23:56:02 kitty squid[14616]: Squid Parent: child process 14617 exited with status 1
|
|
Sep 23 23:56:05 kitty squid[14616]: Squid Parent: child process 14619 started
|
|
</verb>
|
|
|
|
<p>
|
|
If there is some problem, and Squid can not start, the parent process will give up
|
|
after a while. Your <em/syslog/ will show:
|
|
<verb>
|
|
Sep 23 23:56:12 kitty squid[14616]: Exiting due to repeated, frequent failures
|
|
</verb>
|
|
When this happens you should check your <em/syslog/ messages and
|
|
<em/cache.log/ file for error messages.
|
|
|
|
<p>
|
|
When you look at a process (<em/ps/ command) listing, you'll see two squid processes:
|
|
<verb>
|
|
24353 ?? Ss 0:00.00 /usr/local/squid/bin/squid
|
|
24354 ?? R 0:03.39 (squid) (squid)
|
|
</verb>
|
|
The first is the parent process, and the child process is the one called ``(squid)''.
|
|
Note that if you accidentally kill the parent process, the child process will not
|
|
notice.
|
|
|
|
<p>
|
|
If you want to run Squid from your termainal and prevent it from
|
|
backgrounding and spawning a child process, use the <em/-N/ command
|
|
line option.
|
|
<verb>
|
|
/usr/local/squid/bin/squid -N
|
|
</verb>
|
|
|
|
<sect2>Squid Version 1.1.X
|
|
|
|
<sect3>From inittab
|
|
<P>
|
|
On systems which have an <em>/etc/inittab</em> file (Digital Unix,
|
|
Solaris, IRIX, HP-UX, Linux), you can add a line like this:
|
|
<verb>
|
|
sq:3:respawn:/usr/local/squid/bin/squid.sh < /dev/null >> /tmp/squid.log 2>&1
|
|
</verb>
|
|
We recommend using a <em/squid.sh/ shell script, but you could instead call
|
|
Squid directly. A sameple <em/squid.sh/ script is shown below:
|
|
<verb>
|
|
#!/bin/sh
|
|
C=/usr/local/squid
|
|
PATH=/usr/bin:$C/bin
|
|
TZ=PST8PDT
|
|
export PATH TZ
|
|
|
|
notify="root"
|
|
cd $C
|
|
umask 022
|
|
sleep 10
|
|
while [ -f /tmp/nosquid ]; do
|
|
sleep 1
|
|
done
|
|
/usr/bin/tail -20 $C/logs/cache.log \
|
|
| Mail -s "Squid restart on `hostname` at `date`" $notify
|
|
exec bin/squid -CYs
|
|
</verb>
|
|
|
|
<sect3>From rc.local
|
|
<P>
|
|
On BSD-ish systems, you will need to start Squid from the ``rc'' files,
|
|
usually <em>/etc/rc.local</em>. For example:
|
|
<verb>
|
|
if [ -f /usr/local/squid/bin/RunCache ]; then
|
|
echo -n ' Squid'
|
|
(/usr/local/squid/bin/RunCache &)
|
|
fi
|
|
</verb>
|
|
|
|
<sect3>From init.d
|
|
<P>
|
|
Some people may want to use the ``init.d'' startup system.
|
|
If you start Squid (or RunCache) from an ``init.d'' script, then you
|
|
should probably use <em/nohup/, e.g.:
|
|
<verb>
|
|
nohup squid -sY $conf >> $logdir/squid.out 2>&1
|
|
</verb>
|
|
Also, you may need to add a line to trap certain signals
|
|
and prevent them from being sent to the Squid process.
|
|
Add this line at the top of your script:
|
|
<verb>
|
|
trap '' 1 2 3 18
|
|
</verb>
|
|
|
|
<sect1>How do I tell if Squid is running?
|
|
<P>
|
|
You can use the <em/client/ program:
|
|
<verb>
|
|
% client http://www.netscape.com/ > test
|
|
</verb>
|
|
<P>
|
|
There are other command-line HTTP client programs available
|
|
as well. Two that you may find useful are
|
|
<url url="ftp://gnjilux.cc.fer.hr/pub/unix/util/wget/"
|
|
name="wget">
|
|
and
|
|
<url url="ftp://ftp.internatif.org/pub/unix/echoping/"
|
|
name="echoping">.
|
|
|
|
<P>
|
|
Another way is to use Squid itself to see if it can signal a running
|
|
Squid process:
|
|
<verb>
|
|
% squid -k check
|
|
</verb>
|
|
And then check the shell's exit status variable.
|
|
|
|
<P>
|
|
Also, check the log files, most importantly the <em/access.log/ and
|
|
<em/cache.log/ files.
|
|
|
|
<sect1><em/squid/ command line options
|
|
<P>
|
|
These are the command line options for <bf/Squid-2/:
|
|
<descrip>
|
|
<tag/-a/
|
|
Specify an alternate port number for incoming HTTP requests.
|
|
Useful for testing a configuration file on a non-standard port.
|
|
<tag/-d/
|
|
Debugging level for ``stderr'' messages. If you use this
|
|
option, then debugging messages up to the specified level will
|
|
also be written to stderr.
|
|
<tag/-f/
|
|
Specify an alternate <em/squid.conf/ file instead of the
|
|
pathname compiled into the executable.
|
|
<tag/-h/
|
|
Prints the usage and help message.
|
|
<tag/-k reconfigure/
|
|
Sends a <em/HUP/ signal, which causes Squid to re-read
|
|
its configuration files.
|
|
<tag/-k rotate/
|
|
Sends an <em/USR1/ signal, which causes Squid to
|
|
rotate its log files. Note, if <em/logfile_rotate/
|
|
is set to zero, Squid still closes and re-opens
|
|
all log files.
|
|
<tag/-k shutdown/
|
|
Sends a <em/TERM/ signal, which causes Squid to
|
|
wait briefly for current connections to finish and then
|
|
exit. The amount of time to wait is specified with
|
|
<em/shutdown_lifetime/.
|
|
<tag/-k interrupt/
|
|
Sends an <em/INT/ signal, which causes Squid to
|
|
shutdown immediately, without waiting for
|
|
current connections.
|
|
<tag/-k kill/
|
|
Sends a <em/KILL/ signal, which causes the Squid
|
|
process to exit immediately, without closing
|
|
any connections or log files. Use this only
|
|
as a last resort.
|
|
<tag/-k debug/
|
|
Sends an <em/USR2/ signal, which causes Squid
|
|
to generate full debugging messages until the
|
|
next <em/USR2/ signal is recieved. Obviously
|
|
very useful for debugging problems.
|
|
<tag/-k check/
|
|
Sends a ``<em/ZERO/'' signal to the Squid process.
|
|
This simply checks whether or not the process
|
|
is actually running.
|
|
<tag/-s/
|
|
Send debugging (level 0 only) message to syslog.
|
|
<tag/-u/
|
|
Specify an alternate port number for ICP messages.
|
|
Useful for testing a configuration file on a non-standard port.
|
|
<tag/-v/
|
|
Prints the Squid version.
|
|
<tag/-z/
|
|
Creates disk swap directories. You must use this option when
|
|
installing Squid for the first time, or when you add or
|
|
modify the <em/cache_dir/ configuration.
|
|
<tag/-D/
|
|
Do not make initial DNS tests. Normally, Squid looks up
|
|
some well-known DNS hostnames to ensure that your DNS
|
|
name resolution service is working properly.
|
|
<tag/-F/
|
|
If the <em/swap.state/ logs are clean, then the cache is
|
|
rebuilt in the ``foreground'' before any requests are
|
|
served. This will decrease the time required to rebuild
|
|
the cache, but HTTP requests will not be satisified during
|
|
this time.
|
|
<tag/-N/
|
|
Do not automatically become a background daemon process.
|
|
<tag/-R/
|
|
Do not set the SO_REUSEADDR option on sockets.
|
|
<tag/-V/
|
|
Enable virtual host support for the httpd-accelerator mode.
|
|
This is identical to writing <em/httpd_accel_host virtual/
|
|
in the config file.
|
|
<tag/-X/
|
|
Enable full debugging while parsing the config file.
|
|
<tag/-Y/
|
|
Return ICP_OP_MISS_NOFETCH instead of ICP_OP_MISS while
|
|
the <em/swap.state/ file is being read. If your cache has
|
|
mostly child caches which use ICP, this will allow your
|
|
cache to rebuild faster.
|
|
</descrip>
|
|
|
|
<sect1>How do I see how Squid works?
|
|
<P>
|
|
<itemize>
|
|
<item>
|
|
Check the <em/cache.log/ file in your logs directory. It logs
|
|
interesting (and boring) things as a part of its normal operation.
|
|
<item>
|
|
Install and use the
|
|
<ref id="cachemgr-section" name="Cache Manager">.
|
|
</itemize>
|
|
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Configuration issues
|
|
|
|
<sect1>How do I join a cache hierarchy?
|
|
<P>
|
|
To place your cache in a hierarchy, use the <tt/cache_host/
|
|
directive in <em/squid.conf/ to specify the parent and sibling
|
|
nodes.
|
|
|
|
<P>
|
|
For example, the following <em/squid.conf/ file on
|
|
<tt/childcache.example.com/ configures its cache to retrieve
|
|
data from one parent cache and two sibling caches:
|
|
|
|
<verb>
|
|
# squid.conf - On the host: childcache.example.com
|
|
#
|
|
# Format is: hostname type http_port udp_port
|
|
#
|
|
cache_host parentcache.example.com parent 3128 3130
|
|
cache_host childcache2.example.com sibling 3128 3130
|
|
cache_host childcache3.example.com sibling 3128 3130
|
|
</verb>
|
|
|
|
The <tt/cache_host_domain/ directive allows you to specify that
|
|
certain caches siblings or parents for certain domains:
|
|
|
|
<verb>
|
|
# squid.conf - On the host: sv.cache.nlanr.net
|
|
#
|
|
# Format is: hostname type http_port udp_port
|
|
#
|
|
|
|
cache_host electraglide.geog.unsw.edu.au parent 3128 3130
|
|
cache_host cache1.nzgate.net.nz parent 3128 3130
|
|
cache_host pb.cache.nlanr.net parent 3128 3130
|
|
cache_host it.cache.nlanr.net parent 3128 3130
|
|
cache_host sd.cache.nlanr.net parent 3128 3130
|
|
cache_host uc.cache.nlanr.net sibling 3128 3130
|
|
cache_host bo.cache.nlanr.net sibling 3128 3130
|
|
cache_host_domain electraglide.geog.unsw.edu.au .au
|
|
cache_host_domain cache1.nzgate.net.nz .au .aq .fj .nz
|
|
cache_host_domain pb.cache.nlanr.net .uk .de .fr .no .se .it
|
|
cache_host_domain it.cache.nlanr.net .uk .de .fr .no .se .it
|
|
cache_host_domain sd.cache.nlanr.net .mx .za .mu .zm
|
|
</verb>
|
|
|
|
The configuration above indicates that the cache will use
|
|
<tt/pb.cache.nlanr.net/ and <tt/it.cache.nlanr.net/
|
|
for domains uk, de, fr, no, se and it, <tt/sd.cache.nlanr.net/
|
|
for domains mx, za, mu and zm, and <tt/cache1.nzgate.net.nz/
|
|
for domains au, aq, fj, and nz.
|
|
|
|
<sect1>How do I join NLANR's cache hierarchy?
|
|
<P>
|
|
We have a simple set of
|
|
<url url="http://www.ircache.net/Cache/joining.html"
|
|
name="guidelines for joining">
|
|
the NLANR cache hierarchy.
|
|
|
|
<sect1>Why should I want to join NLANR's cache hierarchy?
|
|
<P>
|
|
The NLANR hierarchy can provide you with an initial source for parent or
|
|
sibling caches. Joining the NLANR global cache system will frequently
|
|
improve the performance of your caching service.
|
|
|
|
<sect1>How do I register my cache with NLANR's registration service?
|
|
<P>
|
|
Just enable these options in your <em/squid.conf/ and you'll be
|
|
registered:
|
|
<verb>
|
|
cache_announce 24
|
|
announce_to sd.cache.nlanr.net:3131
|
|
</verb>
|
|
|
|
<em/NOTE:/ announcing your cache <bf/is not/ the same thing as
|
|
joining the NLANR cache hierarchy.
|
|
You can join the NLANR cache hierarchy without registering, and
|
|
you can register without joining the NLANR cache hierarchy.
|
|
<P>
|
|
|
|
<sect1>How do I find other caches close to me and arrange parent/child/sibling relationships with them?
|
|
<P>
|
|
Visit the NLANR cache
|
|
<url url="http://www.ircache.net/Cache/Tracker/"
|
|
name="registration database">
|
|
to discover other caches near you. Keep in mind that just because
|
|
a cache is registered in the database <bf/does not/ mean they
|
|
are willing to be your parent/sibling/child. But it can't hurt to ask...
|
|
<P>
|
|
|
|
<sect1>My cache registration is not appearing in the Tracker database.
|
|
|
|
<P>
|
|
<itemize>
|
|
<item>
|
|
Your site will not be listed if your cache IP address does not have
|
|
a DNS PTR record. If we can't map the IP address back to a domain
|
|
name, it will be listed as ``Unknown.''
|
|
<item>
|
|
The registration messages are sent with UDP. We may not be receiving
|
|
your announcement message due to firewalls which block UDP, or
|
|
dropped packets due to congestion.
|
|
</itemize>
|
|
|
|
<sect1>What is the httpd-accelerator mode?
|
|
<P>
|
|
This entry has been moved to <ref id="what-is-httpd-accelerator" name="a different section">.
|
|
|
|
<sect1>How do I configure Squid to work behind a firewall?
|
|
<p>
|
|
<em>Note: The information here is current for version 2.2.</em>
|
|
|
|
<P>
|
|
If you are behind a firewall then you can't make direct connections
|
|
to the outside world, so you <bf/must/ use a
|
|
parent cache. Squid doesn't use ICP queries for a request if it's
|
|
behind a firewall or if there is only one parent.
|
|
|
|
<P>
|
|
You can use the <tt/never_direct/ access list in
|
|
<em/squid.conf/ to specify which requests must be forwarded to
|
|
your parent cache outside the firewall. For example, if Squid
|
|
can connect directly to all servers that end with <em/mydomain.com/, but
|
|
must use the parent for all others, you would write:
|
|
<verb>
|
|
acl INSIDE dstdomain mydomain.com
|
|
never_direct deny INSIDE
|
|
</verb>
|
|
Note that the outside domains will not match the <em/INSIDE/
|
|
acl. When there are no matches, the default action is
|
|
the opposite of the last action. Its as if there is
|
|
an implicit <em/never_direct allow all/ as the final rule.
|
|
|
|
<p>
|
|
You could also specify internal servers by IP address
|
|
<verb>
|
|
acl INSIDE_IP dst 1.2.3.4/24
|
|
never_direct deny INSIDE
|
|
</verb>
|
|
Note, however that when you use IP addresses, Squid must
|
|
perform a DNS lookup to convert URL hostnames to an
|
|
address. Your internal DNS servers may not be able to
|
|
lookup external domains.
|
|
|
|
<p>
|
|
If you use <em/never_direct/ and you have multiple parent caches,
|
|
then you probably will want to mark one of them as a default
|
|
choice in case Squid can't decide which one to use. That is
|
|
done with the <em/default/ keyword on a <em/cache_peer/
|
|
line. For example:
|
|
<verb>
|
|
cache_peer xyz.mydomain.com parent 3128 0 default
|
|
</verb>
|
|
|
|
<sect1>How do I configure Squid forward all requests to another proxy?
|
|
<p>
|
|
<em>Note: The information here is current for version 2.2.</em>
|
|
<p>
|
|
First, you need to give Squid a parent cache. Second, you need
|
|
to tell Squid it can not connect directly to origin servers. This is done
|
|
with three configuration file lines:
|
|
<verb>
|
|
cache_peer parentcache.foo.com parent 3128 0 no-query default
|
|
acl all src 0.0.0.0/0.0.0.0
|
|
never_direct allow all
|
|
</verb>
|
|
Note, with this configuration, if the parent cache fails or becomes
|
|
unreachable, then every request will result in an error message.
|
|
|
|
<p>
|
|
In case you want to be able to use direct connections when all the
|
|
parents go down you should use a different approach:
|
|
<verb>
|
|
cache_peer parentcache.foo.com parent 3128 0 no-query
|
|
prefer_direct off
|
|
</verb>
|
|
The default behaviour of Squid in the absence of positive ICP, HTCP, etc
|
|
replies is to connect to the origin server instead of using parents.
|
|
The <em>prefer_direct off</em> directive tells Squid to try parents first.
|
|
|
|
<sect1>I have <em/dnsserver/ processes that aren't being used, should I lower the number in <em/squid.conf/?
|
|
<P>
|
|
The <em/dnsserver/ processes are used by <em/squid/ because the <tt/gethostbyname(3)/ library routines used to
|
|
convert web sites names to their internet addresses
|
|
blocks until the function returns (i.e., the process that calls
|
|
it has to wait for a reply). Since there is only one <em/squid/
|
|
process, everyone who uses the cache would have to wait each
|
|
time the routine was called. This is why the <em/dnsserver/ is
|
|
a separate process, so that these processes can block,
|
|
without causing blocking in <em/squid/.
|
|
|
|
<P>
|
|
It's very important that there are enough <em/dnsserver/
|
|
processes to cope with every access you will need, otherwise
|
|
<em/squid/ will stop occasionally. A good rule of thumb is to
|
|
make sure you have at least the maximum number of dnsservers
|
|
<em/squid/ has <bf/ever/ needed on your system,
|
|
and probably add two to be on the safe side. In other words, if
|
|
you have only ever seen at most three <em/dnsserver/ processes
|
|
in use, make at least five. Remember that a <em/dnsserver/ is
|
|
small and, if unused, will be swapped out.
|
|
|
|
<sect1>My <em/dnsserver/ average/median service time seems high, how can I reduce it?
|
|
|
|
<P>
|
|
First, find out if you have enough <em/dnsserver/ processes running by
|
|
looking at the Cachemanager <em/dns/ output. Ideally, you should see
|
|
that the first <em/dnsserver/ handles a lot of requests, the second one
|
|
less than the first, etc. The last <em/dnsserver/ should have serviced
|
|
relatively few requests. If there is not an obvious decreasing trend, then
|
|
you need to increase the number of <em/dns_children/ in the configuration
|
|
file. If the last <em/dnsserver/ has zero requests, then you definately
|
|
have enough.
|
|
|
|
<P>
|
|
Another factor which affects the <em/dnsserver/ service time is the
|
|
proximity of your DNS resolver. Normally we do not recommend running
|
|
Squid and <em/named/ on the same host. Instead you should try use a
|
|
DNS resolver (<em/named/) on a different host, but on the same LAN.
|
|
If your DNS traffic must pass through one or more routers, this could
|
|
be causing unnecessary delays.
|
|
|
|
<sect1>How can I easily change the default HTTP port?
|
|
<P>
|
|
Before you run the configure script, simply set the <em/CACHE_HTTP_PORT/
|
|
environment variable.
|
|
<verb>
|
|
setenv CACHE_HTTP_PORT 8080
|
|
./configure
|
|
make
|
|
make install
|
|
</verb>
|
|
|
|
<sect1>Is it possible to control how big each <em/cache_dir/ is?
|
|
|
|
<P>
|
|
With Squid-1.1 it is NOT possible. Each <em/cache_dir/ is assumed
|
|
to be the same size. The <em/cache_swap/ setting defines the size of
|
|
all <em/cache_dir/'s taken together. If you have N <em/cache_dir/'s
|
|
then each one will hold <em/cache_swap/ ÷ N Megabytes.
|
|
|
|
<sect1>What <em/cache_dir/ size should I use?
|
|
<p>
|
|
Most people have a disk partition dedicated to the Squid cache.
|
|
You don't want to use the entire partition size. You have to leave
|
|
some extra room. Currently, Squid is not very tolerant of running
|
|
out of disk space.
|
|
<p>
|
|
Lets say you have a 9GB disk.
|
|
Remember that disk manufacturers lie about the space available.
|
|
A so-called 9GB disk usually results in about 8.5GB of raw, usable space.
|
|
First, put a filesystem on it, and mount
|
|
it. Then check the ``available space'' with your <em/df/ program.
|
|
Note that you lose some disk space to filesystem overheads, like superblocks,
|
|
inodes, and directory entries. Also note that Unix normally keeps
|
|
10% free for itself. So with a 9GB disk, you're probably down to
|
|
about 8GB after formatting.
|
|
|
|
<p>
|
|
Next, I suggest taking off another 10%
|
|
or so for Squid overheads, and a "safe buffer." Squid normally puts
|
|
its <em/swap.state/ files in each cache directory. These grow in size
|
|
until you rotate the logs, or restart squid.
|
|
Also note that Squid performs better when there is
|
|
more free space. So if performance is important to you, then take off
|
|
even more space. Typically, for a 9GB disk, I recommend a <em/cache_dir/
|
|
setting of 6000 to 7500 Megabytes:
|
|
<verb>
|
|
cache_dir ... 7000 16 256
|
|
</verb>
|
|
|
|
<p>
|
|
Its better to start out conservative. After the cache becomes full,
|
|
look at the disk usage. If you think there is plenty of unused space,
|
|
then increase the <em/cache_dir/ setting a little.
|
|
|
|
<p>
|
|
If you're getting ``disk full'' write errors, then you definately need
|
|
to decrease your cache size.
|
|
|
|
<sect1>I'm adding a new <em/cache_dir/. Will I lose my cache?
|
|
<P>
|
|
With Squid-1.1, yes, you will lose your cache. This is because
|
|
version 1.1 uses a simplistic algorithm to distribute files
|
|
between cache directories.
|
|
|
|
<P>
|
|
With Squid-2, you will not lose your existing cache.
|
|
You can add and delete <em/cache_dir/'s without affecting
|
|
any of the others.
|
|
|
|
<sect1>Squid and <em/http-gw/ from the TIS toolkit.
|
|
|
|
<P>
|
|
Several people on both the <em/fwtk-users/ and the
|
|
<em/squid-users/ mailing asked
|
|
about using Squid in combination with http-gw from the
|
|
<url url="http://www.tis.com/"
|
|
name="TIS toolkit">.
|
|
The most elegant way in my opinion is to run an internal Squid caching
|
|
proxyserver which handles client requests and let this server forward
|
|
it's requests to the http-gw running on the firewall. Cache hits won't
|
|
need to be handled by the firewall.
|
|
|
|
<P>
|
|
In this example Squid runs on the same server as the http-gw, Squid uses
|
|
8000 and http-gw uses 8080 (web). The local domain is <em/home.nl/.
|
|
|
|
<sect2>Firewall configuration:
|
|
|
|
<P>
|
|
Either run http-gw as a daemon from the <em>/etc/rc.d/rc.local</em> (Linux
|
|
Slackware):
|
|
<verb>
|
|
exec /usr/local/fwtk/http-gw -daemon 8080
|
|
</verb>
|
|
or run it from inetd like this:
|
|
<verb>
|
|
web stream tcp nowait.100 root /usr/local/fwtk/http-gw http-gw
|
|
</verb>
|
|
I increased the watermark to 100 because a lot of people run into
|
|
problems with the default value.
|
|
|
|
<P>
|
|
Make sure you have at least the following line in
|
|
<em>/usr/local/etc/netperm-table</em>:
|
|
<verb>
|
|
http-gw: hosts 127.0.0.1
|
|
</verb>
|
|
You could add the IP-address of your own workstation to this rule and
|
|
make sure the http-gw by itself works, like:
|
|
<verb>
|
|
http-gw: hosts 127.0.0.1 10.0.0.1
|
|
</verb>
|
|
|
|
<sect2>Squid configuration:
|
|
|
|
<P>
|
|
The following settings are important:
|
|
|
|
<verb>
|
|
http_port 8000
|
|
icp_port 0
|
|
|
|
cache_host localhost.home.nl parent 8080 0 default
|
|
acl HOME dstdomain .home.nl
|
|
never_direct deny HOME
|
|
</verb>
|
|
This tells Squid to use the parent for all domains other than <em/home.nl/.
|
|
Below, <em/access.log/ entries show what happens if you do a reload on the
|
|
Squid-homepage:
|
|
|
|
<verb>
|
|
872739961.631 1566 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/ - DEFAULT_PARENT/localhost.home.nl -
|
|
872739962.976 1266 10.0.0.21 TCP_CLIENT_REFRESH/304 88 GET http://www.nlanr.net/Images/cache_now.gif - DEFAULT_PARENT/localhost.home.nl -
|
|
872739963.007 1299 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/Icons/squidnow.gif - DEFAULT_PARENT/localhost.home.nl -
|
|
872739963.061 1354 10.0.0.21 TCP_CLIENT_REFRESH/304 83 GET http://www.squid-cache.org/Icons/Squidlogo2.gif - DEFAULT_PARENT/localhost.home.nl
|
|
</verb>
|
|
|
|
<P>
|
|
http-gw entries in syslog:
|
|
|
|
<verb>
|
|
Aug 28 02:46:00 memo http-gw[2052]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
|
|
Aug 28 02:46:00 memo http-gw[2052]: log host=localhost/127.0.0.1 protocol=HTTP cmd=dir dest=www.squid-cache.org path=/
|
|
Aug 28 02:46:01 memo http-gw[2052]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1
|
|
Aug 28 02:46:01 memo http-gw[2053]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
|
|
Aug 28 02:46:01 memo http-gw[2053]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/Squidlogo2.gif
|
|
Aug 28 02:46:01 memo http-gw[2054]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
|
|
Aug 28 02:46:01 memo http-gw[2054]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/squidnow.gif
|
|
Aug 28 02:46:01 memo http-gw[2055]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
|
|
Aug 28 02:46:01 memo http-gw[2055]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.nlanr.net path=/Images/cache_now.gif
|
|
Aug 28 02:46:02 memo http-gw[2055]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1
|
|
Aug 28 02:46:03 memo http-gw[2053]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=2
|
|
Aug 28 02:46:04 memo http-gw[2054]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=3
|
|
</verb>
|
|
|
|
|
|
<P>
|
|
To summarize:
|
|
|
|
<P>
|
|
Advantages:
|
|
<itemize>
|
|
<item>
|
|
http-gw allows you to selectively block ActiveX and Java, and it's
|
|
primary design goal is security.
|
|
<item>
|
|
The firewall doesn't need to run large applications like Squid.
|
|
<item>
|
|
The internal Squid-server still gives you the benefit of caching.
|
|
</itemize>
|
|
|
|
<P>
|
|
Disadvantages:
|
|
<itemize>
|
|
<item>
|
|
The internal Squid proxyserver can't (and shouldn't) work with other
|
|
parent or neighbor caches.
|
|
<item>
|
|
Initial requests are slower because these go through http-gw, http-gw
|
|
also does reverse lookups. Run a nameserver on the firewall or use an
|
|
internal nameserver.
|
|
</itemize>
|
|
|
|
|
|
<quote>
|
|
--<url url="mailto:RvdOever@baan.nl" name="Rodney van den Oever">
|
|
</quote>
|
|
|
|
|
|
<sect1>What is ``HTTP_X_FORWARDED_FOR''? Why does squid provide it to WWW servers, and how can I stop it?
|
|
|
|
<P>
|
|
When a proxy-cache is used, a server does not see the connection
|
|
coming from the originating client. Many people like to implement
|
|
access controls based on the client address.
|
|
To accommodate these people, Squid adds its own request header
|
|
called "X-Forwarded-For" which looks like this:
|
|
<verb>
|
|
X-Forwarded-For: 128.138.243.150, unknown, 192.52.106.30
|
|
</verb>
|
|
Entries are always IP addresses, or the word <em/unknown/ if the address
|
|
could not be determined or if it has been disabled with the
|
|
<em/forwarded_for/ configuration option.
|
|
|
|
<P>
|
|
We must note that access controls based on this header are extremely
|
|
weak and simple to fake. Anyone may hand-enter a request with any IP
|
|
address whatsoever. This is perhaps the reason why client IP addresses
|
|
have been omitted from the HTTP/1.1 specification.
|
|
|
|
<sect1>Can Squid anonymize HTTP requests?
|
|
<p>
|
|
Yes it can, however the way of doing it has changed from earlier versions
|
|
of squid. As of squid-2.2 a more customisable method has been introduced.
|
|
Please follow the instructions for the version of squid that you are using.
|
|
As a default, no anonymizing is done.
|
|
|
|
<p>
|
|
If you choose to use the anonymizer you might wish to investigate the forwarded_for
|
|
option to prevent the client address being disclosed. Failure to turn off the
|
|
forwarded_for option will reduce the effectiveness of the anonymizer. Finally
|
|
if you filter the User-Agent header using the fake_user_agent option can
|
|
prevent some user problems as some sites require the User-Agent header.
|
|
|
|
<sect2>Squid 2.2
|
|
<p>
|
|
With the introduction of squid 2.2 the anonoymizer has become more customisable.
|
|
It now allows specification of exactly which headers will be allowed to pass.
|
|
|
|
The new anonymizer uses the 'anonymize_headers' tag. It has two modes 'deny' all
|
|
and allow the specified headers. The following example will simulate the old
|
|
paranoid mode.
|
|
|
|
<verb>
|
|
anonymize_headers allow Allow Authorization Cache-Control
|
|
anonymize_headers allow Content-Encoding Content-Length
|
|
anonymize_headers allow Content-Type Date Expires Host
|
|
anonymize_headers allow If-Modified-Since Last-Modified
|
|
anonymize_headers allow Location Pragma Accept Charset
|
|
anonymize_headers allow Accept-Encoding Accept-Language
|
|
anonymize_headers allow Content-Language Mime-Version
|
|
anonymize_headers allow Retry-After Title Connection
|
|
anonymize_headers allow Proxy-Connection
|
|
</verb>
|
|
|
|
This will prevent any headers other than those listed from being passed by the
|
|
proxy.
|
|
|
|
<p>
|
|
The second mode is 'allow' all and deny the specified headers. The example
|
|
replicates the old standard mode.
|
|
|
|
<verb>
|
|
anonymize_headers deny From Referer Server
|
|
anonymize_headers deny User-Agent WWW-Authenticate Link
|
|
</verb>
|
|
|
|
It allows all headers to pass unless they are listed.
|
|
|
|
<p>
|
|
You can not mix allow and deny in a squid configuration it is either one
|
|
or the other!
|
|
|
|
<sect2>Squid 2.1 and Earlier
|
|
<P>
|
|
There are three modes: <em/none/, <em/standard/, and
|
|
<em/paranoid/. The mode is set with the <em>http_anonymizer</em>
|
|
configuration option.
|
|
<P>
|
|
With no anonymizing (the default), Squid forwards all request headers
|
|
as received from the client, to the origin server (subject to the regular
|
|
rules of HTTP).
|
|
<P>
|
|
In the <em/standard/ mode, Squid filters out the following specific request
|
|
headers:
|
|
<itemize>
|
|
<item>From:
|
|
<item>Referer:
|
|
<item>Server:
|
|
<item>User-Agent:
|
|
<item>WWW-Authenticate:
|
|
<item>Link:
|
|
</itemize>
|
|
|
|
<P>
|
|
In the <em/paranoid/ mode, Squid allows only the following specific
|
|
request headers:
|
|
<itemize>
|
|
<item>Allow:
|
|
<item>Authorization:
|
|
<item>Cache-Control:
|
|
<item>Content-Encoding:
|
|
<item>Content-Length:
|
|
<item>Content-Type:
|
|
<item>Date:
|
|
<item>Expires:
|
|
<item>Host:
|
|
<item>If-Modified-Since:
|
|
<item>Last-Modified:
|
|
<item>Location:
|
|
<item>Pragma:
|
|
<item>Accept:
|
|
<item>Accept-Charset:
|
|
<item>Accept-Encoding:
|
|
<item>Accept-Language:
|
|
<item>Content-Language:
|
|
<item>Mime-Version:
|
|
<item>Retry-After:
|
|
<item>Title:
|
|
<item>Connection:
|
|
<item>Proxy-Connection:
|
|
</itemize>
|
|
|
|
<P>
|
|
References:
|
|
<url url="http://www.iks-jena.de/mitarb/lutz/anon/web.en.html"
|
|
name="Anonymous WWW">
|
|
|
|
|
|
<sect1>Can I make Squid go direct for some sites?
|
|
<p>
|
|
Sure, just use the <em/always_direct/ access list.
|
|
<p>
|
|
For example, if you want Squid to connect directly to <em/hotmail.com/ servers,
|
|
you can use these lines in your config file:
|
|
<verb>
|
|
acl hotmail dstdomain .hotmail.com
|
|
always_direct allow hotmail
|
|
</verb>
|
|
|
|
<sect1>Can I make Squid proxy only, without caching anything?
|
|
<p>
|
|
Sure, there are few things you can do.
|
|
<p>
|
|
You can use the <em/no_cache/ access list to make Squid never cache any response:
|
|
<verb>
|
|
acl all src 0/0
|
|
no_cache deny all
|
|
</verb>
|
|
<p>
|
|
With Squid-2.4 and later you can use the ``null'' storage module:
|
|
<verb>
|
|
cache_dir null -1 1000
|
|
</verb>
|
|
|
|
<sect1>Can I prevent users from downloading large files?
|
|
<p>
|
|
You can set the global <em/reply_body_max_size/ parameter. This option
|
|
controls the largest HTTP message body that will be sent to a cache
|
|
client for one request.
|
|
<p>
|
|
If the HTTP response coming from the server has a <tt/Content-length/
|
|
header, then Squid compares the content-length value to the
|
|
<em/reply_body_max_size/ value. If the content-length is larger,
|
|
the server connection is closed and the user receives an error
|
|
message from Squid.
|
|
<p>
|
|
Some responses don't have <tt/Content-length/
|
|
headers. In this case, Squid counts how many bytes are written
|
|
to the client. Once the limit is reached, the client's connection
|
|
is simply closed.
|
|
<p>
|
|
Note that ``creative'' user-agents will still be able to download
|
|
really large files through the cache using HTTP/1.1 range requests.
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Communication between browsers and Squid
|
|
|
|
<P>
|
|
Most web browsers available today support proxying and are easily configured
|
|
to use a Squid server as a proxy. Some browsers support advanced features
|
|
such as lists of domains or URL patterns that shouldn't be fetched through
|
|
the proxy, or JavaScript automatic proxy configuration.
|
|
|
|
<sect1>Netscape manual configuration
|
|
<P>
|
|
Select <bf/Network Preferences/ from the
|
|
<bf/Options/ menu. On the <bf/Proxies/
|
|
page, click the radio button next to <bf/Manual Proxy
|
|
Configuration/ and then click on the <bf/View/
|
|
button. For each protocol that your Squid server supports (by default,
|
|
HTTP, FTP, and gopher) enter the Squid server's hostname or IP address
|
|
and put the HTTP port number for the Squid server (by default, 3128) in
|
|
the <bf/Port/ column. For any protocols that your Squid
|
|
does not support, leave the fields blank.
|
|
<P>
|
|
Here is a
|
|
<url url="/Doc/FAQ/navigator.jpg"
|
|
name="screen shot"> of the Netscape Navigator manual proxy
|
|
configuration screen.
|
|
<P>
|
|
|
|
<sect1>Netscape automatic configuration
|
|
<label id="netscape-pac">
|
|
<P>
|
|
Netscape Navigator's proxy configuration can be automated with
|
|
JavaScript (for Navigator versions 2.0 or higher). Select
|
|
<bf/Network Preferences/ from the <bf/Options/
|
|
menu. On the <bf/Proxies/ page, click the radio button
|
|
next to <bf/Automatic Proxy Configuration/ and then
|
|
fill in the URL for your JavaScript proxy configuration file in the
|
|
text box. The box is too small, but the text will scroll to the
|
|
right as you go.
|
|
<P>
|
|
Here is a
|
|
<url url="/Doc/FAQ/navigator-auto.jpg"
|
|
name="screen shot">
|
|
of the Netscape Navigator automatic proxy configuration screen.
|
|
|
|
You may also wish to consult Netscape's documentation for the Navigator
|
|
<url
|
|
url="http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy-live.html"
|
|
name="JavaScript proxy configuration">
|
|
|
|
<P>
|
|
Here is a sample auto configuration JavaScript from Oskar Pearson:
|
|
<code>
|
|
//We (www.is.co.za) run a central cache for our customers that they
|
|
//access through a firewall - thus if they want to connect to their intranet
|
|
//system (or anything in their domain at all) they have to connect
|
|
//directly - hence all the "fiddling" to see if they are trying to connect
|
|
//to their local domain.
|
|
|
|
//Replace each occurrence of company.com with your domain name
|
|
//and if you have some kind of intranet system, make sure
|
|
//that you put it's name in place of "internal" below.
|
|
|
|
//We also assume that your cache is called "cache.company.com", and
|
|
//that it runs on port 8080. Change it down at the bottom.
|
|
|
|
//(C) Oskar Pearson and the Internet Solution (http://www.is.co.za)
|
|
|
|
function FindProxyForURL(url, host)
|
|
{
|
|
//If they have only specified a hostname, go directly.
|
|
if (isPlainHostName(host))
|
|
return "DIRECT";
|
|
|
|
//These connect directly if the machine they are trying to
|
|
//connect to starts with "intranet" - ie http://intranet
|
|
//Connect directly if it is intranet.*
|
|
//If you have another machine that you want them to
|
|
//access directly, replace "internal*" with that
|
|
//machine's name
|
|
if (shExpMatch( host, "intranet*")||
|
|
shExpMatch(host, "internal*"))
|
|
return "DIRECT";
|
|
|
|
//Connect directly to our domains (NB for Important News)
|
|
if (dnsDomainIs( host,"company.com")||
|
|
//If you have another domain that you wish to connect to
|
|
//directly, put it in here
|
|
dnsDomainIs(host,"sistercompany.com"))
|
|
return "DIRECT";
|
|
|
|
//So the error message "no such host" will appear through the
|
|
//normal Netscape box - less support queries :)
|
|
if (!isResolvable(host))
|
|
return "DIRECT";
|
|
|
|
//We only cache http, ftp and gopher
|
|
if (url.substring(0, 5) == "http:" ||
|
|
url.substring(0, 4) == "ftp:"||
|
|
url.substring(0, 7) == "gopher:")
|
|
|
|
//Change the ":8080" to the port that your cache
|
|
//runs on, and "cache.company.com" to the machine that
|
|
//you run the cache on
|
|
return "PROXY cache.company.com:8080; DIRECT";
|
|
|
|
//We don't cache WAIS
|
|
if (url.substring(0, 5) == "wais:")
|
|
return "DIRECT";
|
|
|
|
else
|
|
return "DIRECT";
|
|
}
|
|
</code>
|
|
|
|
<sect1>Lynx and Mosaic configuration
|
|
<P>
|
|
For Mosaic and Lynx, you can set environment variables
|
|
before starting the application. For example (assuming csh or tcsh):
|
|
<P>
|
|
<verb>
|
|
% setenv http_proxy http://mycache.example.com:3128/
|
|
% setenv gopher_proxy http://mycache.example.com:3128/
|
|
% setenv ftp_proxy http://mycache.example.com:3128/
|
|
</verb>
|
|
<P>
|
|
For Lynx you can also edit the <em/lynx.cfg/ file to configure
|
|
proxy usage. This has the added benefit of causing all Lynx users on
|
|
a system to access the proxy without making environment variable changes
|
|
for each user. For example:
|
|
<verb>
|
|
http_proxy:http://mycache.example.com:3128/
|
|
ftp_proxy:http://mycache.example.com:3128/
|
|
gopher_proxy:http://mycache.example.com:3128/
|
|
</verb>
|
|
|
|
<sect1>Redundant Auto-Proxy-Configuration
|
|
|
|
<P>
|
|
There's one nasty side-effect to using auto-proxy scripts: if you start
|
|
the web browser it will try and load the auto-proxy-script.
|
|
|
|
<P>
|
|
If your script isn't available either because the web server hosting the
|
|
script is down or your workstation can't reach the web server (e.g.
|
|
because you're working off-line with your notebook and just want to
|
|
read a previously saved HTML-file) you'll get different errors depending
|
|
on the browser you use.
|
|
|
|
<P>
|
|
The Netscape browser will just return an error after a timeout (after
|
|
that it tries to find the site 'www.proxy.com' if the script you use is
|
|
called 'proxy.pac').
|
|
|
|
<P>
|
|
The Microsoft Internet Explorer on the other hand won't even start, no
|
|
window displays, only after about 1 minute it'll display a window asking
|
|
you to go on with/without proxy configuration.
|
|
|
|
<P>
|
|
The point is that your workstations always need to locate the
|
|
proxy-script. I created some extra redundancy by hosting the script on
|
|
two web servers (actually Apache web servers on the proxy servers
|
|
themselves) and adding the following records to my primary nameserver:
|
|
<verb>
|
|
proxy CNAME proxy1
|
|
CNAME proxy2
|
|
</verb>
|
|
The clients just refer to 'http://proxy/proxy.pac'. This script looks like this:
|
|
|
|
<verb>
|
|
function FindProxyForURL(url,host)
|
|
{
|
|
// Hostname without domainname or host within our own domain?
|
|
// Try them directly:
|
|
// http://www.domain.com actually lives before the firewall, so
|
|
// make an exception:
|
|
if ((isPlainHostName(host)||dnsDomainIs( host,".domain.com")) &&
|
|
!localHostOrDomainIs(host, "www.domain.com"))
|
|
return "DIRECT";
|
|
|
|
// First try proxy1 then proxy2. One server mostly caches '.com'
|
|
// to make sure both servers are not
|
|
// caching the same data in the normal situation. The other
|
|
// server caches the other domains normally.
|
|
// If one of 'm is down the client will try the other server.
|
|
else if (shExpMatch(host, "*.com"))
|
|
return "PROXY proxy1.domain.com:8080; PROXY proxy2.domain.com:8081; DIRECT";
|
|
return "PROXY proxy2.domain.com:8081; PROXY proxy1.domain.com:8080; DIRECT";
|
|
}
|
|
</verb>
|
|
|
|
<P>
|
|
I made sure every client domain has the appropriate 'proxy' entry.
|
|
The clients are automatically configured with two nameservers using
|
|
DHCP.
|
|
|
|
<quote>
|
|
--<url url="mailto:RvdOever@baan.nl"
|
|
name="Rodney van den Oever">
|
|
</quote>
|
|
|
|
|
|
<sect1>Microsoft Internet Explorer configuration
|
|
<P>
|
|
Select <bf/Options/ from the <bf/View/
|
|
menu. Click on the <bf/Connection/ tab. Tick the
|
|
<bf/Connect through Proxy Server/ option and hit the
|
|
<bf/Proxy Settings/ button. For each protocol that
|
|
your Squid server supports (by default, HTTP, FTP, and gopher)
|
|
enter the Squid server's hostname or IP address and put the HTTP
|
|
port number for the Squid server (by default, 3128) in the
|
|
<bf/Port/ column. For any protocols that your Squid
|
|
does not support, leave the fields blank.
|
|
<P>
|
|
Here is a
|
|
<url url="/Doc/FAQ/msie.jpg"
|
|
name="screen shot"> of the Internet Explorer proxy
|
|
configuration screen.
|
|
<P>
|
|
Microsoft is also starting to support Netscape-style JavaScript
|
|
automated proxy configuration. As of now, only MSIE version 3.0a
|
|
for Windows 3.1 and Windows NT 3.51 supports this feature (i.e.,
|
|
as of version 3.01 build 1225 for Windows 95 and NT 4.0, the feature
|
|
was not included).
|
|
<P>
|
|
If you have a version of MSIE that does have this feature, elect
|
|
<bf/Options/ from the <bf/View/ menu.
|
|
Click on the <bf/Advanced/ tab. In the lower left-hand
|
|
corner, click on the <bf/Automatic Configuration/
|
|
button. Fill in the URL for your JavaScript file in the dialog
|
|
box it presents you. Then exit MSIE and restart it for the changes
|
|
to take effect. MSIE will reload the JavaScript file every time
|
|
it starts.
|
|
|
|
<sect1>Netmanage Internet Chameleon WebSurfer configuration
|
|
<P>
|
|
Netmanage WebSurfer supports manual proxy configuration and exclusion
|
|
lists for hosts or domains that should not be fetched via proxy
|
|
(this information is current as of WebSurfer 5.0). Select
|
|
<bf/Preferences/ from the <bf/Settings/
|
|
menu. Click on the <bf/Proxies/ tab. Select the
|
|
<bf/Use Proxy/ options for HTTP, FTP, and gopher. For
|
|
each protocol that enter the Squid server's hostname or IP address
|
|
and put the HTTP port number for the Squid server (by default,
|
|
3128) in the <bf/Port/ boxes. For any protocols that
|
|
your Squid does not support, leave the fields blank.
|
|
<P>
|
|
Take a look at this
|
|
<url url="/Doc/FAQ/netmanage.jpg"
|
|
name="screen shot">
|
|
if the instructions confused you.
|
|
<P>
|
|
On the same configuration window, you'll find a button to bring up
|
|
the exclusion list dialog box, which will let you enter some hosts
|
|
or domains that you don't want fetched via proxy. It should be
|
|
self-explanatory, but you might look at this
|
|
<url url="/Doc/FAQ/netmanage-exclusion.jpg"
|
|
name="screen shot">
|
|
just for fun anyway.
|
|
|
|
<sect1>Opera 2.12 proxy configuration
|
|
|
|
<P>
|
|
Select <em/Proxy Servers.../ from the <em/Preferences/ menu. Check each
|
|
protocol that your Squid server supports (by default, HTTP, FTP, and
|
|
Gopher) and enter the Squid server's address as hostname:port (e.g.
|
|
mycache.example.com:3128 or 123.45.67.89:3128). Click on <em/Okay/ to accept the
|
|
setup.
|
|
|
|
<P>
|
|
Notes:
|
|
<itemize>
|
|
<item>
|
|
Opera 2.12 doesn't support gopher on its own, but requires a proxy; therefore
|
|
Squid's gopher proxying can extend the utility of your Opera immensely.
|
|
<item>
|
|
Unfortunately, Opera 2.12 chokes on some HTTP requests, for example
|
|
<url url="http://spam.abuse.net/spam/"
|
|
name="abuse.net">.
|
|
At the moment I think it has something to do with cookies. If you have
|
|
trouble with a site, try disabling the HTTP proxying by unchecking
|
|
that protocol in the <em/Preferences/|<em/Proxy Servers.../ dialogue. Opera will
|
|
remember the address, so reenabling is easy.
|
|
</itemize>
|
|
|
|
<quote>
|
|
--<url url="mailto:hclsmith@tallships.istar.ca" name="Hume Smith">
|
|
</quote>
|
|
|
|
<sect1>How do I tell Squid to use a specific username for FTP urls?
|
|
|
|
<P>
|
|
Insert your username in the host part of the URL, for example:
|
|
<verb>
|
|
ftp://joecool@ftp.foo.org/
|
|
</verb>
|
|
Squid should then prompt you for your account password. Alternatively,
|
|
you can specify both your username and password in the URL itself:
|
|
<verb>
|
|
ftp://joecool:secret@ftp.foo.org/
|
|
</verb>
|
|
However, we certainly do not recommend this, as it could be very
|
|
easy for someone to see or grab your password.
|
|
|
|
<sect1>Configuring Browsers for WPAD
|
|
<P>
|
|
by <url url="mailto:mark@rts.com.au" name="Mark Reynolds">
|
|
<P>
|
|
You may like to start by reading the
|
|
<url url="http://www.ietf.org/internet-drafts/draft-ietf-wrec-wpad-01.txt" name="Internet-Draft">
|
|
that describes WPAD.
|
|
|
|
<P>
|
|
After reading the 8 steps below, if you don't understand any of the
|
|
terms or methods mentioned, you probably shouldn't be doing this.
|
|
Implementing wpad requires you to <bf/fully/ understand:
|
|
<enum>
|
|
<item> web server installations and modifications.
|
|
<item>squid proxy server (or others) installation etc.
|
|
<item>Domain Name System maintenance etc.
|
|
</enum>
|
|
Please don't bombard the squid list with web server or dns questions. See
|
|
your system administrator, or do some more research on those topics.
|
|
|
|
<P>
|
|
This is not a recommendation for any product or version. As far as I
|
|
know IE5 is the only browser out now implementing wpad. I think wpad
|
|
is an excellent feature that will return several hours of life per month.
|
|
Hopefully, all browser clients will implement it as well. But it will take
|
|
years for all the older browsers to fade away though.
|
|
|
|
<P>
|
|
I have only focused on the domain name method, to the exclusion of the
|
|
DHCP method. I think the dns method might be easier for most people.
|
|
I don't currently, and may never, fully understand wpad and IE5, but this
|
|
method worked for me. It <bf/may/ work for you.
|
|
|
|
<P>
|
|
But if you'd rather just have a go ...
|
|
<enum>
|
|
<item>
|
|
Create a standard <ref id="netscape-pac" name="netscape auto
|
|
proxy config file">. The sample provided there is more than
|
|
adequate to get you going. No doubt all the other load balancing
|
|
and backup scripts will be fine also.
|
|
|
|
<item>
|
|
Store the resultant file in the document root directory of a
|
|
handy web server as <em/wpad.dat/ (Not <em/proxy.pac/ as you
|
|
may have previously done.)
|
|
|
|
<p>
|
|
<url url="mailto:ira at racoon.riga.lv" name="Andrei Ivanov">
|
|
notes that you should be able to use an HTTP redirect if you
|
|
want to store the wpad.dat file somewhere else. You can probably
|
|
even redirect <em/wpad.dat/ to <em/proxy.pac/:
|
|
<verb>
|
|
Redirect /wpad.dat http://racoon.riga.lv/proxy.pac
|
|
</verb>
|
|
|
|
<item>
|
|
If you do nothing more, a url like
|
|
<tt>http://www.your.domain.name/wpad.dat</tt> should bring up
|
|
the script text in your browser window.
|
|
|
|
<item>
|
|
Insert the following entry into your web server <em/mime.types/ file.
|
|
Maybe in addition to your pac file type, if you've done this before.
|
|
<verb>
|
|
application/x-ns-proxy-autoconfig dat
|
|
</verb>
|
|
And then restart your web server, for new mime type to work.
|
|
|
|
<item>
|
|
Assuming Internet Explorer 5, under <em/Tools/, <em/Internet
|
|
Options/, <em/Connections/, <em/Settings/ <bf/or/ <em/Lan
|
|
Settings/, set <bf/ONLY/ <em/Use Automatic Configuration Script/
|
|
to be the URL for where your new <em/wpad.dat/ file can be found.
|
|
i.e. <tt>http://www.your.domain.name/wpad.dat</tt> Test that
|
|
that all works as per your script and network. There's no point
|
|
continuing until this works ...
|
|
|
|
<item>
|
|
Create/install/implement a DNS record so that
|
|
<tt>wpad.your.domain.name</tt> resolves to the host above where
|
|
you have a functioning auto config script running. You should
|
|
now be able to use <tt>http://wpad.your.domain.name/wpad.dat</tt>
|
|
as the Auto Config Script location in step 5 above.
|
|
|
|
<item>
|
|
And finally, go back to the setup screen detailed in 5 above,
|
|
and choose nothing but the <em/Automatically Detect Settings/
|
|
option, turning everything else off. Best to restart IE5, as
|
|
you normally do with any Microsoft product... And it should all
|
|
work. Did for me anyway.
|
|
|
|
<item>
|
|
One final question might be 'Which domain name does the client
|
|
(IE5) use for the wpad... lookup?' It uses the hostname from
|
|
the control panel setting. It starts the search by adding the
|
|
hostname "WPAD" to current fully-qualified domain name. For
|
|
instance, a client in a.b.Microsoft.com would search for a WPAD
|
|
server at wpad.a.b.microsoft.com. If it could not locate one,
|
|
it would remove the bottom-most domain and try again; for
|
|
instance, it would try wpad.b.microsoft.com next. IE 5 would
|
|
stop searching when it found a WPAD server or reached the
|
|
third-level domain, wpad.microsoft.com.
|
|
|
|
|
|
</enum>
|
|
|
|
<P>
|
|
Anybody using these steps to install and test, please feel free to make
|
|
notes, corrections or additions for improvements, and post back to the
|
|
squid list...
|
|
|
|
<P>
|
|
There are probably many more tricks and tips which hopefully will be
|
|
detailed here in the future. Things like <em/wpad.dat/ files being served
|
|
from the proxy server themselves, maybe with a round robin dns setup
|
|
for the WPAD host.
|
|
|
|
<sect1>IE 5.0x crops trailing slashes from FTP URL's
|
|
<p>
|
|
by <url url="mailto:reuben at reub dot net" name="Reuben Farrelly">
|
|
<p>
|
|
There was a bug in the 5.0x releases of Internet Explorer in which IE
|
|
cropped any trailing slash off an FTP URL. The URL showed up correctly in
|
|
the browser's ``Address:'' field, however squid logs show that the trailing
|
|
slash was being taken off.
|
|
<p>
|
|
An example of where this impacted squid if you had a setup where squid
|
|
would go direct for FTP directory listings but forward a request to a
|
|
parent for FTP file transfers. This was useful if your upstream proxy was
|
|
an older version of Squid or another vendors software which displayed
|
|
directory listings with broken icons and you wanted your own local version
|
|
of squid to generate proper FTP directory listings instead.
|
|
The workaround for this is to add a double slash to any directory listing
|
|
in which the slash was important, or else upgrade to IE 5.5. (Or use Netscape)
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Squid Log Files
|
|
|
|
<P>
|
|
The logs are a valuable source of information about Squid workloads and
|
|
performance. The logs record not only access information, but also system
|
|
configuration errors and resource consumption (eg, memory, disk
|
|
space). There are several log file maintained by Squid. Some have to be
|
|
explicitely activated during compile time, others can safely be deactivated
|
|
during run-time.
|
|
|
|
<P>
|
|
There are a few basic points common to all log files. The time stamps
|
|
logged into the log files are usually UTC seconds unless stated otherwise.
|
|
The initial time stamp usually contains a millisecond extension.
|
|
|
|
<P>
|
|
The frequent time lookups on busy caches may have a performance impact on
|
|
some systems. The compile time configuration option
|
|
<em/--enable-time-hack/ makes Squid only look up a new time in one
|
|
second intervals. The implementation uses Unix's <em/alarm()/
|
|
functionality. Note that the resolution of logged times is much coarser
|
|
afterwards, and may not suffice for some log file analysis programs.
|
|
Usually there is no need to fiddle with the timestamp hack.
|
|
|
|
<sect1><em/squid.out/
|
|
|
|
<P>
|
|
If you run your Squid from the <em/RunCache/ script, a file
|
|
<em/squid.out/ contains the Squid startup times, and also all fatal
|
|
errors, e.g. as produced by an <em/assert()/ failure. If you are not
|
|
using <em/RunCache/, you will not see such a file.
|
|
|
|
<sect1><em/cache.log/
|
|
|
|
<P>
|
|
The <em/cache.log/ file contains the debug and error messages that Squid
|
|
generates. If you start your Squid using the default <em/RunCache/ script,
|
|
or start it with the <em/-s/ command line option, a copy of certain
|
|
messages will go into your syslog facilities. It is a matter of personal
|
|
preferences to use a separate file for the squid log data.
|
|
|
|
<P>
|
|
From the area of automatic log file analysis, the <em/cache.log/ file does
|
|
not have much to offer. You will usually look into this file for automated
|
|
error reports, when programming Squid, testing new features, or searching
|
|
for reasons of a perceived misbehaviour, etc.
|
|
|
|
|
|
<sect1><em/useragent.log/
|
|
|
|
<P>
|
|
The user agent log file is only maintained, if
|
|
|
|
<enum>
|
|
<item>you configured the compile time <em/--enable-useragent-log/
|
|
option, and
|
|
<item>you pointed the <em/useragent_log/ configuration option to a
|
|
file.
|
|
</enum>
|
|
|
|
<P>
|
|
From the user agent log file you are able to find out about distributation
|
|
of browsers of your clients. Using this option in conjunction with a loaded
|
|
production squid might not be the best of all ideas.
|
|
|
|
<sect1><em/store.log/
|
|
|
|
<P>
|
|
The <em/store.log/ file covers the objects currently kept on disk or
|
|
removed ones. As a kind of transaction log it is ususally used for
|
|
debugging purposes. A definitive statement, whether an object resides on
|
|
your disks is only possible after analysing the <em/complete/ log file.
|
|
The release (deletion) of an object may be logged at a later time than the
|
|
swap out (save to disk).
|
|
|
|
<P>
|
|
The <em/store.log/ file may be of interest to log file analysis which
|
|
looks into the objects on your disks and the time they spend there, or how
|
|
many times a hot object was accessed. The latter may be covered by another
|
|
log file, too. With knowledge of the <em/cache_dir/ configuration option,
|
|
this log file allows for a URL to filename mapping without recursing your
|
|
cache disks. However, the Squid developers recommend to treat
|
|
<em/store.log/ primarily as a debug file, and so should you, unless you
|
|
know what you are doing.
|
|
|
|
<P>
|
|
The print format for a store log entry (one line) consists of eleven
|
|
space-separated columns, compare with the <em/storeLog()/ function in file
|
|
<em>src/store_log.c</em>:
|
|
|
|
<verb>
|
|
"%9d.%03d %-7s %08X %4d %9d %9d %9d %s %d/%d %s %s\n"
|
|
</verb>
|
|
|
|
<descrip>
|
|
<tag/time/
|
|
<P>
|
|
The timestamp when the line was logged in UTC with a millisecond fraction.
|
|
|
|
<tag/action/
|
|
<P>
|
|
The action the object was sumitted to, compare with <em>src/store_log.c</em>:
|
|
|
|
<itemize>
|
|
<item><bf/CREATE/ Seems to be unused.
|
|
<item><bf/RELEASE/ The object was removed from the cache (see also
|
|
<ref id="log-fileno" name="file number">).
|
|
<item><bf/SWAPOUT/ The object was saved to disk.
|
|
<item><bf/SWAPIN/ The object existed on disk and was read into memory.
|
|
</itemize>
|
|
|
|
<tag/file number/<label id="log-fileno">
|
|
<P>
|
|
The file number for the object storage file. Please note that the path to
|
|
this file is calculated according to your <em/cache_dir/ configuration.
|
|
|
|
<P>
|
|
A file number of <em/FFFFFFFF/ denominates "memory only" objects. Any
|
|
action code for such a file number refers to an object which existed only
|
|
in memory, not on disk. For instance, if a <em/RELEASE/ code was logged
|
|
with file number <em/FFFFFFFF/, the object existed only in memory, and was
|
|
released from memory.
|
|
|
|
<tag/status/
|
|
<P>
|
|
The HTTP reply status code.
|
|
|
|
<tag/datehdr/<label id="log-Date">
|
|
<P>
|
|
The value of the HTTP "Date: " reply header.
|
|
|
|
<tag/lastmod<label id="log-LM">
|
|
<P>
|
|
The value of the HTTP "Last-Modified: " reply header.
|
|
|
|
<tag/expires/<label id="log-Expires">
|
|
<P>
|
|
The value of the HTTP "Expires: " reply header.
|
|
|
|
<tag/type/
|
|
<P>
|
|
The HTTP "Content-Type" major value, or "unknown" if it cannot be
|
|
determined.
|
|
|
|
<tag/sizes/
|
|
<P>
|
|
This column consists of two slash separated fields:
|
|
|
|
<enum>
|
|
<item>The advertised content length from the HTTP "Content-Length: " reply
|
|
header.
|
|
<item>The size actually read.
|
|
</enum>
|
|
|
|
<P>
|
|
If the advertised (or expected) length is missing, it will be set to
|
|
zero. If the advertised length is not zero, but not equal to the real
|
|
length, the object will be realeased from the cache.
|
|
|
|
<tag/method/
|
|
<P>
|
|
The request method for the object, e.g. <em/GET/.
|
|
|
|
<tag/key/<label id="log-key">
|
|
<P>
|
|
The key to the object, usually the URL.
|
|
</descrip>
|
|
|
|
<P>
|
|
The timestamp format for the columns <ref id="log-Date" name="Date"> to
|
|
<ref id="log-Expires" name="Expires"> are all expressed in UTC seconds. The
|
|
actual values are parsed from the HTTP reply headers. An unparsable header
|
|
is represented by a value of -1, and a missing header is represented by a
|
|
value of -2.
|
|
|
|
<P>
|
|
The column <ref id="log-key" name="key"> usually contains just the URL of
|
|
the object. Some objects though will never become public. Thus the key is
|
|
said to include a unique integer number and the request method in addition
|
|
to the URL.
|
|
|
|
<sect1><em/hierarchy.log/
|
|
<P>
|
|
This logfile exists for Squid-1.0 only. The format is
|
|
<verb>
|
|
[date] URL peerstatus peerhost
|
|
</verb>
|
|
|
|
<sect1><em/access.log/
|
|
|
|
<P>
|
|
Most log file analysis program are based on the entries in
|
|
<em/access.log/. Currently, there are two file formats possible for the log
|
|
file, depending on your configuration for the <em/emulate_httpd_log/
|
|
option. By default, Squid will log in its native log file format. If the
|
|
above option is enabled, Squid will log in the common log file format as
|
|
defined by the CERN web daemon.
|
|
|
|
<P>
|
|
The common log file format contains other information than the native log
|
|
file, and less. The native format contains more information for the admin
|
|
interested in cache evaluation.
|
|
|
|
<sect2><em/The common log file format/
|
|
|
|
<P>
|
|
The
|
|
<url url="http://www.w3.org/pub/WWW/Daemon/User/Config/Logging.html#common-logfile-format"
|
|
name="Common Logfile Format">
|
|
is used by numerous HTTP servers. This format consists of the following
|
|
seven fields:
|
|
<verb>
|
|
remotehost rfc931 authuser [date] "method URL" status bytes
|
|
</verb>
|
|
<P>
|
|
It is parsable by a variety of tools. The common format contains different
|
|
information than the native log file format. The HTTP version is logged,
|
|
which is not logged in native log file format.
|
|
|
|
|
|
<sect2><em/The native log file format/
|
|
|
|
<P>
|
|
The native format is different for different major versions of Squid. For
|
|
Squid-1.0 it is:
|
|
<verb>
|
|
time elapsed remotehost code/status/peerstatus bytes method URL
|
|
</verb>
|
|
|
|
<P>
|
|
For Squid-1.1, the information from the <em/hierarchy.log/ was moved
|
|
into <em/access.log/. The format is:
|
|
<verb>
|
|
time elapsed remotehost code/status bytes method URL rfc931 peerstatus/peerhost type
|
|
</verb>
|
|
|
|
<P>
|
|
For Squid-2 the columns stay the same, though the content within may change
|
|
a little.
|
|
|
|
<P>
|
|
The native log file format logs more and different information than the
|
|
common log file format: the request duration, some timeout information,
|
|
the next upstream server address, and the content type.
|
|
|
|
There exist tools, which convert one file format into the other. Please
|
|
mind that even though the log formats share most information, both formats
|
|
contain information which is not part of the other format, and thus this
|
|
part of the information is lost when converting. Especially converting back
|
|
and forth is not possible without loss.
|
|
|
|
<em/squid2common.pl/ is a conversion utility, which converts any of the
|
|
squid log file formats into the old CERN proxy style output. There exist
|
|
tools to analyse, evaluate and graph results from that format.
|
|
|
|
|
|
<sect2><em/access.log native format in detail/
|
|
|
|
<P>
|
|
It is recommended though to use Squid's native log format due to its
|
|
greater amount of information made available for later analysis. The print
|
|
format line for native <em/access.log/ entries looks like this:
|
|
|
|
<verb>
|
|
"%9d.%03d %6d %s %s/%03d %d %s %s %s %s%s/%s %s"
|
|
</verb>
|
|
|
|
<P>
|
|
Therefore, an <em/access.log/ entry usually consists of (at least) 10
|
|
columns separated by one ore more spaces:
|
|
|
|
<descrip>
|
|
<tag/time/
|
|
<P>
|
|
A Unix timestamp as UTC seconds with a millisecond resolution. You
|
|
can convert Unix timestamps into something more human readable using
|
|
this short perl script:
|
|
<verb>
|
|
#! /usr/bin/perl -p
|
|
s/^\d+\.\d+/localtime $&/e;
|
|
</verb>
|
|
|
|
<tag/duration/
|
|
<P>
|
|
The elapsed time considers how many milliseconds the transaction
|
|
busied the cache. It differs in interpretation between TCP and UDP:
|
|
<P>
|
|
<itemize>
|
|
<item>For HTTP/1.0, this is basically the time between <em/accept()/
|
|
and <em/close()/.
|
|
<item>For persistent connections, this ought to be the time between
|
|
scheduling the reply and finishing sending it.
|
|
<item>For ICP, this is the time between scheduling a reply and actually
|
|
sending it.
|
|
</itemize>
|
|
<P>
|
|
Please note that the entries are logged <em/after/ the reply finished
|
|
being sent, <em/not/ during the lifetime of the transaction.
|
|
|
|
<tag/client address/
|
|
<P>
|
|
The IP address of the requesting instance, the client IP address. The
|
|
<em/client_netmask/ configuration option can distort the clients for data
|
|
protection reasons, but it makes analysis more difficult. Often it is
|
|
better to use one of the log file anonymizers.
|
|
<P>
|
|
Also, the <em/log_fqdn/ configuration option may log the fully qualified
|
|
domain name of the client instead of the dotted quad. The use of that
|
|
option is discouraged due to its performance impact.
|
|
|
|
<tag/result codes/<label id="log-resultcode">
|
|
<P>
|
|
This column is made up of two entries separated by a slash. This column
|
|
encodes the transaction result:
|
|
|
|
<enum>
|
|
<item>The cache result of the request contains information on the kind of
|
|
request, how it was satisfied, or in what way it failed. Please refer
|
|
to section <ref id="cache-result-codes" name="Squid result codes">
|
|
for valid symbolic result codes.
|
|
|
|
<P>
|
|
Several codes from older versions are no longer available, were
|
|
renamed, or split. Especially the <em/ERR_/ codes do not seem to
|
|
appear in the log file any more. Also refer to section
|
|
<ref id="cache-result-codes" name="Squid result codes"> for details
|
|
on the codes no longer available in Squid-2.
|
|
|
|
<P>
|
|
The NOVM versions and Squid-2 also rely on the Unix buffer cache, thus
|
|
you will see less <em/TCP_MEM_HIT/s than with a Squid-1.
|
|
Basically, the NOVM feature relies on <em/read()/ to obtain an
|
|
object, but due to the kernel buffer cache, no disk activity is needed.
|
|
Only small objects (below 8KByte) are kept in Squid's part of main
|
|
memory.
|
|
|
|
<item>The status part contains the HTTP result codes with some Squid specific
|
|
extensions. Squid uses a subset of the RFC defined error codes for
|
|
HTTP. Refer to section <ref id="http-status-codes" name="status codes">
|
|
for details of the status codes recognized by a Squid-2.
|
|
</enum>
|
|
|
|
<tag/bytes/
|
|
<P>
|
|
The size is the amount of data delivered to the client. Mind that this does
|
|
not constitute the net object size, as headers are also counted. Also,
|
|
failed requests may deliver an error page, the size of which is also logged
|
|
here.
|
|
|
|
<tag/request method/
|
|
<P>
|
|
The request method to obtain an object. Please refer to section
|
|
<ref id="request-methods"> for available methods.
|
|
If you turned off <em/log_icp_queries/ in your configuration, you
|
|
will not see (and thus unable to analyse) ICP exchanges. The <em/PURGE/
|
|
method is only available, if you have an ACL for ``method purge'' enabled
|
|
in your configuration file.
|
|
|
|
<tag/URL/
|
|
<P>
|
|
This column contains the URL requested. Please note that the log file
|
|
may contain whitespaces for the URI. The default configuration for
|
|
<em/uri_whitespace/ denies whitespaces, though.
|
|
|
|
<tag/rfc931/
|
|
<P>
|
|
The eigth column may contain the ident lookups for the requesting
|
|
client. Since ident lookups have performance impact, the default
|
|
configuration turns <em/ident_loookups/ off. If turned off, or no ident
|
|
information is available, a ``-'' will be logged.
|
|
|
|
<tag/hierarchy code/
|
|
<P>
|
|
The hierarchy information consists of three items:
|
|
<P>
|
|
<enum>
|
|
<item>Any hierarchy tag may be prefixed with <em/TIMEOUT_/, if the
|
|
timeout occurs waiting for all ICP replies to return from the
|
|
neighbours. The timeout is either dynamic, if the
|
|
<em/icp_query_timeout/ was not set, or the time configured there
|
|
has run up.
|
|
<item>A code that explains how the request was handled, e.g. by
|
|
forwarding it to a peer, or going straight to the source. Refer to
|
|
section <ref id="hier-codes"> for details on hierarchy codes and
|
|
removed hierarchy codes.
|
|
<item>The name of the host the object was requested from. This host may
|
|
be the origin site, a parent or any other peer. Also note that the
|
|
hostname may be numerical.
|
|
</enum>
|
|
|
|
<tag/type/
|
|
<P>
|
|
The content type of the object as seen in the HTTP reply
|
|
header. Please note that ICP exchanges usually don't have any content
|
|
type, and thus are logged ``-''. Also, some weird replies have content
|
|
types ``:'' or even empty ones.
|
|
</descrip>
|
|
|
|
<P>
|
|
There may be two more columns in the <em/access.log/, if the (debug) option
|
|
<em/log_mime_headers/ is enabled In this case, the HTTP request headers are
|
|
logged between a ``['' and a ``]'', and the HTTP reply headers are also
|
|
logged between ``['' and ``]''. All control characters like CR and LF are
|
|
URL-escaped, but spaces are <em/not/ escaped! Parsers should watch out for
|
|
this.
|
|
|
|
<sect1>Squid result codes
|
|
<label id="cache-result-codes">
|
|
|
|
<P>
|
|
The <bf/TCP_/ codes refer to requests on the HTTP port (usually 3128). The
|
|
<bf/UDP_/ codes refer to requests on the ICP port (usually 3130). If
|
|
ICP logging was disabled using the <em/log_icp_queries/ option, no ICP
|
|
replies will be logged.
|
|
|
|
<P>
|
|
The following result codes were taken from a Squid-2, compare with the
|
|
<em/log_tags/ struct in <em>src/access_log.c</em>:
|
|
|
|
<descrip>
|
|
|
|
<tag/TCP_HIT/
|
|
A valid copy of the requested object was in the cache.
|
|
|
|
<tag/TCP_MISS/
|
|
The requested object was not in the cache.
|
|
|
|
<tag/TCP_REFRESH_HIT/
|
|
The requested object was cached but <em/STALE/. The IMS query
|
|
for the object resulted in "304 not modified".
|
|
|
|
<tag/TCP_REF_FAIL_HIT/
|
|
The requested object was cached but <em/STALE/. The IMS query
|
|
failed and the stale object was delivered.
|
|
|
|
<tag/TCP_REFRESH_MISS/
|
|
The requested object was cached but <em/STALE/. The IMS query
|
|
returned the new content.
|
|
|
|
<tag/TCP_CLIENT_REFRESH_MISS/<label id="tcp-client-refresh-miss">
|
|
The client issued a "no-cache" pragma, or some analogous cache
|
|
control command along with the request. Thus, the cache has to
|
|
refetch the object.
|
|
|
|
<tag/TCP_IMS_HIT/<label id="tcp-ims-hit">
|
|
The client issued an IMS request for an object which was in the
|
|
cache and fresh.
|
|
|
|
<tag/TCP_SWAPFAIL_MISS/<label id="tcp-swapfail-miss">
|
|
The object was believed to be in the cache,
|
|
but could not be accessed.
|
|
|
|
<tag/TCP_NEGATIVE_HIT/
|
|
Request for a negatively cached object,
|
|
e.g. "404 not found", for which the cache believes to know that
|
|
it is inaccessible. Also refer to the explainations for
|
|
<em/negative_ttl/ in your <em/squid.conf/ file.
|
|
|
|
<tag/TCP_MEM_HIT/
|
|
A valid copy of the requested object was in the
|
|
cache <em/and/ it was in memory, thus avoiding disk accesses.
|
|
|
|
<tag/TCP_DENIED/
|
|
Access was denied for this request.
|
|
|
|
<tag/TCP_OFFLINE_HIT/
|
|
The requested object was retrieved from the
|
|
cache during offline mode. The offline mode never
|
|
validates any object, see <em/offline_mode/ in
|
|
<em/squid.conf/ file.
|
|
|
|
<tag/UDP_HIT/
|
|
A valid copy of the requested object was in the cache.
|
|
|
|
<tag/UDP_MISS/
|
|
The requested object is not in this cache.
|
|
|
|
<tag/UDP_DENIED/
|
|
Access was denied for this request.
|
|
|
|
<tag/UDP_INVALID/
|
|
An invalid request was received.
|
|
|
|
<tag/UDP_MISS_NOFETCH/<label id="udp-miss-nofetch">
|
|
During "-Y" startup, or during frequent
|
|
failures, a cache in hit only mode will return either UDP_HIT or
|
|
this code. Neighbours will thus only fetch hits.
|
|
|
|
<tag/NONE/
|
|
Seen with errors and cachemgr requests.
|
|
</descrip>
|
|
|
|
<P>
|
|
The following codes are no longer available in Squid-2:
|
|
|
|
<descrip>
|
|
<tag/ERR_*/
|
|
Errors are now contained in the status code.
|
|
|
|
<tag/TCP_CLIENT_REFRESH/
|
|
See: <ref id="tcp-client-refresh-miss" name="TCP_CLIENT_REFRESH_MISS">.
|
|
|
|
<tag/TCP_SWAPFAIL/
|
|
See: <ref id="tcp-swapfail-miss" name="TCP_SWAPFAIL_MISS">.
|
|
|
|
<tag/TCP_IMS_MISS/
|
|
Deleted, <ref id="tcp-ims-hit" name="TCP_IMS_HIT"> used instead.
|
|
|
|
<tag/UDP_HIT_OBJ/
|
|
Hit objects are no longer available.
|
|
|
|
<tag/UDP_RELOADING/
|
|
See: <ref id="udp-miss-nofetch" name="UDP_MISS_NOFETCH">.
|
|
</descrip>
|
|
|
|
<sect1>HTTP status codes
|
|
<label id="http-status-codes">
|
|
<P>
|
|
These are taken from
|
|
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2616.txt"
|
|
name="RFC 2616"> and verified for Squid. Squid-2 uses almost all
|
|
codes except 307 (Temporary Redirect), 416 (Request Range Not Satisfiable),
|
|
and 417 (Expectation Failed). Extra codes include 0 for a result code being
|
|
unavailable, and 600 to signal an invalid header, a proxy error. Also, some
|
|
definitions were added as for
|
|
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2518.txt"
|
|
name="RFC 2518"> (WebDAV).
|
|
Yes, there are really two entries for status code
|
|
424, compare with <em/http_status/ in <em>src/enums.h</em>:
|
|
|
|
<verb>
|
|
000 Used mostly with UDP traffic.
|
|
|
|
100 Continue
|
|
101 Switching Protocols
|
|
*102 Processing
|
|
|
|
200 OK
|
|
201 Created
|
|
202 Accepted
|
|
203 Non-Authoritative Information
|
|
204 No Content
|
|
205 Reset Content
|
|
206 Partial Content
|
|
*207 Multi Status
|
|
|
|
300 Multiple Choices
|
|
301 Moved Permanently
|
|
302 Moved Temporarily
|
|
303 See Other
|
|
304 Not Modified
|
|
305 Use Proxy
|
|
[307 Temporary Redirect]
|
|
|
|
400 Bad Request
|
|
401 Unauthorized
|
|
402 Payment Required
|
|
403 Forbidden
|
|
404 Not Found
|
|
405 Method Not Allowed
|
|
406 Not Acceptable
|
|
407 Proxy Authentication Required
|
|
408 Request Timeout
|
|
409 Conflict
|
|
410 Gone
|
|
411 Length Required
|
|
412 Precondition Failed
|
|
413 Request Entity Too Large
|
|
414 Request URI Too Large
|
|
415 Unsupported Media Type
|
|
[416 Request Range Not Satisfiable]
|
|
[417 Expectation Failed]
|
|
*424 Locked
|
|
*424 Failed Dependency
|
|
*433 Unprocessable Entity
|
|
|
|
500 Internal Server Error
|
|
501 Not Implemented
|
|
502 Bad Gateway
|
|
503 Service Unavailable
|
|
504 Gateway Timeout
|
|
505 HTTP Version Not Supported
|
|
*507 Insufficient Storage
|
|
|
|
600 Squid header parsing error
|
|
</verb>
|
|
|
|
<sect1>Request methods
|
|
<label id="request-methods">
|
|
<P>
|
|
Squid recognizes several request methods as defined in
|
|
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2616.txt"
|
|
name="RFC 2616">. Newer versions of Squid (2.2.STABLE5 and above)
|
|
also recognize
|
|
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2616.txt"
|
|
name="RFC 2518"> ``HTTP Extensions for Distributed Authoring --
|
|
WEBDAV'' extensions.
|
|
|
|
<verb>
|
|
method defined cachabil. meaning
|
|
--------- ---------- ---------- -------------------------------------------
|
|
GET HTTP/0.9 possibly object retrieval and simple searches.
|
|
HEAD HTTP/1.0 possibly metadata retrieval.
|
|
POST HTTP/1.0 CC or Exp. submit data (to a program).
|
|
PUT HTTP/1.1 never upload data (e.g. to a file).
|
|
DELETE HTTP/1.1 never remove resource (e.g. file).
|
|
TRACE HTTP/1.1 never appl. layer trace of request route.
|
|
OPTIONS HTTP/1.1 never request available comm. options.
|
|
CONNECT HTTP/1.1r3 never tunnel SSL connection.
|
|
|
|
ICP_QUERY Squid never used for ICP based exchanges.
|
|
PURGE Squid never remove object from cache.
|
|
|
|
PROPFIND rfc2518 ? retrieve properties of an object.
|
|
PROPATCH rfc2518 ? change properties of an object.
|
|
MKCOL rfc2518 never create a new collection.
|
|
MOVE rfc2518 never create a duplicate of src in dst.
|
|
COPY rfc2518 never atomically move src to dst.
|
|
LOCK rfc2518 never lock an object against modifications.
|
|
UNLOCK rfc2518 never unlock an object.
|
|
</verb>
|
|
|
|
|
|
|
|
<sect1>Hierarchy Codes
|
|
<label id="hier-codes">
|
|
<P>
|
|
The following hierarchy codes are used with Squid-2:
|
|
<descrip>
|
|
<tag/NONE/
|
|
For TCP HIT, TCP failures, cachemgr requests and all UDP
|
|
requests, there is no hierarchy information.
|
|
|
|
<tag/DIRECT/
|
|
The object was fetched from the origin server.
|
|
|
|
<tag/SIBLING_HIT/
|
|
The object was fetched from a sibling cache which replied with
|
|
UDP_HIT.
|
|
|
|
<tag/PARENT_HIT/
|
|
The object was requested from a parent cache which replied with
|
|
UDP_HIT.
|
|
|
|
<tag/DEFAULT_PARENT/
|
|
No ICP queries were sent. This parent was chosen because it was
|
|
marked ``default'' in the config file.
|
|
|
|
<tag/SINGLE_PARENT/
|
|
The object was requested from the only parent appropriate for the
|
|
given URL.
|
|
|
|
<tag/FIRST_UP_PARENT/
|
|
The object was fetched from the first parent in the list of
|
|
parents.
|
|
|
|
<tag/NO_PARENT_DIRECT/
|
|
The object was fetched from the origin server, because no parents
|
|
existed for the given URL.
|
|
|
|
<tag/FIRST_PARENT_MISS/
|
|
The object was fetched from the parent with the fastest (possibly
|
|
weighted) round trip time.
|
|
|
|
<tag/CLOSEST_PARENT_MISS/
|
|
This parent was chosen, because it included the the lowest RTT
|
|
measurement to the origin server. See also the <em/closests-only/
|
|
peer configuration option.
|
|
|
|
<tag/CLOSEST_PARENT/
|
|
The parent selection was based on our own RTT measurements.
|
|
|
|
<tag/CLOSEST_DIRECT/
|
|
Our own RTT measurements returned a shorter time than any parent.
|
|
|
|
<tag/NO_DIRECT_FAIL/
|
|
The object could not be requested because of a firewall
|
|
configuration, see also <em/never_direct/ and related material,
|
|
and no parents were available.
|
|
|
|
<tag/SOURCE_FASTEST/
|
|
The origin site was chosen, because the source ping arrived fastest.
|
|
|
|
<tag/ROUNDROBIN_PARENT/
|
|
No ICP replies were received from any parent. The parent was
|
|
chosen, because it was marked for round robin in the config file
|
|
and had the lowest usage count.
|
|
|
|
<tag/CACHE_DIGEST_HIT/
|
|
The peer was chosen, because the cache digest predicted a
|
|
hit. This option was later replaced in order to distinguish
|
|
between parents and siblings.
|
|
|
|
<tag/CD_PARENT_HIT/
|
|
The parent was chosen, because the cache digest predicted a
|
|
hit.
|
|
|
|
<tag/CD_SIBLING_HIT/
|
|
The sibling was chosen, because the cache digest predicted a
|
|
hit.
|
|
|
|
<tag/NO_CACHE_DIGEST_DIRECT/
|
|
This output seems to be unused?
|
|
|
|
<tag/CARP/
|
|
The peer was selected by CARP.
|
|
|
|
<tag/ANY_PARENT/
|
|
part of <em>src/peer_select.c:hier_strings[]</em>.
|
|
|
|
<tag/INVALID CODE/
|
|
part of <em>src/peer_select.c:hier_strings[]</em>.
|
|
</descrip>
|
|
|
|
<P>
|
|
Almost any of these may be preceded by 'TIMEOUT_' if the two-second
|
|
(default) timeout occurs waiting for all ICP replies to arrive from
|
|
neighbors, see also the <em/icp_query_timeout/ configuration option.
|
|
|
|
<P>
|
|
The following hierarchy codes were removed from Squid-2:
|
|
<verb>
|
|
code meaning
|
|
-------------------- -------------------------------------------------
|
|
PARENT_UDP_HIT_OBJ hit objects are not longer available.
|
|
SIBLING_UDP_HIT_OBJ hit objects are not longer available.
|
|
SSL_PARENT_MISS SSL can now be handled by squid.
|
|
FIREWALL_IP_DIRECT No special logging for hosts inside the firewall.
|
|
LOCAL_IP_DIRECT No special logging for local networks.
|
|
</verb>
|
|
|
|
<sect1><em>cache/log</em> (Squid-1.x)
|
|
<label id="swaplog">
|
|
|
|
<P>
|
|
This file has a rather unfortunate name. It also is often called the
|
|
<em/swap log/. It is a record of every cache object written to disk.
|
|
It is read when Squid starts up to ``reload'' the cache. If you remove
|
|
this file when squid is NOT running, you will effectively wipe out your
|
|
cache contents. If you remove this file while squid IS running,
|
|
you can easily recreate it. The safest way is to simply shutdown
|
|
the running process:
|
|
<verb>
|
|
% squid -k shutdown
|
|
</verb>
|
|
This will disrupt service, but at least you will have your swap log
|
|
back.
|
|
Alternatively, you can tell squid to rotate its log files. This also
|
|
causes a clean swap log to be written.
|
|
<verb>
|
|
% squid -k rotate
|
|
</verb>
|
|
|
|
<P>
|
|
For Squid-1.1, there are six fields:
|
|
<enum>
|
|
<item>
|
|
<bf/fileno/:
|
|
The swap file number holding the object data. This is mapped to a pathname on your filesystem.
|
|
|
|
<item>
|
|
<bf/timestamp/:
|
|
This is the time when the object was last verified to be current. The time is a
|
|
hexadecimal representation of Unix time.
|
|
|
|
<item>
|
|
<bf/expires/:
|
|
This is the value of the Expires header in the HTTP reply. If an Expires header
|
|
was not present, this will be -2 or fffffffe. If the Expires header was
|
|
present, but invalid (unparsable), this will be -1 or ffffffff.
|
|
|
|
<item>
|
|
<bf/lastmod/:
|
|
Value of the HTTP reply Last-Modified header. If missing it will be -2,
|
|
if invalid it will be -1.
|
|
|
|
<item>
|
|
<bf/size/:
|
|
Size of the object, including headers.
|
|
|
|
<item>
|
|
<bf/url/:
|
|
The URL naming this object.
|
|
|
|
</enum>
|
|
|
|
<sect1><em>swap.state</em> (Squid-2.x)
|
|
<P>
|
|
In Squid-2, the swap log file is now called <em/swap.state/. This is
|
|
a binary file that includes MD5 checksums, and <em/StoreEntry/ fields.
|
|
Please see the <url url="../Prog-Guide/" name="Programmers Guide"> for
|
|
information on the contents and format of that file.
|
|
|
|
<p>
|
|
If you remove <em/swap.state/ while Squid is running, simply send
|
|
Squid the signal to rotate its log files:
|
|
<verb>
|
|
% squid -k rotate
|
|
</verb>
|
|
Alternatively, you can tell Squid to shutdown and it will
|
|
rewrite this file before it exits.
|
|
|
|
<p>
|
|
If you remove the <em/swap.state/ while Squid is not running, you will
|
|
not lose your entire cache. In this case, Squid will scan all of
|
|
the cache directories and read each swap file to rebuild the cache.
|
|
This can take a very long time, so you'll have to be patient.
|
|
|
|
<p>
|
|
By default the <em/swap.state/ file is stored in the top-level
|
|
of each <em/cache_dir/. You can move the logs to a different
|
|
location with the <em/cache_swap_log/ option.
|
|
|
|
|
|
<sect1>Which log files can I delete safely?
|
|
<p>
|
|
You should never delete <em/access.log/, <em/store.log/,
|
|
<em/cache.log/, or <em/swap.state/ while Squid is running.
|
|
With Unix, you can delete a file when a process
|
|
has the file opened. However, the filesystem space is
|
|
not reclaimed until the process closes the file.
|
|
|
|
<p>
|
|
If you accidentally delete <em/swap.state/ while Squid is running,
|
|
you can recover it by following the instructions in the previous
|
|
questions. If you delete the others while Squid is running,
|
|
you can not recover them.
|
|
|
|
<p>
|
|
The correct way to maintain your log files is with Squid's ``rotate''
|
|
feature. You should rotate your log files at least once per day.
|
|
The current log files are closed and then renamed with numeric extensions
|
|
(.0, .1, etc). If you want to, you can write your own scripts
|
|
to archive or remove the old log files. If not, Squid will
|
|
only keep up to <em/logfile_rotate/ versions of each log file.
|
|
The logfile rotation procedure also writes a clean <em/swap.state/
|
|
file, but it does not leave numbered versions of the old files.
|
|
|
|
<P>
|
|
To rotate Squid's logs, simple use this command:
|
|
<verb>
|
|
squid -k rotate
|
|
</verb>
|
|
For example, use this cron entry to rotate the logs at midnight:
|
|
<verb>
|
|
0 0 * * * /usr/local/squid/bin/squid -k rotate
|
|
</verb>
|
|
|
|
<sect1>How can I disable Squid's log files?
|
|
|
|
<P>
|
|
To disable <em/access.log/:
|
|
<verb>
|
|
cache_access_log /dev/null
|
|
</verb>
|
|
|
|
<P>
|
|
To disable <em/store.log/:
|
|
<verb>
|
|
cache_store_log none
|
|
</verb>
|
|
|
|
<P>
|
|
It is a bad idea to disable the <em/cache.log/ because this file
|
|
contains many important status and debugging messages. However,
|
|
if you really want to, you can:
|
|
To disable <em/access.log/:
|
|
<verb>
|
|
cache_log /dev/null
|
|
</verb>
|
|
|
|
<sect1>My log files get very big!
|
|
<label id="log-large">
|
|
<P>
|
|
You need to <em/rotate/ your log files with a cron job. For example:
|
|
<verb>
|
|
0 0 * * * /usr/local/squid/bin/squid -k rotate
|
|
</verb>
|
|
|
|
<sect1>Managing log files
|
|
|
|
<P>
|
|
The preferred log file for analysis is the <em/access.log/ file in native
|
|
format. For long term evaluations, the log file should be obtained at
|
|
regular intervals. Squid offers an easy to use API for rotating log files,
|
|
in order that they may be moved (or removed) without disturbing the cache
|
|
operations in progress. The procedures were described above.
|
|
|
|
<P>
|
|
Depending on the disk space allocated for log file storage, it is
|
|
recommended to set up a cron job which rotates the log files every 24, 12,
|
|
or 8 hour. You will need to set your <em/logfile_rotate/ to a sufficiently
|
|
large number. During a time of some idleness, you can safely transfer the
|
|
log files to your analysis host in one burst.
|
|
|
|
<P>
|
|
Before transport, the log files can be compressed during off-peak time. On
|
|
the analysis host, the log file are concatinated into one file, so one file
|
|
for 24 hours is the yield. Also note that with <em/log_icp_queries/
|
|
enabled, you might have around 1 GB of uncompressed log information per day
|
|
and busy cache. Look into you cache manager info page to make an educated
|
|
guess on the size of your log files.
|
|
|
|
<P>
|
|
The EU project <url url="http://www.desire.org/" name="DESIRE">
|
|
developed some
|
|
<url url="http://www.uninett.no/prosjekt/desire/arneberg/statistics.html" name="some basic rules">
|
|
to obey when handling and processing log files:
|
|
|
|
<itemize>
|
|
<item>Respect the privacy of your clients when publishing results.
|
|
<item>Keep logs unavailable unless anonymized. Most countries have laws on
|
|
privacy protection, and some even on how long you are legally allowed to
|
|
keep certain kinds of information.
|
|
<item>Rotate and process log files at least once a day. Even if you don't
|
|
process the log files, they will grow quite large, see section
|
|
<ref id="log-large">. If you rely on processing the log files, reserve
|
|
a large enough partition solely for log files.
|
|
<item>Keep the size in mind when processing. It might take longer to
|
|
process log files than to generate them!
|
|
<item>Limit yourself to the numbers you are interested in. There is data
|
|
beyond your dreams available in your log file, some quite obvious, others
|
|
by combination of different views. Here are some examples for figures to
|
|
watch:
|
|
<itemize>
|
|
<item>The hosts using your cache.
|
|
<item>The elapsed time for HTTP requests - this is the latency the user
|
|
sees. Usually, you will want to make a distinction for HITs and MISSes
|
|
and overall times. Also, medians are preferred over averages.
|
|
<item>The requests handled per interval (e.g. second, minute or hour).
|
|
</itemize>
|
|
</itemize>
|
|
|
|
|
|
<sect1>Why do I get ERR_NO_CLIENTS_BIG_OBJ messages so often?
|
|
|
|
<P>
|
|
This message means that the requested object was in ``Delete Behind''
|
|
mode and the user aborted the transfer. An object will go into
|
|
``Delete Behind'' mode if
|
|
<itemize>
|
|
<item>It is larger than <em/maximum_object_size/
|
|
<item>It is being fetched from a neighbor which has the <em/proxy-only/ option set.
|
|
</itemize>
|
|
|
|
<sect1>What does ERR_LIFETIME_EXP mean?
|
|
|
|
<P>
|
|
This means that a timeout occurred while the object was being transferred. Most
|
|
likely the retrieval of this object was very slow (or it stalled before finishing)
|
|
and the user aborted the request. However, depending on your settings for
|
|
<em/quick_abort/, Squid may have continued to try retrieving the object.
|
|
Squid imposes a maximum amount of time on all open sockets, so after some amount
|
|
of time the stalled request was aborted and logged win an ERR_LIFETIME_EXP
|
|
message.
|
|
|
|
<sect1>Retrieving ``lost'' files from the cache
|
|
<P>
|
|
<quote><it>
|
|
I've been asked to retrieve an object which was accidentally
|
|
destroyed at the source for recovery.
|
|
So, how do I figure out where the things are so I can copy
|
|
them out and strip off the headers?
|
|
</it></quote>
|
|
<P>
|
|
The following method applies only to the Squid-1.1 versions:
|
|
<P>
|
|
Use <em>grep</em> to find the named object (Url) in the
|
|
<ref id="swaplog" name="cache/log"> file. The first field in
|
|
this file is an integer <em/file number/.
|
|
|
|
<P>
|
|
Then, find the file <em/fileno-to-pathname.pl/ from the ``scripts''
|
|
directory of the Squid source distribution. The usage is
|
|
<verb>
|
|
perl fileno-to-pathname.pl [-c squid.conf]
|
|
</verb>
|
|
file numbers are read on stdin, and pathnames are printed on
|
|
stdout.
|
|
|
|
<sect1>Can I use <em/store.log/ to figure out if a response was cachable?
|
|
<p>
|
|
Sort of. You can use <em/store.log/ to find out if a particular response
|
|
was <em>cached</em>.
|
|
<p>
|
|
Cached responses are logged with the SWAPOUT tag.
|
|
Uncached responses are logged with the RELEASE tag.
|
|
<p>
|
|
However, your
|
|
analysis must also consider that when a cached response is removed
|
|
from the cache (for example due to cache replacement) it is also
|
|
logged in <em/store.log/ with the RELEASE tag. To differentiate these
|
|
two, you can look at the filenumber (3rd) field. When an uncachable
|
|
response is released, the filenumber is FFFFFFFF (-1). Any other
|
|
filenumber indicates a cached response was released.
|
|
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Operational issues
|
|
|
|
<sect1>How do I see system level Squid statistics?
|
|
<P>
|
|
The Squid distribution includes a CGI utility called <em/cachemgr.cgi/
|
|
which can be used to view squid statistics with a web browser.
|
|
This document has a section devoted to <em/cachemgr.cgi/ usage
|
|
which you should consult for more information.
|
|
|
|
<sect1>How can I find the biggest objects in my cache?
|
|
<P>
|
|
<verb>
|
|
sort -r -n +4 -5 access.log | awk '{print $5, $7}' | head -25
|
|
</verb>
|
|
|
|
<sect1>I want to restart Squid with a clean cache
|
|
<p>
|
|
<em>Note: The information here is current for version 2.2.</em>
|
|
<P>
|
|
First of all, you must stop Squid of course. You can use
|
|
the command:
|
|
<verb>
|
|
% squid -k shutdown
|
|
</verb>
|
|
|
|
<p>
|
|
The fastest way to restart with an entirely clean cache is
|
|
to over write the <em/swap.state/ files for each <em/cache_dir/
|
|
in your config file. Note, you can not just remove the
|
|
<em/swap.state/ file, or truncate it to zero size. Instead,
|
|
you should put just one byte of garbage there. For example:
|
|
<verb>
|
|
% echo "" > /cache1/swap.state
|
|
</verb>
|
|
Repeat that for every <em/cache_dir/, then restart Squid.
|
|
Be sure to leave the <em/swap.state/ file with the same
|
|
owner and permissions that it had before!
|
|
|
|
<p>
|
|
Another way, which takes longer, is to have squid recreate all the
|
|
<em/cache_dir/ directories. But first you must move the existing
|
|
directories out of the way. For example, you can try this:
|
|
<verb>
|
|
% cd /cache1
|
|
% mkdir JUNK
|
|
% mv ?? swap.state* JUNK
|
|
% rm -rf JUNK &
|
|
</verb>
|
|
Repeat this for your other <em/cache_dir/'s, then tell Squid
|
|
to create new directories:
|
|
<verb>
|
|
% squid -z
|
|
</verb>
|
|
|
|
<sect1>How can I proxy/cache Real Audio?
|
|
|
|
<P>
|
|
by <url url="mailto:roever@nse.simac.nl" name="Rodney van den Oever">,
|
|
and <url url="mailto:jrg@blodwen.demon.co.uk" name="James R Grinter">
|
|
|
|
<P>
|
|
<itemize>
|
|
|
|
<item>
|
|
Point the RealPlayer at your Squid server's HTTP port (e.g. 3128).
|
|
|
|
<item>
|
|
Using the Preferences->Transport tab, select <em/Use specified transports/
|
|
and with the <em/Specified Transports/ button, select use <em/HTTP Only/.
|
|
</itemize>
|
|
|
|
The RealPlayer (and RealPlayer Plus) manual states:
|
|
<verb>
|
|
Use HTTP Only
|
|
Select this option if you are behind a firewall and cannot
|
|
receive data through TCP. All data will be streamed through
|
|
HTTP.
|
|
|
|
Note: You may not be able to receive some content if you select
|
|
this option.
|
|
</verb>
|
|
|
|
<P>
|
|
Again, from the documentation:
|
|
<verb>
|
|
RealPlayer 4.0 identifies itself to the firewall when making a
|
|
request for content to a RealServer. The following string is
|
|
attached to any URL that the Player requests using HTTP GET:
|
|
|
|
/SmpDsBhgRl
|
|
|
|
Thus, to identify an HTTP GET request from the RealPlayer, look
|
|
for:
|
|
|
|
http://[^/]+/SmpDsBhgRl
|
|
|
|
The Player can also be identified by the mime type in a POST to
|
|
the RealServer. The RealPlayer POST has the following mime
|
|
type:
|
|
|
|
"application/x-pncmd"
|
|
</verb>
|
|
|
|
Note that the first request is a POST, and the second has a '?' in the URL, so
|
|
standard Squid configurations would treat it as non-cachable. It also looks
|
|
rather ``magic.''
|
|
|
|
<P>
|
|
HTTP is an alternative delivery mechanism introduced with version 3 players,
|
|
and it allows a reasonable approximation to ``streaming'' data - that is playing
|
|
it as you receive it. For more details, see their notes on
|
|
<url url="http://www.real.com/products/encoder/realvideo/httpstream.html"
|
|
name="HTTP Pseudo-Streaming">.
|
|
|
|
<P>
|
|
It isn't available in the general case: only if someone has made the realaudio
|
|
file available via an HTTP server, or they're using a version 4 server, they've
|
|
switched it on, and you're using a version 4 client. If someone has made the
|
|
file available via their HTTP server, then it'll be cachable. Otherwise, it
|
|
won't be (as far as we can tell.)
|
|
|
|
<P>
|
|
The more common RealAudio link connects via their own <em/pnm:/ method and is
|
|
transferred using their proprietary protocol (via TCP or UDP) and not using
|
|
HTTP. It can't be cached nor proxied by Squid, and requires something such as
|
|
the simple proxy that Progressive Networks themselves have made available, if
|
|
you're in a firewall/no direct route situation. Their product does not cache
|
|
(and I don't know of any software available that does.)
|
|
|
|
<P>
|
|
Some confusion arises because there is also a configuration option to use an
|
|
HTTP proxy (such as Squid) with the Realaudio/RealVideo players. This is
|
|
because the players can fetch the ``<tt/.ram/'' file that contains the <em/pnm:/
|
|
reference for the audio/video stream. They fetch that .ram file from an HTTP
|
|
server, using HTTP.
|
|
|
|
<sect1>How can I purge an object from my cache?
|
|
<label id="purging-objects">
|
|
|
|
<P>
|
|
Squid does not allow
|
|
you to purge objects unless it is configured with access controls
|
|
in <em/squid.conf/. First you must add something like
|
|
<verb>
|
|
acl PURGE method purge
|
|
acl localhost src 127.0.0.1
|
|
http_access allow purge localhost
|
|
http_access deny purge
|
|
</verb>
|
|
The above only allows purge requests which come from the local host and
|
|
denies all other purge requests.
|
|
|
|
<P>
|
|
To purge an object, you can use the <em/client/ program:
|
|
<verb>
|
|
client -m PURGE http://www.miscreant.com/
|
|
</verb>
|
|
If the purge was successful, you will see a ``200 OK'' response:
|
|
<verb>
|
|
HTTP/1.0 200 OK
|
|
Date: Thu, 17 Jul 1997 16:03:32 GMT
|
|
Server: Squid/1.1.14
|
|
</verb>
|
|
If the object was not found in the cache, you will see a ``404 Not Found''
|
|
response:
|
|
<verb>
|
|
HTTP/1.0 404 Not Found
|
|
Date: Thu, 17 Jul 1997 16:03:22 GMT
|
|
Server: Squid/1.1.14
|
|
</verb>
|
|
|
|
|
|
|
|
<sect1>Using ICMP to Measure the Network
|
|
<label id="using-icmp">
|
|
<P>
|
|
As of version 1.1.9, Squid is able to utilize ICMP Round-Trip-Time (RTT)
|
|
measurements to select the optimal location to forward a cache miss.
|
|
Previously, cache misses would be forwarded to the parent cache
|
|
which returned the first ICP reply message. These were logged
|
|
with FIRST_PARENT_MISS in the access.log file. Now we can
|
|
select the parent which is closest (RTT-wise) to the origin
|
|
server.
|
|
|
|
<sect2>Supporting ICMP in your Squid cache
|
|
|
|
<P>
|
|
It is more important that your parent caches enable the ICMP
|
|
features. If you are acting as a parent, then you may want
|
|
to enable ICMP on your cache. Also, if your cache makes
|
|
RTT measurements, it will fetch objects directly if your
|
|
cache is closer than any of the parents.
|
|
|
|
<P>
|
|
If you want your Squid cache to measure RTT's to origin servers,
|
|
Squid must be compiled with the USE_ICMP option. This is easily
|
|
accomplished by uncommenting "-DUSE_ICMP=1" in <em>src/Makefile</em> and/or
|
|
<em>src/Makefile.in</em>.
|
|
|
|
<P>
|
|
An external program called <em/pinger/ is responsible for sending and
|
|
receiving ICMP packets. It must run with root privileges. After
|
|
Squid has been compiled, the pinger program must be installed
|
|
separately. A special Makefile target will install <em/pinger/ with
|
|
appropriate permissions.
|
|
<verb>
|
|
% make install
|
|
% su
|
|
# make install-pinger
|
|
</verb>
|
|
There are three configuration file options for tuning the
|
|
measurement database on your cache. <em/netdb_low/ and <em/netdb_high/
|
|
specify high and low water marks for keeping the database to a
|
|
certain size (e.g. just like with the IP cache). The <em/netdb_ttl/
|
|
option specifies the minimum rate for pinging a site. If
|
|
<em/netdb_ttl/ is set to 300 seconds (5 minutes) then an ICMP packet
|
|
will not be sent to the same site more than once every five
|
|
minutes. Note that a site is only pinged when an HTTP request for
|
|
the site is received.
|
|
<P>
|
|
Another option, <em/minimum_direct_hops/ can be used to try finding
|
|
servers which are close to your cache. If the measured hop count
|
|
to the origin server is less than or equal to <em/minimum_direct_hops/,
|
|
the request will be forwarded directly to the origin server.
|
|
|
|
<sect2>Utilizing your parents database
|
|
<P>
|
|
Your parent caches can be asked to include the RTT measurements
|
|
in their ICP replies. To do this, you must enable <em/query_icmp/
|
|
in your config file:
|
|
<verb>
|
|
query_icmp on
|
|
</verb>
|
|
This causes a flag to be set in your outgoing ICP queries.
|
|
<P>
|
|
If your parent caches return ICMP RTT measurements then
|
|
the eighth column of your access.log will have lines
|
|
similar to:
|
|
<verb>
|
|
CLOSEST_PARENT_MISS/it.cache.nlanr.net
|
|
</verb>
|
|
In this case, it means that <em/it.cache.nlanr.net/ returned
|
|
the lowest RTT to the origin server. If your cache measured
|
|
a lower RTT than any of the parents, the request will
|
|
be logged with
|
|
<verb>
|
|
CLOSEST_DIRECT/www.sample.com
|
|
</verb>
|
|
|
|
<sect2>Inspecting the database
|
|
<P>
|
|
The measurement database can be viewed from the cachemgr by
|
|
selecting "Network Probe Database." Hostnames are aggregated
|
|
into /24 networks. All measurements made are averaged over
|
|
time. Measurements are made to specific hosts, taken from
|
|
the URLs of HTTP requests. The recv and sent fields are the
|
|
number of ICMP packets sent and received. At this time they
|
|
are only informational.
|
|
<P>
|
|
A typical database entry looks something like this:
|
|
<verb>
|
|
Network recv/sent RTT Hops Hostnames
|
|
192.41.10.0 20/ 21 82.3 6.0 www.jisedu.org www.dozo.com
|
|
bo.cache.nlanr.net 42.0 7.0
|
|
uc.cache.nlanr.net 48.0 10.0
|
|
pb.cache.nlanr.net 55.0 10.0
|
|
it.cache.nlanr.net 185.0 13.0
|
|
</verb>
|
|
This means we have sent 21 pings to both www.jisedu.org and
|
|
www.dozo.com. The average RTT is 82.3 milliseconds. The
|
|
next four lines show the measured values from our parent
|
|
caches. Since <em/bo.cache.nlanr.net/ has the lowest RTT,
|
|
it would be selected as the location to forward a request
|
|
for a www.jisedu.org or www.dozo.com URL.
|
|
|
|
<sect1>Why are so few requests logged as TCP_IMS_MISS?
|
|
|
|
<P>
|
|
When Squid receives an <em/If-Modified-Since/ request, it will
|
|
not forward the request unless the object needs to be refreshed
|
|
according to the <em/refresh_pattern/ rules. If the request
|
|
does need to be refreshed, then it will be logged as TCP_REFRESH_HIT
|
|
or TCP_REFRESH_MISS.
|
|
|
|
<P>
|
|
If the request is not forwarded, Squid replies to the IMS request
|
|
according to the object in its cache. If the modification times are the
|
|
same, then Squid returns TCP_IMS_HIT. If the modification times are
|
|
different, then Squid returns TCP_IMS_MISS. In most cases, the cached
|
|
object will not have changed, so the result is TCP_IMS_HIT. Squid will
|
|
only return TCP_IMS_MISS if some other client causes a newer version of
|
|
the object to be pulled into the cache.
|
|
|
|
<sect1>How can I make Squid NOT cache some servers or URLs?
|
|
|
|
<P>
|
|
In Squid-2, you use the <em/no_cache/ option to specify uncachable
|
|
requests. For example, this makes all responses from origin servers
|
|
in the 10.0.1.0/24 network uncachable:
|
|
<verb>
|
|
acl Local dst 10.0.1.0/24
|
|
no_cache deny Local
|
|
</verb>
|
|
|
|
<p>
|
|
This example makes all URL's with '.html' uncachable:
|
|
<verb>
|
|
acl HTML url_regex .html$
|
|
no_cache deny HTML
|
|
</verb>
|
|
|
|
<p>
|
|
This example makes a specific URL uncachable:
|
|
<verb>
|
|
acl XYZZY url_regex ^http://www.i.suck.com/foo.html$
|
|
no_cache deny XYZZY
|
|
</verb>
|
|
|
|
<p>
|
|
This example caches nothing between the hours of 8AM to 11AM:
|
|
<verb>
|
|
acl Morning time 08:00-11:00
|
|
no_cache deny Morning
|
|
</verb>
|
|
|
|
<P>
|
|
In Squid-1.1,
|
|
whether or not an object gets cached is controlled by the
|
|
<em/cache_stoplist/, and <em/cache_stoplist_pattern/ options. So, you may add:
|
|
<verb>
|
|
cache_stoplist my.domain.com
|
|
</verb>
|
|
Specifying uncachable objects by IP address is harder. The <url url="../1.1/patches.html"
|
|
name="1.1 patch page"> includes a patch called <em/no-cache-local.patch/ which
|
|
changes the behaviour of the <em/local_ip/ and <em/local_domain/ so
|
|
that matching requests are NOT CACHED, in addition to being fetched directly.
|
|
|
|
<sect1>How can I delete and recreate a cache directory?
|
|
|
|
<P>
|
|
Deleting an existing cache directory is not too difficult. Unfortunately,
|
|
you can't simply change squid.conf and then reconfigure. You can't
|
|
stop using a <em>cache_dir</em> while Squid is running. Also note
|
|
that Squid requires at least one <em>cache_dir</em> to run.
|
|
|
|
<enum>
|
|
<item>
|
|
Edit your <em/squid.conf/ file and comment out, or delete
|
|
the <em/cache_dir/ line for the cache directory that you want to
|
|
remove.
|
|
<item>
|
|
If you don't have any <em>cache_dir</em> lines in your squid.conf,
|
|
then Squid was using the default. You'll need to add a new
|
|
<em>cache_dir</em> line because Squid will continue to use
|
|
the default otherwise. You can add a small, temporary directory, fo
|
|
example:
|
|
<verb>
|
|
/usr/local/squid/cachetmp ....
|
|
</verb>
|
|
If you add a new <em>cache_dir</em> you have to run <em>squid -z</em>
|
|
to initialize that directory.
|
|
|
|
<item>
|
|
Remeber that
|
|
you can not delete a cache directory from a running Squid process;
|
|
you can not simply reconfigure squid. You must
|
|
shutdown Squid:
|
|
<verb>
|
|
squid -k shutdown
|
|
</verb>
|
|
<item>
|
|
Once Squid exits, you may immediately start it up again. Since you
|
|
deleted the old <em>cache_dir</em> from squid.conf, Squid won't
|
|
try to access that directory.
|
|
If you
|
|
use the RunCache script, Squid should start up again automatically.
|
|
<item>
|
|
Now Squid is no longer using the cache directory that you removed
|
|
from the config file. You can verify this by checking "Store Directory"
|
|
information with the cache manager. From the command line, type:
|
|
<verb>
|
|
client mgr:storedir
|
|
</verb>
|
|
|
|
<item>
|
|
Now that Squid is not using the cache directory, you can <em/rm -rf/ it,
|
|
format the disk, build a new filesystem, or whatever.
|
|
|
|
</enum>
|
|
|
|
<P>
|
|
The procedure is similar to recreate the directory.
|
|
<enum>
|
|
<item>
|
|
Edit <em/squid.conf/ and add a new <em/cache_dir/ line.
|
|
<item>
|
|
Initialize the new directory by running
|
|
<verb>
|
|
% squid -z
|
|
</verb>
|
|
NOTE: it is safe to run this even if Squid is already running. <em/squid -z/
|
|
will harmlessly try to create all of the subdirectories that already exist.
|
|
<item>
|
|
Reconfigure Squid
|
|
<verb>
|
|
squid -k reconfigure
|
|
</verb>
|
|
Unlike deleting, you can add new cache directories while Squid is
|
|
already running.
|
|
</enum>
|
|
|
|
<sect1>Why can't I run Squid as root?
|
|
<P>
|
|
by Dave J Woolley
|
|
<P>
|
|
If someone were to discover a buffer overrun bug in Squid and it runs as
|
|
a user other than root, they can only corrupt the files writeable to
|
|
that user, but if it runs a root, they can take over the whole machine.
|
|
This applies to all programs that don't absolutely need root status, not
|
|
just squid.
|
|
|
|
<sect1>Can you tell me a good way to upgrade Squid with minimal downtime?
|
|
<P>
|
|
Here is a technique that was described by <url url="mailto:radu@netsoft.ro"
|
|
name="Radu Greab">.
|
|
<P>
|
|
Start a second Squid server on an unused HTTP port (say 4128). This
|
|
instance of Squid probably doesn't need a large disk cache. When this
|
|
second server has finished reloading the disk store, swap the
|
|
<em/http_port/ values in the two <em/squid.conf/ files. Set the
|
|
original Squid to use port 5128, and the second one to use 3128. Next,
|
|
run ``squid -k reconfigure'' for both Squids. New requests will go to
|
|
the second Squid, now on port 3128 and the first Squid will finish
|
|
handling its current requests. After a few minutes, it should be safe
|
|
to fully shut down the first Squid and upgrade it. Later you can simply
|
|
repeat this process in reverse.
|
|
|
|
<sect1>Can Squid listen on more than one HTTP port?
|
|
<p>
|
|
<em>Note: The information here is current for version 2.3.</em>
|
|
<p>
|
|
Yes, you can specify multiple <em/http_port/ lines in your <em/squid.conf/
|
|
file. Squid attempts to bind() to each port that you specify. Sometimes
|
|
Squid may not be able to bind to a port, either because of permissions
|
|
or because the port is already in use. If Squid can bind to at least
|
|
one port, then it will continue running. If it can not bind to
|
|
any of the ports, then Squid stops.
|
|
|
|
<p>
|
|
With version 2.3 and later you can specify IP addresses
|
|
and port numbers together (see the squid.conf comments).
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Memory
|
|
|
|
<sect1>Why does Squid use so much memory!?
|
|
|
|
<P>
|
|
Squid uses a lot of memory for performance reasons. It takes much, much
|
|
longer to read something from disk than it does to read directly from
|
|
memory.
|
|
|
|
<P>
|
|
A small amount of metadata for each cached object is kept in memory.
|
|
This is the <em/StoreEntry/ data structure. For <em/Squid-2/ this is
|
|
56-bytes on "small" pointer architectures (Intel, Sparc, MIPS, etc) and
|
|
88-bytes on "large" pointer architectures (Alpha). In addition, There
|
|
is a 16-byte cache key (MD5 checksum) associated with each
|
|
<em/StoreEntry/. This means there are 72 or 104 bytes of metadata in
|
|
memory for every object in your cache. A cache with 1,000,000
|
|
objects therefore requires 72 MB of memory for <em/metadata only/.
|
|
In practice it requires much more than that.
|
|
|
|
<P>
|
|
Squid-1.1 also uses a lot of memory to store in-transit objects.
|
|
This version stores incoming objects only in memory, until the transfer
|
|
is complete. At that point it decides whether or not to store the object
|
|
on disk. This means that when users download large files, your memory
|
|
usage will increase significantly. The squid.conf parameter <em/maximum_object_size/
|
|
determines how much memory an in-transit object can consume before we
|
|
mark it as uncachable. When an object is marked uncachable, there is no
|
|
need to keep all of the object in memory, so the memory is freed for
|
|
the part of the object which has already been written to the client.
|
|
In other words, lowering <em/maximum_object_size/ also lowers Squid-1.1
|
|
memory usage.
|
|
|
|
<P>
|
|
Other uses of memory by Squid include:
|
|
<itemize>
|
|
<item>
|
|
Disk buffers for reading and writing
|
|
<item>
|
|
Network I/O buffers
|
|
<item>
|
|
IP Cache contents
|
|
<item>
|
|
FQDN Cache contents
|
|
<item>
|
|
Netdb ICMP measurement database
|
|
<item>
|
|
Per-request state information, including full request and
|
|
reply headers
|
|
<item>
|
|
Miscellaneous statistics collection.
|
|
<item>
|
|
``Hot objects'' which are kept entirely in memory.
|
|
</itemize>
|
|
|
|
<sect1>How can I tell how much memory my Squid process is using?
|
|
|
|
<P>
|
|
One way is to simply look at <em/ps/ output on your system.
|
|
For BSD-ish systems, you probably want to use the <em/-u/ option
|
|
and look at the <em/VSZ/ and <em/RSS/ fields:
|
|
<verb>
|
|
wessels ˜ 236% ps -axuhm
|
|
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
|
|
squid 9631 4.6 26.4 141204 137852 ?? S 10:13PM 78:22.80 squid -NCYs
|
|
</verb>
|
|
For SYSV-ish, you probably want to use the <em/-l/ option.
|
|
When interpreting the <em/ps/ output, be sure to check your <em/ps/
|
|
manual page. It may not be obvious if the reported numbers are kbytes,
|
|
or pages (usually 4 kb).
|
|
|
|
<P>
|
|
A nicer way to check the memory usage is with a program called
|
|
<em/top/:
|
|
<verb>
|
|
last pid: 20128; load averages: 0.06, 0.12, 0.11 14:10:58
|
|
46 processes: 1 running, 45 sleeping
|
|
CPU states: % user, % nice, % system, % interrupt, % idle
|
|
Mem: 187M Active, 1884K Inact, 45M Wired, 268M Cache, 8351K Buf, 1296K Free
|
|
Swap: 1024M Total, 256K Used, 1024M Free
|
|
|
|
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
|
|
9631 squid 2 0 138M 135M select 78:45 3.93% 3.93% squid
|
|
</verb>
|
|
|
|
<P>
|
|
Finally, you can ask the Squid process to report its own memory
|
|
usage. This is available on the Cache Manager <em/info/ page.
|
|
Your output may vary depending upon your operating system and
|
|
Squid version, but it looks similar to this:
|
|
<verb>
|
|
Resource usage for squid:
|
|
Maximum Resident Size: 137892 KB
|
|
Memory usage for squid via mstats():
|
|
Total space in arena: 140144 KB
|
|
Total free: 8153 KB 6%
|
|
</verb>
|
|
|
|
<P>
|
|
If your RSS (Resident Set Size) value is much lower than your
|
|
process size, then your cache performance is most likely suffering
|
|
due to <ref id="paging" name="paging">.
|
|
|
|
|
|
<sect1>My Squid process grows without bounds.
|
|
|
|
<P>
|
|
You might just have your <em/cache_mem/ parameter set too high.
|
|
See the ``<ref id="lower-mem-usage" name="What can I do to reduce Squid's memory usage?">''
|
|
entry below.
|
|
|
|
<P>
|
|
When a process continually grows in size, without levelling off
|
|
or slowing down, it often indicates a memory leak. A memory leak
|
|
is when some chunk of memory is used, but not free'd when it is
|
|
done being used.
|
|
|
|
<P>
|
|
Memory leaks are a real problem for programs (like Squid) which do all
|
|
of their processing within a single process. Historically, Squid has
|
|
had real memory leak problems. But as the software has matured, we
|
|
believe almost all of Squid's memory leaks have been eliminated, and
|
|
new ones are least easy to identify.
|
|
|
|
<P>
|
|
Memory leaks may also be present in your system's libraries, such
|
|
as <em/libc.a/ or even <em/libmalloc.a/. If you experience the ever-growing
|
|
process size phenomenon, we suggest you first try an
|
|
<ref id="alternate-malloc" name="alternative malloc library">.
|
|
|
|
<sect1>I set <em/cache_mem/ to XX, but the process grows beyond that!
|
|
|
|
<P>
|
|
The <em/cache_mem/ parameter <bf/does NOT/ specify the maximum
|
|
size of the process. It only specifies how much memory to use
|
|
for caching ``hot'' (very popular) replies. Squid's actual memory
|
|
usage is depends very strongly on your cache size (disk space) and
|
|
your incoming request load. Reducing <em/cache_mem/ will usually
|
|
also reduce the process size, but not necessarily, and there are
|
|
other ways to reduce Squid's memory usage (see below).
|
|
|
|
|
|
<sect1>How do I analyze memory usage from the cache manger output?
|
|
<label id="analyze-memory-usage">
|
|
<P>
|
|
|
|
<P>
|
|
<it>
|
|
Note: This information is specific to Squid-1.1 versions
|
|
</it>
|
|
|
|
<P>
|
|
Look at your <em/cachemgr.cgi/ <tt/Cache
|
|
Information/ page. For example:
|
|
<verb>
|
|
Memory usage for squid via mallinfo():
|
|
Total space in arena: 94687 KB
|
|
Ordinary blocks: 32019 KB 210034 blks
|
|
Small blocks: 44364 KB 569500 blks
|
|
Holding blocks: 0 KB 5695 blks
|
|
Free Small blocks: 6650 KB
|
|
Free Ordinary blocks: 11652 KB
|
|
Total in use: 76384 KB 81%
|
|
Total free: 18302 KB 19%
|
|
|
|
Meta Data:
|
|
StoreEntry 246043 x 64 bytes = 15377 KB
|
|
IPCacheEntry 971 x 88 bytes = 83 KB
|
|
Hash link 2 x 24 bytes = 0 KB
|
|
URL strings = 11422 KB
|
|
Pool MemObject structures 514 x 144 bytes = 72 KB ( 70 free)
|
|
Pool for Request structur 516 x 4380 bytes = 2207 KB ( 2121 free)
|
|
Pool for in-memory object 6200 x 4096 bytes = 24800 KB ( 22888 free)
|
|
Pool for disk I/O 242 x 8192 bytes = 1936 KB ( 1888 free)
|
|
Miscellaneous = 2600 KB
|
|
total Accounted = 58499 KB
|
|
</verb>
|
|
|
|
<P>
|
|
First note that <tt/mallinfo()/ reports 94M in ``arena.'' This
|
|
is pretty close to what <em/top/ says (97M).
|
|
|
|
<P>
|
|
Of that 94M, 81% (76M) is actually being used at the moment. The
|
|
rest has been freed, or pre-allocated by <tt/malloc(3)/
|
|
and not yet used.
|
|
|
|
<P>
|
|
Of the 76M in use, we can account for 58.5M (76%). There are some
|
|
calls to <tt/malloc(3)/ for which we can't account.
|
|
|
|
<P>
|
|
The <tt/Meta Data/ list gives the breakdown of where the
|
|
accounted memory has gone. 45% has gone to <tt/StoreEntry/
|
|
and URL strings. Another 42% has gone to buffering hold objects
|
|
in VM while they are fetched and relayed to the clients (<tt/Pool
|
|
for in-memory object/).
|
|
|
|
<P>
|
|
The pool sizes are specified by <em/squid.conf/ parameters.
|
|
In version 1.0, these pools are somewhat broken: we keep a stack
|
|
of unused pages instead of freeing the block. In the <tt/Pool
|
|
for in-memory object/, the unused stack size is 1/2 of
|
|
<tt/cache_mem/. The <tt>Pool for disk I/O</tt> is
|
|
hardcoded at 200. For <tt/MemObject/ and <tt/Request/
|
|
it's 1/8 of your system's <tt/FD_SETSIZE/ value.
|
|
|
|
<P>
|
|
If you need to lower your process size, we recommend lowering the
|
|
max object sizes in the 'http', 'ftp' and 'gopher' config lines.
|
|
You may also want to lower <tt/cache_mem/ to suit your
|
|
needs. But if you <tt/make cache_mem/ too low, then some
|
|
objects may not get saved to disk during high-load periods. Newer
|
|
Squid versions allow you to set <tt/memory_pools off/ to
|
|
disable the free memory pools.
|
|
|
|
<sect1>The ``Total memory accounted'' value is less than the size of my Squid process.
|
|
|
|
<P>
|
|
We are not able to account for <em/all/ memory that Squid uses. This
|
|
would require excessive amounts of code to keep track of every last byte.
|
|
We do our best to account for the major uses of memory.
|
|
|
|
<P>
|
|
Also, note that the <em/malloc/ and <em/free/ functions have
|
|
their own overhead. Some additional memory is required to keep
|
|
track of which chunks are in use, and which are free. Additionally,
|
|
most operating systems do not allow processes to shrink in size.
|
|
When a process gives up memory by calling <em/free/, the total
|
|
process size does not shrink. So the process size really
|
|
represents the maximum size your Squid process has reached.
|
|
|
|
|
|
<sect1>xmalloc: Unable to allocate 4096 bytes!
|
|
<label id="malloc-death">
|
|
|
|
<P>
|
|
by <url url="mailto:hno@hem.passagen.se" name="Henrik Nordstrom">
|
|
|
|
<P>
|
|
Messages like "FATAL: xcalloc: Unable to allocate 4096 blocks of 1 bytes!"
|
|
appear when Squid can't allocate more memory, and on most operating systems
|
|
(inclusive BSD) there are only two possible reasons:
|
|
<enum>
|
|
<item>The machine is out of swap
|
|
<item>The process' maximum data segment size has been reached
|
|
</enum>
|
|
The first case is detected using the normal swap monitoring tools
|
|
available on the platform (<em/pstat/ on SunOS, perhaps <em/pstat/ is
|
|
used on BSD as well).
|
|
<P>
|
|
To tell if it is the second case, first rule out the first case and then
|
|
monitor the size of the Squid process. If it dies at a certain size with
|
|
plenty of swap left then the max data segment size is reached without no
|
|
doubts.
|
|
<P>
|
|
The data segment size can be limited by two factors:
|
|
<enum>
|
|
<item>Kernel imposed maximum, which no user can go above
|
|
<item>The size set with ulimit, which the user can control.
|
|
</enum>
|
|
<P>
|
|
When squid starts it sets data and file ulimit's to the hard level. If
|
|
you manually tune ulimit before starting Squid make sure that you set
|
|
the hard limit and not only the soft limit (the default operation of
|
|
ulimit is to only change the soft limit). root is allowed to raise the
|
|
soft limit above the hard limit.
|
|
<P>
|
|
This command prints the hard limits:
|
|
<verb>
|
|
ulimit -aH
|
|
</verb>
|
|
<P>
|
|
This command sets the data size to unlimited:
|
|
<verb>
|
|
ulimit -HSd unlimited
|
|
</verb>
|
|
|
|
|
|
<sect2>BSD/OS
|
|
<P>
|
|
by <url url="mailto:Arjan.deVet@adv.IAEhv.nl" name="Arjan de Vet">
|
|
<P>
|
|
The default kernel limit on BSD/OS for datasize is 64MB (at least on 3.0
|
|
which I'm using).
|
|
|
|
<P>
|
|
Recompile a kernel with larger datasize settings:
|
|
|
|
<verb>
|
|
maxusers 128
|
|
# Support for large inpcb hash tables, e.g. busy WEB servers.
|
|
options INET_SERVER
|
|
# support for large routing tables, e.g. gated with full Internet routing:
|
|
options "KMEMSIZE=\(16*1024*1024\)"
|
|
options "DFLDSIZ=\(128*1024*1024\)"
|
|
options "DFLSSIZ=\(8*1024*1024\)"
|
|
options "SOMAXCONN=128"
|
|
options "MAXDSIZ=\(256*1024*1024\)"
|
|
</verb>
|
|
|
|
See <em>/usr/share/doc/bsdi/config.n</em> for more info.
|
|
|
|
<P>
|
|
In /etc/login.conf I have this:
|
|
|
|
<verb>
|
|
default:\
|
|
:path=/bin /usr/bin /usr/contrib/bin:\
|
|
:datasize-cur=256M:\
|
|
:openfiles-cur=1024:\
|
|
:openfiles-max=1024:\
|
|
:maxproc-cur=1024:\
|
|
:stacksize-cur=64M:\
|
|
:radius-challenge-styles=activ,crypto,skey,snk,token:\
|
|
:tc=auth-bsdi-defaults:\
|
|
:tc=auth-ftp-bsdi-defaults:
|
|
|
|
#
|
|
# Settings used by /etc/rc and root
|
|
# This must be set properly for daemons started as root by inetd as well.
|
|
# Be sure reset these values back to system defaults in the default class!
|
|
#
|
|
daemon:\
|
|
:path=/bin /usr/bin /sbin /usr/sbin:\
|
|
:widepasswords:\
|
|
:tc=default:
|
|
# :datasize-cur=128M:\
|
|
# :openfiles-cur=256:\
|
|
# :maxproc-cur=256:\
|
|
</verb>
|
|
|
|
<P>
|
|
This should give enough space for a 256MB squid process.
|
|
|
|
<sect2>FreeBSD (2.2.X)
|
|
<P>
|
|
by Duane Wessels
|
|
<P>
|
|
The procedure is almost identical to that for BSD/OS above.
|
|
Increase the open filedescriptor limit in <em>/sys/conf/param.c</em>:
|
|
<verb>
|
|
int maxfiles = 4096;
|
|
int maxfilesperproc = 1024;
|
|
</verb>
|
|
Increase the maximum and default data segment size in your kernel
|
|
config file, e.g. <em>/sys/conf/i386/CONFIG</em>:
|
|
<verb>
|
|
options "MAXDSIZ=(512*1024*1024)"
|
|
options "DFLDSIZ=(128*1024*1024)"
|
|
</verb>
|
|
We also found it necessary to increase the number of mbuf clusters:
|
|
<verb>
|
|
options "NMBCLUSTERS=10240"
|
|
</verb>
|
|
And, if you have more than 256 MB of physical memory, you probably
|
|
have to disable BOUNCE_BUFFERS (whatever that is), so comment
|
|
out this line:
|
|
<verb>
|
|
#options BOUNCE_BUFFERS #include support for DMA bounce buffers
|
|
</verb>
|
|
|
|
|
|
Also, update limits in <em>/etc/login.conf</em>:
|
|
<verb>
|
|
# Settings used by /etc/rc
|
|
#
|
|
daemon:\
|
|
:coredumpsize=infinity:\
|
|
:datasize=infinity:\
|
|
:maxproc=256:\
|
|
:maxproc-cur@:\
|
|
:memoryuse-cur=64M:\
|
|
:memorylocked-cur=64M:\
|
|
:openfiles=4096:\
|
|
:openfiles-cur@:\
|
|
:stacksize=64M:\
|
|
:tc=default:
|
|
</verb>
|
|
And don't forget to run ``cap_mkdb /etc/login.conf'' after editing that file.
|
|
|
|
|
|
<sect2>OSF, Digital Unix
|
|
<P>
|
|
by <url url="mailto:ongbh@zpoprp.zpo.dec.com" name="Ong Beng Hui">
|
|
<P>
|
|
To increase the data size for Digital UNIX, edit the file <tt>/etc/sysconfigtab</tt>
|
|
and add the entry...
|
|
<verb>
|
|
proc:
|
|
per-proc-data-size=1073741824
|
|
</verb>
|
|
Or, with csh, use the limit command, such as
|
|
<verb>
|
|
> limit datasize 1024M
|
|
</verb>
|
|
|
|
<P>
|
|
Editing <tt>/etc/sysconfigtab</tt> requires a reboot, but the limit command
|
|
doesn't.
|
|
|
|
<sect1>fork: (12) Cannot allocate memory
|
|
<P>
|
|
When Squid is reconfigured (SIGHUP) or the logs are rotated (SIGUSR1),
|
|
some of the helper processes (dnsserver) must be killed and
|
|
restarted. If your system does not have enough virtual memory,
|
|
the Squid process may not be able to fork to start the new helper
|
|
processes.
|
|
The best way to fix this is to increase your virtual memory by adding
|
|
swap space. Normally your system uses raw disk partitions for swap
|
|
space, but most operating systems also support swapping on regular
|
|
files (Digital Unix excepted). See your system manual pages for
|
|
<em/swap/, <em/swapon/, and <em/mkfile/.
|
|
|
|
<sect1>What can I do to reduce Squid's memory usage?
|
|
<label id="lower-mem-usage">
|
|
<P>
|
|
If your cache performance is suffering because of memory limitations,
|
|
you might consider buying more memory. But if that is not an option,
|
|
There are a number of things to try:
|
|
<itemize>
|
|
<item>
|
|
Try a <ref id="alternate-malloc" name="different malloc library">.
|
|
<item>
|
|
Reduce the <em/cache_mem/ parameter in the config file. This controls
|
|
how many ``hot'' objects are kept in memory. Reducing this parameter
|
|
will not significantly affect performance, but you may recieve
|
|
some warnings in <em/cache.log/ if your cache is busy.
|
|
<item>
|
|
Turn the <em/memory_pools off/ in the config file. This causes
|
|
Squid to give up unused memory by calling <em/free()/ instead of
|
|
holding on to the chunk for potential, future use.
|
|
<item>
|
|
Reduce the <em/cache_swap/ parameter in your config file. This will
|
|
reduce the number of objects Squid keeps. Your overall hit ratio may go down a
|
|
little, but your cache will perform significantly better.
|
|
<item>
|
|
Reduce the <em/maximum_object_size/ parameter (Squid-1.1 only).
|
|
You won't be able to
|
|
cache the larger objects, and your byte volume hit ratio may go down,
|
|
but Squid will perform better overall.
|
|
<item>
|
|
If you are using Squid-1.1.x, try the ``NOVM'' version.
|
|
</itemize>
|
|
|
|
<sect1>Using an alternate <em/malloc/ library.
|
|
<label id="alternate-malloc">
|
|
<P>
|
|
Many users have found improved performance and memory utilization when
|
|
linking Squid with an external malloc library. We recommend either
|
|
GNU malloc, or dlmalloc.
|
|
|
|
<sect2>Using GNU malloc
|
|
|
|
<P>
|
|
To make Squid use GNU malloc follow these simple steps:
|
|
|
|
<enum>
|
|
<item>Download the GNU malloc source, available from one of
|
|
<url url="http://www.gnu.org/order/ftp.html"
|
|
name="The GNU FTP Mirror sites">.
|
|
<item>Compile GNU malloc
|
|
<verb>
|
|
% gzip -dc malloc.tar.gz | tar xf -
|
|
% cd malloc
|
|
% vi Makefile # edit as needed
|
|
% make
|
|
</verb>
|
|
<item>Copy libmalloc.a to your system's library directory and be sure to
|
|
name it <em/libgnumalloc.a/.
|
|
<verb>
|
|
% su
|
|
# cp malloc.a /usr/lib/libgnumalloc.a
|
|
</verb>
|
|
<item>(Optional) Copy the GNU malloc.h to your system's include directory and
|
|
be sure to name it <em/gnumalloc.h/. This step is not required, but if
|
|
you do this, then Squid will be able to use the <em/mstat()/ function to
|
|
report memory usage statistics on the cachemgr info page.
|
|
<verb>
|
|
# cp malloc.h /usr/include/gnumalloc.h
|
|
</verb>
|
|
<item>Reconfigure and recompile Squid
|
|
<verb>
|
|
% make realclean
|
|
% ./configure ...
|
|
% make
|
|
% make install
|
|
</verb>
|
|
Note, In later distributions, 'realclean' has been changed to 'distclean'.
|
|
As the configure script runs, watch its output. You should find that
|
|
it locates libgnumalloc.a and optionally gnumalloc.h.
|
|
</enum>
|
|
|
|
<sect2>dlmalloc
|
|
|
|
<P>
|
|
<url url="http://g.oswego.edu/dl/html/malloc.html" name="dlmalloc">
|
|
has been written by <url url="mailto:dl@cs.oswego.edu"
|
|
name="Doug Lea">. According to Doug:
|
|
<quote>
|
|
This is not the fastest, most space-conserving, most portable, or
|
|
most tunable malloc ever written. However it is among the fastest
|
|
while also being among the most space-conserving, portable and tunable.
|
|
</quote>
|
|
|
|
<P>
|
|
dlmalloc is included with the <em/Squid-2/ source distribution.
|
|
To use this library, you simply give an option to the <em/configure/
|
|
script:
|
|
<verb>
|
|
% ./configure --enable-dlmalloc ...
|
|
</verb>
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>The Cache Manager
|
|
<label id="cachemgr-section">
|
|
<P>
|
|
by <url url="mailto:JLarmour@origin-at.co.uk" name="Jonathan Larmour">
|
|
|
|
<sect1>What is the cache manager?
|
|
<P>
|
|
The cache manager (<em/cachemgr.cgi/) is a CGI utility for
|
|
displaying statistics about the <em/squid/ process as it runs.
|
|
The cache manager is a convenient way to manage the cache and view
|
|
statistics without logging into the server.
|
|
|
|
<sect1>How do you set it up?
|
|
<P>
|
|
That depends on which web server you're using. Below you will
|
|
find instructions for configuring the CERN and Apache servers
|
|
to permit <em/cachemgr.cgi/ usage.
|
|
<P>
|
|
<em/EDITOR"S NOTE: readers are encouraged to submit instructions
|
|
for configuration of cachemgr.cgi on other web server platforms, such
|
|
as Netscape./
|
|
|
|
<P>
|
|
After you edit the server configuration files, you will probably
|
|
need to either restart your web server or or send it a <tt/SIGHUP/ signal
|
|
to tell it to re-read its configuration files.
|
|
|
|
<P>
|
|
When you're done configuring your web server, you'll connect to
|
|
the cache manager with a web browser, using a URL such as:
|
|
<verb>
|
|
http://www.example.com/Squid/cgi-bin/cachemgr.cgi/
|
|
</verb>
|
|
|
|
<sect1>Cache manager configuration for CERN httpd 3.0
|
|
<P>
|
|
First, you should ensure that only specified workstations can access
|
|
the cache manager. That is done in your CERN <em/httpd.conf/, not in
|
|
<em/squid.conf/.
|
|
|
|
<verb>
|
|
Protection MGR-PROT {
|
|
Mask @(workstation.example.com)
|
|
}
|
|
</verb>
|
|
|
|
Wildcards are acceptable, IP addresses are acceptable, and others
|
|
can be added with a comma-separated list of IP addresses. There
|
|
are many more ways of protection. Your server documentation has
|
|
details.
|
|
|
|
<P>
|
|
You also need to add:
|
|
<verb>
|
|
Protect /Squid/* MGR-PROT
|
|
Exec /Squid/cgi-bin/*.cgi /usr/local/squid/bin/*.cgi
|
|
</verb>
|
|
This marks the script as executable to those in <tt/MGR-PROT/.
|
|
|
|
<sect1>Cache manager configuration for Apache
|
|
<P>
|
|
First, make sure the cgi-bin directory you're using is listed with a
|
|
<tt/ScriptAlias/ in your Apache <em/srm.conf/ file like this:
|
|
<verb>
|
|
ScriptAlias /Squid/cgi-bin/ /usr/local/squid/cgi-bin/
|
|
</verb>
|
|
It's probably a <bf/bad/ idea to <tt/ScriptAlias/
|
|
the entire <em//usr/local/squid/bin/ directory where all the
|
|
Squid executables live.
|
|
<P>
|
|
Next, you should ensure that only specified workstations can access
|
|
the cache manager. That is done in your Apache <em/access.conf/,
|
|
not in <em/squid.conf/. At the bottom of <em/access.conf/
|
|
file, insert:
|
|
<verb>
|
|
<Location /Squid/cgi-bin/cachemgr.cgi>
|
|
order deny,allow
|
|
deny from all
|
|
allow from workstation.example.com
|
|
&etago;Location>
|
|
</verb>
|
|
|
|
You can have more than one allow line, and you can allow
|
|
domains or networks.
|
|
<P>
|
|
Alternately, <em/cachemgr.cgi/ can be password-protected. You'd
|
|
add the following to <em/access.conf/:
|
|
|
|
<verb>
|
|
<Location /Squid/cgi-bin/cachemgr.cgi>
|
|
AuthUserFile /path/to/password/file
|
|
AuthGroupFile /dev/null
|
|
AuthName User/Password Required
|
|
AuthType Basic
|
|
require user cachemanager
|
|
&etago;Location>
|
|
</verb>
|
|
|
|
Consult the Apache documentation for information on using <em/htpasswd/
|
|
to set a password for this ``user.''
|
|
|
|
<sect1>Cache manager configuration for Roxen 2.0 and later
|
|
<p>
|
|
by Francesco ``kinkie'' Chemolli
|
|
<p>
|
|
Notice: this is <em/not/ how things would get best done
|
|
with Roxen, but this what you need to do go adhere to the
|
|
example.
|
|
Also, knowledge of basic Roxen configuration is required.
|
|
|
|
<p>
|
|
This is what's required to start up a fresh Virtual Server, only
|
|
serving the cache manager. If you already have some Virtual Server
|
|
you wish to use to host the Cache Manager, just add a new CGI
|
|
support module to it.
|
|
|
|
<p>
|
|
Create a new virtual server, and set it to host http://www.example.com/.
|
|
Add to it at least the following modules:
|
|
<itemize>
|
|
<item>Content Types
|
|
<item>CGI scripting support
|
|
</itemize>
|
|
|
|
<p>
|
|
In the <em/CGI scripting support/ module, section <em/Settings/,
|
|
change the following settings:
|
|
<itemize>
|
|
<item>CGI-bin path: set to /Squid/cgi-bin/
|
|
<item>Handle *.cgi: set to <em/no/
|
|
<item>Run user scripts as owner: set to <em/no/
|
|
<item>Search path: set to the directory containing the cachemgr.cgi file
|
|
</itemize>
|
|
|
|
<p>
|
|
In section <em/Security/, set <em/Patterns/ to:
|
|
<verb>
|
|
allow ip=1.2.3.4
|
|
</verb>
|
|
where 1.2.3.4 is the IP address for workstation.example.com
|
|
|
|
<p>
|
|
Save the configuration, and you're done.
|
|
|
|
<sect1>Cache manager ACLs in <em/squid.conf/
|
|
<P>
|
|
The default cache manager access configuration in <em/squid.conf/ is:
|
|
|
|
<verb>
|
|
acl manager proto cache_object
|
|
acl localhost src 127.0.0.1/255.255.255.255
|
|
acl all src 0.0.0.0/0.0.0.0
|
|
</verb>
|
|
|
|
With the following rules:
|
|
|
|
<verb>
|
|
http_access deny manager !localhost
|
|
http_access allow all
|
|
</verb>
|
|
|
|
<P>
|
|
The first ACL is the most important as the cache manager program
|
|
interrogates squid using a special <tt/cache_object/ protocol.
|
|
Try it yourself by doing:
|
|
<P>
|
|
<verb>
|
|
telnet mycache.example.com 3128
|
|
GET cache_object://mycache.example.com/info HTTP/1.0
|
|
</verb>
|
|
<P>
|
|
The default ACLs say that if the request is for a
|
|
<tt/cache_object/, and it isn't the local host, then deny
|
|
access; otherwise allow access.
|
|
|
|
<P>
|
|
In fact, only allowing localhost access means that on the
|
|
initial <em/cachemgr.cgi/ form you can only specify the cache
|
|
host as <tt/localhost/. We recommend the following:
|
|
|
|
<verb>
|
|
acl manager proto cache_object
|
|
acl localhost src 127.0.0.1/255.255.255.255
|
|
acl example src 123.123.123.123/255.255.255.255
|
|
acl all src 0.0.0.0/0.0.0.0
|
|
</verb>
|
|
|
|
Where <tt/123.123.123.123/ is the IP address of your web server.
|
|
Then modify the rules like this:
|
|
|
|
<verb>
|
|
http_access allow manager localhost
|
|
http_access allow manager example
|
|
http_access deny manager
|
|
http_access allow all
|
|
</verb>
|
|
If you're using <em/miss_access/, then don't forget to also add
|
|
a <em/miss_access/ rule for the cache manager:
|
|
<verb>
|
|
miss_access allow manager
|
|
</verb>
|
|
|
|
<P>
|
|
|
|
The default ACLs assume that your web server is on the same machine
|
|
as <em/squid/. Remember that the connection from the cache
|
|
manager program to squid originates at the web server, not the
|
|
browser. So if your web server lives somewhere else, you should
|
|
make sure that IP address of the web server that has <em/cachemgr.cgi/
|
|
installed on it is in the <tt/example/ ACL above.
|
|
|
|
<P>
|
|
Always be sure to send a <tt/SIGHUP/ signal to <em/squid/
|
|
any time you change the <em/squid.conf/ file.
|
|
|
|
<sect1>Why does it say I need a password and a URL?
|
|
<P>
|
|
If you ``drop'' the list box, and browse it, you will see that the
|
|
password is only required to shutdown the cache, and the URL is
|
|
required to refresh an object (i.e., retrieve it from its original
|
|
source again) Otherwise these fields can be left blank: a password
|
|
is not required to obtain access to the informational aspects of
|
|
<em/cachemgr.cgi/.
|
|
|
|
<sect1>I want to shutdown the cache remotely. What's the password?
|
|
<P>
|
|
See the <tt/cachemgr_passwd/ directive in <em/squid.conf/.
|
|
|
|
<sect1>How do I make the cache host default to <em/my/ cache?
|
|
<P>
|
|
When you run <em/configure/ use the <em/--enable-cachemgr-hostname/ option:
|
|
<verb>
|
|
% ./configure --enable-cachemgr-hostname=`hostname` ...
|
|
</verb>
|
|
<p>
|
|
Note, if you do this after you already installed Squid before, you need to
|
|
make sure <em/cachemgr.cgi/ gets recompiled. For example:
|
|
<verb>
|
|
% cd src
|
|
% rm cachemgr.o cachemgr.cgi
|
|
% make cachemgr.cgi
|
|
</verb>
|
|
<p>
|
|
Then copy <em/cachemgr.cgi/ to your HTTP server's <em/cgi-bin/ directory.
|
|
|
|
<sect1>What's the difference between Squid TCP connections and Squid UDP connections?
|
|
<P>
|
|
Browsers and caches use TCP connections to retrieve web objects
|
|
from web servers or caches. UDP connections are used when another
|
|
cache using you as a sibling or parent wants to find out if you
|
|
have an object in your cache that it's looking for. The UDP
|
|
connections are ICP queries.
|
|
|
|
<sect1>It says the storage expiration will happen in 1970!
|
|
<P>
|
|
Don't worry. The default (and sensible) behavior of <em/squid/
|
|
is to expire an object when it happens to overwrite it. It doesn't
|
|
explicitly garbage collect (unless you tell it to in other ways).
|
|
|
|
<sect1>What do the Meta Data entries mean?
|
|
<P>
|
|
<descrip>
|
|
|
|
<tag/StoreEntry/
|
|
Entry describing an object in the cache.
|
|
|
|
<tag/IPCacheEntry/
|
|
An entry in the DNS cache.
|
|
|
|
<tag/Hash link/
|
|
Link in the cache hash table structure.
|
|
|
|
<tag/URL strings/
|
|
The strings of the URLs themselves that map to
|
|
an object number in the cache, allowing access to the
|
|
StoreEntry.
|
|
|
|
</descrip>
|
|
|
|
<P>
|
|
Basically just like the <tt/log/ file in your cache directory:
|
|
|
|
<enum>
|
|
<item><tt/PoolMemObject structures/
|
|
<item>Info about objects currently in memory,
|
|
(eg, in the process of being transferred).
|
|
<item><tt/Pool for Request structures/
|
|
<item>Information about each request as it happens.
|
|
<item><tt/Pool for in-memory object/
|
|
<item>Space for object data as it is retrieved.
|
|
</enum>
|
|
|
|
<P>
|
|
If <em/squid/ is much smaller than this field, run for cover!
|
|
Something is very wrong, and you should probably restart <em/squid/.
|
|
|
|
<sect1>In the utilization section, what is <tt/Other/?
|
|
<P>
|
|
|
|
<tt/Other/ is a default category to track objects which
|
|
don't fall into one of the defined categories.
|
|
|
|
<sect1>In the utilization section, why is the <tt>Transfer KB/sec</tt>
|
|
column always zero?
|
|
<P>
|
|
This column contains gross estimations of data transfer rates
|
|
averaged over the entire time the cache has been running. These
|
|
numbers are unreliable and mostly useless.
|
|
|
|
<sect1>In the utilization section, what is the <tt/Object Count/?
|
|
<P>
|
|
The number of objects of that type in the cache right now.
|
|
|
|
<sect1>In the utilization section, what is the <tt>Max/Current/Min KB</tt>?
|
|
<P>
|
|
These refer to the size all the objects of this type have grown
|
|
to/currently are/shrunk to.
|
|
|
|
<sect1>What is the <tt>I/O</tt> section about?
|
|
<P>
|
|
These are histograms on the number of bytes read from the network
|
|
per <tt/read(2)/ call. Somewhat useful for determining
|
|
maximum buffer sizes.
|
|
|
|
<sect1>What is the <tt/Objects/ section for?
|
|
<P>
|
|
<bf><em/Warning:/</bf> this will download to your browser
|
|
a list of every URL in the cache and statistics about it. It can
|
|
be very, very large. <bf><em/Sometimes it will be larger than
|
|
the amount of available memory in your client!/</bf> You
|
|
probably don't need this information anyway.
|
|
|
|
<sect1>What is the <tt/VM Objects/ section for?
|
|
<P>
|
|
<tt/VM Objects/ are the objects which are in Virtual Memory.
|
|
These are objects which are currently being retrieved and
|
|
those which were kept in memory for fast access (accelerator
|
|
mode).
|
|
|
|
<sect1>What does <tt/AVG RTT/ mean?
|
|
<P>
|
|
Average Round Trip Time. This is how long on average after
|
|
an ICP ping is sent that a reply is received.
|
|
|
|
<sect1>In the IP cache section, what's the difference between a hit, a negative hit and a miss?
|
|
<P>
|
|
|
|
A HIT means that the document was found in the cache. A
|
|
MISS, that it wasn't found in the cache. A negative hit
|
|
means that it was found in the cache, but it doesn't exist.
|
|
|
|
<sect1>What do the IP cache contents mean anyway?
|
|
<P>
|
|
|
|
The hostname is the name that was requested to be resolved.
|
|
|
|
<P>
|
|
For the <tt/Flags/ column:
|
|
|
|
<itemize>
|
|
<item><tt/C/ Means positively cached.
|
|
<item><tt/N/ Means negatively cached.
|
|
<item><tt/P/ Means the request is pending being dispatched.
|
|
<item><tt/D/ Means the request has been dispatched and we're waiting for an answer.
|
|
<item><tt/L/ Means it is a locked entry because it represents a parent or sibling.
|
|
</itemize>
|
|
|
|
The <tt/TTL/ column represents ``Time To Live'' (i.e., how long
|
|
the cache entry is valid). (May be negative if the document has
|
|
expired.)
|
|
|
|
<P>
|
|
The <tt/N/ column is the number of IP addresses from which
|
|
the cache has documents.
|
|
|
|
<P>
|
|
The rest of the line lists all the IP addresses that have been associated
|
|
with that IP cache entry.
|
|
<P>
|
|
|
|
<sect1>What is the fqdncache and how is it different from the ipcache?
|
|
<P>
|
|
IPCache contains data for the Hostname to IP-Number mapping, and
|
|
FQDNCache does it the other way round. For example:
|
|
|
|
<em/IP Cache Contents:/
|
|
<verb>
|
|
Hostname Flags lstref TTL N [IP-Number]
|
|
gorn.cc.fh-lippe.de C 0 21581 1 193.16.112.73
|
|
lagrange.uni-paderborn.de C 6 21594 1 131.234.128.245
|
|
www.altavista.digital.com C 10 21299 4 204.123.2.75 ...
|
|
2/ftp.symantec.com DL 1583 -772855 0
|
|
|
|
Flags: C --> Cached
|
|
D --> Dispatched
|
|
N --> Negative Cached
|
|
L --> Locked
|
|
lstref: Time since last use
|
|
TTL: Time-To-Live until information expires
|
|
N: Count of addresses
|
|
</verb>
|
|
|
|
<P>
|
|
<em/FQDN Cache Contents:/
|
|
<verb>
|
|
IP-Number Flags TTL N Hostname
|
|
130.149.17.15 C -45570 1 andele.cs.tu-berlin.de
|
|
194.77.122.18 C -58133 1 komet.teuto.de
|
|
206.155.117.51 N -73747 0
|
|
|
|
Flags: C --> Cached
|
|
D --> Dispatched
|
|
N --> Negative Cached
|
|
L --> Locked
|
|
TTL: Time-To-Live until information expires
|
|
N: Count of names
|
|
</verb>
|
|
|
|
<sect1>What does ``Page faults with physical i/o: 4897'' mean?
|
|
<label id="paging">
|
|
|
|
<P>
|
|
This question was asked on the <em/squid-users/ mailing list, to which
|
|
there were three excellent replies.
|
|
|
|
<P>
|
|
by <url url="mailto:JLarmour@origin-at.co.uk" name="Jonathan Larmour">
|
|
|
|
<P>
|
|
You get a ``page fault'' when your OS tries to access something in memory
|
|
which is actually swapped to disk. The term ``page fault'' while correct at
|
|
the kernel and CPU level, is a bit deceptive to a user, as there's no
|
|
actual error - this is a normal feature of operation.
|
|
|
|
<P>
|
|
Also, this doesn't necessarily mean your squid is swapping by that much.
|
|
Most operating systems also implement paging for executables, so that only
|
|
sections of the executable which are actually used are read from disk into
|
|
memory. Also, whenever squid needs more memory, the fact that the memory
|
|
was allocated will show up in the page faults.
|
|
|
|
<P>
|
|
However, if the number of faults is unusually high, and getting bigger,
|
|
this could mean that squid is swapping. Another way to verify this is using
|
|
a program called ``vmstat'' which is found on most UNIX platforms. If you run
|
|
this as ``vmstat 5'' this will update a display every 5 seconds. This can
|
|
tell you if the system as a whole is swapping a lot (see your local man
|
|
page for vmstat for more information).
|
|
|
|
<P>
|
|
It is very bad for squid to swap, as every single request will be blocked
|
|
until the requested data is swapped in. It is better to tweak the <em/cache_mem/
|
|
and/or <em/memory_pools/ setting in squid.conf, or switch to the NOVM versions
|
|
of squid, than allow this to happen.
|
|
|
|
<P>
|
|
by <url url="mailto:peter@spinner.dialix.com.au" name="Peter Wemm">
|
|
|
|
<P>
|
|
There's two different operations at work, Paging and swapping. Paging
|
|
is when individual pages are shuffled (either discarded or swapped
|
|
to/from disk), while ``swapping'' <em/generally/ means the entire
|
|
process got sent to/from disk.
|
|
|
|
<P>
|
|
Needless to say, swapping a process is a pretty drastic event, and usually
|
|
only reserved for when there's a memory crunch and paging out cannot free
|
|
enough memory quickly enough. Also, there's some variation on how
|
|
swapping is implemented in OS's. Some don't do it at all or do a hybrid
|
|
of paging and swapping instead.
|
|
|
|
<P>
|
|
As you say, paging out doesn't necessarily involve disk IO, eg: text (code)
|
|
pages are read-only and can simply be discarded if they are not used (and
|
|
reloaded if/when needed). Data pages are also discarded if unmodified, and
|
|
paged out if there's been any changes. Allocated memory (malloc) is always
|
|
saved to disk since there's no executable file to recover the data from.
|
|
mmap() memory is variable.. If it's backed from a file, it uses the same
|
|
rules as the data segment of a file - ie: either discarded if unmodified or
|
|
paged out.
|
|
|
|
<P>
|
|
There's also ``demand zeroing'' of pages as well that cause faults.. If you
|
|
malloc memory and it calls brk()/sbrk() to allocate new pages, the chances
|
|
are that you are allocated demand zero pages. Ie: the pages are not
|
|
``really'' attached to your process yet, but when you access them for the
|
|
first time, the page fault causes the page to be connected to the process
|
|
address space and zeroed - this saves unnecessary zeroing of pages that are
|
|
allocated but never used.
|
|
|
|
<P>
|
|
The ``page faults with physical IO'' comes from the OS via getrusage(). It's
|
|
highly OS dependent on what it means. Generally, it means that the process
|
|
accessed a page that was not present in memory (for whatever reason) and
|
|
there was disk access to fetch it. Many OS's load executables by demand
|
|
paging as well, so the act of starting squid implicitly causes page faults
|
|
with disk IO - however, many (but not all) OS's use ``read ahead'' and
|
|
``prefault'' heuristics to streamline the loading. Some OS's maintain
|
|
``intent queues'' so that pages can be selected as pageout candidates ahead
|
|
of time. When (say) squid touches a freshly allocated demand zero page and
|
|
one is needed, the OS can page out one of the candidates on the spot,
|
|
causing a 'fault with physical IO' with demand zeroing of allocated memory
|
|
which doesn't happen on many other OS's. (The other OS's generally put
|
|
the process to sleep while the pageout daemon finds a page for it).
|
|
|
|
<P>
|
|
The meaning of ``swapping'' varies. On FreeBSD for example, swapping out is
|
|
implemented as unlocking upages, kernel stack, PTD etc for aggressive
|
|
pageout with the process. The only thing left of the process in memory is
|
|
the 'struct proc'. The FreeBSD paging system is highly adaptive and can
|
|
resort to paging in a way that is equivalent to the traditional swapping
|
|
style operation (ie: entire process). FreeBSD also tries stealing pages
|
|
from active processes in order to make space for disk cache. I suspect
|
|
this is why setting 'memory_pools off' on the non-NOVM squids on FreeBSD is
|
|
reported to work better - the VM/buffer system could be competing with
|
|
squid to cache the same pages. It's a pity that squid cannot use mmap() to
|
|
do file IO on the 4K chunks in it's memory pool (I can see that this is not
|
|
a simple thing to do though, but that won't stop me wishing. :-).
|
|
|
|
<P>
|
|
by <url url="mailto:webadm@info.cam.ac.uk" name="John Line">
|
|
|
|
<P>
|
|
The comments so far have been about what paging/swapping figures mean in
|
|
a ``traditional'' context, but it's worth bearing in mind that on some systems
|
|
(Sun's Solaris 2, at least), the virtual memory and filesystem handling are
|
|
unified and what a user process sees as reading or writing a file, the system
|
|
simply sees as paging something in from disk or a page being updated so it
|
|
needs to be paged out. (I suppose you could view it as similar to the operating
|
|
system memory-mapping the files behind-the-scenes.)
|
|
|
|
<P>
|
|
The effect of this is that on Solaris 2, paging figures will also include file
|
|
I/O. Or rather, the figures from vmstat certainly appear to include file I/O,
|
|
and I presume (but can't quickly test) that figures such as those quoted by
|
|
Squid will also include file I/O.
|
|
|
|
<P>
|
|
To confirm the above (which represents an impression from what I've read and
|
|
observed, rather than 100% certain facts...), using an otherwise idle Sun Ultra
|
|
1 system system I just tried using cat (small, shouldn't need to page) to copy
|
|
(a) one file to another, (b) a file to /dev/null, (c) /dev/zero to a file, and
|
|
(d) /dev/zero to /dev/null (interrupting the last two with control-C after a
|
|
while!), while watching with vmstat. 300-600 page-ins or page-outs per second
|
|
when reading or writing a file (rather than a device), essentially zero in
|
|
other cases (and when not cat-ing).
|
|
|
|
<P>
|
|
So ... beware assuming that all systems are similar and that paging figures
|
|
represent *only* program code and data being shuffled to/from disk - they
|
|
may also include the work in reading/writing all those files you were
|
|
accessing...
|
|
|
|
<sect2>Ok, so what is unusually high?
|
|
|
|
<P>
|
|
You'll probably want to compare the number of page faults to the number of
|
|
HTTP requests. If this ratio is close to, or exceeding 1, then
|
|
Squid is paging too much.
|
|
|
|
<sect1>What does the IGNORED field mean in the 'cache server list'?
|
|
<P>
|
|
This refers to ICP replies which Squid ignored, for one of these
|
|
reasons:
|
|
<itemize>
|
|
<item>
|
|
The URL in the reply could not be found in the cache at all.
|
|
<item>
|
|
The URL in the reply was already being fetched. Probably
|
|
this ICP reply arrived too late.
|
|
<item>
|
|
The URL in the reply did not have a MemObject associated with
|
|
it. Either the request is already finished, or the user aborted
|
|
before the ICP arrived.
|
|
<item>
|
|
The reply came from a multicast-responder, but the
|
|
<em/cache_peer_access/ configuration does not allow us to
|
|
forward this request to that
|
|
neighbor.
|
|
<item>
|
|
Source-Echo replies from known neighbors are ignored.
|
|
<item>
|
|
ICP_OP_DENIED replies are ignored after the first 100.
|
|
</itemize>
|
|
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Access Controls
|
|
<label id="access-controls">
|
|
|
|
<sect1>Introduction
|
|
<p>
|
|
Squid's access control scheme is relatively comprehensive and difficult
|
|
for some people to understand. There are two different components: <em/ACL elements/,
|
|
and <em/access lists/. An access list consists of an <em/allow/ or <em/deny/ action
|
|
followed by a number of ACL elements.
|
|
|
|
<sect2>ACL elements
|
|
<p>
|
|
<em>Note: The information here is current for version 2.4.</em>
|
|
<p>
|
|
Squid knows about the following types of ACL elements:
|
|
<itemize>
|
|
<item>
|
|
<bf/src/: source (client) IP addresses
|
|
<item>
|
|
<bf/dst/: destination (server) IP addresses
|
|
<item>
|
|
<bf/myip/: the local IP address of a client's connection
|
|
<item>
|
|
<bf/srcdomain/: source (client) domain name
|
|
<item>
|
|
<bf/dstdomain/: destination (server) domain name
|
|
<item>
|
|
<bf/srcdom_regex/: source (client) regular expression pattern matching
|
|
<item>
|
|
<bf/dstdom_regex/: destination (server) regular expression pattern matching
|
|
<item>
|
|
<bf/time/: time of day, and day of week
|
|
<item>
|
|
<bf/url_regex/: URL regular expression pattern matching
|
|
<item>
|
|
<bf/urlpath_regex/: URL-path regular expression pattern matching, leaves out the protocol and hostname
|
|
<item>
|
|
<bf/port/: destination (server) port number
|
|
<item>
|
|
<bf/myport/: local port number that client connected to
|
|
<item>
|
|
<bf/proto/: transfer protocol (http, ftp, etc)
|
|
<item>
|
|
<bf/method/: HTTP request method (get, post, etc)
|
|
<item>
|
|
<bf/browser/: regular expression pattern matching on the request's user-agent header
|
|
<item>
|
|
<bf/ident/: string matching on the user's name
|
|
<item>
|
|
<bf/ident_regex/: regular expression pattern matching on the user's name
|
|
<item>
|
|
<bf/src_as/: source (client) Autonomous System number
|
|
<item>
|
|
<bf/dst_as/: destination (server) Autonomous System number
|
|
<item>
|
|
<bf/proxy_auth/: user authentication via external processes
|
|
<item>
|
|
<bf/proxy_auth_regex/: user authentication via external processes
|
|
<item>
|
|
<bf/snmp_community/: SNMP community string matching
|
|
<item>
|
|
<bf/maxconn/: a limit on the maximum number of connections from a single client IP address
|
|
<item>
|
|
<bf/req_mime_type/: regular expression pattern matching on the request content-type header
|
|
<item>
|
|
<bf/arp/: Ethernet (MAC) address matching
|
|
</itemize>
|
|
|
|
<p>
|
|
Notes:
|
|
|
|
<p>
|
|
Not all of the ACL elements can be used with all types of access lists (described below).
|
|
For example, <em/snmp_community/ is only meaningful when used with <em/snmp_access/. The
|
|
<em/src_as/ and <em/dst_as/ types are only used in <em/cache_peer_access/ access lists.
|
|
|
|
<p>
|
|
The <em/arp/ ACL requires the special configure option --enable-arp-acl. Furthermore, the
|
|
ARP ACL code is not portable to all operating systems. It works on Linux, Solaris, and
|
|
some *BSD variants.
|
|
|
|
<p>
|
|
The SNMP ACL element and access list require the --enable-snmp configure option.
|
|
|
|
<p>
|
|
Some ACL elements can cause processing delays. For example, use of <em/src_domain/ and <em/srcdom_regex/
|
|
require a reverse DNS lookup on the client's IP address. This lookup adds some delay to the request.
|
|
|
|
<p>
|
|
Each ACL element is assigned a unique <em/name/. A named ACL element consists of a <em/list of values/.
|
|
When checking for a match, the multiple values use OR logic. In other words, an ACL element is <em/matched/
|
|
when any one of its values is a match.
|
|
|
|
<p>
|
|
You can't give the same name to two different types of ACL elements. It will generate a syntax error.
|
|
|
|
<p>
|
|
You can put different values for the same ACL name on different lines. Squid combines them into
|
|
one list.
|
|
|
|
<sect2>Access Lists
|
|
<p>
|
|
There are a number of different access lists:
|
|
<itemize>
|
|
<item>
|
|
<bf/http_access/: Allows HTTP clients (browsers) to access the HTTP port. This is the primary access control list.
|
|
<item>
|
|
<bf/icp_access/: Allows neighbor caches to query your cache with ICP.
|
|
<item>
|
|
<bf/miss_access/: Allows certain clients to forward cache misses through your cache.
|
|
<item>
|
|
<bf/no_cache/: Defines responses that should not be cached.
|
|
<item>
|
|
<bf/redirector_access/: Controls which requests are sent through the redirector pool.
|
|
<item>
|
|
<bf/ident_lookup_access/: Controls which requests need an Ident lookup.
|
|
<item>
|
|
<bf/always_direct/: Controls which requests should always be forwarded directly to origin servers.
|
|
<item>
|
|
<bf/never_direct/: Controls which requests should never be forwarded directly to origin servers.
|
|
<item>
|
|
<bf/snmp_access/: Controls SNMP client access to the cache.
|
|
<item>
|
|
<bf/broken_posts/: Defines requests for which squid appends an extra CRLF after POST message bodies as required by some broken origin servers.
|
|
<item>
|
|
<bf/cache_peer_access/: Controls which requests can be forwarded to a given neighbor (peer).
|
|
</itemize>
|
|
|
|
<p>
|
|
Notes:
|
|
|
|
<p>
|
|
An access list <em/rule/ consists of an <em/allow/ or <em/deny/ keyword, followed by a list of ACL element names.
|
|
|
|
<p>
|
|
An access list consists of one or more access list rules.
|
|
|
|
<p>
|
|
Access list rules are checked in the order they are written. List searching terminates as soon as one
|
|
of the rules is a match.
|
|
|
|
<p>
|
|
If a rule has multiple ACL elements, it uses AND logic. In other
|
|
words, <em/all/ ACL elements of the rule must be a match in order
|
|
for the rule to be a match. This means that it is possible to
|
|
write a rule that can never be matched. For example, a port number
|
|
can never be equal to both 80 AND 8000 at the same time.
|
|
|
|
<p>
|
|
If none of the rules are matched, then the default action is the
|
|
<em/opposite/ of the last rule in the list. Its a good idea to
|
|
be explicit with the default action. The best way is to thse
|
|
the <em/all/ ACL. For example:
|
|
<verb>
|
|
acl all src 0/0
|
|
http_access deny all
|
|
</verb>
|
|
|
|
|
|
<sect1>How do I allow my clients to use the cache?
|
|
<p>
|
|
Define an ACL that corresponds to your client's IP addresses.
|
|
For example:
|
|
<verb>
|
|
acl myclients src 172.16.5.0/24
|
|
</verb>
|
|
Next, allow those clients in the <em/http_access/ list:
|
|
<verb>
|
|
http_access allow myclients
|
|
</verb>
|
|
|
|
<sect1>how do I configure Squid not to cache a specific server?
|
|
<p>
|
|
<verb>
|
|
acl someserver dstdomain .someserver.com
|
|
no_cache deny someserver
|
|
</verb>
|
|
|
|
|
|
<sect1>How do I implement an ACL ban list?
|
|
|
|
<P>
|
|
As an example, we will assume that you would like to prevent users from
|
|
accessing cooking recipes.
|
|
|
|
<P>
|
|
One way to implement this would be to deny access to any URLs
|
|
that contain the words ``cooking'' or ``recipe.''
|
|
You would use these configuration lines:
|
|
<verb>
|
|
acl Cooking1 url_regex cooking
|
|
acl Recipe1 url_regex recipe
|
|
http_access deny Cooking1
|
|
http_access deny Recipe1
|
|
http_access allow all
|
|
</verb>
|
|
The <em/url_regex/ means to search the entire URL for the regular
|
|
expression you specify. Note that these regular expressions are case-sensitive,
|
|
so a url containing ``Cooking'' would not be denied.
|
|
|
|
<P>
|
|
Another way is to deny access to specific servers which are known
|
|
to hold recipes. For example:
|
|
<verb>
|
|
acl Cooking2 dstdomain gourmet-chef.com
|
|
http_access deny Cooking2
|
|
http_access allow all
|
|
</verb>
|
|
The <em/dstdomain/ means to search the hostname in the URL for the
|
|
string ``gourmet-chef.com.''
|
|
Note that when IP addresses are used in URLs (instead of domain names),
|
|
Squid-1.1 implements relaxed access controls. If the a domain name
|
|
for the IP address has been saved in Squid's ``FQDN cache,'' then
|
|
Squid can compare the destination domain against the access controls.
|
|
However, if the domain is not immediately available, Squid allows
|
|
the request and makes a lookup for the IP address so that it may
|
|
be available for future reqeusts.
|
|
|
|
|
|
|
|
<sect1>How do I block specific users or groups from accessing my cache?
|
|
|
|
<sect2>Ident
|
|
<P>
|
|
You can use
|
|
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc931.txt"
|
|
name="ident lookups">
|
|
to allow specific users access to your cache. This requires that an
|
|
<url url="ftp://ftp.lysator.liu.se/pub/ident/servers"
|
|
name="ident server">
|
|
process runs on the user's machine(s).
|
|
In your <em/squid.conf/ configuration
|
|
file you would write something like this:
|
|
<verb>
|
|
ident_lookup on
|
|
acl friends user kim lisa frank joe
|
|
http_access allow friends
|
|
http_access deny all
|
|
</verb>
|
|
|
|
<sect2>Proxy Authentication
|
|
<label id="proxy-auth-acl">
|
|
<P>
|
|
Another option is to use proxy-authentication. In this scheme, you assign
|
|
usernames and passwords to individuals. When they first use the proxy
|
|
they are asked to authenticate themselves by entering their username and
|
|
password.
|
|
|
|
<P>
|
|
In Squid v2 this authentication is hanled via external processes. For
|
|
information on how to configure this, please see
|
|
<ref id="configuring-proxy-auth" name="Configuring Proxy Authentication">.
|
|
|
|
<sect1>Do you have a CGI program which lets users change their own proxy passwords?
|
|
<P>
|
|
<url url="mailto:orso@ineparnet.com.br" name="Pedro L Orso">
|
|
has adapted the Apache's <em/htpasswd/ into a CGI program
|
|
called <url url="/htpasswd/chpasswd-cgi.tar.gz" name="chpasswd.cgi">.
|
|
|
|
|
|
|
|
<sect1>Is there a way to do ident lookups only for a certain host and compare the result with a userlist in squid.conf?
|
|
|
|
<P>
|
|
Sort of.
|
|
|
|
<P>
|
|
If you use a <em/user/ ACL in squid conf, then Squid will perform
|
|
an
|
|
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc931.txt"
|
|
name="ident lookup">
|
|
for every client request. In other words, Squid-1.1 will perform
|
|
ident lookups for all requests or no requests. Defining a <em/user/ ACL
|
|
enables ident lookups, regardless of the <em/ident_lookup/ setting.
|
|
|
|
<P>
|
|
However, even though ident lookups are performed for every request, Squid does
|
|
not wait for the lookup to complete unless the ACL rules require it. Consider this
|
|
configuration:
|
|
<verb>
|
|
acl host1 src 10.0.0.1
|
|
acl host2 src 10.0.0.2
|
|
acl pals user kim lisa frank joe
|
|
http_access allow host1
|
|
http_access allow host2 pals
|
|
</verb>
|
|
Requests coming from 10.0.0.1 will be allowed immediately because
|
|
there are no user requirements for that host. However, requests
|
|
from 10.0.0.2 will be allowed only after the ident lookup completes, and
|
|
if the username is in the set kim, lisa, frank, or joe.
|
|
|
|
<sect1>Common Mistakes
|
|
|
|
<sect2>And/Or logic
|
|
|
|
<P>
|
|
You've probably noticed (and been frustrated by) the fact that
|
|
you cannot combine access controls with terms like ``and'' or ``or.''
|
|
These operations are already built in to the access control scheme
|
|
in a fundamental way which you must understand.
|
|
<itemize>
|
|
<item>
|
|
<bf>All elements of an <em/acl/ entry are OR'ed together</bf>.
|
|
<item>
|
|
<bf>All elements of an <em/access/ entry are AND'ed together</bf>.
|
|
e.g. <em/http_access/ and <em/icp_access/.
|
|
</itemize>
|
|
|
|
<P>
|
|
For example, the following access control configuration will never work:
|
|
<verb>
|
|
acl ME src 10.0.0.1
|
|
acl YOU src 10.0.0.2
|
|
http_access allow ME YOU
|
|
</verb>
|
|
In order for the request to be allowed, it must match the ``ME'' acl AND the ``YOU'' acl.
|
|
This is impossible because any IP address could only match one or the other. This
|
|
should instead be rewritten as:
|
|
<verb>
|
|
acl ME src 10.0.0.1
|
|
acl YOU src 10.0.0.2
|
|
http_access allow ME
|
|
http_access allow YOU
|
|
</verb>
|
|
Or, alternatively, this would also work:
|
|
<verb>
|
|
acl US src 10.0.0.1 10.0.0.2
|
|
http_access allow US
|
|
</verb>
|
|
|
|
<sect2>allow/deny mixups
|
|
|
|
<P>
|
|
<it>
|
|
I have read through my squid.conf numerous times, spoken to my
|
|
neighbors, read the FAQ and Squid Docs and cannot for the life of
|
|
me work out why the following will not work.
|
|
</it>
|
|
|
|
<P>
|
|
<it>
|
|
I can successfully access cachemgr.cgi from our web server machine here,
|
|
but I would like to use MRTG to monitor various aspects of our proxy.
|
|
When I try to use 'client' or GET cache_object from the machine the
|
|
proxy is running on, I always get access denied.
|
|
</it>
|
|
|
|
<verb>
|
|
acl manager proto cache_object
|
|
acl localhost src 127.0.0.1/255.255.255.255
|
|
acl server src 1.2.3.4/255.255.255.255
|
|
acl all src 0.0.0.0/0.0.0.0
|
|
acl ourhosts src 1.2.0.0/255.255.0.0
|
|
|
|
http_access deny manager !localhost !server
|
|
http_access allow ourhosts
|
|
http_access deny all
|
|
</verb>
|
|
|
|
<P>
|
|
The intent here is to allow cache manager requests from the <em/localhost/
|
|
and <em/server/ addresses, and deny all others. This policy has been
|
|
expressed here:
|
|
<verb>
|
|
http_access deny manager !localhost !server
|
|
</verb>
|
|
|
|
<P>
|
|
The problem here is that for allowable requests, this access rule is
|
|
not matched. For example, if the source IP address is <em/localhost/,
|
|
then ``!localhost'' is <em/false/ and the access rule is not matched, so
|
|
Squid continues checking the other rules. Cache manager requests from
|
|
the <em/server/ address work because <em/server/ is a subset of <em/ourhosts/
|
|
and the second access rule will match and allow the request. Also note that
|
|
this means any cache manager request from <em/ourhosts/ would be allowed.
|
|
|
|
<P>
|
|
To implement the desired policy correctly, the access rules should be
|
|
rewritten as
|
|
<verb>
|
|
http_access allow manager localhost
|
|
http_access allow manager server
|
|
http_access deny manager
|
|
http_access allow ourhosts
|
|
http_access deny all
|
|
</verb>
|
|
If you're using <em/miss_access/, then don't forget to also add
|
|
a <em/miss_access/ rule for the cache manager:
|
|
<verb>
|
|
miss_access allow manager
|
|
</verb>
|
|
|
|
<P>
|
|
You may be concerned that the having five access rules instead of three
|
|
may have an impact on the cache performance. In our experience this is
|
|
not the case. Squid is able to handle a moderate amount of access control
|
|
checking without degrading overall performance. You may like to verify
|
|
that for yourself, however.
|
|
|
|
<sect2>Differences between <em/src/ and <em/srcdomain/ ACL types.
|
|
|
|
<P>
|
|
For the <em/srcdomain/ ACL type, Squid does a reverse lookup
|
|
of the client's IP address and checks the result with the domains
|
|
given on the <em/acl/ line. With the <em/src/ ACL type, Squid
|
|
converts hostnames to IP addresses at startup and then only compares
|
|
the client's IP address. The <em/src/ ACL is preferred over <em/srcdomain/
|
|
because it does not require address-to-name lookups for each request.
|
|
|
|
|
|
<sect1>I set up my access controls, but they don't work! why?
|
|
|
|
<P>
|
|
You can debug your access control configuration by setting the
|
|
<em/debug_options/ parameter in <em/squid.conf/ and
|
|
watching <em/cache.log/ as requests are made. The access control
|
|
routes correspond to debug section 28, so you might enter:
|
|
<verb>
|
|
debug_options ALL,1 28,9
|
|
</verb>
|
|
|
|
<sect1>Proxy-authentication and neighbor caches
|
|
<P>
|
|
The problem...
|
|
<quote>
|
|
<verb>
|
|
[ Parents ]
|
|
/ \
|
|
/ \
|
|
[ Proxy A ] --- [ Proxy B ]
|
|
|
|
|
|
|
|
USER
|
|
</verb>
|
|
<P>
|
|
<it>
|
|
Proxy A sends and ICP query to Proxy B about an object, Proxy B replies with an
|
|
ICP_HIT. Proxy A forwards the HTTP request to Proxy B, but
|
|
does not pass on the authentication details, therefore the HTTP GET from
|
|
Proxy A fails.
|
|
</it>
|
|
</quote>
|
|
|
|
<P>
|
|
Only ONE proxy cache in a chain is allowed to ``use'' the Proxy-Authentication
|
|
request header. Once the header is used, it must not be passed on to
|
|
other proxies.
|
|
|
|
<P>
|
|
Therefore, you must allow the neighbor caches to request from each other
|
|
without proxy authentication. This is simply accomplished by listing
|
|
the neighbor ACL's first in the list of <em/http_access/ lines. For example:
|
|
<verb>
|
|
acl proxy-A src 10.0.0.1
|
|
acl proxy-B src 10.0.0.2
|
|
acl user_passwords proxy_auth /tmp/user_passwds
|
|
|
|
http_access allow proxy-A
|
|
http_access allow proxy-B
|
|
http_access allow user_passwords
|
|
http_access deny all
|
|
</verb>
|
|
|
|
<sect1>Is there an easy way of banning all Destination addresses except one?
|
|
<P>
|
|
<verb>
|
|
acl GOOD dst 10.0.0.1
|
|
acl BAD dst 0.0.0.0/0.0.0.0
|
|
http_access allow GOOD
|
|
http_access deny BAD
|
|
</verb>
|
|
|
|
<sect1>Does anyone have a ban list of porn sites and such?
|
|
|
|
<P>
|
|
<itemize>
|
|
<item><url url="http://web.onda.com.br/orso/" name="Pedro Lineu Orso's List">
|
|
<item><url url="http://www.hklc.com/squidblock/" name="Linux Center Hong Kong's List">
|
|
<item>
|
|
Snerpa, an ISP in Iceland operates a DNS-database of
|
|
IP-addresses of blacklisted sites containing porn, violence,
|
|
etc. which is utilized using a small perl-script redirector.
|
|
Information on this on the <url
|
|
url="http://www.snerpa.is/notendur/infilter/infilter-en.phtml"
|
|
name="INfilter"> webpage.
|
|
</itemize>
|
|
|
|
<sect1>Squid doesn't match my subdomains
|
|
<P>
|
|
There is a subtle problem with domain-name based access controls
|
|
when a single ACL element has an entry that is a subdomain of
|
|
another entry. For example, consider this list:
|
|
<verb>
|
|
acl FOO dstdomain boulder.co.us vail.co.us co.us
|
|
</verb>
|
|
<P>
|
|
In the first place, the above list is simply wrong because
|
|
the first two (<em/boulder.co.us/ and <em/vail.co.us/) are
|
|
unnecessary. Any domain name that matches one of the first two
|
|
will also match the last one (<em/co.us/). Ok, but why does this
|
|
happen?
|
|
|
|
<P>
|
|
The problem stems from the data structure used to index domain
|
|
names in an access control list. Squid uses <em/Splay trees/
|
|
for lists of domain names. As other tree-based data structures,
|
|
the searching algorithm requires a comparison function that returns
|
|
-1, 0, or +1 for any pair of keys (domain names). This is similar
|
|
to the way that <em/strcmp()/ works.
|
|
|
|
<P>
|
|
The problem is that it is wrong to say that <em/co.us/ is greater-than,
|
|
equal-to, or less-than <em/boulder.co.us/.
|
|
|
|
<P>
|
|
For example, if you
|
|
said that <em/co.us/ is LESS than <em/fff.co.us/, then
|
|
the Splay tree searching algorithm might never discover
|
|
<em/co.us/ as a match for <em/kkk.co.us/.
|
|
|
|
<P>
|
|
similarly, if you said that <em/co.us/ is GREATER than <em/fff.co.us/,
|
|
then the Splay tree searching algorithm might never
|
|
discover <em/co.us/ as a match for <em/bbb.co.us/.
|
|
|
|
<P>
|
|
The bottom line is that you can't have one entry that is a subdomain
|
|
of another. Squid-2.2 will warn you if it detects this condition.
|
|
|
|
<sect1>Why does Squid deny some port numbers?
|
|
<P>
|
|
It is dangerous to allow Squid to connect to certain port numbers.
|
|
For example, it has been demonstrated that someone can use Squid
|
|
as an SMTP (email) relay. As I'm sure you know, SMTP relays are
|
|
one of the ways that spammers are able to flood our mailboxes.
|
|
To prevent mail relaying, Squid denies requests when the URL port
|
|
number is 25. Other ports should be blocked as well, as a precaution.
|
|
|
|
<P>
|
|
There are two ways to filter by port number: either allow specific
|
|
ports, or deny specific ports. By default, Squid does the first. This
|
|
is the ACL entry that comes in the default <em/squid.conf/:
|
|
<verb>
|
|
acl Safe_ports port 80 21 443 563 70 210 1025-65535
|
|
http_access deny !Safe_ports
|
|
</verb>
|
|
The above configuration denies requests when the URL port number is
|
|
not in the list. The list allows connections to the standard
|
|
ports for HTTP, FTP, Gopher, SSL, WAIS, and all non-priveleged
|
|
ports.
|
|
|
|
<P>
|
|
Another approach is to deny dangerous ports. The dangerous
|
|
port list should look something like:
|
|
<verb>
|
|
acl Dangerous_ports 7 9 19 22 23 25 53 109 110 119
|
|
http_access deny Dangerous_ports
|
|
</verb>
|
|
...and probably many others.
|
|
|
|
<P>
|
|
Please consult the <em>/etc/services</em> file on your system
|
|
for a list of known ports and protocols.
|
|
|
|
<sect1>Does Squid support the use of a database such as mySQL for storing the ACL list?
|
|
<p>
|
|
<em>Note: The information here is current for version 2.2.</em>
|
|
<p>
|
|
No, it does not.
|
|
|
|
<sect1>How can I allow a single address to access a specific URL?
|
|
<p>
|
|
This example allows only the <em/special_client/ to access
|
|
the <em/special_url/. Any other client that tries to access
|
|
the <em/special_url/ is denied.
|
|
<verb>
|
|
acl special_client src 10.1.2.3
|
|
acl special_url url_regex ^http://www.squid-cache.org/Doc/FAQ/$
|
|
http_access allow special_client special_url
|
|
http_access deny special_url
|
|
</verb>
|
|
|
|
<sect1>How can I allow some clients to use the cache at specific times?
|
|
<p>
|
|
Let's say you have two workstations that should only be allowed access
|
|
to the Internet during working hours (8:30 - 17:30). You can use
|
|
something like this:
|
|
<verb>
|
|
acl FOO src 10.1.2.3 10.1.2.4
|
|
acl WORKING time MTWHF 08:30-17:30
|
|
http_access allow FOO WORKING
|
|
http_access deny FOO
|
|
</verb>
|
|
|
|
<sect1>How can I allow some users to use the cache at specific times?
|
|
<p>
|
|
<verb>
|
|
acl USER1 proxy_auth Dick
|
|
acl USER2 proxy_auth Jane
|
|
acl DAY time 06:00-18:00
|
|
http_access allow USER1 DAY
|
|
http_access deny USER1
|
|
http_access allow USER2 !DAY
|
|
http_access deny USER2
|
|
</verb>
|
|
|
|
<sect1>Problems with IP ACL's that have complicated netmasks
|
|
<p>
|
|
<em>Note: The information here is current for version 2.3.</em>
|
|
<p>
|
|
The following ACL entry gives inconsistent or unexpected results:
|
|
<verb>
|
|
acl restricted src 10.0.0.128/255.0.0.128 10.85.0.0/16
|
|
</verb>
|
|
The reason is that IP access lists are stored in ``splay'' tree
|
|
data structures. These trees require the keys to be sortable.
|
|
When you use a complicated, or non-standard, netmask (255.0.0.128), it confuses
|
|
the function that compares two address/mask pairs.
|
|
<p>
|
|
The best way to fix this problem is to use separate ACL names
|
|
for each ACL value. For example, change the above to:
|
|
<verb>
|
|
acl restricted1 src 10.0.0.128/255.0.0.128
|
|
acl restricted2 src 10.85.0.0/16
|
|
</verb>
|
|
<p>
|
|
Then, of course, you'll have to rewrite your <em/http_access/
|
|
lines as well.
|
|
|
|
<sect1>Can I set up ACL's based on MAC address rather than IP?
|
|
<p>
|
|
Yes, for some operating systes. Squid calls these ``ARP ACLs'' and
|
|
they are supported on Linux, Solaris, and probably BSD variants.
|
|
<p>
|
|
NOTE: Squid can only determine the MAC address for clients that
|
|
are on the same subnet. If the client is on a different subnet,
|
|
then Squid can not find out its MAC address.
|
|
<p>
|
|
To use ARP (MAC) access controls, you
|
|
first need to compile in the optional code. Do this with
|
|
the <em/--enable-arp-acl/ configure option:
|
|
<verb>
|
|
% ./configure --enable-arp-acl ...
|
|
% make clean
|
|
% make
|
|
</verb>
|
|
If <em>src/acl.c</em> doesn't compile, then ARP ACLs are probably not
|
|
supported on your system.
|
|
<p>
|
|
If everything compiles, then you can add some ARP ACL lines to
|
|
your <em/squid.conf/:
|
|
<verb>
|
|
acl M1 arp 01:02:03:04:05:06
|
|
acl M2 arp 11:12:13:14:15:16
|
|
http_access allow M1
|
|
http_access allow M2
|
|
http_access deny all
|
|
</verb>
|
|
|
|
<sect1>Debugging ACLs
|
|
<p>
|
|
If ACLs are giving you problems and you don't know why they
|
|
aren't working, you can use this tip to debug them.
|
|
<p>
|
|
In <em>squid.conf</em> enable debugging for section 32 at level 2.
|
|
For example:
|
|
<verb>
|
|
debug_options ALL,1 32,2
|
|
</verb>
|
|
The restart or reconfigure squid.
|
|
<p>
|
|
From now on, your <em/cache.log/ should contain a line for every
|
|
request that explains if it was allowed, or denied, and which
|
|
ACL was the last one that it matched.
|
|
|
|
<sect1>Can I limit the number of connections from a client?
|
|
<p>
|
|
Yes, use the <em/maxconn/ ACL type in conjunction with <em/http_access deny/.
|
|
For example:
|
|
<verb>
|
|
acl losers src 1.2.3.0/24
|
|
acl 5CONN maxconn 5
|
|
http_access deny 5CONN losers
|
|
</verb>
|
|
<p>
|
|
Given the above configuration, when a client whose source IP address
|
|
is in the 1.2.3.0/24 subnet tries to establish 6 or more connections
|
|
at once, Squid returns an error page. Unless you use the
|
|
<em/deny_info/ feature, the error message will just say ``access
|
|
denied.''
|
|
<p>
|
|
The <em/maxconn/ ACL requires the client_db feature. If you've
|
|
disabled client_db (for example with <em/client_db off/) then
|
|
<em/maxconn/ ALCs will not work.
|
|
<p>
|
|
Note, the <em/maxconn/ ACL type is kind of tricky because it
|
|
uses less-than comparison. The ACL is a match when the number
|
|
of established connections is <em/greater/ than the value you
|
|
specify. Because of that, you don't want to use the <em/maxconn/
|
|
ACL with <em/http_access allow/.
|
|
<p>
|
|
Also note that you could use <em/maxconn/ in conjunction with
|
|
a user type (ident, proxy_auth), rather than an IP address type.
|
|
|
|
<sect1>I'm trying to deny <em/foo.com/, but it's not working.
|
|
<p>
|
|
In Squid-2.3 we changed the way that Squid matches subdomains.
|
|
There is a difference between <em/.foo.com/ and <em/foo.com/. The
|
|
first matches any domain in <em/foo.com/, while the latter matches
|
|
only ``foo.com'' exactly. So if you want to deny <em/bar.foo.com/,
|
|
you should write
|
|
<verb>
|
|
acl yuck dstdomain .foo.com
|
|
http_access deny yuck
|
|
</verb>
|
|
To be safe, you probably want to list both forms in your
|
|
access lists, for example:
|
|
<verb>
|
|
acl yuck dstdomain .foo.com foo.com
|
|
http_access deny yuck
|
|
</verb>
|
|
|
|
<sect1>I want to customize, or make my own error messages.
|
|
<p>
|
|
You can customize the existing error messages as described in
|
|
<ref id="custom-err-msgs" name="Customizable Error Messages">.
|
|
You can also create new error messages and use these in conjunction
|
|
with the <em/deny_info/ option.
|
|
<p>
|
|
For example, lets say you want your users to see a special message
|
|
when they request something that matches your pornography list.
|
|
First, create a file named ERR_NO_PORNO in the
|
|
<em>/usr/local/squid/etc/errors</em> directory. That file might
|
|
contain something like this:
|
|
<verb>
|
|
<p>
|
|
Our company policy is to deny requests to known porno sites. If you
|
|
feel you've received this message in error, please contact
|
|
the support staff (support@this.company.com, 555-1234).
|
|
</verb>
|
|
<p>
|
|
Next, set up your access controls as follows:
|
|
<verb>
|
|
acl porn url_regex "/usr/local/squid/etc/porno.txt"
|
|
deny_info ERR_NO_PORNO porn
|
|
http_access deny porn
|
|
(additional http_access lines ...)
|
|
</verb>
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Troubleshooting
|
|
|
|
<sect1>Why am I getting ``Proxy Access Denied?''
|
|
<P>
|
|
You may need to set up the <em/http_access/ option to allow
|
|
requests from your IP addresses. Please see <ref id="access-controls"
|
|
name="the Access Controls section"> for information about that.
|
|
<P>
|
|
If <em/squid/ is in httpd-accelerator mode, it will accept normal
|
|
HTTP requests and forward them to a HTTP server, but it will not
|
|
honor proxy requests. If you want your cache to also accept
|
|
proxy-HTTP requests then you must enable this feature:
|
|
<verb>
|
|
httpd_accel_with_proxy on
|
|
</verb>
|
|
Alternately, you may have misconfigured one of your ACLs. Check the
|
|
<em/access.log/ and <em/squid.conf/ files for clues.
|
|
|
|
<sect1>I can't get <tt/local_domain/ to work; <em/Squid/ is caching the objects from my local servers.
|
|
|
|
<P>
|
|
The <tt/local_domain/ directive does not prevent local
|
|
objects from being cached. It prevents the use of sibling caches
|
|
when fetching local objects. If you want to prevent objects from
|
|
being cached, use the <tt/cache_stoplist/ or <tt/http_stop/
|
|
configuration options (depending on your version).
|
|
|
|
<sect1>I get <tt/Connection Refused/ when the cache tries to retrieve an object located on a sibling, even though the sibling thinks it delivered the object to my cache.
|
|
<P>
|
|
|
|
If the HTTP port number is wrong but the ICP port is correct you
|
|
will send ICP queries correctly and the ICP replies will fool your
|
|
cache into thinking the configuration is correct but large objects
|
|
will fail since you don't have the correct HTTP port for the sibling
|
|
in your <em/squid.conf/ file. If your sibling changed their
|
|
<tt/http_port/, you could have this problem for some time
|
|
before noticing.
|
|
|
|
<sect1>Running out of filedescriptors
|
|
<label id="filedescriptors">
|
|
<P>
|
|
|
|
If you see the <tt/Too many open files/ error message, you
|
|
are most likely running out of file descriptors. This may be due
|
|
to running Squid on an operating system with a low filedescriptor
|
|
limit. This limit is often configurable in the kernel or with
|
|
other system tuning tools. There are two ways to run out of file
|
|
descriptors: first, you can hit the per-process limit on file
|
|
descriptors. Second, you can hit the system limit on total file
|
|
descriptors for all processes.
|
|
|
|
<sect2>Linux
|
|
<P>
|
|
Dancer has a <url url="http://www2.simegen.com/~dancer/minihowto.html"
|
|
name="Mini-'Adding File-descriptors-to-linux for squid' HOWTO">, but
|
|
this information seems specific to the Linux 2.0.36 kernel.
|
|
|
|
<p>
|
|
Henrik has a <url url="http://squid.sourceforge.net/hno/linux-lfd.html" name="How to get many filedescriptors on Linux 2.2.X"> page.
|
|
|
|
<P>
|
|
You also might want to
|
|
have a look at
|
|
<url url="http://www.linux.org.za/oskar/patches/kernel/filehandle/"
|
|
name="filehandle patch">
|
|
by
|
|
<url url="mailto:michael@metal.iinet.net.au"
|
|
name="Michael O'Reilly">
|
|
|
|
<P>
|
|
If your kernel version is 2.2.x or greater, you can read and write
|
|
the maximum number of file handles and/or inodes
|
|
simply by accessing the special files:
|
|
<verb>
|
|
/proc/sys/fs/file-max
|
|
/proc/sys/fs/inode-max
|
|
</verb>
|
|
So, to increase your file descriptor limit:
|
|
<verb>
|
|
echo 3072 > /proc/sys/fs/file-max
|
|
</verb>
|
|
|
|
<P>
|
|
If your kernel version is between 2.0.35 and 2.1.x (?), you can read and write
|
|
the maximum number of file handles and/or inodes
|
|
simply by accessing the special files:
|
|
<verb>
|
|
/proc/sys/kernel/file-max
|
|
/proc/sys/kernel/inode-max
|
|
</verb>
|
|
|
|
<P>
|
|
While this does increase the current number of file descriptors,
|
|
Squid's <em/configure/ script probably won't figure out the
|
|
new value unless you also update the include files, specifically
|
|
the value of <em/OPEN_MAX/ in
|
|
<em>/usr/include/linux/limits.h</em>.
|
|
|
|
<sect2>Solaris
|
|
<P>
|
|
Add the following to your <em>/etc/system</em> file to
|
|
increase your maximum file descriptors per process:
|
|
<P>
|
|
<verb>
|
|
set rlim_fd_max = 4096
|
|
</verb>
|
|
<P>
|
|
Next you should re-run the <em>configure</em> script
|
|
in the top directory so that it finds the new value.
|
|
If it does not find the new limit, then you might try
|
|
editing <em>include/autoconf.h</em> and setting
|
|
<tt/#define DEFAULT_FD_SETSIZE/ by hand. Note that
|
|
<em>include/autoconf.h</em> is created from <em>autoconf.h.in</em>
|
|
every time you run configure. Thus, if you edit it by
|
|
hand, you might lose your changes later on.
|
|
|
|
<P>
|
|
If you have a very old version of Squid (1.1.X), and you
|
|
want to use more than 1024 descriptors, then you must
|
|
edit <em>src/Makefile</em> and enable
|
|
<tt/$(USE_POLL_OPT)/. Then recompile <em/squid/.
|
|
|
|
<p>
|
|
<url url="mailto:voeckler at rvs dot uni-hannover dot de" name="Jens-S. Voeckler">
|
|
advises that you should NOT change the soft limit (<em/rlim_fd_cur/) to anything
|
|
larger than 256. It will break other programs, such as the license
|
|
manager needed for the SUN workshop compiler. Jens-S. also says that it
|
|
should be safe to raise the limit as high as 16,384.
|
|
|
|
<sect2>IRIX
|
|
<p>
|
|
For some hints, please see SGI's <url
|
|
url="http://www.sgi.com/tech/web/irix62.html" name="Tuning IRIX 6.2 for
|
|
a Web Server"> document.
|
|
|
|
<sect2>FreeBSD
|
|
<P>
|
|
by <url url="mailto:torsten.sturm@axis.de" name="Torsten Sturm">
|
|
<enum>
|
|
<item>How do I check my maximum filedescriptors?
|
|
<P>Do <tt/sysctl -a/ and look for the value of
|
|
<tt/kern.maxfilesperproc/.
|
|
<item>How do I increase them?
|
|
<verb>
|
|
sysctl -w kern.maxfiles=XXXX
|
|
sysctl -w kern.maxfilesperproc=XXXX
|
|
</verb>
|
|
<bf>Warning</bf>: You probably want <tt/maxfiles
|
|
> maxfilesperproc/ if you're going to be pushing the
|
|
limit.
|
|
<item>What is the upper limit?
|
|
<P>I don't think there is a formal upper limit inside the kernel.
|
|
All the data structures are dynamically allocated. In practice
|
|
there might be unintended metaphenomena (kernel spending too much
|
|
time searching tables, for example).
|
|
</enum>
|
|
|
|
<sect2>General BSD
|
|
<P>
|
|
For most BSD-derived systems (SunOS, 4.4BSD, OpenBSD, FreeBSD,
|
|
NetBSD, BSD/OS, 386BSD, Ultrix) you can also use the ``brute force''
|
|
method to increase these values in the kernel (requires a kernel
|
|
rebuild):
|
|
<enum>
|
|
<item>How do I check my maximum filedescriptors?
|
|
<P>Do <tt/pstat -T/ and look for the <tt/files/
|
|
value, typically expressed as the ratio of <tt/current/maximum/.
|
|
<item>How do I increase them the easy way?
|
|
<P>One way is to increase the value of the <tt/maxusers/ variable
|
|
in the kernel configuration file and build a new kernel. This method
|
|
is quick and easy but also has the effect of increasing a wide variety of
|
|
other variables that you may not need or want increased.
|
|
<item>Is there a more precise method?
|
|
<P>Another way is to find the <em/param.c/ file in your kernel
|
|
build area and change the arithmetic behind the relationship between
|
|
<tt/maxusers/ and the maximum number of open files.
|
|
</enum>
|
|
Here are a few examples which should lead you in the right direction:
|
|
<enum>
|
|
<item>SunOS
|
|
<P>Change the value of <tt/nfile/ in <tt//usr/kvm/sys/conf.common/param.c/tt> by altering this equation:
|
|
<verb>
|
|
int nfile = 16 * (NPROC + 16 + MAXUSERS) / 10 + 64;
|
|
</verb>
|
|
Where <tt/NPROC/ is defined by:
|
|
<verb>
|
|
#define NPROC (10 + 16 * MAXUSERS)
|
|
</verb>
|
|
<item>FreeBSD (from the 2.1.6 kernel)
|
|
<P>Very similar to SunOS, edit <em>/usr/src/sys/conf/param.c</em>
|
|
and alter the relationship between <tt/maxusers/ and the
|
|
<tt>maxfiles</tt> and <tt>maxfilesperproc</tt> variables:
|
|
<verb>
|
|
int maxfiles = NPROC*2;
|
|
int maxfilesperproc = NPROC*2;
|
|
</verb>
|
|
Where <tt>NPROC</tt> is defined by:
|
|
<tt>#define NPROC (20 + 16 * MAXUSERS)</tt>
|
|
The per-process limit can also be adjusted directly in the kernel
|
|
configuration file with the following directive:
|
|
<tt>options OPEN_MAX=128</tt>
|
|
<item>BSD/OS (from the 2.1 kernel)
|
|
<P>Edit <tt>/usr/src/sys/conf/param.c</tt> and adjust the
|
|
<tt>maxfiles</tt> math here:
|
|
<verb>
|
|
int maxfiles = 3 * (NPROC + MAXUSERS) + 80;
|
|
</verb>
|
|
Where <tt>NPROC</tt> is defined by:
|
|
<tt>#define NPROC (20 + 16 * MAXUSERS)</tt>
|
|
You should also set the <tt>OPEN_MAX</tt> value in your kernel
|
|
configuration file to change the per-process limit.
|
|
</enum>
|
|
|
|
<sect2>Reconfigure afterwards
|
|
<P>
|
|
<bf/NOTE:/ After you rebuild/reconfigure your kernel with more
|
|
filedescriptors, you must then recompile Squid. Squid's configure
|
|
script determines how many filedescriptors are available, so you
|
|
must make sure the configure script runs again as well. For example:
|
|
<verb> cd squid-1.1.x
|
|
make realclean
|
|
./configure --prefix=/usr/local/squid
|
|
make
|
|
</verb>
|
|
|
|
<sect1>What are these strange lines about removing objects?
|
|
<P>
|
|
For example:
|
|
<verb>
|
|
97/01/23 22:31:10| Removed 1 of 9 objects from bucket 3913
|
|
97/01/23 22:33:10| Removed 1 of 5 objects from bucket 4315
|
|
97/01/23 22:35:40| Removed 1 of 14 objects from bucket 6391
|
|
</verb>
|
|
|
|
These log entries are normal, and do not indicate that <em/squid/ has
|
|
reached <tt/cache_swap_high/.
|
|
|
|
<P>
|
|
Consult your cache information page in <em/cachemgr.cgi/ for
|
|
a line like this:
|
|
|
|
<verb>
|
|
Storage LRU Expiration Age: 364.01 days
|
|
</verb>
|
|
|
|
Objects which have not been used for that amount of time are removed as
|
|
a part of the regular maintenance. You can set an upper limit on the
|
|
<tt/LRU Expiration Age/ value with <tt/reference_age/ in the config
|
|
file.
|
|
|
|
<sect1>Can I change a Windows NT FTP server to list directories in Unix format?
|
|
|
|
<P>
|
|
Why, yes you can! Select the following menus:
|
|
<itemize>
|
|
<item>Start
|
|
<item>Programs
|
|
<item>Microsoft Internet Server (Common)
|
|
<item>Internet Service Manager
|
|
</itemize>
|
|
<P>
|
|
This will bring up a box with icons for your various services. One of
|
|
them should be a little ftp ``folder.'' Double click on this.
|
|
<P>
|
|
You will then have to select the server (there should only be one)
|
|
Select that and then choose ``Properties'' from the menu and choose the
|
|
``directories'' tab along the top.
|
|
<P>
|
|
There will be an option at the bottom saying ``Directory listing style.''
|
|
Choose the ``Unix'' type, not the ``MS-DOS'' type.
|
|
<P>
|
|
<quote>
|
|
--Oskar Pearson <oskar@is.co.za>
|
|
</quote>
|
|
|
|
<sect1>Why am I getting ``Ignoring MISS from non-peer x.x.x.x?''
|
|
|
|
<P>
|
|
You are receiving ICP MISSes (via UDP) from a parent or sibling cache
|
|
whose IP address your cache does not know about. This may happen
|
|
in two situations.
|
|
|
|
<P>
|
|
<enum>
|
|
<item>
|
|
If the peer is multihomed, it is sending packets out an interface
|
|
which is not advertised in the DNS. Unfortunately, this is a
|
|
configuration problem at the peer site. You can tell them to either
|
|
add the IP address interface to their DNS, or use Squid's
|
|
'udp_outgoing_address' option to force the replies
|
|
out a specific interface. For example:
|
|
<P>
|
|
<em/on your parent squid.conf:/
|
|
<verb>
|
|
udp_outgoing_address proxy.parent.com
|
|
</verb>
|
|
<em/on your squid.conf:/
|
|
<verb>
|
|
cache_host proxy.parent.com parent 3128 3130
|
|
</verb>
|
|
|
|
<item>
|
|
You can also see this warning when sending ICP queries to
|
|
multicast addresses. For security reasons, Squid requires
|
|
your configuration to list all other caches listening on the
|
|
multicast group address. If an unknown cache listens to that address
|
|
and sends replies, your cache will log the warning message. To fix
|
|
this situation, either tell the unknown cache to stop listening
|
|
on the multicast address, or if they are legitimate, add them
|
|
to your configuration file.
|
|
</enum>
|
|
|
|
<sect1>DNS lookups for domain names with underscores (_) always fail.
|
|
|
|
<P>
|
|
The standards for naming hosts
|
|
(<url url="http://ds.internic.net/rfc/rfc952.txt" name="RFC 952">,
|
|
<url url="http://ds.internic.net/rfc/rfc1101.txt" name="RFC 1101">)
|
|
do not allow underscores in domain names:
|
|
<quote>
|
|
A "name" (Net, Host, Gateway, or Domain name) is a text string up
|
|
to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus
|
|
sign (-), and period (.).
|
|
</quote>
|
|
The resolver library that ships with recent versions of BIND enforces
|
|
this restriction, returning an error for any host with underscore in
|
|
the hostname. The best solution is to complain to the hostmaster of the
|
|
offending site, and ask them to rename their host.
|
|
|
|
<p>
|
|
See also the
|
|
<url url="http://www.intac.com/~cdp/cptd-faq/section4.html#underscore"
|
|
name="comp.protocols.tcp-ip.domains FAQ">.
|
|
|
|
<P>
|
|
Some people have noticed that
|
|
<url url="http://ds.internic.net/rfc/rfc1033.txt" name="RFC 1033">
|
|
implies that underscores <bf/are/ allowed. However, this is an
|
|
<em/informational/ RFC with a poorly chosen
|
|
example, and not a <em/standard/ by any means.
|
|
|
|
<sect1>Why does Squid say: ``Illegal character in hostname; underscores are not allowed?'
|
|
|
|
<P>
|
|
See the above question. The underscore character is not
|
|
valid for hostnames.
|
|
|
|
<P>
|
|
Some DNS resolvers allow the underscore, so yes, the hostname
|
|
might work fine when you don't use Squid.
|
|
|
|
<P>
|
|
To make Squid allow underscores in hostnames, re-run the
|
|
<em>configure</em> script with this option:
|
|
<verb>
|
|
% ./configure --enable-underscores ...
|
|
</verb>
|
|
and then recompile:
|
|
<verb>
|
|
% make clean
|
|
% make
|
|
</verb>
|
|
|
|
<sect1>Why am I getting access denied from a sibling cache?
|
|
|
|
<P>
|
|
The answer to this is somewhat complicated, so please hold on.
|
|
<em/NOTE:/ most of this text is taken from
|
|
<url url="http://www.nlanr.net/%7ewessels/Papers/icp-squid.ps.gz"
|
|
name="ICP and the Squid Web Cache">.
|
|
|
|
<P>
|
|
An ICP query does not include any parent or sibling designation,
|
|
so the receiver really has no indication of how the peer
|
|
cache is configured to use it. This issue becomes important
|
|
when a cache is willing to serve cache hits to anyone, but only
|
|
handle cache misses for its paying users or customers. In other
|
|
words, whether or not to allow the request depends on if the
|
|
result is a hit or a miss. To accomplish this,
|
|
Squid acquired the <tt/miss_access/ feature
|
|
in October of 1996.
|
|
|
|
<P>
|
|
The necessity of ``miss access'' makes life a little bit complicated,
|
|
and not only because it was awkward to implement. Miss access
|
|
means that the ICP query reply must be an extremely accurate prediction
|
|
of the result of a subsequent HTTP request. Ascertaining
|
|
this result is actually very hard, if not impossible to
|
|
do, since the ICP request cannot convey the
|
|
full HTTP request.
|
|
Additionally, there are more types of HTTP request results than there
|
|
are for ICP. The ICP query reply will either be a hit or miss.
|
|
However, the HTTP request might result in a ``<tt/304 Not Modified/'' reply
|
|
sent from the origin server. Such a reply is not strictly a hit since the peer
|
|
needed to forward a conditional request to the source. At the same time,
|
|
its not strictly a miss either since the local object data is still valid,
|
|
and the Not-Modified reply is quite small.
|
|
|
|
<P>
|
|
One serious problem for cache hierarchies is mismatched freshness
|
|
parameters. Consider a cache <em/C/ using ``strict''
|
|
freshness parameters so its users get maximally current data.
|
|
<em/C/ has a sibling <em/S/ with less strict freshness parameters.
|
|
When an object is requested at <em/C/, <em/C/ might
|
|
find that <em/S/ already has the object via an ICP query and
|
|
ICP HIT response. <em/C/ then retrieves the object
|
|
from <em/S/.
|
|
|
|
<P>
|
|
In an HTTP/1.0 world, <em/C/ (and <em/C/'s client)
|
|
will receive an object that was never
|
|
subject to its local freshness rules. Neither HTTP/1.0 nor ICP provides
|
|
any way to ask only for objects less than a certain age. If the
|
|
retrieved object is stale by <em/C/s rules,
|
|
it will be removed from <em/C/s cache, but
|
|
it will subsequently be fetched from <em/S/ so long as it
|
|
remains fresh there. This configuration miscoupling
|
|
problem is a significant deterrent to establishing
|
|
both parent and sibling relationships.
|
|
|
|
<P>
|
|
HTTP/1.1 provides numerous request headers to specify freshness
|
|
requirements, which actually introduces
|
|
a different problem for cache hierarchies: ICP
|
|
still does not include any age information, neither in query nor
|
|
reply. So <em/S/ may return an ICP HIT if its
|
|
copy of the object is fresh by its configuration
|
|
parameters, but the subsequent HTTP request may result
|
|
in a cache miss due to any
|
|
<tt/Cache-control:/ headers originated by <em/C/ or by
|
|
<em/C/'s client. Situations now emerge where the ICP reply
|
|
no longer matches the HTTP request result.
|
|
|
|
<P>
|
|
In the end, the fundamental problem is that the ICP query does not
|
|
provide enough information to accurately predict whether
|
|
the HTTP request
|
|
will be a hit or miss. In fact, the current ICP Internet Draft is very
|
|
vague on this subject. What does ICP HIT really mean? Does it mean
|
|
``I know a little about that URL and have some copy of the object?'' Or
|
|
does it mean ``I have a valid copy of that object and you are allowed to
|
|
get it from me?''
|
|
|
|
<P>
|
|
So, what can be done about this problem? We really need to change ICP
|
|
so that freshness parameters are included. Until that happens, the members
|
|
of a cache hierarchy have only two options to totally eliminate the ``access
|
|
denied'' messages from sibling caches:
|
|
<enum>
|
|
<item>Make sure all members have the same <tt/refresh_rules/ parameters.
|
|
<item>Do not use <tt/miss_access/ at all. Promise your sibling cache
|
|
administrator that <em/your/ cache is properly configured and that you
|
|
will not abuse their generosity. The sibling cache administrator can
|
|
check his log files to make sure you are keeping your word.
|
|
</enum>
|
|
If neither of these is realistic, then the sibling relationship should not
|
|
exist.
|
|
|
|
<sect1>Cannot bind socket FD NN to *:8080 (125) Address already in use
|
|
|
|
<P>
|
|
This means that another processes is already listening on port 8080
|
|
(or whatever you're using). It could mean that you have a Squid process
|
|
already running, or it could be from another program. To verify, use
|
|
the <em/netstat/ command:
|
|
<verb>
|
|
netstat -naf inet | grep LISTEN
|
|
</verb>
|
|
That will show all sockets in the LISTEN state. You might also try
|
|
<verb>
|
|
netstat -naf inet | grep 8080
|
|
</verb>
|
|
If you find that some process has bound to your port, but you're not sure
|
|
which process it is, you might be able to use the excellent
|
|
<url url="ftp://vic.cc.purdue.edu/pub/tools/unix/lsof/"
|
|
name="lsof">
|
|
program. It will show you which processes own every open file descriptor
|
|
on your system.
|
|
|
|
<sect1>icpDetectClientClose: ERROR xxx.xxx.xxx.xxx: (32) Broken pipe
|
|
|
|
<P>
|
|
This means that the client socket was closed by the client
|
|
before Squid was finished sending data to it. Squid detects this
|
|
by trying to <tt/read(2)/ some data from the socket. If the
|
|
<tt/read(2)/ call fails, then Squid konws the socket has been
|
|
closed. Normally the <tt/read(2)/ call returns <em/ECONNRESET: Connection reset by peer/
|
|
and these are NOT logged. Any other error messages (such as
|
|
<em/EPIPE: Broken pipe/ are logged to <em/cache.log/. See the ``intro'' of
|
|
section 2 of your Unix manual for a list of all error codes.
|
|
|
|
<sect1>icpDetectClientClose: FD 135, 255 unexpected bytes
|
|
<P>
|
|
These are caused by misbehaving Web clients attempting to use persistent
|
|
connections. Squid-1.1 does not support persistent connections.
|
|
|
|
<sect1>Does Squid work with NTLM Authentication?
|
|
|
|
<P>
|
|
<url url="/Versions/v2/2.5/" name="Version 2.5"> will
|
|
support Microsoft NTLM authentication. However, there are some
|
|
limits on our support: We cannot proxy connections to a origin
|
|
server that use NTLM authentication, but we can act as a web
|
|
accelerator or proxy server and authenticate the client connection
|
|
using NTLM.
|
|
|
|
<p>
|
|
We support NT4, Samba, and Windows 2000 Domain Controllers. For
|
|
more information get squid 2.5 and run <em>./configure --help</em>.
|
|
|
|
<p>
|
|
Why we cannot proxy NTLM even though we can use it.
|
|
Quoting from summary at the end of the browser authentication section in
|
|
<url url="http://support.microsoft.com/support/kb/articles/Q198/1/16.ASP"
|
|
name="this article">:
|
|
<quote>
|
|
In summary, Basic authentication does not require an implicit end-to-end
|
|
state, and can therefore be used through a proxy server. Windows NT
|
|
Challenge/Response authentication requires implicit end-to-end state and
|
|
will not work through a proxy server.
|
|
</quote>
|
|
|
|
<P>
|
|
Squid transparently passes the NTLM request and response headers between
|
|
clients and servers. NTLM relies on a single end-end connection (possibly
|
|
with men-in-the-middle, but a single connection every step of the way. This
|
|
implies that for NTLM authentication to work at all with proxy caches, the
|
|
proxy would need to tightly link the client-proxy and proxy-server links, as
|
|
well as understand the state of the link at any one time. NTLM through a
|
|
CONNECT might work, but we as far as we know that hasn't been implemented
|
|
by anyone, and it would prevent the pages being cached - removing the value
|
|
of the proxy.
|
|
|
|
<p>
|
|
NTLM authentication is carried entirely inside the HTTP protocol, but is
|
|
different from Basic authentication in many ways.
|
|
|
|
<enum>
|
|
<item>
|
|
It is dependent on a stateful end-to-end connection which collides with
|
|
RFC 2616 for proxy-servers to disjoin the client-proxy and proxy-server
|
|
connections.
|
|
|
|
<item>
|
|
It is only taking place once per connection, not per request. Once the
|
|
connection is authenticated then all future requests on the same connection
|
|
inherities the authentication. The connection must be reestablished to set
|
|
up other authentication or re-identify the user.
|
|
</enum>
|
|
|
|
<p>
|
|
The reasons why it is not implemented in Netscape is probably:
|
|
|
|
<itemize>
|
|
<item> It is very specific for the Windows platform
|
|
|
|
<item> It is not defined in any RFC or even internet draft.
|
|
|
|
<item> The protocol has several shortcomings, where the most apparent one is
|
|
that it cannot be proxied.
|
|
|
|
<item> There exists an open internet standard which does mostly the same but
|
|
without the shortcomings or platform dependencies: <url url="ftp://ftp.isi.edu/in-notes/rfc2617.txt" name="digest authentication">.
|
|
</itemize>
|
|
|
|
|
|
<sect1>The <em/default/ parent option isn't working!
|
|
|
|
<P>
|
|
This message was received at <em/squid-bugs/:
|
|
<quote>
|
|
<it>If you have only one parent, configured as:</it>
|
|
<verb>
|
|
cache_host xxxx parent 3128 3130 no-query default
|
|
</verb>
|
|
<it>nothing is sent to the parent; neither UDP packets, nor TCP connections.</it>
|
|
</quote>
|
|
|
|
<P>
|
|
Simply adding <em/default/ to a parent does not force all requests to be sent
|
|
to that parent. The term <em/default/ is perhaps a poor choice of words. A <em/default/
|
|
parent is only used as a <bf/last resort/. If the cache is able to make direct connections,
|
|
direct will be preferred over default. If you want to force all requests to your parent
|
|
cache(s), use the <em/never_direct/ option:
|
|
<verb>
|
|
acl all src 0.0.0.0/0.0.0.0
|
|
never_direct allow all
|
|
</verb>
|
|
|
|
<sect1>``Hot Mail'' complains about: Intrusion Logged. Access denied.
|
|
|
|
<P>
|
|
``Hot Mail'' is proxy-unfriendly and requires all requests to come from
|
|
the same IP address. You can fix this by adding to your
|
|
<em/squid.conf/:
|
|
<verb>
|
|
hierarchy_stoplist hotmail.com
|
|
</verb>
|
|
|
|
<sect1>My Squid becomes very slow after it has been running for some time.
|
|
|
|
<P>
|
|
This is most likely because Squid is using more memory than it should be
|
|
for your system. When the Squid process becomes large, it experiences a lot
|
|
of paging. This will very rapidly degrade the performance of Squid.
|
|
Memory usage is a complicated problem. There are a number
|
|
of things to consider.
|
|
|
|
<P>
|
|
First, examine the Cache Manager <em/Info/ ouput and look at these two lines:
|
|
<verb>
|
|
Number of HTTP requests received: 121104
|
|
Page faults with physical i/o: 16720
|
|
</verb>
|
|
Note, if your system does not have the <em/getrusage()/ function, then you will
|
|
not see the page faults line.
|
|
|
|
<P>
|
|
Divide the number of page faults by the number of connections. In this
|
|
case 16720/121104 = 0.14. Ideally this ratio should be in the 0.0 - 0.1
|
|
range. It may be acceptable to be in the 0.1 - 0.2 range. Above that,
|
|
however, and you will most likely find that Squid's performance is
|
|
unacceptably slow.
|
|
|
|
<P>
|
|
If the ratio is too high, you will need to make some changes to
|
|
<ref id="lower-mem-usage" name="lower the
|
|
amount of memory Squid uses">.
|
|
|
|
<sect1>WARNING: Failed to start 'dnsserver'
|
|
|
|
<P>
|
|
This could be a permission problem. Does the Squid userid have
|
|
permission to execute the <em/dnsserver/ program?
|
|
|
|
<P>
|
|
You might also try testing <em/dnsserver/ from the command line:
|
|
<verb>
|
|
> echo oceana.nlanr.net | ./dnsserver
|
|
</verb>
|
|
Should produce something like:
|
|
<verb>
|
|
$name oceana.nlanr.net
|
|
$h_name oceana.nlanr.net
|
|
$h_len 4
|
|
$ipcount 1
|
|
132.249.40.200
|
|
$aliascount 0
|
|
$ttl 82067
|
|
$end
|
|
</verb>
|
|
|
|
<sect1>Sending in Squid bug reports
|
|
<P>
|
|
Bug reports for Squid should be sent to the <url url="mailto:squid-bugs@squid-cache.org"
|
|
name="squid-bugs alias">. Any bug report must include
|
|
<itemize>
|
|
<item>The Squid version
|
|
<item>Your Operating System type and version
|
|
</itemize>
|
|
|
|
<sect2>crashes and core dumps
|
|
<label id="coredumps">
|
|
<P>
|
|
There are two conditions under which squid will exit abnormally and
|
|
generate a coredump. First, a SIGSEGV or SIGBUS signal will cause Squid
|
|
to exit and dump core. Second, many functions include consistency
|
|
checks. If one of those checks fail, Squid calls abort() to generate a
|
|
core dump.
|
|
|
|
<P>
|
|
Many people report that Squid doesn't leave a coredump anywhere. This may be
|
|
due to one of the following reasons:
|
|
<itemize>
|
|
<item>
|
|
Resource Limits. The shell has limits on the size of a coredump
|
|
file. You may need to increase the limit.
|
|
<item>
|
|
No debugging symbols.
|
|
The Squid binary must have debugging symbols in order to get
|
|
a meaningful coredump.
|
|
<item>
|
|
Threads and Linux. On Linux, threaded applications do not generate
|
|
core dumps. When you use --enable-async-io, it uses threads and
|
|
you can't get a coredump.
|
|
<item>
|
|
It did leave a coredump file, you just can't find it.
|
|
</itemize>
|
|
|
|
|
|
<p>
|
|
<bf/Resource Limits/:
|
|
These limits can usually be changed in
|
|
shell scripts. The command to change the resource limits is usually
|
|
either <em/limit/ or <em/limits/. Sometimes it is a shell-builtin function,
|
|
and sometimes it is a regular program. Also note that you can set resource
|
|
limits in the <em>/etc/login.conf</em> file on FreeBSD and maybe other BSD
|
|
systems.
|
|
|
|
<P>
|
|
To change the coredumpsize limit you might use a command like:
|
|
<verb>
|
|
limit coredumpsize unlimited
|
|
</verb>
|
|
or
|
|
<verb>
|
|
limits coredump unlimited
|
|
</verb>
|
|
|
|
<p>
|
|
<bf/Debugging Symbols/:
|
|
To see if your Squid binary has debugging symbols, use this command:
|
|
<verb>
|
|
% nm /usr/local/squid/bin/squid | head
|
|
</verb>
|
|
The binary has debugging symbols if you see gobbledegook like this:
|
|
<verb>
|
|
0812abec B AS_tree_head
|
|
080a7540 D AclMatchedName
|
|
080a73fc D ActionTable
|
|
080908a4 r B_BYTES_STR
|
|
080908bc r B_GBYTES_STR
|
|
080908ac r B_KBYTES_STR
|
|
080908b4 r B_MBYTES_STR
|
|
080a7550 D Biggest_FD
|
|
08097c0c R CacheDigestHashFuncCount
|
|
08098f00 r CcAttrs
|
|
</verb>
|
|
There are no debugging symbols if you see this instead:
|
|
<verb>
|
|
/usr/local/squid/bin/squid: no symbols
|
|
</verb>
|
|
Debugging symbols may have been
|
|
removed by your <em/install/ program. If you look at the
|
|
squid binary from the source directory, then it might have
|
|
the debugging symbols.
|
|
|
|
|
|
<P>
|
|
<bf/Coredump Location/:
|
|
The core dump file will be left in one of the following locations:
|
|
<enum>
|
|
<item>The <em/coredump_dir/ directory, if you set that option.
|
|
<item>The first <em/cache_dir/ directory if you have used the
|
|
<em/cache_effective_user/ option.
|
|
<item>The current directory when Squid was started
|
|
</enum>
|
|
Recent versions of Squid report their current directory after
|
|
starting, so look there first:
|
|
<verb>
|
|
2000/03/14 00:12:36| Set Current Directory to /usr/local/squid/cache
|
|
</verb>
|
|
If you cannot find a core file, then either Squid does not have
|
|
permission to write in its current directory, or perhaps your shell
|
|
limits (csh and clones) are preventing the core file from being written.
|
|
|
|
<p>
|
|
Often you can get a coredump if you run Squid from the
|
|
command line like this:
|
|
<verb>
|
|
% limit core un
|
|
% /usr/local/squid/bin/squid -NCd1
|
|
</verb>
|
|
|
|
|
|
<P>
|
|
Once you have located the core dump file, use a debugger such as
|
|
<em/dbx/ or <em/gdb/ to generate a stack trace:
|
|
<verb>
|
|
|
|
tirana-wessels squid/src 270% gdb squid /T2/Cache/core
|
|
GDB is free software and you are welcome to distribute copies of it
|
|
under certain conditions; type "show copying" to see the conditions.
|
|
There is absolutely no warranty for GDB; type "show warranty" for details.
|
|
GDB 4.15.1 (hppa1.0-hp-hpux10.10), Copyright 1995 Free Software Foundation, Inc...
|
|
Core was generated by `squid'.
|
|
Program terminated with signal 6, Aborted.
|
|
|
|
[...]
|
|
|
|
(gdb) where
|
|
#0 0xc01277a8 in _kill ()
|
|
#1 0xc00b2944 in _raise ()
|
|
#2 0xc007bb08 in abort ()
|
|
#3 0x53f5c in __eprintf (string=0x7b037048 "", expression=0x5f <Address 0x5f out of bounds>, line=8, filename=0x6b <Address 0x6b out of bounds>)
|
|
#4 0x29828 in fd_open (fd=10918, type=3221514150, desc=0x95e4 "HTTP Request") at fd.c:71
|
|
#5 0x24f40 in comm_accept (fd=2063838200, peer=0x7b0390b0, me=0x6b) at comm.c:574
|
|
#6 0x23874 in httpAccept (sock=33, notused=0xc00467a6) at client_side.c:1691
|
|
#7 0x25510 in comm_select_incoming () at comm.c:784
|
|
#8 0x25954 in comm_select (sec=29) at comm.c:1052
|
|
#9 0x3b04c in main (argc=1073745368, argv=0x40000dd8) at main.c:671
|
|
</verb>
|
|
|
|
<P>
|
|
If possible, you might keep the coredump file around for a day or
|
|
two. It is often helpful if we can ask you to send additional
|
|
debugger output, such as the contents of some variables.
|
|
|
|
<sect1>Debugging Squid
|
|
|
|
<P>
|
|
If you believe you have found a non-fatal bug (such as incorrect HTTP
|
|
processing) please send us a section of your cache.log with debugging to
|
|
demonstrate the problem. The cache.log file can become very large, so
|
|
alternatively, you may want to copy it to an FTP or HTTP server where we
|
|
can download it.
|
|
|
|
<P>
|
|
It is very simple to
|
|
enable full debugging on a running squid process. Simply use the <em/-k debug/
|
|
command line option:
|
|
<verb>
|
|
% ./squid -k debug
|
|
</verb>
|
|
This causes every <em/debug()/ statement in the source code to write a line
|
|
in the <em/cache.log/ file.
|
|
You also use the same command to restore Squid to normal debugging.
|
|
|
|
<P>
|
|
To enable selective debugging (e.g. for one source file only), you
|
|
need to edit <em/squid.conf/ and add to the <em/debug_options/ line.
|
|
Every Squid source file is assigned a different debugging <em/section/.
|
|
The debugging section assignments can be found by looking at the top
|
|
of individual source files, or by reading the file <em>doc/debug-levels.txt</em>
|
|
(correctly renamed to <em/debug-sections.txt/ for Squid-2).
|
|
You also specify the debugging <em/level/ to control the amount of
|
|
debugging. Higher levels result in more debugging messages.
|
|
For example, to enable full debugging of Access Control functions,
|
|
you would use
|
|
<verb>
|
|
debug_options ALL,1 28,9
|
|
</verb>
|
|
Then you have to restart or reconfigure Squid.
|
|
|
|
<P>
|
|
Once you have the debugging captured to <em/cache.log/, take a look
|
|
at it yourself and see if you can make sense of the behaviour which
|
|
you see. If not, please feel free to send your debugging output
|
|
to the <em/squid-users/ or <em/squid-bugs/ lists.
|
|
|
|
<sect1>FATAL: ipcache_init: DNS name lookup tests failed
|
|
<P>
|
|
Squid normally tests your system's DNS configuration before
|
|
it starts server requests. Squid tries to resolve some
|
|
common DNS names, as defined in the <em/dns_testnames/ configuration
|
|
directive. If Squid cannot resolve these names, it could mean:
|
|
<enum>
|
|
<item>your DNS nameserver is unreachable or not running.
|
|
<item>your <em>/etc/resolv.conf</em> file may contain incorrect information.
|
|
<item>your <em>/etc/resolv.conf</em> file may have incorrect permissions, and
|
|
may be unreadable by Squid.
|
|
</enum>
|
|
|
|
<P>
|
|
To disable this feature, use the <em/-D/ command line option.
|
|
|
|
<P>
|
|
Note, Squid does NOT use the <em/dnsservers/ to test the DNS. The
|
|
test is performed internally, before the <em/dnsservers/ start.
|
|
|
|
<sect1>FATAL: Failed to make swap directory /var/spool/cache: (13) Permission denied
|
|
<P>
|
|
Starting with version 1.1.15, we have required that you first run
|
|
<verb>
|
|
squid -z
|
|
</verb>
|
|
to create the swap directories on your filesystem. If you have set the
|
|
<em/cache_effective_user/ option, then the Squid process takes on the
|
|
given userid before making the directories. If the <em/cache_dir/
|
|
directory (e.g. /var/spool/cache) does not exist, and the Squid userid
|
|
does not have permission to create it, then you will get the ``permission
|
|
denied'' error. This can be simply fixed by manually creating the
|
|
cache directory.
|
|
<verb>
|
|
# mkdir /var/spool/cache
|
|
# chown <userid> <groupid> /var/spool/cache
|
|
# squid -z
|
|
</verb>
|
|
|
|
<P>
|
|
Alternatively, if the directory already exists, then your operating
|
|
system may be returning ``Permission Denied'' instead of ``File Exists''
|
|
on the mkdir() system call. This
|
|
<url url="store.c-mkdir.patch" name="patch">
|
|
by
|
|
<url url="mailto:miquels@cistron.nl" name="Miquel van Smoorenburg">
|
|
should fix it.
|
|
|
|
<sect1>FATAL: Cannot open HTTP Port
|
|
<P>
|
|
Either (1) the Squid userid does not have permission to bind to the port, or
|
|
(2) some other process has bound itself to the port.
|
|
Remember that root privileges are required to open port numbers
|
|
less than 1024. If you see this message when using a high port number,
|
|
or even when starting Squid as root, then the port has already been
|
|
opened by another process.
|
|
Maybe you are running in the HTTP Accelerator mode and there is
|
|
already a HTTP server running on port 80? If you're really stuck,
|
|
install the way cool
|
|
<url url="ftp://vic.cc.purdue.edu/pub/tools/unix/lsof/"
|
|
name="lsof">
|
|
utility to show you which process has your port in use.
|
|
|
|
<sect1>FATAL: All redirectors have exited!
|
|
<P>
|
|
This is explained in the <ref id="redirectors-exit" name="Redirector section">.
|
|
|
|
<sect1>FATAL: file_map_allocate: Exceeded filemap limit
|
|
<p>
|
|
See the next question.
|
|
|
|
<sect1>FATAL: You've run out of swap file numbers.
|
|
<p>
|
|
<em>Note: The information here applies to version 2.2 and earlier.</em>
|
|
<P>
|
|
Squid keeps an in-memory bitmap of disk files that are
|
|
available for use, or are being used. The size of this
|
|
bitmap is determined at run name, based on two things:
|
|
the size of your cache, and the average (mean) cache object size.
|
|
|
|
The size of your cache is specified in squid.conf, on the
|
|
<em/cache_dir/ lines. The mean object size can also
|
|
be specified in squid.conf, with the 'store_avg_object_size'
|
|
directive. By default, Squid uses 13 Kbytes as the average size.
|
|
|
|
<P>
|
|
When allocating the bitmaps, Squid allocates this many bits:
|
|
<verb>
|
|
2 * cache_size / store_avg_object_size
|
|
</verb>
|
|
|
|
So, if you exactly specify the correct average object size,
|
|
Squid should have 50% filemap bits free when the cache is full.
|
|
You can see how many filemap bits are being used by looking
|
|
at the 'storedir' cache manager page. It looks like this:
|
|
|
|
<verb>
|
|
Store Directory #0: /usr/local/squid/cache
|
|
First level subdirectories: 4
|
|
Second level subdirectories: 4
|
|
Maximum Size: 1024000 KB
|
|
Current Size: 924837 KB
|
|
Percent Used: 90.32%
|
|
Filemap bits in use: 77308 of 157538 (49%)
|
|
Flags:
|
|
</verb>
|
|
|
|
<P>
|
|
Now, if you see the ``You've run out of swap file numbers'' message,
|
|
then it means one of two things:
|
|
<enum>
|
|
<item>
|
|
You've found a Squid bug.
|
|
<item>
|
|
Your cache's average file size is much smaller
|
|
than the 'store_avg_object_size' value.
|
|
</enum>
|
|
|
|
To check the average file size of object currently in your
|
|
cache, look at the cache manager 'info' page, and you will
|
|
find a line like:
|
|
<verb>
|
|
Mean Object Size: 11.96 KB
|
|
</verb>
|
|
|
|
<P>
|
|
To make the warning message go away, set 'store_avg_object_size'
|
|
to that value (or lower) and then restart Squid.
|
|
|
|
<sect1>I am using up over 95% of the filemap bits?!!
|
|
<p>
|
|
<em>Note: The information here is current for version 2.3</em>
|
|
<p>
|
|
Calm down, this is now normal. Squid now dynamically allocates
|
|
filemap bits based on the number of objects in your cache.
|
|
You won't run out of them, we promise.
|
|
|
|
|
|
<sect1>FATAL: Cannot open /usr/local/squid/logs/access.log: (13) Permission denied
|
|
<p>
|
|
In Unix, things like <em/processes/ and <em/files/ have an <em/owner/.
|
|
For Squid, the process owner and file owner should be the same. If they
|
|
are not the same, you may get messages like ``permission denied.''
|
|
<p>
|
|
To find out who owns a file, use the <em/ls -l/ command:
|
|
<verb>
|
|
% ls -l /usr/local/squid/logs/access.log
|
|
</verb>
|
|
|
|
<p>
|
|
A process is normally owned by the user who starts it. However,
|
|
Unix sometimes allows a process to change its owner. If you
|
|
specified a value for the <em/effective_user/
|
|
option in <em/squid.conf/, then that will be the process owner.
|
|
The files must be owned by this same userid.
|
|
|
|
<p>
|
|
If all this is confusing, then you probably should not be
|
|
running Squid until you learn some more about Unix.
|
|
As a reference, I suggest <url url="http://www.oreilly.com/catalog/lunix4/"
|
|
name="Learning the UNIX Operating System, 4th Edition">.
|
|
|
|
<sect1>When using a username and password, I can not access some files.
|
|
|
|
<P>
|
|
<it>If I try by way of a test, to access</it>
|
|
<verb>
|
|
ftp://username:password@ftpserver/somewhere/foo.tar.gz
|
|
</verb>
|
|
<it>I get</it>
|
|
<verb>
|
|
somewhere/foo.tar.gz: Not a directory.
|
|
</verb>
|
|
|
|
<P>
|
|
Use this URL instead:
|
|
<verb>
|
|
ftp://username:password@ftpserver/%2fsomewhere/foo.tar.gz
|
|
</verb>
|
|
|
|
<sect1>pingerOpen: icmp_sock: (13) Permission denied
|
|
<P>
|
|
This means your <em/pinger/ program does not have root priveleges.
|
|
You should either do this:
|
|
<verb>
|
|
% su
|
|
# make install-pinger
|
|
</verb>
|
|
or
|
|
<verb>
|
|
# chown root /usr/local/squid/bin/pinger
|
|
# chmod 4755 /usr/local/squid/bin/pinger
|
|
</verb>
|
|
|
|
<sect1>What is a forwarding loop?
|
|
<P>
|
|
A forwarding loop is when a request passes through one proxy more than
|
|
once. You can get a forwarding loop if
|
|
<itemize>
|
|
<item>a cache forwards requests to itself. This might happen with
|
|
transparent caching (or server acceleration) configurations.
|
|
<item>a pair or group of caches forward requests to each other. This can
|
|
happen when Squid uses ICP, Cache Digests, or the ICMP RTT database
|
|
to select a next-hop cache.
|
|
</itemize>
|
|
|
|
<P>
|
|
Forwarding loops are detected by examining the <em/Via/ request header.
|
|
Each cache which "touches" a request must add its hostname to the
|
|
<em/Via/ header. If a cache notices its own hostname in this header
|
|
for an incoming request, it knows there is a forwarding loop somewhere.
|
|
<p>
|
|
NOTE:
|
|
Squid may report a forwarding loop if a request goes through
|
|
two caches that have the same <em/visible_hostname/ value.
|
|
If you want to have multiple machines with the same
|
|
<em/visible_hostname/ then you must give each machine a different
|
|
<em/unique_hostname/ so that forwarding loops are correctly detected.
|
|
|
|
<P>
|
|
When Squid detects a forwarding loop, it is logged to the <em/cache.log/
|
|
file with the recieved <em/Via/ header. From this header you can determine
|
|
which cache (the last in the list) forwarded the request to you.
|
|
|
|
<P>
|
|
One way to reduce forwarding loops is to change a <em/parent/
|
|
relationship to a <em/sibling/ relationship.
|
|
|
|
<P>
|
|
Another way is to use <em/cache_peer_access/ rules. For example:
|
|
<verb>
|
|
# Our parent caches
|
|
cache_peer A.example.com parent 3128 3130
|
|
cache_peer B.example.com parent 3128 3130
|
|
cache_peer C.example.com parent 3128 3130
|
|
|
|
# An ACL list
|
|
acl PEERS src A.example.com
|
|
acl PEERS src B.example.com
|
|
acl PEERS src C.example.com
|
|
|
|
# Prevent forwarding loops
|
|
cache_peer_access A.example.com allow !PEERS
|
|
cache_peer_access B.example.com allow !PEERS
|
|
cache_peer_access C.example.com allow !PEERS
|
|
</verb>
|
|
The above configuration instructs squid to NOT forward a request
|
|
to parents A, B, or C when a request is received from any one
|
|
of those caches.
|
|
|
|
<sect1>accept failure: (71) Protocol error
|
|
<P>
|
|
This error message is seen mostly on Solaris systems.
|
|
<url url="mailto:mtk@ny.ubs.com" name="Mark Kennedy">
|
|
gives a great explanation:
|
|
<quote>
|
|
Error 71 [EPROTO] is an obscure way of reporting that clients made it onto your
|
|
server's TCP incoming connection queue but the client tore down the
|
|
connection before the server could accept it. I.e. your server ignored
|
|
its clients for too long. We've seen this happen when we ran out of
|
|
file descriptors. I guess it could also happen if something made squid
|
|
block for a long time.
|
|
</quote>
|
|
|
|
<sect1>storeSwapInFileOpened: ... Size mismatch
|
|
<P>
|
|
<it>
|
|
Got these messages in my cache log - I guess it means that the index
|
|
contents do not match the contents on disk.
|
|
</it>
|
|
<verb>
|
|
1998/09/23 09:31:30| storeSwapInFileOpened: /var/cache/00/00/00000015: Size mismatch: 776(fstat) != 3785(object)
|
|
1998/09/23 09:31:31| storeSwapInFileOpened: /var/cache/00/00/00000017: Size mismatch: 2571(fstat) != 4159(object)
|
|
</verb>
|
|
|
|
<P>
|
|
<it>
|
|
What does Squid do in this case?
|
|
</it>
|
|
|
|
<P>
|
|
NOTE, these messages are specific to Squid-2. These happen when Squid
|
|
reads an object from disk for a cache hit. After it opens the file,
|
|
Squid checks to see if the size is what it expects it should be. If the
|
|
size doesn't match, the error is printed. In this case, Squid does not
|
|
send the wrong object to the client. It will re-fetch the object from
|
|
the source.
|
|
|
|
<sect1>Why do I get <em>fwdDispatch: Cannot retrieve 'https://www.buy.com/corp/ordertracking.asp'</em>
|
|
<P>
|
|
These messages are caused by buggy clients, mostly Netscape Navigator.
|
|
What happens is, Netscape sends an HTTPS/SSL request over a persistent HTTP connection.
|
|
Normally, when Squid gets an SSL request, it looks like this:
|
|
<verb>
|
|
CONNECT www.buy.com:443 HTTP/1.0
|
|
</verb>
|
|
Then Squid opens a TCP connection to the destination host and port, and
|
|
the <em/real/ request is sent encrypted over this connection. Thats the
|
|
whole point of SSL, that all of the information must be sent encrypted.
|
|
|
|
<P>
|
|
With this client bug, however, Squid receives a request like this:
|
|
<verb>
|
|
GET https://www.buy.com/corp/ordertracking.asp HTTP/1.0
|
|
Accept: */*
|
|
User-agent: Netscape ...
|
|
...
|
|
</verb>
|
|
Now, all of the headers, and the message body have been sent, <em/unencrypted/
|
|
to Squid. There is no way for Squid to somehow turn this into an SSL request.
|
|
The only thing we can do is return the error message.
|
|
|
|
<P>
|
|
Note, this browser bug does represent a security risk because the browser
|
|
is sending sensitive information unencrypted over the network.
|
|
|
|
<sect1>Squid can't access URLs like http://3626046468/ab2/cybercards/moreinfo.html
|
|
<p>
|
|
by Dave J Woolley (DJW at bts dot co dot uk)
|
|
<p>
|
|
These are illegal URLs, generally only used by illegal sites;
|
|
typically the web site that supports a spammer and is expected to
|
|
survive a few hours longer than the spamming account.
|
|
<p>
|
|
Their intention is to:
|
|
<itemize>
|
|
<item>
|
|
confuse content filtering rules on proxies, and possibly
|
|
some browsers' idea of whether they are trusted sites on
|
|
the local intranet;
|
|
<item>
|
|
confuse whois (?);
|
|
<item>
|
|
make people think they are not IP addresses and unknown
|
|
domain names, in an attempt to stop them trying to locate
|
|
and complain to the ISP.
|
|
</itemize>
|
|
<p>
|
|
Any browser or proxy that works with them should be considered a
|
|
security risk.
|
|
<p>
|
|
<url url="http://www.ietf.org/rfc/rfc1738.txt" name="RFC 1738">
|
|
has this to say about the hostname part of a URL:
|
|
<quote>
|
|
The fully qualified domain name of a network host, or its IP
|
|
address as a set of four decimal digit groups separated by
|
|
".". Fully qualified domain names take the form as described
|
|
in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123
|
|
[5]: a sequence of domain labels separated by ".", each domain
|
|
label starting and ending with an alphanumerical character and
|
|
possibly also containing "-" characters. The rightmost domain
|
|
label will never start with a digit, though, which
|
|
syntactically distinguishes all domain names from the IP
|
|
addresses.
|
|
</quote>
|
|
|
|
<sect1>I get a lot of ``URI has whitespace'' error messages in my cache log, what should I do?
|
|
|
|
<p>
|
|
Whitespace characters (space, tab, newline, carriage return) are
|
|
not allowed in URI's and URL's. Unfortunately, a number of Web services
|
|
generate URL's with whitespace. Of course your favorite browser silently
|
|
accomodates these bad URL's. The servers (or people) that generate
|
|
these URL's are in violation of Internet standards. The whitespace
|
|
characters should be encoded.
|
|
|
|
<P>
|
|
If you want Squid to accept URL's with whitespace, you have to
|
|
decide how to handle them. There are four choices that you
|
|
can set with the <em/uri_whitespace/ option:
|
|
<enum>
|
|
<item>
|
|
DENY:
|
|
The request is denied with an ``Invalid Request'' message.
|
|
This is the default.
|
|
<item>
|
|
ALLOW:
|
|
The request is allowed and the URL remains unchanged.
|
|
<item>
|
|
ENCODE:
|
|
The whitespace characters are encoded according to
|
|
<url url="http://www.ietf.org/rfc/rfc1738.txt"
|
|
name="RFC 1738">. This can be considered a violation
|
|
of the HTTP specification.
|
|
<item>
|
|
CHOP:
|
|
The URL is chopped at the first whitespace character
|
|
and then processed normally. This also can be considered
|
|
a violation of HTTP.
|
|
</enum>
|
|
|
|
<sect1>commBind: Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address
|
|
<label id="comm-bind-loopback-fail">
|
|
<p>
|
|
This likely means that your system does not have a loopback network device, or
|
|
that device is not properly configured.
|
|
All Unix systems should have a network device named <em/lo0/, and it should
|
|
be configured with the address 127.0.0.1. If not, you may get the above
|
|
error message.
|
|
To check your system, run:
|
|
<verb>
|
|
% ifconfig lo0
|
|
</verb>
|
|
The result should look something like:
|
|
<verb>
|
|
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
|
|
inet 127.0.0.1 netmask 0xff000000
|
|
</verb>
|
|
|
|
<p>
|
|
If you use FreeBSD, see <ref id="freebsd-no-lo0" name="this">.
|
|
|
|
<sect1>Unknown cache_dir type '/var/squid/cache'
|
|
<p>
|
|
The format of the <em/cache_dir/ option changed with version
|
|
2.3. It now takes a <em/type/ argument. All you need to do
|
|
is insert <tt/ufs/ in the line, like this:
|
|
<verb>
|
|
cache_dir ufs /var/squid/cache ...
|
|
</verb>
|
|
|
|
<sect1>unrecognized: 'cache_dns_program /usr/local/squid/bin/dnsserver'
|
|
<p>
|
|
As of Squid 2.3, the default is to use internal DNS lookup code.
|
|
The <em/cache_dns_program/ and <em/dns_children/ options are not
|
|
known squid.conf directives in this case. Simply comment out
|
|
these two options.
|
|
<p>
|
|
If you want to use external DNS lookups, with the <em/dnsserver/
|
|
program, then add this to your configure command:
|
|
<verb>
|
|
--disable-internal-dns
|
|
</verb>
|
|
|
|
<sect1>Is <em/dns_defnames/ broken in 2.3.STABLE1 and STABLE2?
|
|
<p>
|
|
Sort of. As of Squid 2.3, the default is to use internal DNS lookup code.
|
|
The <em/dns_defnames/ option is only used with the external <em/dnsserver/
|
|
processes. If you relied on <em/dns_defnames/ before, you have three choices:
|
|
<enum>
|
|
<item>
|
|
See if the <em/append_domain/ option will work for you instead.
|
|
<item>
|
|
Configure squid with --disable-internal-dns to use the external
|
|
dnsservers.
|
|
<item>
|
|
Enhance <em>src/dns_internal.c</em> to understand the <tt/search/
|
|
and <tt/domain/ lines from <em>/etc/resolv.conf</em>.
|
|
</enum>
|
|
|
|
<sect1>What does <em>sslReadClient: FD 14: read failure: (104) Connection reset by peer</em> mean?
|
|
<p>
|
|
``Connection reset by peer'' is an error code that Unix operating systems
|
|
sometimes return for <em/read/, <em/write/, <em/connect/, and other
|
|
system calls.
|
|
<p>
|
|
Connection reset means that the other host, the peer, sent us a RESET
|
|
packet on a TCP connection. A host sends a RESET when it receives
|
|
an unexpected packet for a nonexistent connection. For example, if
|
|
one side sends data at the same time that the other side closes
|
|
a connection, when the other side receives the data it may send
|
|
a reset back.
|
|
<p>
|
|
The fact that these messages appear in Squid's log might indicate
|
|
a problem, such as a broken origin server or parent cache. On
|
|
the other hand, they might be ``normal,'' especially since
|
|
some applications are known to force connection resets rather
|
|
than a proper close.
|
|
<p>
|
|
You probably don't need to worry about them, unless you receive
|
|
a lot of user complaints relating to SSL sites.
|
|
<p>
|
|
<url url="raj at cup dot hp dot com" name="Rick Jones"> notes that
|
|
if the server is running a Microsoft TCP stack, clients
|
|
receive RST segments whenever the listen queue overflows. In other words,
|
|
if the server is really busy, new connections receive the reset message.
|
|
This is contrary to rational behaviour, but is unlikely to change.
|
|
|
|
|
|
<sect1>What does <em>Connection refused</em> mean?
|
|
<p>
|
|
This is an error message, generated by your operating system,
|
|
in response to a <em/connect()/ system call. It happens when
|
|
there is no server at the other end listening on the port number
|
|
that we tried to connect to.
|
|
<p>
|
|
Its quite easy to generate this error on your own. Simply
|
|
telnet to a random, high numbered port:
|
|
<verb>
|
|
% telnet localhost 12345
|
|
Trying 127.0.0.1...
|
|
telnet: Unable to connect to remote host: Connection refused
|
|
</verb>
|
|
It happens because there is no server listening for connections
|
|
on port 12345.
|
|
<p>
|
|
When you see this in response to a URL request, it probably means
|
|
the origin server web site is temporarily down. It may also mean
|
|
that your parent cache is down, if you have one.
|
|
|
|
<sect1>squid: ERROR: no running copy
|
|
<p>
|
|
You may get this message when you run commands like <tt/squid -krotate/.
|
|
<p>
|
|
This error message usually means that the <em/squid.pid/ file is
|
|
missing. Since the PID file is normally present when squid is running,
|
|
the absence of the PID file usually means Squid is not running.
|
|
If you accidentally delete the PID file, Squid will continue running, and
|
|
you won't be able to send it any signals.
|
|
<p>
|
|
If you accidentally removed the PID file, there are two ways to get it back.
|
|
<enum>
|
|
<item>run <tt/ps/ and find the Squid process id. You'll probably see
|
|
two processes, like this:
|
|
<verb>
|
|
bender-wessels % ps ax | grep squid
|
|
83617 ?? Ss 0:00.00 squid -s
|
|
83619 ?? S 0:00.48 (squid) -s (squid)
|
|
</verb>
|
|
You want the second process id, 83619 in this case. Create the PID file and put the
|
|
process id number there. For example:
|
|
<verb>
|
|
echo 83619 > /usr/local/squid/logs/squid.pid
|
|
</verb>
|
|
<item>
|
|
Use the above technique to find the Squid process id. Send the process a HUP
|
|
signal, which is the same as <tt/squid -kreconfigure/:
|
|
<verb>
|
|
kill -HUP 83619
|
|
</verb>
|
|
The reconfigure process creates a new PID file automatically.
|
|
</enum>
|
|
|
|
<sect1>FATAL: getgrnam failed to find groupid for effective group 'nogroup'
|
|
<p>
|
|
You are probably starting Squid as root. Squid is trying to find
|
|
a group-id that doesn't have any special priveleges that it will
|
|
run as. The default is <em/nogroup/, but this may not be defined
|
|
on your system. You need to edit <em/squid.conf/ and set
|
|
<em/cache_effective_group/ to the name of an unpriveledged group
|
|
from <em>/etc/group</em>. There is a good chance that <em/nobody/
|
|
will work for you.
|
|
|
|
<sect1>``Unsupported Request Method and Protocol'' for <em/https/ URLs.
|
|
<p>
|
|
<em>Note: The information here is current for version 2.3.</em>
|
|
<p>
|
|
This is correct. Squid does not know what to do with an <em/https/
|
|
URL. To handle such a URL, Squid would need to speak the SSL
|
|
protocol. Unfortunately, it does not (yet).
|
|
<p>
|
|
Normally, when you type an <em/https/ URL into your browser, one of
|
|
two things happens.
|
|
<enum>
|
|
<item>The browser opens an SSL connection directly to the origin
|
|
server.
|
|
<item>The browser tunnels the request through Squid with the
|
|
<em/CONNECT/ request method.
|
|
</enum>
|
|
<p>
|
|
The <em/CONNECT/ method is a way to tunnel any kind of
|
|
connection through an HTTP proxy. The proxy doesn't
|
|
understand or interpret the contents. It just passes
|
|
bytes back and forth between the client and server.
|
|
For the gory details on tunnelling and the CONNECT
|
|
method, please see
|
|
<url url="ftp://ftp.isi.edu/in-notes/rfc2817.txt" name="RFC 2817">
|
|
and <url url="http://www.web-cache.com/Writings/Internet-Drafts/draft-luotonen-web-proxy-tunneling-01.txt"
|
|
name="Tunneling TCP based protocols through Web proxy servers"> (expired).
|
|
|
|
<sect1>Squid uses 100% CPU
|
|
<p>
|
|
There may be many causes for this.
|
|
<p>
|
|
Andrew Doroshenko reports that removing <em>/dev/null</em>, or
|
|
mounting a filesystem with the <em>nodev</em> option, can cause
|
|
Squid to use 100% of CPU. His suggested solution is to ``touch /dev/null.''
|
|
|
|
<sect1>Webmin's <em/cachemgr.cgi/ crashes the operating system
|
|
<p>
|
|
Mikael Andersson reports that clicking on Webmin's <em/cachemgr.cgi/
|
|
link creates numerous instances of <em/cachemgr.cgi/ that quickly
|
|
consume all available memory and brings the system to its knees.
|
|
<p>
|
|
Changing the path to use Squid's own <em/cachemgr.cgi/ fixes
|
|
this problem. You can change the path by logging into the
|
|
Webmin GUI, select <em/Servers/ then <em/Squid Proxy Cache/.
|
|
Next select <em/Module Config/. From here you'll be
|
|
able to enter the pathname to the <em/cachemgr.cgi/ that came
|
|
with Squid.
|
|
|
|
<sect1>Segment Violation at startup or upon first request
|
|
|
|
<p>
|
|
Some versions of GCC (notably 2.95.1 through 2.95.3) have bugs
|
|
with compiler optimization. These GCC bugs may cause NULL pointer
|
|
accesses in Squid, resulting in a ``FATAL: Received Segment
|
|
Violation...dying'' message and a core dump.
|
|
<p>
|
|
You can work around these GCC bugs by disabling compiler
|
|
optimization. The best way to do that is start with a clean
|
|
source tree and set the CC options specifically:
|
|
<verb>
|
|
% cd squid-x.y
|
|
% make distclean
|
|
% setenv CFLAGS='-g -Wall'
|
|
% ./configure ...
|
|
</verb>
|
|
<p>
|
|
To check that you did it right, you can search for AC_CFLAGS in
|
|
<em>src/Makefile</em>:
|
|
<verb>
|
|
% grep AC_CFLAGS src/Makefile
|
|
AC_CFLAGS = -g -Wall
|
|
</verb>
|
|
Now when you recompile, GCC won't try to optimize anything:
|
|
<verb>
|
|
% make
|
|
Making all in lib...
|
|
gcc -g -Wall -I../include -I../include -c rfc1123.c
|
|
...etc...
|
|
</verb>
|
|
<p>
|
|
NOTE: some people worry that disabling compiler optimization will
|
|
negatively impact Squid's performance. The impact should be
|
|
negligible, unless your cache is really busy and already runs
|
|
at a high CPU usage. For most people, the compiler optimization
|
|
makes little or no difference at all.
|
|
|
|
<sect1>urlParse: Illegal character in hostname 'proxy.mydomain.com:8080proxy.mydomain.com'
|
|
<p>
|
|
By Yomler of fnac.net
|
|
<p>
|
|
A combination of a bad configuration of Internet Explorer and any
|
|
application which use the cydoor DLLs will produce the entry in the log.
|
|
See <url url="http://www.cydoor.com/" name="cydoor.com"> for a complete list.
|
|
<p>
|
|
The bad configuration of IE is the use of a active configuration script
|
|
(proxy.pac) and an active or inactive, but filled proxy settings. IE will
|
|
only use the proxy.pac. Cydoor aps will use both and will generate the errors.
|
|
<p>
|
|
Disabling the old proxy settings in IE is not enought, you should delete
|
|
them completely and only use the proxy.pac for example.
|
|
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>How does Squid work?
|
|
|
|
<sect1>What are cachable objects?
|
|
<P>
|
|
An Internet Object is a file, document or response to a query for
|
|
an Internet service such as FTP, HTTP, or gopher. A client requests
|
|
an Internet object from a caching proxy; if the object
|
|
is not already cached, the proxy server fetches
|
|
the object (either from the host specified in the URL or from a
|
|
parent or sibling cache) and delivers it to the client.
|
|
|
|
<sect1>What is the ICP protocol?
|
|
<label id="what-is-icp">
|
|
<P>
|
|
ICP is a protocol used for communication among squid caches.
|
|
The ICP protocol is defined in two Internet RFC's.
|
|
<url url="http://www.ircache.net/Cache/ICP/rfc2186.txt"
|
|
name="RFC 2186">
|
|
describes the protocol itself, while
|
|
<url url="http://www.ircache.net/Cache/ICP/rfc2187.txt"
|
|
name="RFC 2187">
|
|
describes the application of ICP to hierarchical Web caching.
|
|
|
|
<P>
|
|
ICP is primarily used within a cache hierarchy to locate specific
|
|
objects in sibling caches. If a squid cache does not have a
|
|
requested document, it sends an ICP query to its siblings, and the
|
|
siblings respond with ICP replies indicating a ``HIT'' or a ``MISS.''
|
|
The cache then uses the replies to choose from which cache to
|
|
resolve its own MISS.
|
|
|
|
<P>
|
|
ICP also supports multiplexed transmission of multiple object
|
|
streams over a single TCP connection. ICP is currently implemented
|
|
on top of UDP. Current versions of Squid also support ICP via
|
|
multicast.
|
|
|
|
<sect1>What is the <em/dnsserver/?
|
|
<P>
|
|
The <em/dnsserver/ is a process forked by <em/squid/ to
|
|
resolve IP addresses from domain names. This is necessary because
|
|
the <tt>gethostbyname(3)</tt> function blocks the calling process
|
|
until the DNS query is completed.
|
|
<P>
|
|
Squid must use non-blocking I/O at all times, so DNS lookups are
|
|
implemented external to the main process. The <em/dnsserver/
|
|
processes do not cache DNS lookups, that is implemented inside the
|
|
<em/squid/ process.
|
|
<P>
|
|
|
|
<sect1>What is the <em/ftpget/ program for?
|
|
<P>
|
|
<em/ftpget/ exists only in Squid 1.1 and Squid 1.0 versions.
|
|
<P>
|
|
The <em/ftpget/ program is an FTP client used for retrieving
|
|
files from FTP servers. Because the FTP protocol is complicated,
|
|
it is easier to implement it separately from the main <em/squid/
|
|
code.
|
|
<P>
|
|
|
|
<sect1>FTP PUT's don't work!
|
|
<P>
|
|
FTP PUT should work with Squid-2.0 and later versions. If you
|
|
are using Squid-1.1, then you need to upgrade before PUT will work.
|
|
|
|
<sect1>What is a cache hierarchy? What are parents and siblings?
|
|
<P>
|
|
|
|
A cache hierarchy is a collection of caching proxy servers organized
|
|
in a logical parent/child and sibling arrangement so that caches
|
|
closest to Internet gateways (closest to the backbone transit
|
|
entry-points) act as parents to caches at locations farther from
|
|
the backbone. The parent caches resolve ``misses'' for their children.
|
|
In other words, when a cache requests an object from its parent,
|
|
and the parent does not have the object in its cache, the parent
|
|
fetches the object, caches it, and delivers it to the child. This
|
|
ensures that the hierarchy achieves the maximum reduction in
|
|
bandwidth utilization on the backbone transit links, helps reduce
|
|
load on Internet information servers outside the network served by
|
|
the hierarchy, and builds a rich cache on the parents so that the
|
|
other child caches in the hierarchy will obtain better ``hit'' rates
|
|
against their parents.
|
|
|
|
<P>
|
|
In addition to the parent-child relationships, squid supports the
|
|
notion of siblings: caches at the same level in the hierarchy,
|
|
provided to distribute cache server load. Each cache in the
|
|
hierarchy independently decides whether to fetch the reference from
|
|
the object's home site or from parent or sibling caches, using a
|
|
a simple resolution protocol. Siblings will not fetch an object
|
|
for another sibling to resolve a cache ``miss.''
|
|
|
|
<sect1>What is the Squid cache resolution algorithm?
|
|
<P>
|
|
|
|
<itemize>
|
|
<item>Send ICP queries to all appropriate siblings
|
|
<item>Wait for all replies to arrive with a configurable timeout
|
|
(the default is two seconds).
|
|
<item>Begin fetching the object upon receipt of the first HIT reply,
|
|
or
|
|
<item>Fetch the object from the first parent which replied with MISS
|
|
(subject to weighting values), or
|
|
<item>Fetch the object from the source
|
|
</itemize>
|
|
|
|
<P>
|
|
The algorithm is somewhat more complicated when firewalls
|
|
are involved.
|
|
|
|
<P>
|
|
The <tt/single_parent_bypass/ directive can be used to skip
|
|
the ICP queries if the only appropriate sibling is a parent cache
|
|
(i.e., if there's only one place you'd fetch the object from, why
|
|
bother querying?)
|
|
|
|
<sect1>What features are Squid developers currently working on?
|
|
<P>
|
|
|
|
There are several open issues for the caching project namely
|
|
more automatic load balancing and (both configured and
|
|
dynamic) selection of parents, routing, multicast
|
|
cache-to-cache communication, and better recognition of URLs
|
|
that are not worth caching.
|
|
<P>
|
|
For our other to-do list items, please
|
|
see our ``TODO'' file in the recent source distributions.
|
|
|
|
<P>
|
|
Prospective developers should review the resources available at the
|
|
<url url="http://www.squid-cache.org/Devel/"
|
|
name="Squid developers corner">
|
|
|
|
<sect1>Tell me more about Internet traffic workloads
|
|
<P>
|
|
|
|
Workload can be characterized as the burden a client or
|
|
group of clients imposes on a system. Understanding the
|
|
nature of workloads is important to the managing system
|
|
capacity.
|
|
|
|
If you are interested in Internet traffic workloads then NLANR's
|
|
<url url="http://www.nlanr.net/NA/"
|
|
name="Network Analysis activities"> is a good place to start.
|
|
|
|
<sect1>What are the tradeoffs of caching with the NLANR cache system?
|
|
<P>
|
|
|
|
The NLANR root caches are at the NSF supercomputer centers (SCCs),
|
|
which are interconnected via NSF's high speed backbone service
|
|
(vBNS). So inter-cache communication between the NLANR root caches
|
|
does not cross the Internet.
|
|
|
|
<P>
|
|
The benefits of hierarchical caching (namely, reduced network
|
|
bandwidth consumption, reduced access latency, and improved
|
|
resiliency) come at a price. Caches higher in the hierarchy must
|
|
field the misses of their descendents. If the equilibrium hit rate
|
|
of a leaf cache is 50%, half of all leaf references have to be
|
|
resolved through a second level cache rather than directly from
|
|
the object's source. If this second level cache has most of the
|
|
documents, it is usually still a win, but if higher level caches
|
|
often don't have the document, or become overloaded, then they
|
|
could actually increase access latency, rather than reduce it.
|
|
<P>
|
|
|
|
<sect1>Where can I find out more about firewalls?
|
|
|
|
<P>
|
|
Please see the
|
|
<url url="http://lists.gnac.net/firewalls/"
|
|
name="Firewalls mailing list and FAQ">
|
|
information site.
|
|
|
|
<sect1>What is the ``Storage LRU Expiration Age?''
|
|
<P>
|
|
For example:
|
|
<verb>
|
|
Storage LRU Expiration Age: 4.31 days
|
|
</verb>
|
|
|
|
<P>
|
|
The LRU expiration age is a dynamically-calculated value. Any objects
|
|
which have not been accessed for this amount of time will be removed from
|
|
the cache to make room for new, incoming objects. Another way of looking
|
|
at this is that it would
|
|
take your cache approximately this many days to go from empty to full at
|
|
your current traffic levels.
|
|
|
|
<P>
|
|
As your cache becomes more busy, the LRU age becomes lower so that more
|
|
objects will be removed to make room for the new ones. Ideally, your
|
|
cache will have an LRU age value in the range of at least 3 days. If the
|
|
LRU age is lower than 3 days, then your cache is probably not big enough
|
|
to handle the volume of requests it receives. By adding more disk space
|
|
you could increase your cache hit ratio.
|
|
|
|
<P>
|
|
The configuration parameter <em/reference_age/ places an upper limit on
|
|
your cache's LRU expiration age.
|
|
|
|
<sect1>What is ``Failure Ratio at 1.01; Going into hit-only-mode for 5 minutes''?
|
|
<P>
|
|
Consider a pair of caches named A and B. It may be the case that A can
|
|
reach B, and vice-versa, but B has poor reachability to the rest of the
|
|
Internet.
|
|
In this case, we would like B to recognize that it has poor reachability
|
|
and somehow convey this fact to its neighbor caches.
|
|
|
|
<P>
|
|
Squid will track the ratio of failed-to-successful requests over short
|
|
time periods. A failed request is one which is logged as ERR_DNS_FAIL, ERR_CONNECT_FAIL, or ERR_READ_ERROR. When the failed-to-successful ratio exceeds 1.0,
|
|
then Squid will return ICP_MISS_NOFETCH instead of ICP_MISS to neighbors.
|
|
Note, Squid will still return ICP_HIT for cache hits.
|
|
|
|
<sect1>Does squid periodically re-read its configuration file?
|
|
<P>
|
|
No, you must send a HUP signal to have Squid re-read its configuration file,
|
|
including access control lists. An easy way to do this is with the <em/-k/
|
|
command line option:
|
|
<verb>
|
|
squid -k reconfigure
|
|
</verb>
|
|
|
|
<sect1>How does <em/unlinkd/ work?
|
|
<P>
|
|
<em/unlinkd/ is an external process used for unlinking unused cache files.
|
|
Performing the unlink operation in an external process opens up some
|
|
race-condition problems for Squid. If we are not careful, the following
|
|
sequence of events could occur:
|
|
<enum>
|
|
<item>
|
|
An object with swap file number <bf/S/ is removed from the cache.
|
|
<item>
|
|
We want to unlink file <bf/F/ which corresponds to swap file number <bf/S/,
|
|
so we write pathname <bf/F/ to the <em/unlinkd/ socket.
|
|
We also mark <bf/S/ as available in the filemap.
|
|
<item>
|
|
We have a new object to swap out. It is allocated to the first available
|
|
file number, which happens to be <bf/S/. Squid opens file <bf/F/ for writing.
|
|
<item>
|
|
The <em/unlinkd/ process reads the request to unlink <bf/F/ and issues the
|
|
actual unlink call.
|
|
</enum>
|
|
<P>
|
|
So, the problem is, how can we guarantee that <em/unlinkd/ will not
|
|
remove a cache file that Squid has recently allocated to a new object?
|
|
The approach we have taken is to have Squid keep a stack of unused (but
|
|
not deleted!) swap file numbers. The stack size is hard-coded at 128
|
|
entries. We only give unlink requests to <em/unlinkd/ when the unused
|
|
file number stack is full. Thus, if we ever have to start unlinking
|
|
files, we have a pool of 128 file numbers to choose from which we know
|
|
will not be removed by <em/unlinkd/.
|
|
|
|
<P>
|
|
In terms of implementation, the only way to send unlink requests to
|
|
the <em/unlinkd/ process is via the <em/storePutUnusedFileno/ function.
|
|
|
|
<P>
|
|
Unfortunately there are times when Squid can not use the <em/unlinkd/ process
|
|
but must call <em/unlink(2)/ directly. One of these times is when the cache
|
|
swap size is over the high water mark. If we push the released file numbers
|
|
onto the unused file number stack, and the stack is not full, then no files
|
|
will be deleted, and the actual disk usage will remain unchanged. So, when
|
|
we exceed the high water mark, we must call <em/unlink(2)/ directly.
|
|
|
|
<sect1>What is an icon URL?
|
|
|
|
<P>
|
|
One of the most unpleasant things Squid must do is generate HTML
|
|
pages of Gopher and FTP directory listings. For some strange
|
|
reason, people like to have little <em/icons/ next to each
|
|
listing entry, denoting the type of object to which the
|
|
link refers (image, text file, etc.).
|
|
|
|
<P>
|
|
In Squid 1.0 and 1.1, we used internal browser icons with names
|
|
like <em/gopher-internal-image/. Unfortunately, these were
|
|
not very portable. Not all browsers had internal icons, or
|
|
even used the same names. Perhaps only Netscape and Mosaic
|
|
used these names.
|
|
|
|
<P>
|
|
For Squid 2 we include a set of icons in the source distribution.
|
|
These icon files are loaded by Squid as cached objects at runtime.
|
|
Thus, every Squid cache now has its own icons to use in Gopher and FTP
|
|
listings. Just like other objects available on the web, we refer to
|
|
the icons with
|
|
<url url="http://info.internet.isi.edu/in-notes/rfc/files/rfc1738.txt"
|
|
name="Uniform Resource Locators">, or <em/URLs/.
|
|
|
|
<sect1>Can I make my regular FTP clients use a Squid cache?
|
|
|
|
<P>
|
|
Nope, its not possible. Squid only accepts HTTP requests. It speaks
|
|
FTP on the <em/server-side/, but <bf/not/ on the <em/client-side/.
|
|
|
|
<P>
|
|
The very cool
|
|
<url url="ftp://gnjilux.cc.fer.hr/pub/unix/util/wget/"
|
|
name="wget">
|
|
will download FTP URLs via Squid (and probably any other proxy cache).
|
|
|
|
<sect1>Why is the select loop average time so high?
|
|
<P>
|
|
|
|
<it>
|
|
Is there any way to speed up the time spent dealing with select? Cachemgr
|
|
shows:
|
|
</it>
|
|
<verb>
|
|
Select loop called: 885025 times, 714.176 ms avg
|
|
</verb>
|
|
|
|
<P>
|
|
This number is NOT how much time it takes to handle filedescriptor I/O.
|
|
We simply count the number of times select was called, and divide the
|
|
total process running time by the number of select calls.
|
|
|
|
<P>
|
|
This means, on average it takes your cache .714 seconds to check all
|
|
the open file descriptors once. But this also includes time select()
|
|
spends in a wait state when there is no I/O on any file descriptors.
|
|
My relatively idle workstation cache has similar numbers:
|
|
<verb>
|
|
Select loop called: 336782 times, 715.938 ms avg
|
|
</verb>
|
|
But my busy caches have much lower times:
|
|
<verb>
|
|
Select loop called: 16940436 times, 10.427 ms avg
|
|
Select loop called: 80524058 times, 10.030 ms avg
|
|
Select loop called: 10590369 times, 8.675 ms avg
|
|
Select loop called: 84319441 times, 9.578 ms avg
|
|
</verb>
|
|
|
|
<sect1>How does Squid deal with Cookies?
|
|
|
|
<P>
|
|
The presence of Cookies headers in <bf/requests/ does not affect whether
|
|
or not an HTTP reply can be cached. Similarly, the presense of
|
|
<em/Set-Cookie/ headers in <bf/replies/ does not affect whether
|
|
the reply can be cached.
|
|
|
|
<P>
|
|
The proper way to deal with <em/Set-Cookie/ reply headers, according
|
|
to <url url="http://info.internet.isi.edu/in-notes/rfc/files/rfc2109.txt" name="RFC 2109">
|
|
is to cache the whole object, <em/EXCEPT/ the <em/Set-Cookie/ header lines.
|
|
|
|
|
|
<P>
|
|
With Squid-1.1, we can not filter out specific HTTP headers, so
|
|
Squid-1.1 does not cache any response which contains a <em/Set-Cookie/
|
|
header.
|
|
|
|
<P>
|
|
With Squid-2, however, we can filter out specific HTTP headers. But instead
|
|
of filtering them on the receiving-side, we filter them on the sending-side.
|
|
Thus, Squid-2 does cache replies with <em/Set-Cookie/ headers, but
|
|
it filters out the <em/Set-Cookie/ header itself for cache hits.
|
|
|
|
<sect1>How does Squid decide when to refresh a cached object?
|
|
|
|
<P>
|
|
When checking the object freshness, we calculate these values:
|
|
<itemize>
|
|
<item>
|
|
<em/OBJ_DATE/ is the time when the object was given out by the
|
|
origin server. This is taken from the HTTP Date reply header.
|
|
<item>
|
|
<em/OBJ_LASTMOD/ is the time when the object was last modified,
|
|
given by the HTTP Last-Modified reply header.
|
|
<item>
|
|
<em/OBJ_AGE/ is how much the object has aged <em/since/ it was retrieved:
|
|
<verb>
|
|
OBJ_AGE = NOW - OBJ_DATE
|
|
</verb>
|
|
<item>
|
|
<em/LM_AGE/ is how old the object was <em/when/ it was retrieved:
|
|
<verb>
|
|
LM_AGE = OBJ_DATE - OBJ_LASTMOD
|
|
</verb>
|
|
<item>
|
|
<em/LM_FACTOR/ is the ratio of <em/OBJ_AGE/ to <em/LM_AGE/:
|
|
<verb>
|
|
LM_FACTOR = OBJ_AGE / LM_AGE
|
|
</verb>
|
|
<item>
|
|
<em/CLIENT_MAX_AGE/ is the (optional) maximum object age the client will
|
|
accept as taken from the HTTP/1.1 Cache-Control request header.
|
|
<item>
|
|
<em/EXPIRES/ is the (optional) expiry time from the server reply headers.
|
|
</itemize>
|
|
|
|
<P>
|
|
These values are compared with the parameters of the <em/refresh_pattern/
|
|
rules. The refresh parameters are:
|
|
<itemize>
|
|
<item>URL regular expression
|
|
<item><em/CONF_MIN/:
|
|
The time (in minutes) an object without an explicit expiry
|
|
time should be considered fresh. The recommended value is 0, any higher
|
|
values may cause dynamic applications to be erronously cached unless the
|
|
application designer has taken the appropriate actions.
|
|
|
|
<item><em/CONF_PERCENT/:
|
|
A percentage of the objects age (time since last
|
|
modification age) an object without explicit exipry time will be
|
|
considered fresh.
|
|
|
|
<item><em/CONF_MAX/:
|
|
An upper limit on how long objects without an explicit
|
|
expiry time will be considered fresh.
|
|
|
|
</itemize>
|
|
|
|
<P>
|
|
The URL regular expressions are checked in the order listed until a
|
|
match is found. Then the algorithms below are applied for determining
|
|
if an object is fresh or stale.
|
|
|
|
<sect2>Squid-1.1 and Squid-1.NOVM algorithm
|
|
<P>
|
|
<verb>
|
|
if (CLIENT_MAX_AGE)
|
|
if (OBJ_AGE > CLIENT_MAX_AGE)
|
|
return STALE
|
|
if (OBJ_AGE <= CONF_MIN)
|
|
return FRESH
|
|
if (EXPIRES) {
|
|
if (EXPIRES <= NOW)
|
|
return STALE
|
|
else
|
|
return FRESH
|
|
}
|
|
if (OBJ_AGE > CONF_MAX)
|
|
return STALE
|
|
if (LM_FACTOR < CONF_PERCENT)
|
|
return FRESH
|
|
return STALE
|
|
</verb>
|
|
|
|
<P>
|
|
<url url="mailto:bertold@tohotom.vein.hu" name="Kolics Bertold">
|
|
has made an excellent
|
|
<url url="http://www.squid-cache.org/Doc/FAQ/refresh-flowchart.gif"
|
|
name="flow chart diagram"> showing this process.
|
|
|
|
<sect2>Squid-2 algorithm
|
|
|
|
<P>
|
|
For Squid-2 the refresh algorithm has been slightly modified to give the
|
|
<em/EXPIRES/ value a higher precedence, and the <em/CONF_MIN/ value
|
|
lower precedence:
|
|
<verb>
|
|
if (EXPIRES) {
|
|
if (EXPIRES <= NOW)
|
|
return STALE
|
|
else
|
|
return FRESH
|
|
}
|
|
if (CLIENT_MAX_AGE)
|
|
if (OBJ_AGE > CLIENT_MAX_AGE)
|
|
return STALE
|
|
if (OBJ_AGE > CONF_MAX)
|
|
return STALE
|
|
if (OBJ_DATE > OBJ_LASTMOD) {
|
|
if (LM_FACTOR < CONF_PERCENT)
|
|
return FRESH
|
|
else
|
|
return STALE
|
|
}
|
|
if (OBJ_AGE <= CONF_MIN)
|
|
return FRESH
|
|
return STALE
|
|
</verb>
|
|
|
|
|
|
<sect1>What exactly is a <em/deferred read/?
|
|
<P>
|
|
The cachemanager I/O page lists <em/deferred reads/ for various
|
|
server-side protocols.
|
|
<P>
|
|
Sometimes reading on the server-side gets ahead of writing to the
|
|
client-side. Especially if your cache is on a fast network and your
|
|
clients are connected at modem speeds. Squid-1.1 will read up to 256k
|
|
(per request) ahead before it starts to defer the server-side reads.
|
|
|
|
<sect1>Why is my cache's inbound traffic equal to the outbound traffic?
|
|
<P>
|
|
<it>
|
|
I've been monitoring
|
|
the traffic on my cache's ethernet adapter an found a behavior I can't explain:
|
|
the inbound traffic is equal to the outbound traffic. The differences are
|
|
negligible. The hit ratio reports 40%.
|
|
Shouldn't the outbound be at least 40% greater than the inbound?
|
|
</it>
|
|
<P>
|
|
by <url url="mailto:david@avarice.nepean.uws.edu.au" name="David J N Begley">
|
|
<P>
|
|
I can't account for the exact behavior you're seeing, but I can offer this
|
|
advice; whenever you start measuring raw Ethernet or IP traffic on
|
|
interfaces, you can forget about getting all the numbers to exactly match what
|
|
Squid reports as the amount of traffic it has sent/received.
|
|
|
|
<P>
|
|
Why?
|
|
|
|
<P>
|
|
Squid is an application - it counts whatever data is sent to, or received
|
|
from, the lower-level networking functions; at each successively lower layer,
|
|
additional traffic is involved (such as header overhead, retransmits and
|
|
fragmentation, unrelated broadcasts/traffic, etc.). The additional traffic is
|
|
never seen by Squid and thus isn't counted - but if you run MRTG (or any
|
|
SNMP/RMON measurement tool) against a specific interface, all this additional
|
|
traffic will "magically appear".
|
|
|
|
<P>
|
|
Also remember that an interface has no concept of upper-layer networking (so
|
|
an Ethernet interface doesn't distinguish between IP traffic that's entirely
|
|
internal to your organization, and traffic that's to/from the Internet); this
|
|
means that when you start measuring an interface, you have to be aware of
|
|
*what* you are measuring before you can start comparing numbers elsewhere.
|
|
|
|
<P>
|
|
It is possible (though by no means guaranteed) that you are seeing roughly
|
|
equivalent input/output because you're measuring an interface that both
|
|
retrieves data from the outside world (Internet), *and* serves it to end users
|
|
(internal clients). That wouldn't be the whole answer, but hopefully it gives
|
|
you a few ideas to start applying to your own circumstance.
|
|
|
|
<P>
|
|
To interpret any statistic, you have to first know what you are measuring;
|
|
for example, an interface counts inbound and outbound bytes - that's it. The
|
|
interface doesn't distinguish between inbound bytes from external Internet
|
|
sites or from internal (to the organization) clients (making requests). If
|
|
you want that, try looking at RMON2.
|
|
|
|
<P>
|
|
Also, if you're talking about a 40% hit rate in terms of object
|
|
requests/counts then there's absolutely no reason why you should expect a 40%
|
|
reduction in traffic; after all, not every request/object is going to be the
|
|
same size so you may be saving a lot in terms of requests but very little in
|
|
terms of actual traffic.
|
|
|
|
<sect1>How come some objects do not get cached?
|
|
|
|
<P>
|
|
To determine whether a given object may be cached, Squid takes many
|
|
things into consideration. The current algorithm (for Squid-2)
|
|
goes something like this:
|
|
|
|
<enum>
|
|
<item>
|
|
Responses with <em/Cache-Control: Private/ are NOT cachable.
|
|
<item>
|
|
Responses with <em/Cache-Control: No-Cache/ are NOT cachable.
|
|
<item>
|
|
Responses with <em/Cache-Control: No-Store/ are NOT cachable.
|
|
<item>
|
|
Responses for requests with an <em/Authorization/ header
|
|
are cachable ONLY if the reponse includes <em/Cache-Control: Public/.
|
|
<item>
|
|
Responses with <em/Vary/ headers are NOT cachable because Squid
|
|
does not yet support Vary features.
|
|
<item>
|
|
The following HTTP status codes are cachable:
|
|
<itemize>
|
|
<item>200 OK
|
|
<item>203 Non-Authoritative Information
|
|
<item>300 Multiple Choices
|
|
<item>301 Moved Permanently
|
|
<item>410 Gone
|
|
</itemize>
|
|
However, if Squid receives one of these responses from a neighbor
|
|
cache, it will NOT be cached if ALL of the <em/Date/, <em/Last-Modified/,
|
|
and <em/Expires/ reply headers are missing. This prevents such objects
|
|
from bouncing back-and-forth between siblings forever.
|
|
<item>
|
|
A 302 Moved Temporarily response is cachable ONLY if the response
|
|
also includes an <em/Expires/ header.
|
|
<item>
|
|
The following HTTP status codes are ``negatively cached'' for
|
|
a short amount of time (configurable):
|
|
<itemize>
|
|
<item>204 No Content
|
|
<item>305 Use Proxy
|
|
<item>400 Bad Request
|
|
<item>403 Forbidden
|
|
<item>404 Not Found
|
|
<item>405 Method Not Allowed
|
|
<item>414 Request-URI Too Large
|
|
<item>500 Internal Server Error
|
|
<item>501 Not Implemented
|
|
<item>502 Bad Gateway
|
|
<item>503 Service Unavailable
|
|
<item>504 Gateway Time-out
|
|
</itemize>
|
|
<item>
|
|
All other HTTP status codes are NOT cachable, including:
|
|
<itemize>
|
|
<item>206 Partial Content
|
|
<item>303 See Other
|
|
<item>304 Not Modified
|
|
<item>401 Unauthorized
|
|
<item>407 Proxy Authentication Required
|
|
</itemize>
|
|
</enum>
|
|
|
|
<sect1>What does <em/keep-alive ratio/ mean?
|
|
<P>
|
|
The <em/keep-alive ratio/ shows up in the <em/server_list/
|
|
cache manager page for Squid 2.
|
|
<P>
|
|
This is a mechanism to try detecting neighbor caches which might
|
|
not be able to deal with HTTP/1.1 persistent connections. Every
|
|
time we send a <em/proxy-connection: keep-alive/ request header
|
|
to a neighbor, we count how many times the neighbor sent us
|
|
a <em/proxy-connection: keep-alive/ reply header. Thus, the
|
|
<em/keep-alive ratio/ is the ratio of these two counters.
|
|
|
|
<P>
|
|
If the ratio stays above 0.5, then we continue to assume the neighbor
|
|
properly implements persistent connections. Otherwise, we will stop
|
|
sending the keep-alive request header to that neighbor.
|
|
|
|
<sect1>How does Squid's cache replacement algorithm work?
|
|
|
|
<P>
|
|
Squid uses an LRU (least recently used) algorithm to replace old cache
|
|
objects. This means objects which have not been accessed for the
|
|
longest time are removed first. In the source code, the
|
|
StoreEntry->lastref value is updated every time an object is accessed.
|
|
|
|
<P>
|
|
Objects are not necessarily removed ``on-demand.'' Instead, a regularly
|
|
scheduled event runs to periodically remove objects. Normally this
|
|
event runs every second.
|
|
|
|
<P>
|
|
Squid keeps the cache disk usage between the low and high water marks.
|
|
By default the low mark is 90%, and the high mark is 95% of the total
|
|
configured cache size. When the disk usage is close to the low mark,
|
|
the replacement is less aggressive (fewer objects removed). When the
|
|
usage is close to the high mark, the replacement is more aggressive
|
|
(more objects removed).
|
|
|
|
<P>
|
|
When selecting objects for removal, Squid examines some number of objects
|
|
and determines which can be removed and which cannot.
|
|
A number of factors determine whether or not any given object can be
|
|
removed. If the object is currently being requested, or retrieved
|
|
from an upstream site, it will not be removed. If the object is
|
|
``negatively-cached'' it will be removed. If the object has a private
|
|
cache key, it will be removed (there would be no reason to keep it --
|
|
because the key is private, it can never be ``found'' by subsequent requests).
|
|
Finally, if the time since last access is greater than the LRU threshold,
|
|
the object is removed.
|
|
|
|
<P>
|
|
The LRU threshold value is dynamically calculated based on the current
|
|
cache size and the low and high marks. The LRU threshold scaled
|
|
exponentially between the high and low water marks. When the store swap
|
|
size is near the low water mark, the LRU threshold is large. When the
|
|
store swap size is near the high water mark, the LRU threshold is small.
|
|
The threshold automatically adjusts to the rate of incoming requests.
|
|
In fact, when your cache size has stabilized, the LRU threshold
|
|
represents how long it takes to fill (or fully replace) your cache at
|
|
the current request rate. Typical values for the LRU threshold are 1 to
|
|
10 days.
|
|
|
|
<P>
|
|
Back to selecting objects for removal. Obviously it is not possible to
|
|
check every object in the cache every time we need to remove some of them.
|
|
We can only check a small subset each time. The way in which
|
|
this is implemented is very different between Squid-1.1 and Squid-2.
|
|
|
|
<sect2>Squid 1.1
|
|
<P>
|
|
The Squid cache storage is implemented as a hash table with some number
|
|
of "hash buckets." Squid-1.1 scans one bucket at a time and sorts all the
|
|
objects in the bucket by their LRU age. Objects with an LRU age
|
|
over the threshold are removed. The scan rate is adjusted so that
|
|
it takes approximately 24 hours to scan the entire cache. The
|
|
store buckets are randomized so that we don't always scan the same
|
|
buckets at the same time of the day.
|
|
|
|
<P>
|
|
This algorithm has some flaws. Because we only scan one bucket,
|
|
there are going to be better candidates for removal in some of
|
|
the other 16,000 or so buckets. Also, the qsort() function
|
|
might take a non-trivial amount of CPU time, depending on how many
|
|
entries are in each bucket.
|
|
|
|
<sect2>Squid 2
|
|
<P>
|
|
For Squid-2 we eliminated the need to use qsort() by indexing
|
|
cached objects into an automatically sorted linked list. Every time
|
|
an object is accessed, it gets moved to the top of the list. Over time,
|
|
the least used objects migrate to the bottom of the list. When looking
|
|
for objects to remove, we only need to check the last 100 or so objects
|
|
in the list. Unfortunately this approach increases our memory usage
|
|
because of the need to store three additional pointers per cache object.
|
|
But for Squid-2 we're still ahead of the game because we also replaced
|
|
plain-text cache keys with MD5 hashes.
|
|
|
|
<sect1>What are private and public keys?
|
|
<label id="pub-priv-keys">
|
|
<P>
|
|
<em/keys/ refers to the database keys which Squid uses to index
|
|
cache objects. Every object in the cache--whether saved on disk
|
|
or currently being downloaded--has a cache key. For Squid-1.0 and
|
|
Squid-1.1 the cache key was basically the URL. Squid-2 uses
|
|
MD5 checksums for cache keys.
|
|
|
|
<P>
|
|
The Squid cache uses the notions of <em/private/ and <em/public/
|
|
cache keys. An object can start out as being private, but may later be
|
|
changed to public status. Private objects are associated with only a single
|
|
client whereas a public object may be sent to multiple clients at the
|
|
same time. In other words, public objects can be located by any cache
|
|
client. Private keys can only be located by a single client--the one
|
|
who requested it.
|
|
|
|
<P>
|
|
Objects are changed from private to public after all of the HTTP
|
|
reply headers have been received and parsed. In some cases, the
|
|
reply headers will indicate the object should not be made public.
|
|
For example, if the <em/no-cache/ Cache-Control directive is used.
|
|
|
|
<sect1>What is FORW_VIA_DB for?
|
|
<P>
|
|
We use it to collect data for <url
|
|
url="http://www.ircache.net/Cache/Plankton/" name="Plankton">.
|
|
|
|
<sect1>Does Squid send packets to port 7 (echo)? If so, why?
|
|
<P>
|
|
It may. This is an old feature from the Harvest cache software.
|
|
The cache would send ICP ``SECHO'' message to the echo ports of
|
|
origin servers. If the SECHO message came back before any of the
|
|
other ICP replies, then it meant the origin server was probably
|
|
closer than any neighbor cache. In that case Harvest/Squid sent
|
|
the request directly to the origin server.
|
|
|
|
<P>
|
|
With more attention focused on security, many administrators filter
|
|
UDP packets to port 7. The Computer Emergency Response Team (CERT)
|
|
once issued an advisory note (<url
|
|
url="http://www.cert.org/advisories/CA-96.01.UDP_service_denial.html"
|
|
name="CA-96.01: UDP Port Denial-of-Service Attack">) that says UDP
|
|
echo and chargen services can be used for a denial of service
|
|
attack. This made admins extremely nervous about any packets
|
|
hitting port 7 on their systems, and they made complaints.
|
|
|
|
<P>
|
|
The <em/source_ping/ feature has been disabled in Squid-2.
|
|
If you're seeing packets to port 7 that are coming from a
|
|
Squid cache (remote port 3130), then its probably a
|
|
very old version of Squid.
|
|
|
|
<sect1>What does ``WARNING: Reply from unknown nameserver [a.b.c.d]'' mean?
|
|
<P>
|
|
It means Squid sent a DNS query to one IP address, but the response
|
|
came back from a different IP address. By default Squid checks that
|
|
the addresses match. If not, Squid ignores the response.
|
|
|
|
<P>There are a number of reasons why this would happen:
|
|
<enum>
|
|
<item>
|
|
Your DNS name server just works this way, either becuase
|
|
its been configured to, or because its stupid and doesn't
|
|
know any better.
|
|
<item>
|
|
You have a weird broadcast address, like 0.0.0.0, in
|
|
your <em>/etc/resolv.conf</em> file.
|
|
<item>
|
|
Somebody is trying to send spoofed DNS responses to
|
|
your cache.
|
|
</enum>
|
|
|
|
<P>
|
|
If you recognize the IP address in the warning as one of your
|
|
name server hosts, then its probably numbers (1) or (2).
|
|
|
|
<P>
|
|
You can make these warnings stop, and allow responses from
|
|
``unknown'' name servers by setting this configuration option:
|
|
<verb>
|
|
ignore_unknown_nameservers off
|
|
</verb>
|
|
|
|
<sect1>How does Squid distribute cache files among the available directories?
|
|
<p>
|
|
<em>Note: The information here is current for version 2.2.</em>
|
|
<p>
|
|
See <em/storeDirMapAllocate()/ in the source code.
|
|
|
|
<p>
|
|
When Squid wants to create a new disk file for storing an object, it
|
|
first selects which <em/cache_dir/ the object will go into. This is done
|
|
with the <em/storeDirSelectSwapDir()/ function. If you have <em/N/
|
|
cache directories, the function identifies the <em>3N/4</em> (75%)
|
|
of them with the most available space. These directories are
|
|
then used, in order of having the most available space. When Squid has
|
|
stored one URL to each of the
|
|
<em>3N/4</em> <em/cache_dir/'s, the process repeats and
|
|
<em/storeDirSelectSwapDir()/ finds a new set of <em>3N/4</em>
|
|
cache directories with the most available space.
|
|
|
|
<p>
|
|
Once the <em/cache_dir/ has been selected, the next step is to find
|
|
an available <em/swap file number/. This is accomplished
|
|
by checking the <em/file map/, with the <em/file_map_allocate()/
|
|
function. Essentially the swap file numbers are allocated
|
|
sequentially. For example, if the last number allocated
|
|
happens to be 1000, then the next one will be the first
|
|
number after 1000 that is not already being used.
|
|
|
|
<sect1>Why do I see negative byte hit ratio?
|
|
<p>
|
|
Byte hit ratio is calculated a bit differently than
|
|
Request hit ratio. Squid counts the number of bytes read
|
|
from the network on the server-side, and the number of bytes written to
|
|
the client-side. The byte hit ratio is calculated as
|
|
<verb>
|
|
(client_bytes - server_bytes) / client_bytes
|
|
</verb>
|
|
If server_bytes is greater than client_bytes, you end up
|
|
with a negative value.
|
|
|
|
<p>
|
|
The server_bytes may be greater than client_bytes for a number
|
|
of reasons, including:
|
|
<itemize>
|
|
<item>
|
|
Cache Digests and other internally generated requests.
|
|
Cache Digest messages are quite large. They are counted
|
|
in the server_bytes, but since they are consumed internally,
|
|
they do not count in client_bytes.
|
|
<item>
|
|
User-aborted requests. If your <em/quick_abort/ setting
|
|
allows it, Squid sometimes continues to fetch aborted
|
|
requests from the server-side, without sending any
|
|
data to the client-side.
|
|
<item>
|
|
Some range requests, in combination with Squid bugs, can
|
|
consume more bandwidth on the server-side than on the
|
|
client-side. In a range request, the client is asking for
|
|
only some part of the object. Squid may decide to retrieve
|
|
the whole object anyway, so that it can be used later on.
|
|
This means downloading more from the server than sending
|
|
to the client. You can affect this behavior with
|
|
the <em/range_offset_limit/ option.
|
|
</itemize>
|
|
|
|
<sect1>What does ``Disabling use of private keys'' mean?
|
|
<p>
|
|
First you need to understand the
|
|
<ref id="pub-priv-keys" name="difference between public and private
|
|
keys">.
|
|
|
|
<p>
|
|
When Squid sends ICP queries, it uses the ICP <em/reqnum/ field
|
|
to hold the private key data. In other words, when Squid gets an
|
|
ICP reply, it uses the <em/reqnum/ value to build the private cache key for
|
|
the pending object.
|
|
|
|
|
|
<p>
|
|
Some ICP implementations always set the <em/reqnum/ field to zero
|
|
when they send a reply. Squid can not use private cache keys with
|
|
such neighbor caches because Squid will not be able to
|
|
locate cache keys for those ICP replies. Thus, if Squid detects a neighbor
|
|
cache that sends zero reqnum's, it
|
|
disables the use of private cache keys.
|
|
|
|
<p>
|
|
Not having private cache keys has some important privacy
|
|
implications. Two users could receive one response that was
|
|
meant for only one of the users. This response could contain
|
|
personal, confidential information. You will need to disable
|
|
the ``zero reqnum'' neighbor if you want Squid to use private
|
|
cache keys.
|
|
|
|
<sect1>What is a half-closed filedescriptor?
|
|
<p>
|
|
TCP allows connections to be in a ``half-closed'' state. This
|
|
is accomplished with the <em/shutdown(2)/ system call. In Squid,
|
|
this means that a client has closed its side of the connection for
|
|
writing, but leaves it open for reading. Half-closed connections
|
|
are tricky because Squid can't tell the difference between a
|
|
half-closed connection, and a fully closed one.
|
|
<p>
|
|
If Squid tries to read a connection, and <em/read()/ returns
|
|
0, and Squid knows that the client doesn't have the whole
|
|
response yet, Squid puts marks the filedescriptor as half-closed.
|
|
Most likely the client has aborted the request and the connection
|
|
is really closed. However, there is a slight chance that
|
|
the client is using the <em/shutdown()/ call, and that it
|
|
can still read the response.
|
|
<p>
|
|
To disable half-closed connections, simply put this in
|
|
squid.conf:
|
|
<verb>
|
|
half_closed_clients off
|
|
</verb>
|
|
Then, Squid will always close its side of the connection
|
|
instead of marking it as half-closed.
|
|
|
|
<sect1>What does --enable-heap-replacement do?
|
|
<p>
|
|
Squid has traditionally used an LRU replacement algorithm. As of
|
|
<url url="/Versions/v2/2.3/" name="version 2.3">, you can use
|
|
some other replacement algorithms by using the <em/--enable-heap-replacement/
|
|
configure option. Currently, the heap replacement code supports two
|
|
additional algorithms: LFUDA, and GDS.
|
|
<p>
|
|
The heap replacement code was contributed by John Dilley and others
|
|
from Hewlett-Packard. Their work is described in these papers:
|
|
<enum>
|
|
<item>
|
|
<url url="http://www.hpl.hp.com/techreports/1999/HPL-1999-69.html"
|
|
name="Enhancement and Validation of Squid's Cache Replacement Policy">
|
|
(HP Tech Report).
|
|
<item>
|
|
<url url="http://workshop.ircache.net/Papers/dilley-abstract.html"
|
|
name="Enhancement and Validation of the Squid Cache Replacement Policy">
|
|
(WCW 1999 paper).
|
|
</enum>
|
|
|
|
<sect1>Why is actual filesystem space used greater than what Squid thinks?
|
|
<p>
|
|
If you compare <em/df/ output and cachemgr <em/storedir/ output,
|
|
you will notice that actual disk usage is greater than what Squid
|
|
reports. This may be due to a number of reasons:
|
|
<itemize>
|
|
<item>
|
|
Squid doesn't keep track of the size of the <em/swap.state/
|
|
file, which normally resides on each <em/cache_dir/.
|
|
<item>
|
|
Directory entries and take up filesystem space.
|
|
<item>
|
|
Other applications might be using the same disk partition.
|
|
<item>
|
|
Your filesystem block size might be larger than what Squid
|
|
thinks. When calculating total disk usage, Squid rounds
|
|
file sizes up to a whole number of 1024 byte blocks. If
|
|
your filesystem uses larger blocks, then some "wasted" space
|
|
is not accounted.
|
|
</itemize>
|
|
|
|
<sect1>How do <em/positive_dns_ttl/ and <em/negative_dns_ttl/ work?
|
|
<p>
|
|
<em/positive_dns_ttl/ is how long Squid caches a successful DNS
|
|
lookup. Similarly, <em/negative_dns_ttl/ is how long Squid caches
|
|
a failed DNS lookup.
|
|
<p>
|
|
<em/positive_dns_ttl/ is not always used. It is NOT used in the following
|
|
cases:
|
|
<itemize>
|
|
<item>Squid-2.3 and later versions with internal DNS lookups. Internal
|
|
lookups are the default for Squid-2.3 and later.
|
|
<item>If you applied the ``DNS TTL'' <ref id="dns-ttl-hack" name="patch">
|
|
for BIND.
|
|
<item>If you are using FreeBSD, then it already has the DNS TTL patch
|
|
built in.
|
|
</itemize>
|
|
|
|
<p>
|
|
Let's say you have the following settings:
|
|
<verb>
|
|
positive_dns_ttl 1 hours
|
|
negative_dns_ttl 1 minutes
|
|
</verb>
|
|
When Squid looks up a name like <em/www.squid-cache.org/, it gets back
|
|
an IP address like 204.144.128.89. The address is cached for the
|
|
next hour. That means, when Squid needs to know the address for
|
|
<em/www.squid-cache.org/ again, it uses the cached answer for the
|
|
next hour. After one hour, the cached information expires, and Squid
|
|
makes a new query for the address of <em/www.squid-cache.org/.
|
|
|
|
<p>
|
|
If you have the DNS TTL patch, or are using internal lookups, then
|
|
each hostname has its own TTL value, which was set by the domain
|
|
name administrator. You can see these values in the 'ipcache'
|
|
cache manager page. For example:
|
|
<verb>
|
|
Hostname Flags lstref TTL N
|
|
www.squid-cache.org C 73043 12784 1( 0) 204.144.128.89-OK
|
|
www.ircache.net C 73812 10891 1( 0) 192.52.106.12-OK
|
|
polygraph.ircache.net C 241768 -181261 1( 0) 192.52.106.12-OK
|
|
</verb>
|
|
The TTL field shows how how many seconds until the entry expires.
|
|
Negative values mean the entry is already expired, and will be refreshed
|
|
upon next use.
|
|
|
|
<p>
|
|
The <em/negative_dns_ttl/ specifies how long to cache failed DNS lookups.
|
|
When Squid fails to resolve a hostname, you can be pretty sure that
|
|
it is a real failure, and you are not likely to get a successful
|
|
answer within a short time period. Squid retries its lookups
|
|
many times before declaring a lookup has failed.
|
|
If you like, you can set <em/negative_dns_ttl/ to zero.
|
|
|
|
<sect1>What does <em>swapin MD5 mismatch</em> mean?
|
|
<p>
|
|
It means that Squid opened up a disk file to serve a cache hit, but
|
|
it found that the stored object doesn't match what the user's request.
|
|
Squid stores the MD5 digest of the URL at the start of each disk file.
|
|
When the file is opened, Squid checks that the disk file MD5 matches the
|
|
MD5 of the URL requested by the user. If they don't match, the warning
|
|
is printed and Squid forwards the request to the origin server.
|
|
<p>
|
|
You do not need to worry about this warning. It means that Squid is
|
|
recovering from a corrupted cache directory.
|
|
|
|
<sect1>What does <em>failed to unpack swapfile meta data</em> mean?
|
|
<p>
|
|
Each of Squid's disk cache files has a metadata section at the beginning.
|
|
This header is used to store the URL MD5, some StoreEntry data, and more.
|
|
When Squid opens a disk file for reading, it looks for the meta data
|
|
header and unpacks it.
|
|
<p>
|
|
This warning means that Squid couln't unpack the meta data. This is
|
|
non-fatal bug, from which Squid can recover. Perhaps
|
|
the meta data was just missing, or perhaps the file got corrupted.
|
|
<p>
|
|
You do not need to worry about this warning. It means that Squid is
|
|
double-checking that the disk file matches what Squid thinks should
|
|
be there, and the check failed. Squid recorvers and generates
|
|
a cache miss in this case.
|
|
|
|
<sect1>Why doesn't Squid make <em/ident/ lookups in interception mode?
|
|
<p>
|
|
Its a side-effect of the way interception proxying works.
|
|
<p>
|
|
When Squid is configured for interception proxying, the operating system
|
|
pretends that it is the origin server. That means that the "local" socket
|
|
address for intercepted TCP
|
|
connections is really the origin server's IP address. If you run
|
|
<em/netstat -n/ on your interception proxy, you'll see a lot of
|
|
foreign IP addresses in the <em/Local Address/ column.
|
|
<p>
|
|
When Squid wants to make an ident query, it creates a new TCP socket
|
|
and <em/binds/ the local endpoint to the same IP address as the
|
|
local end of the client's TCP connection. Since the local address
|
|
isn't really local (its some far away origin server's IP address),
|
|
the <em/bind()/ system call fails. Squid handles this as a failed
|
|
ident lookup.
|
|
<p>
|
|
<it>
|
|
So why bind in that way? If you know you are transparent proxying, then why
|
|
not bind the local endpoint to the host's (intranet) IP address? Why make
|
|
the masses suffer needlessly?
|
|
</it>
|
|
<p>
|
|
Because thats just how ident works.
|
|
Please read <url url="ftp://ftp.isi.edu/in-notes/rfc931.txt" name="RFC 931">,
|
|
in particular the RESTRICTIONS section.
|
|
|
|
<sect1>dnsSubmit: queue overload, rejecting blah
|
|
<p>
|
|
This means that you are using external <em/dnsserver/ processes
|
|
for lookups, and all processes are busy, and Squid's pending queue
|
|
is full. Each <em/dnsserver/ program can only handle one request
|
|
at a time. When all <em/dnsserver/ processes are busy, Squid queues
|
|
up requests, but only to a certain point.
|
|
<p>
|
|
To alleviate this condition, you need to either (1) increase the number
|
|
of <em/dnsserver/ processes by changing the value for <em/dns_children/
|
|
in your config file, or (2) switch to using Squid's internal DNS client
|
|
code.
|
|
<p>
|
|
Note that in some versions, Squid limits <em/dns_children/ to 32. To
|
|
increase it beyond that value, you would have to edit the source code.
|
|
|
|
<sect1>What are FTP passive connections?
|
|
<p>
|
|
by Colin Campbell
|
|
<p>
|
|
Ftp uses two data streams, one for passing commands around, the other for
|
|
moving data. The command channel is handled by the ftpd listening on port
|
|
21.
|
|
<p>
|
|
The data channel varies depending on whether you ask for passive ftp or
|
|
not. When you request data in a non-passive environment, you client tells
|
|
the server ``I am listening on <ip-address> <port>.'' The server then
|
|
connects FROM port 20 to the ip address and port specified by your client.
|
|
This requires your "security device" to permit any host outside from port
|
|
20 to any host inside on any port > 1023. Somewhat of a hole.
|
|
<p>
|
|
In passive mode, when you request a data transfer, the server tells the
|
|
client ``I am listening on <ip address> <port>.'' Your client then connects
|
|
to the server on that IP and port and data flows.
|
|
|
|
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Multicast
|
|
|
|
<sect1>What is Multicast?
|
|
<P>
|
|
Multicast is essentially the ability to send one IP packet to multiple
|
|
receivers. Multicast is often used for audio and video conferencing systems.
|
|
|
|
<P>
|
|
You often hear about <url url="http://www.mbone.com/" name="the Mbone"> in
|
|
reference to Multicast. The Mbone is essentially a ``virtual backbone''
|
|
which exists in the Internet itself. If you want to send and/or receive
|
|
Multicast, you need to be ``on the Mbone.''
|
|
|
|
<sect1>How do I know if I'm on the Mbone?
|
|
|
|
<P>
|
|
One way is to ask someone who manages your network. If your network manager
|
|
doesn't know, or looks at you funny, then you are very likely NOT on the Mbone
|
|
|
|
<P>
|
|
Another way is to use the <em/mtrace/ program, which can be found
|
|
on the <url url="ftp://parcftp.xerox.com/pub/net-research/ipmulti/"
|
|
name="Xerox PARC FTP site">. Mtrace is similar to traceroute. It will
|
|
tell you about the multicast path between your site and another. For example:
|
|
<verb>
|
|
> mtrace mbone.ucar.edu
|
|
mtrace: WARNING: no multicast group specified, so no statistics printed
|
|
Mtrace from 128.117.64.29 to 192.172.226.25 via group 224.2.0.1
|
|
Querying full reverse path... * switching to hop-by-hop:
|
|
0 oceana-ether.nlanr.net (192.172.226.25)
|
|
-1 avidya-ether.nlanr.net (192.172.226.57) DVMRP thresh^ 1
|
|
-2 mbone.sdsc.edu (198.17.46.39) DVMRP thresh^ 1
|
|
-3 * nccosc-mbone.dren.net (138.18.5.224) DVMRP thresh^ 48
|
|
-4 * * FIXW-MBONE.NSN.NASA.GOV (192.203.230.243) PIM/Special thresh^ 64
|
|
-5 dec3800-2-fddi-0.SanFrancisco.mci.net (204.70.158.61) DVMRP thresh^ 64
|
|
-6 dec3800-2-fddi-0.Denver.mci.net (204.70.152.61) DVMRP thresh^ 1
|
|
-7 mbone.ucar.edu (192.52.106.7) DVMRP thresh^ 64
|
|
-8 mbone.ucar.edu (128.117.64.29)
|
|
Round trip time 196 ms; total ttl of 68 required.
|
|
</verb>
|
|
|
|
<P>
|
|
If you think you need to be on the Mbone, this is
|
|
<url url="http://www.mbone.com/mbone/how-to-join.html" name="how you can join">.
|
|
|
|
<sect1>Should I be using Multicast ICP?
|
|
|
|
<P>
|
|
Short answer: No, probably not.
|
|
|
|
<P>
|
|
Reasons why you SHOULD use Multicast:
|
|
<enum>
|
|
<item>
|
|
It reduces the number of times Squid calls <em/sendto()/ to put a UDP
|
|
packet onto the network.
|
|
<item>
|
|
Its trendy and cool to use Multicast.
|
|
</enum>
|
|
|
|
<P>
|
|
Reasons why you SHOULD NOT use Multicast:
|
|
<enum>
|
|
<item>
|
|
Multicast tunnels/configurations/infrastructure are often unstable.
|
|
You may lose multicast connectivity but still have unicast connectivity.
|
|
<item>
|
|
Multicast does not simplify your Squid configuration file. Every trusted
|
|
neighbor cache must still be specified.
|
|
<item>
|
|
Multicast does not reduce the number of ICP replies being sent around.
|
|
It does reduce the number of ICP queries sent, but not the number of replies.
|
|
<item>
|
|
Multicast exposes your cache to some privacy issues. There are no special
|
|
emissions required to join a multicast group. Anyone may join your
|
|
group and eavesdrop on ICP query messages. However, the scope of your
|
|
multicast traffic can be controlled such that it does not exceed certain
|
|
boundaries.
|
|
</enum>
|
|
|
|
<P>
|
|
We only recommend people to use Multicast ICP over network
|
|
infrastructure which they have close control over. In other words, only
|
|
use Multicast over your local area network, or maybe your wide area
|
|
network if you are an ISP. We think it is probably a bad idea to use
|
|
Multicast ICP over congested links or commodity backbones.
|
|
|
|
<sect1>How do I configure Squid to send Multicast ICP queries?
|
|
|
|
<P>
|
|
To configure Squid to send ICP queries to a Multicast address, you
|
|
need to create another neighbour cache entry specified as <em/multicast/.
|
|
For example:
|
|
<verb>
|
|
cache_host 224.9.9.9 multicast 3128 3130 ttl=64
|
|
</verb>
|
|
224.9.9.9 is a sample multicast group address.
|
|
<em/multicast/ indicates that this
|
|
is a special type of neighbour. The HTTP-port argument (3128)
|
|
is ignored for multicast peers, but the ICP-port (3130) is
|
|
very important. The final argument, <em/ttl=64/
|
|
specifies the multicast TTL value for queries sent to this
|
|
address.
|
|
It is probably a good
|
|
idea to increment the minimum TTL by a few to provide a margin
|
|
for error and changing conditions.
|
|
|
|
<P>
|
|
You must also specify which of your neighbours will respond
|
|
to your multicast queries, since it would
|
|
be a bad idea to implicitly trust any ICP reply from an unknown
|
|
address. Note that ICP replies are sent back to <em/unicast/
|
|
addresses; they are NOT multicast, so Squid has no indication
|
|
whether a reply is from a regular query or a multicast
|
|
query. To configure your multicast group neighbours, use the
|
|
<em/cache_host/ directive and the <em/multicast-responder/
|
|
option:
|
|
<verb>
|
|
cache_host cache1 sibling 3128 3130 multicast-responder
|
|
cache_host cache2 sibling 3128 3130 multicast-responder
|
|
</verb>
|
|
Here all fields are relevant. The ICP port number (3130)
|
|
must be the same as in the <em/cache_host/ line defining the
|
|
multicast peer above. The third field must either be
|
|
<em/parent/ or <em/sibling/ to indicate how Squid should treat replies.
|
|
With the <em/multicast-responder/ flag set for a peer,
|
|
Squid will NOT send ICP queries to it directly (i.e. unicast).
|
|
|
|
<sect1>How do I know what Multicast TTL to use?
|
|
|
|
<P>
|
|
The Multicast TTL (which is specified on the <em/cache_host/ line
|
|
of your multicast group) determines how ``far'' your ICP queries
|
|
will go. In the Mbone, there is a certain TTL threshold defined
|
|
for each network interface or tunnel. A multicast packet's TTL must
|
|
be larger than the defined TTL for that packet to be forwarded across
|
|
that link. For example, the <em/mrouted/ manual page recommends:
|
|
<verb>
|
|
32 for links that separate sites within an organization.
|
|
64 for links that separate communities or organizations, and are
|
|
attached to the Internet MBONE.
|
|
128 for links that separate continents on the MBONE.
|
|
</verb>
|
|
|
|
<P>
|
|
A good way to determine the TTL you need is to run <em/mtrace/ as shown above
|
|
and look at the last line. It will show you the minimum TTL required to
|
|
reach the other host.
|
|
|
|
<P>
|
|
If you set you TTL too high, then your ICP messages may travel ``too far''
|
|
and will be subject to eavesdropping by others.
|
|
If you're only using multicast on your LAN, as we suggest, then your TTL will
|
|
be quite small, for example <em/ttl=4/.
|
|
|
|
<sect1>How do I configure Squid to receive and respond to Multicast ICP?
|
|
|
|
<P>
|
|
You must tell Squid to join a multicast group address with the
|
|
<em/mcast_groups/ directive. For example:
|
|
<verb>
|
|
mcast_groups 224.9.9.9
|
|
</verb>
|
|
Of course, all members of your Multicast ICP group will need to use the
|
|
exact same multicast group address.
|
|
|
|
<P>
|
|
<bf/NOTE:/ Choose a multicast group address with care! If two organizations
|
|
happen to choose the same multicast address, then they may find that their
|
|
groups ``overlap'' at some point. This will be especially true if one of the
|
|
querying caches uses a large TTL value. There are two ways to reduce the risk
|
|
of group overlap:
|
|
<enum>
|
|
<item>
|
|
Use a unique group address
|
|
<item>
|
|
Limit the scope of multicast messages with TTLs or administrative scoping.
|
|
</enum>
|
|
|
|
<P>
|
|
Using a unique address is a good idea, but not without some potential
|
|
problems. If you choose an address randomly, how do you know that
|
|
someone else will not also randomly choose the same address? NLANR
|
|
has been assigned a block of multicast addresses by the IANA for use
|
|
in situations such as this. If you would like to be assigned one
|
|
of these addresses, please <url url="mailto:nlanr-cache@nlanr.net"
|
|
name="write to us">. However, note that NLANR or IANA have no
|
|
authority to prevent anyone from using an address assigned to you.
|
|
|
|
<P>
|
|
Limiting the scope of your multicast messages is probably a better
|
|
solution. They can be limited with the TTL value discussed above, or
|
|
with some newer techniques known as administratively scoped
|
|
addresses. Here you can configure well-defined boundaries for the
|
|
traffic to a specific address. The
|
|
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2365.txt" name="Administratively Scoped IP Multicast RFC">
|
|
describes this.
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>System-Dependent Weirdnesses
|
|
|
|
<sect1>Solaris
|
|
|
|
<sect2>select()
|
|
<P>
|
|
<em/select(3c)/ won't handle more than 1024 file descriptors. The
|
|
<em/configure/ script should enable <em/poll()/ by default for
|
|
Solaris. <em/poll()/ allows you to use many more filedescriptors,
|
|
probably 8192 or more.
|
|
|
|
<p>
|
|
For older Squid versions you can enable <em/poll()/
|
|
manually by changing HAVE_POLL in <em>include/autoconf.h</em>, or
|
|
by adding -DUSE_POLL=1 to the DEFINES in src/Makefile.
|
|
|
|
<sect2>malloc
|
|
<P>
|
|
libmalloc.a is leaky. Squid's configure does not use -lmalloc on Solaris.
|
|
|
|
<sect2>DNS lookups and <em/nscd/
|
|
<P>
|
|
by <url url="mailto:david@avarice.nepean.uws.edu.au" name="David J N Begley">.
|
|
<P>
|
|
DNS lookups can be slow because of some mysterious thing called
|
|
<bf/ncsd/. You should edit <em>/etc/nscd.conf</em> and make it say:
|
|
<verb>
|
|
enable-cache hosts no
|
|
</verb>
|
|
<P>
|
|
Apparently nscd serializes DNS queries thus slowing everything down when
|
|
an application (such as Squid) hits the resolver hard. You may notice
|
|
something similar if you run a log processor executing many DNS resolver
|
|
queries - the resolver starts to slow.. right.. down.. . . .
|
|
|
|
<p>
|
|
According to
|
|
<url url="mailto:andre at online dot ee" name="Andres Kroonmaa">,
|
|
users of Solaris starting from version 2.6 and up should NOT
|
|
completely disable <em/nscd/ daemon. <em/nscd/ should be running and
|
|
caching passwd and group files, although it is suggested to
|
|
disable hosts caching as it may interfere with DNS lookups.
|
|
|
|
<p>
|
|
Several library calls rely on available free FILE descriptors
|
|
FD < 256. Systems running without nscd may fail on such calls
|
|
if first 256 files are all in use.
|
|
|
|
<p>
|
|
Since solaris 2.6 Sun has changed the way some system calls
|
|
work and is using <em/nscd/ daemon as a implementor of them. To
|
|
communicate to <em/nscd/ Solaris is using undocumented door calls.
|
|
Basically <em/nscd/ is used to reduce memory usage of user-space
|
|
system libraries that use passwd and group files. Before 2.6
|
|
Solaris cached full passwd file in library memory on the first
|
|
use but as this was considered to use up too much ram on large
|
|
multiuser systems Sun has decided to move implementation of
|
|
these calls out of libraries and to a single dedicated daemon.
|
|
|
|
<sect2>DNS lookups and <em>/etc/nsswitch.conf</em>
|
|
<P>
|
|
by <url url="mailto:ARMISTEJ@oeca.otis.com" name="Jason Armistead">.
|
|
<P>
|
|
The <em>/etc/nsswitch.conf</em> file determines the order of searches
|
|
for lookups (amongst other things). You might only have it set up to
|
|
allow NIS and HOSTS files to work. You definitely want the "hosts:"
|
|
line to include the word <em/dns/, e.g.:
|
|
<verb>
|
|
hosts: nis dns [NOTFOUND=return] files
|
|
</verb>
|
|
|
|
<sect2>DNS lookups and NIS
|
|
<P>
|
|
by <url url="mailto:cudch@csv.warwick.ac.uk" name="Chris Tilbury">.
|
|
|
|
<P>
|
|
Our site cache is running on a Solaris 2.6 machine. We use NIS to distribute
|
|
authentication and local hosts information around and in common with our
|
|
multiuser systems, we run a slave NIS server on it to help the response of
|
|
NIS queries.
|
|
|
|
<P>
|
|
We were seeing very high name-ip lookup times (avg ˜2sec)
|
|
and ip->name lookup times (avg ˜8 sec), although there didn't
|
|
seem to be that much of a problem with response times for valid
|
|
sites until the cache was being placed under high load. Then,
|
|
performance went down the toilet.
|
|
|
|
<P>
|
|
After some time, and a bit of detective work, we found the problem.
|
|
On Solaris 2.6, if you have a local NIS server running (<em/ypserv/)
|
|
and you have NIS in your <em>/etc/nsswitch.conf</em> hosts entry,
|
|
then check the flags it is being started with. The 2.6 ypstart
|
|
script checks to see if there is a <em/resolv.conf/ file present
|
|
when it starts ypserv. If there is, then it starts it with the
|
|
<em/-d/ option.
|
|
|
|
<P>
|
|
This has the same effect as putting the <em/YP_INTERDOMAIN/ key in
|
|
the hosts table -- namely, that failed NIS host lookups are tried
|
|
against the DNS by the NIS server.
|
|
|
|
<P>
|
|
This is a <bf/bad thing(tm)/! If NIS itself tries to resolve names
|
|
using the DNS, then the requests are serialised through the NIS
|
|
server, creating a bottleneck (This is the same basic problem that
|
|
is seen with <em/nscd/). Thus, one failing or slow lookup can, if
|
|
you have NIS before DNS in the service switch file (which is the
|
|
most common setup), hold up every other lookup taking place.
|
|
|
|
<P>
|
|
If you're running in this kind of setup, then you will want to make
|
|
sure that
|
|
|
|
<enum>
|
|
<item>ypserv doesn't start with the <em/-d/ flag.
|
|
<item>you don't have the <em/YP_INTERDOMAIN/ key in the hosts table
|
|
(find the <em/B=-b/ line in the yp Makefile and change it to <em/B=/)
|
|
</enum>
|
|
|
|
<P>
|
|
We changed these here, and saw our average lookup times drop by up
|
|
to an order of magnitude (˜150msec for name-ip queries and
|
|
˜1.5sec for ip-name queries, the latter still so high, I
|
|
suspect, because more of these fail and timeout since they are not
|
|
made so often and the entries are frequently non-existent anyway).
|
|
|
|
<sect2>Tuning
|
|
<P>
|
|
<url url="http://www.rvs.uni-hannover.de/people/voeckler/tune/EN/tune.html"
|
|
name="Solaris 2.x - tuning your TCP/IP stack and more"> by <url
|
|
url="http://www.rvs.uni-hannover.de/people/voeckler/" name="Jens-S.
|
|
Vckler">
|
|
|
|
<sect2>disk write error: (28) No space left on device
|
|
<P>
|
|
You might get this error even if your disk is not full, and is not out
|
|
of inodes. Check your syslog logs (/var/adm/messages, normally) for
|
|
messages like either of these:
|
|
<verb>
|
|
NOTICE: realloccg /proxy/cache: file system full
|
|
NOTICE: alloc: /proxy/cache: file system full
|
|
</verb>
|
|
|
|
<P>
|
|
In a nutshell, the UFS filesystem used by Solaris can't cope with the
|
|
workload squid presents to it very well. The filesystem will end up
|
|
becoming highly fragmented, until it reaches a point where there are
|
|
insufficient free blocks left to create files with, and only fragments
|
|
available. At this point, you'll get this error and squid will revise
|
|
its idea of how much space is actually available to it. You can do a
|
|
"fsck -n raw_device" (no need to unmount, this checks in read only mode)
|
|
to look at the fragmentation level of the filesystem. It will probably
|
|
be quite high (>15%).
|
|
|
|
<P>
|
|
Sun suggest two solutions to this problem. One costs money, the other is
|
|
free but may result in a loss of performance (although Sun do claim it
|
|
shouldn't, given the already highly random nature of squid disk access).
|
|
|
|
<P>
|
|
The first is to buy a copy of VxFS, the Veritas Filesystem. This is an
|
|
extent-based filesystem and it's capable of having online defragmentation
|
|
performed on mounted filesystems. This costs money, however (VxFS is not
|
|
very cheap!)
|
|
|
|
<P>
|
|
The second is to change certain parameters of the UFS filesystem. Unmount
|
|
your cache filesystems and use tunefs to change optimization to "space" and
|
|
to reduce the "minfree" value to 3-5% (under Solaris 2.6 and higher, very
|
|
large filesystems will almost certainly have a minfree of 2% already and you
|
|
shouldn't increase this). You should be able to get fragmentation down to
|
|
around 3% by doing this, with an accompanied increase in the amount of space
|
|
available.
|
|
|
|
<P>
|
|
Thanks to <url url="mailto:cudch@csv.warwick.ac.uk" name="Chris Tilbury">.
|
|
|
|
<sect2>Solaris X86 and IPFilter
|
|
<P>
|
|
by <url url="mailto:jeff@sisna.com" name="Jeff Madison">
|
|
<P>
|
|
Important update regarding Squid running on Solaris x86. I have been
|
|
working for several months to resolve what appeared to be a memory leak in
|
|
squid when running on Solaris x86 regardless of the malloc that was used. I
|
|
have made 2 discoveries that anyone running Squid on this platform may be
|
|
interested in.
|
|
<P>
|
|
Number 1: There is not a memory leak in Squid even though after the system
|
|
runs for some amount of time, this varies depending on the load the system
|
|
is under, Top reports that there is very little memory free. True to the
|
|
claims of the Sun engineer I spoke to this statistic from Top is incorrect.
|
|
The odd thing is that you do begin to see performance suffer substantially
|
|
as time goes on and the only way to correct the situation is to reboot the
|
|
system. This leads me to discovery number 2.
|
|
<P>
|
|
Number 2: There is some type of resource problem, memory or other, with
|
|
IPFilter on Solaris x86. I have not taken the time to investigate what the
|
|
problem is because we no longer are using IPFilter. We have switched to a
|
|
Alteon ACE 180 Gigabit switch which will do the trans-proxy for you. After
|
|
moving the trans-proxy, redirection process out to the Alteon switch Squid
|
|
has run for 3 days strait under a huge load with no problem what so ever.
|
|
We currently have 2 boxes with 40 GB of cached objects on each box. This 40
|
|
GB was accumulated in the 3 days, from this you can see what type of load
|
|
these boxes are under. Prior to this change we were never able to operate
|
|
for more than 4 hours.
|
|
<P>
|
|
Because the problem appears to be with IPFilter I would guess that you
|
|
would only run into this issue if you are trying to run Squid as a
|
|
transparent proxy using IPFilter. That makes sense. If there is anyone
|
|
with information that would indicate my finding are incorrect I am willing
|
|
to investigate further.
|
|
|
|
<sect2>Changing the directory lookup cache size
|
|
<P>
|
|
by <url url="mailto:mbatchelor@citysearch.com" name="Mike Batchelor">
|
|
<P>
|
|
On Solaris, the kernel variable for the directory name lookup cache size is
|
|
<em>ncsize</em>. In <em>/etc/system</em>, you might want to try
|
|
<verb>
|
|
set ncsize = 8192
|
|
</verb>
|
|
or even
|
|
higher. The kernel variable <em/ufs_inode/ - which is the size of the inode
|
|
cache itself - scales with <em/ncsize/ in Solaris 2.5.1 and later. Previous
|
|
versions of Solaris required both to be adjusted independently, but now, it is
|
|
not recommended to adjust <em/ufs_inode/ directly on 2.5.1 and later.
|
|
<P>
|
|
You can set <em/ncsize/ quite high, but at some point - dependent on the
|
|
application - a too-large <em/ncsize/ will increase the latency of lookups.
|
|
<P>
|
|
Defaults are:
|
|
<verb>
|
|
Solaris 2.5.1 : (max_nprocs + 16 + maxusers) + 64
|
|
Solaris 2.6/Solaris 7 : 4 * (max_nprocs + maxusers) + 320
|
|
</verb>
|
|
|
|
<sect2>The priority_paging algorithm
|
|
<P>
|
|
by <url url="mailto:mbatchelor@citysearch.com" name="Mike Batchelor">
|
|
<P>
|
|
Another new tuneable (actually a toggle) in Solaris 2.5.1, 2.6 or Solaris 7 is
|
|
the <em/priority_paging/ algorithm. This is actually a complete rewrite of the
|
|
virtual memory system on Solaris. It will page out application data last, and
|
|
filesystem pages first, if you turn it on (set <em/priority_paging/ = 1 in
|
|
<em>/etc/system</em>). As you may know, the Solaris buffer cache grows to fill
|
|
available pages, and under the old VM system, applications could get paged out
|
|
to make way for the buffer cache, which can lead to swap thrashing and
|
|
degraded application performance. The new <em/priority_paging/ helps keep
|
|
application and shared library pages in memory, preventing the buffer cache
|
|
from paging them out, until memory gets REALLY short. Solaris 2.5.1 requires
|
|
patch 103640-25 or higher and Solaris 2.6 requires 105181-10 or higher to get
|
|
priority_paging. Solaris 7 needs no patch, but all versions have it turned
|
|
off by default.
|
|
|
|
<sect1>FreeBSD
|
|
|
|
<sect2>T/TCP bugs
|
|
<P>
|
|
We have found that with FreeBSD-2.2.2-RELEASE, there some bugs with T/TCP. FreeBSD will
|
|
try to use T/TCP if you've enabled the ``TCP Extensions.'' To disable T/TCP,
|
|
use <em/sysinstall/ to disable TCP Extensions,
|
|
or edit <em>/etc/rc.conf</em> and set
|
|
<verb>
|
|
tcp_extensions="NO" # Allow RFC1323 & RFC1544 extensions (or NO).
|
|
</verb>
|
|
or add this to your /etc/rc files:
|
|
<verb>
|
|
sysctl -w net.inet.tcp.rfc1644=0
|
|
</verb>
|
|
|
|
<sect2>mbuf size
|
|
<P>
|
|
We noticed an odd thing with some of Squid's interprocess communication.
|
|
Often, output from the <em/dnsserver/ processes would NOT be read in
|
|
one chunk. With full debugging, it looks like this:
|
|
|
|
<verb>
|
|
1998/04/02 15:18:48| comm_select: FD 46 ready for reading
|
|
1998/04/02 15:18:48| ipcache_dnsHandleRead: Result from DNS ID 2 (100 bytes)
|
|
1998/04/02 15:18:48| ipcache_dnsHandleRead: Incomplete reply
|
|
....other processing occurs...
|
|
1998/04/02 15:18:48| comm_select: FD 46 ready for reading
|
|
1998/04/02 15:18:48| ipcache_dnsHandleRead: Result from DNS ID 2 (9 bytes)
|
|
1998/04/02 15:18:48| ipcache_parsebuffer: parsing:
|
|
$name www.karup.com
|
|
$h_name www.karup.inter.net
|
|
$h_len 4
|
|
$ipcount 2
|
|
38.15.68.128
|
|
38.15.67.128
|
|
$ttl 2348
|
|
$end
|
|
</verb>
|
|
|
|
Interestingly, it is very common to get only 100 bytes on the first
|
|
read. When two read() calls are required, this adds additional latency
|
|
to the overall request. On our caches running Digital Unix, the median
|
|
<em/dnsserver/ response time was measured at 0.01 seconds. On our
|
|
FreeBSD cache, however, the median latency was 0.10 seconds.
|
|
|
|
<P>
|
|
Here is a simple patch to fix the bug:
|
|
<verb>
|
|
===================================================================
|
|
RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v
|
|
retrieving revision 1.40
|
|
retrieving revision 1.41
|
|
diff -p -u -r1.40 -r1.41
|
|
--- src/sys/kern/uipc_socket.c 1998/05/15 20:11:30 1.40
|
|
+++ /home/ncvs/src/sys/kern/uipc_socket.c 1998/07/06 19:27:14 1.41
|
|
@@ -31,7 +31,7 @@
|
|
* SUCH DAMAGE.
|
|
*
|
|
* @(#)uipc_socket.c 8.3 (Berkeley) 4/15/94
|
|
- * $Id: FAQ.sgml,v 1.3 2004/09/09 12:36:55 cvsdist Exp $
|
|
+ * $Id: FAQ.sgml,v 1.3 2004/09/09 12:36:55 cvsdist Exp $
|
|
*/
|
|
|
|
#include <sys/param.h>
|
|
@@ -491,6 +491,7 @@ restart:
|
|
mlen = MCLBYTES;
|
|
len = min(min(mlen, resid), space);
|
|
} else {
|
|
+ atomic = 1;
|
|
nopages:
|
|
len = min(min(mlen, resid), space);
|
|
/*
|
|
</verb>
|
|
|
|
|
|
<P>
|
|
Another technique which may help, but does not fix the bug, is to
|
|
increase the kernel's mbuf size.
|
|
The default is 128 bytes. The MSIZE symbol is defined in
|
|
<em>/usr/include/machine/param.h</em>. However, to change it we added
|
|
this line to our kernel configuration file:
|
|
<verb>
|
|
options MSIZE="256"
|
|
</verb>
|
|
|
|
<sect2>Dealing with NIS
|
|
<P>
|
|
<em>/var/yp/Makefile</em> has the following section:
|
|
<verb>
|
|
# The following line encodes the YP_INTERDOMAIN key into the hosts.byname
|
|
# and hosts.byaddr maps so that ypserv(8) will do DNS lookups to resolve
|
|
# hosts not in the current domain. Commenting this line out will disable
|
|
# the DNS lookups.
|
|
B=-b
|
|
</verb>
|
|
You will want to comment out the <em/B=-b/ line so that <em/ypserv/ does not
|
|
do DNS lookups.
|
|
|
|
<sect2>FreeBSD 3.3: The lo0 (loop-back) device is not configured on startup
|
|
<label id="freebsd-no-lo0">
|
|
<p>
|
|
Squid requires a the loopback interface to be up and configured. If it is not, you will
|
|
get errors such as <ref id="comm-bind-loopback-fail" name="commBind">.
|
|
<p>
|
|
From <url url="http://www.freebsd.org/releases/3.3R/errata.html" name="FreeBSD 3.3 Errata Notes">:
|
|
<p>
|
|
<quote>
|
|
Fix: Assuming that you experience this problem at all, edit <em>/etc/rc.conf</em>
|
|
and search for where the network_interfaces variable is set. In
|
|
its value, change the word <em/auto/ to <em/lo0/ since the auto keyword
|
|
doesn't bring the loop-back device up properly, for reasons yet to
|
|
be adequately determined. Since your other interface(s) will already
|
|
be set in the network_interfaces variable after initial installation,
|
|
it's reasonable to simply s/auto/lo0/ in rc.conf and move on.
|
|
</quote>
|
|
<p>
|
|
Thanks to <url url="mailto:robl at lentil dot org" name="Robert Lister">.
|
|
|
|
|
|
<sect2>FreeBSD 3.x or newer: Speed up disk writes using Softupdates
|
|
<label id="freebsd-softupdates">
|
|
<p>
|
|
by <url url="mailto:andre.albsmeier@mchp.siemens.de" name="Andre Albsmeier">
|
|
|
|
<p>
|
|
FreeBSD 3.x and newer support Softupdates. This is a mechanism to
|
|
speed up disk writes as it is possible by mounting ufs volumes
|
|
async. However, Softupdates does this in a way that a performance
|
|
similar or better than async is achieved but without loosing security
|
|
in a case of a system crash. For more detailed information and the
|
|
copyright terms see <em>/sys/contrib/softupdates/README</em> and
|
|
<em>/sys/ufs/ffs/README.softupdate</em>.
|
|
|
|
<p>
|
|
To build a system supporting softupdates, you have to build
|
|
a kernel with <tt>options SOFTUPDATES</tt> set (see <em/LINT/ for a commented
|
|
out example). After rebooting with the new kernel, you can enable
|
|
softupdates on a per filesystem base with the command:
|
|
<verb>
|
|
$ tunefs -n /mountpoint
|
|
</verb>
|
|
The filesystem in question MUST NOT be mounted at
|
|
this time. After that, softupdates are permanently enabled and the
|
|
filesystem can be mounted normally. To verify that the softupdates
|
|
code is running, simply issue a mount command and an output similar
|
|
to the following will appear:
|
|
<verb>
|
|
$ mount
|
|
/dev/da2a on /usr/local/squid/cache (ufs, local, noatime, soft-updates, writes: sync 70 async 225)
|
|
</verb>
|
|
|
|
<sect1>OSF1/3.2
|
|
|
|
<P>
|
|
If you compile both libgnumalloc.a and Squid with <em/cc/, the <em/mstats()/
|
|
function returns bogus values. However, if you compile libgnumalloc.a with
|
|
<em/gcc/, and Squid with <em/cc/, the values are correct.
|
|
|
|
<sect1>BSD/OS
|
|
<sect2>gcc/yacc
|
|
<P>
|
|
Some people report
|
|
<ref id="bsdi-compile" name="difficulties compiling squid on BSD/OS">.
|
|
|
|
<sect2>process priority
|
|
<P>
|
|
<it>
|
|
I've noticed that my Squid process
|
|
seems to stick at a nice value of four, and clicks back to that even
|
|
after I renice it to a higher priority. However, looking through the
|
|
Squid source, I can't find any instance of a setpriority() call, or
|
|
anything else that would seem to indicate Squid's adjusting its own
|
|
priority.
|
|
</it>
|
|
<P>
|
|
by <url url="mailto:bogstad@pobox.com" name="Bill Bogstad">
|
|
<P>
|
|
BSD Unices traditionally have auto-niced non-root processes to 4 after
|
|
they used alot (4 minutes???) of CPU time. My guess is that it's the BSD/OS
|
|
not Squid that is doing this. I don't know offhand if there is a way to
|
|
disable this on BSD/OS.
|
|
<P>
|
|
by <url url="mailto:Arjan.deVet@adv.iae.nl" name="Arjan de Vet">
|
|
<P>
|
|
You can get around this by
|
|
starting Squid with nice-level -4 (or another negative value).
|
|
<p>
|
|
by <url url="mailto:bert_driehuis at nl dot compuware dot com" name="Bert Driehuis">
|
|
<p>
|
|
The autonice behavior is a leftover from the history of BSD as a
|
|
university OS. It penalises CPU bound jobs by nicing them after using 600
|
|
CPU seconds.
|
|
Adding
|
|
<verb>
|
|
sysctl -w kern.autonicetime=0
|
|
</verb>
|
|
to <em>/etc/rc.local</em> will disable the behavior systemwide.
|
|
|
|
|
|
|
|
<sect1>Linux
|
|
|
|
<sect2>Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address
|
|
|
|
<P>
|
|
Try a different version of Linux. We have received many reports of this
|
|
``bug'' from people running Linux 2.0.30. The <em/bind(2)/ system
|
|
call should NEVER give this error when binding to port 0.
|
|
|
|
<sect2>FATAL: Don't run Squid as root, set 'cache_effective_user'!
|
|
<P>
|
|
Some users have reported that setting <tt/cache_effective_user/
|
|
to <tt/nobody/ under Linux does not work.
|
|
However, it appears that using any <tt/cache_effective_user/ other
|
|
than <tt/nobody/ will succeed. One solution is to create a
|
|
user account for Squid and set <tt/cache_effective_user/ to that.
|
|
Alternately you can change the UID for the <tt/nobody/ account
|
|
from 65535 to 65534.
|
|
<P>
|
|
Another problem is that RedHat 5.0 Linux seems to have a broken
|
|
<em/setresuid()/ function. There are two ways to fix this.
|
|
Before running configure:
|
|
<verb>
|
|
% setenv ac_cv_func_setresuid no
|
|
% ./configure ...
|
|
% make clean
|
|
% make install
|
|
</verb>
|
|
Or after running configure, manually edit include/autoconf.h and
|
|
change the HAVE_SETRESUID line to:
|
|
<verb>
|
|
#define HAVE_SETRESUID 0
|
|
</verb>
|
|
|
|
<P>
|
|
Also, some users report this error is due to a NIS configuration
|
|
problem. By adding <em/compat/ to the <em/passwd/ and <em/group/
|
|
lines of <em>/etc/nsswitch.conf</em>, the problem goes away.
|
|
(<url url="mailto:acli@ada.ddns.org" name="Ambrose Li">).
|
|
|
|
<P>
|
|
<URL URL="mailto:galifrey@crown.net" name="Russ Mellon"> notes
|
|
that these problems with <em/cache_effective_user/ are fixed in
|
|
version 2.2.x of the Linux kernel.
|
|
|
|
<sect2>Large ACL lists make Squid slow
|
|
<P>
|
|
The regular expression library which comes with Linux is known
|
|
to be very slow. Some people report it entirely fails to work
|
|
after long periods of time.
|
|
|
|
<P>
|
|
To fix, use the GNUregex library included with the Squid source code.
|
|
With Squid-2, use the <em/--enable-gnuregex/ configure option.
|
|
|
|
<sect2>gethostbyname() leaks memory in RedHat 6.0 with glibc 2.1.1.
|
|
<p>
|
|
by <url url="mailto:radu at netsoft dot ro" name="Radu Greab">
|
|
<p>
|
|
The gethostbyname() function leaks memory in RedHat
|
|
6.0 with glibc 2.1.1. The quick fix is to delete nisplus service from
|
|
hosts entry in <em>/etc/nsswitch.conf</em>. In my tests dnsserver memory use
|
|
remained stable after I made the above change.
|
|
|
|
<p>
|
|
See <url url="http://developer.redhat.com/bugzilla/show_bug.cgi?id=3919" name="RedHat bug id 3919">.
|
|
|
|
<sect2>assertion failed: StatHist.c:91: `statHistBin(H, max) == H->capacity - 1' on Alpha system.
|
|
<p>
|
|
by <url url="mailto:jraymond@gnu.org" name="Jamie Raymond">
|
|
<p>
|
|
Some early versions of Linux have a kernel bug that causes this.
|
|
All that is needed is a recent kernel that doesn't have the mentioned bug.
|
|
|
|
<sect2>tools.c:605: storage size of `rl' isn't known
|
|
<p>
|
|
This is a bug with some versions of glibc. The glibc headers
|
|
incorrectly depended on the contents of some kernel headers.
|
|
Everything broke down when the kernel folks rearranged a bit in
|
|
the kernel-specific header files.
|
|
<p>
|
|
We think this glibc bug is present in versions
|
|
2.1.1 (or 2.1.0) and earlier. There are two solutions:
|
|
<enum>
|
|
<item>
|
|
Make sure /usr/include/linux and /usr/include/asm are from the kernel
|
|
version glibc is build/configured for, not any other kernel version.
|
|
Only compiling of loadable kernel modules outside of the kernel sources
|
|
depends on having the current versions of these, and for such builds
|
|
-I/usr/src/linux/include (or where ever the new kernel headers are
|
|
located) can be used to resolve the matter.
|
|
|
|
<item>
|
|
Upgrade glibc to 2.1.2 or later. This is always a good idea anyway,
|
|
provided a prebuilt upgrade package exists for the Linux distribution
|
|
used.. Note: Do not attempt to manually build and install glibc from
|
|
source unless you know exactly what you are doing, as this can easily
|
|
render the system unuseable.
|
|
</enum>
|
|
|
|
<sect2>Can't connect to some sites through Squid
|
|
<p>
|
|
When using Squid, some sites may give erorrs such as
|
|
``(111) Connection refused'' or ``(110) Connection timed out''
|
|
although these sites work fine without going through Squid.
|
|
<p>
|
|
Some versions of linux implement
|
|
<url url="ftp://ftp.isi.edu/in-notes/rfc2481.txt" name="Explicit
|
|
Congestion Notification"> (ECN) and this can cause
|
|
some TCP connections to fail. You can disable ECN with
|
|
the following command:
|
|
<verb>
|
|
echo 0 >/proc/sys/net/ipv4/tcp_ecn
|
|
</verb>
|
|
<p>
|
|
See also the <url url="http://answerpointe.cctec.com/maillists/nanog/historical/0104/msg00714.html" name="thread on the NANOG mailing list">.
|
|
|
|
|
|
<sect1>HP-UX
|
|
|
|
<sect2>StatHist.c:74: failed assertion `statHistBin(H, min) == 0'
|
|
<P>
|
|
This was a very mysterious and unexplainable bug with GCC on HP-UX.
|
|
Certain functions, when specified as <em/static/, would cause
|
|
math bugs. The compiler also failed to handle implied
|
|
int-double conversions properly. These bugs should all be
|
|
handled correctly in Squid version 2.2.
|
|
|
|
<sect1>IRIX
|
|
|
|
<sect2><em/dnsserver/ always returns 255.255.255.255
|
|
<P>
|
|
There is a problem with GCC (2.8.1 at least) on
|
|
Irix 6 which causes it to always return the string 255.255.255.255 for _ANY_
|
|
address when calling inet_ntoa(). If this happens to you, compile Squid
|
|
with the native C compiler instead of GCC.
|
|
|
|
<sect1>SCO-UNIX
|
|
<P>
|
|
by <url url="mailto:f.j.bosscha@nhl.nl" name="F.J. Bosscha">
|
|
<P>
|
|
To make squid run comfortable on SCO-unix you need to do the following:
|
|
<P>
|
|
Increase the <em/NOFILES/ paramater and the <em/NUMSP/ parameter and compile squid
|
|
with I had, although squid told in the cache.log file he had 3000
|
|
filedescriptors, problems with the messages that there were no
|
|
filedescriptors more available. After I increase also the NUMSP value
|
|
the problems were gone.
|
|
|
|
<P>
|
|
One thing left is the number of tcp-connections the system can handle.
|
|
Default is 256, but I increase that as well because of the number of
|
|
clients we have.
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Redirectors
|
|
|
|
<sect1>What is a redirector?
|
|
|
|
<P>
|
|
Squid has the ability to rewrite requested URLs. Implemented
|
|
as an external process (similar to a dnsserver), Squid can be
|
|
configured to pass every incoming URL through a <em/redirector/ process
|
|
that returns either a new URL, or a blank line to indicate no change.
|
|
|
|
<P>
|
|
The <em/redirector/ program is <bf/NOT/ a standard part of the Squid
|
|
package. However, some examples are provided below, and in the
|
|
"contrib/" directory of the source distribution. Since everyone has
|
|
different needs, it is up to the individual administrators to write
|
|
their own implementation.
|
|
|
|
<sect1>Why use a redirector?
|
|
|
|
<P>
|
|
A redirector allows the administrator to control the locations to which
|
|
his users goto. Using this in conjunction with transparent proxies
|
|
allows simple but effective porn control.
|
|
|
|
<sect1>How does it work?
|
|
|
|
<P>
|
|
The redirector program must read URLs (one per line) on standard input,
|
|
and write rewritten URLs or blank lines on standard output. Note that
|
|
the redirector program can not use buffered I/O. Squid writes
|
|
additional information after the URL which a redirector can use to make
|
|
a decision. The input line consists of four fields:
|
|
<verb>
|
|
URL ip-address/fqdn ident method
|
|
</verb>
|
|
|
|
|
|
<P>Do you have any examples?
|
|
|
|
<P>
|
|
A simple very fast redirector called <url
|
|
url="http://www.senet.com.au/squirm/" name="SQUIRM"> is a good place to
|
|
start, it uses the regex lib to allow pattern matching.
|
|
|
|
<P>
|
|
Also see <url url="http://ivs.cs.uni-magdeburg.de/%7eelkner/webtools/jesred/"
|
|
name="jesred">.
|
|
|
|
<P>
|
|
The following Perl script may also be used as a template for writing
|
|
your own redirector:
|
|
<verb>
|
|
#!/usr/local/bin/perl
|
|
$|=1;
|
|
while (<>) {
|
|
s@http://fromhost.com@http://tohost.org@;
|
|
print;
|
|
}
|
|
</verb>
|
|
|
|
|
|
<sect1>Can I use the redirector to return HTTP redirect messages?
|
|
<P>
|
|
Normally, the <em/redirector/ feature is used to rewrite requested URLs.
|
|
Squid then transparently requests the new URL. However, in some situations,
|
|
it may be desirable to return an HTTP "301" or "302" redirect message
|
|
to the client. This is now possible with Squid version 1.1.19.
|
|
|
|
<P>
|
|
Simply modify your redirector program to append either "301:" or "302:"
|
|
before the new URL. For example, the following script might be used
|
|
to direct external clients to a secure Web server for internal documents:
|
|
<verb>
|
|
#!/usr/local/bin/perl
|
|
$|=1;
|
|
while (<>) {
|
|
@X = split;
|
|
$url = $X[0];
|
|
if ($url =~ /^http:\/\/internal\.foo\.com/) {
|
|
$url =~ s/^http/https/;
|
|
$url =~ s/internal/secure/;
|
|
print "302:$url\n";
|
|
} else {
|
|
print "$url\n";
|
|
}
|
|
}
|
|
</verb>
|
|
|
|
<P>
|
|
Please see sections 10.3.2 and 10.3.3 of
|
|
<url url="http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2068.txt"
|
|
name="RFC 2068">
|
|
for an explanation of the 301 and 302 HTTP reply codes.
|
|
|
|
<sect1>FATAL: All redirectors have exited!
|
|
<label id="redirectors-exit">
|
|
|
|
<P>
|
|
A redirector process must <bf/never/ exit (stop running). If you see
|
|
the ``All redirectories have exited'' message, it probably means your
|
|
redirector program has a bug. Maybe it runs out of memory or has memory
|
|
access errors. You may want to test your redirector program outside of
|
|
squid with a big input list, taken from your <em/access.log/ perhaps.
|
|
Also, check for <ref id="coredumps" name="coredump"> files from the redirector program.
|
|
|
|
<sect1>Redirector interface is broken re IDENT values
|
|
<p>
|
|
<it>
|
|
I added a redirctor consisting of
|
|
</it>
|
|
<verb>
|
|
#! /bin/sh
|
|
/usr/bin/tee /tmp/squid.log
|
|
</verb>
|
|
<it>
|
|
and many of the redirector requests don't have a username in the
|
|
ident field.
|
|
</it>
|
|
|
|
<p>
|
|
Squid does not delay a request to wait for an ident lookup,
|
|
unless you use the ident ACLs. Thus, it is very likely that
|
|
the ident was not available at the time of calling the redirector,
|
|
but became available by the time the request is complete and
|
|
logged to access.log.
|
|
<p>
|
|
If you want to block requests waiting for ident lookup, try something
|
|
like this:
|
|
<verb>
|
|
acl foo ident REQUIRED
|
|
http_access allow foo
|
|
</verb>
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Cache Digests
|
|
<label id="cache-digests">
|
|
|
|
<P>
|
|
<EM>Cache Digest FAQs compiled by
|
|
<URL url="mailto:ndoherty@eei.ericsson.se" name="Niall Doherty">.
|
|
</EM>
|
|
|
|
<SECT1>What is a Cache Digest?
|
|
|
|
<P>
|
|
A Cache Digest is a summary of the contents of an Internet Object Caching
|
|
Server.
|
|
It contains, in a compact (i.e. compressed) format, an indication of whether
|
|
or not particular URLs are in the cache.
|
|
|
|
<P>
|
|
A "lossy" technique is used for compression, which means that very high
|
|
compression factors can be achieved at the expense of not having 100%
|
|
correct information.
|
|
|
|
|
|
<SECT1>How and why are they used?
|
|
|
|
<P>
|
|
Cache servers periodically exchange their digests with each other.
|
|
|
|
<P>
|
|
When a request for an object (URL) is received from a client a cache
|
|
can use digests from its peers to find out which of its peers (if any)
|
|
have that object.
|
|
The cache can then request the object from the closest peer (Squid
|
|
uses the NetDB database to determine this).
|
|
|
|
<P>
|
|
Note that Squid will only make digest queries in those digests that are
|
|
<EM>enabled</EM>.
|
|
It will disable a peers digest IFF it cannot fetch a valid digest
|
|
for that peer.
|
|
It will enable that peers digest again when a valid one is fetched.
|
|
|
|
<P>
|
|
The checks in the digest are very fast and they eliminate the need
|
|
for per-request queries to peers. Hence:
|
|
|
|
<P>
|
|
<ITEMIZE>
|
|
<ITEM>Latency is eliminated and client response time should be improved.
|
|
<ITEM>Network utilisation may be improved.
|
|
</ITEMIZE>
|
|
|
|
<P>
|
|
Note that the use of Cache Digests (for querying the cache contents of peers)
|
|
and the generation of a Cache Digest (for retrieval by peers) are independent.
|
|
So, it is possible for a cache to make a digest available for peers, and not
|
|
use the functionality itself and vice versa.
|
|
|
|
|
|
<SECT1>What is the theory behind Cache Digests?
|
|
|
|
<P>
|
|
Cache Digests are based on Bloom Filters - they are a method for
|
|
representing a set of keys with lookup capabilities;
|
|
where lookup means "is the key in the filter or not?".
|
|
|
|
<P>
|
|
In building a cache digest:
|
|
|
|
<P>
|
|
<ITEMIZE>
|
|
<ITEM> A vector (1-dimensional array) of m bits is allocated, with all
|
|
bits initially set to 0.
|
|
<ITEM> A number, k, of independent hash functions are chosen, h1, h2,
|
|
..., hk, with range { 1, ..., m }
|
|
(i.e. a key hashed with any of these functions gives a value between 1
|
|
and m inclusive).
|
|
<ITEM> The set of n keys to be operated on are denoted by:
|
|
A = { a1, a2, a3, ..., an }.
|
|
</ITEMIZE>
|
|
|
|
|
|
<SECT2>Adding a Key
|
|
|
|
<P>
|
|
To add a key the value of each hash function for that key is calculated.
|
|
So, if the key was denoted by <EM>a</EM>, then h1(a), h2(a), ...,
|
|
hk(a) are calculated.
|
|
|
|
<P>
|
|
The value of each hash function for that key represents an index into
|
|
the array and the corresponding bits are set to 1. So, a digest with
|
|
6 hash functions would have 6 bits to be set to 1 for each key added.
|
|
|
|
<P>
|
|
Note that the addition of a number of <EM>different</EM> keys could
|
|
cause one particular bit to be set to 1 multiple times.
|
|
|
|
|
|
<SECT2>Querying a Key
|
|
|
|
<P>
|
|
To query for the existence of a key the indices into the array are
|
|
calculated from the hash functions as above.
|
|
|
|
<P>
|
|
<ITEMIZE>
|
|
<ITEM> If any of the corresponding bits in the array are 0 then the key is
|
|
not present.
|
|
<ITEM> If all of the corresponding bits in the array are 1 then the key is
|
|
<EM>likely</EM> to be present.
|
|
</ITEMIZE>
|
|
|
|
<P>
|
|
Note the term <EM>likely</EM>.
|
|
It is possible that a <EM>collision</EM> in the digest can occur, whereby
|
|
the digest incorrectly indicates a key is present.
|
|
This is the price paid for the compact representation.
|
|
While the probability of a collision can never be reduced to zero it can
|
|
be controlled.
|
|
Larger values for the ratio of the digest size to the number of entries added
|
|
lower the probability.
|
|
The number of hash functions chosen also influence the probability.
|
|
|
|
|
|
<SECT2>Deleting a Key
|
|
<P>
|
|
|
|
To delete a key, it is not possible to simply set the associated bits
|
|
to 0 since any one of those bits could have been set to 1 by the addition
|
|
of a different key!
|
|
|
|
<P>
|
|
Therefore, to support deletions a counter is required for each bit position
|
|
in the array.
|
|
The procedures to follow would be:
|
|
|
|
<P>
|
|
<ITEMIZE>
|
|
<ITEM> When adding a key, set appropriate bits to 1 and increment the
|
|
corresponding counters.
|
|
<ITEM> When deleting a key, decrement the appropriate counters (while > 0),
|
|
and if a counter reaches 0 <EM>then</EM> the corresponding bit is set to 0.
|
|
</ITEMIZE>
|
|
|
|
|
|
<SECT1>How is the size of the Cache Digest in Squid determined?
|
|
|
|
<P>
|
|
Upon initialisation, the <EM>capacity</EM> is set to the number
|
|
of objects that can be (are) stored in the cache.
|
|
Note that there are upper and lower limits here.
|
|
|
|
<P>
|
|
An arbitrary constant, bits_per_entry (currently set to 5), is
|
|
used to calculate the size of the array using the following formula:
|
|
|
|
<P>
|
|
<VERB>
|
|
number of bits in array = capacity * bits_per_entry + 7
|
|
</VERB>
|
|
|
|
<P>
|
|
The size of the digest, in bytes, is therefore:
|
|
|
|
<P>
|
|
<VERB>
|
|
digest size = int (number of bits in array / 8)
|
|
</VERB>
|
|
|
|
<P>
|
|
When a digest rebuild occurs, the change in the cache size (capacity)
|
|
is measured.
|
|
If the capacity has changed by a large enough amount (10%) then
|
|
the digest array is freed and reallocated memory, otherwise the
|
|
same digest is re-used.
|
|
|
|
|
|
<SECT1>What hash functions (and how many of them) does Squid use?
|
|
|
|
<P>
|
|
The protocol design allows for a variable number of hash functions (k).
|
|
However, Squid employs a very efficient method using a fixed number - four.
|
|
|
|
<P>
|
|
Rather than computing a number of independent hash functions over a URL
|
|
Squid uses a 128-bit MD5 hash of the key (actually a combination of the URL
|
|
and the HTTP retrieval method) and then splits this into four equal
|
|
chunks.
|
|
|
|
Each chunk, modulo the digest size (m), is used as the value for one of
|
|
the hash functions - i.e. an index into the bit array.
|
|
|
|
<P>
|
|
Note: As Squid retrieves objects and stores them in its cache on disk,
|
|
it adds them to the in-RAM index using a lookup key which is an MD5 hash
|
|
- the very one discussed above.
|
|
This means that the values for the Cache Digest hash functions are
|
|
already available and consequently the operations are extremely
|
|
efficient!
|
|
|
|
<P>
|
|
Obviously, modifying the code to support a variable number of hash functions
|
|
would prove a little more difficult and would most likely reduce efficiency.
|
|
|
|
|
|
<SECT1>How are objects added to the Cache Digest in Squid?
|
|
|
|
<P>
|
|
Every object referenced in the index in RAM is checked to see if
|
|
it is suitable for addition to the digest.
|
|
|
|
A number of objects are not suitable, e.g. those that are private,
|
|
not cachable, negatively cached etc. and are skipped immediately.
|
|
|
|
<P>
|
|
A <EM>freshness</EM> test is next made in an attempt to guess if
|
|
the object will expire soon, since if it does, it is not worthwhile
|
|
adding it to the digest.
|
|
The object is checked against the refresh patterns for staleness...
|
|
|
|
<P>
|
|
Since Squid stores references to objects in its index using the MD5 key
|
|
discussed earlier there is no URL actually available for each object -
|
|
which means that the pattern used will fall back to the default pattern, ".".
|
|
This is an unfortunate state of affairs, but little can be done about
|
|
it.
|
|
A <EM>cd_refresh_pattern</EM> option will be added to the configuration
|
|
file soon which will at least make the confusion a little clearer :-)
|
|
|
|
<P>
|
|
Note that it is best to be conservative with your refresh pattern
|
|
for the Cache Digest, i.e.
|
|
do <EM>not</EM> add objects if they might become stale soon.
|
|
This will reduce the number of False Hits.
|
|
|
|
|
|
<SECT1>Does Squid support deletions in Cache Digests? What are diffs/deltas?
|
|
|
|
<P>
|
|
Squid does not support deletions from the digest.
|
|
Because of this the digest must, periodically, be rebuilt from scratch to
|
|
erase stale bits and prevent digest pollution.
|
|
|
|
<P>
|
|
A more sophisticated option is to use <EM>diffs</EM> or <EM>deltas</EM>.
|
|
These would be created by building a new digest and comparing with the
|
|
current/old one.
|
|
They would essentially consist of aggregated deletions and additions
|
|
since the <EM>previous</EM> digest.
|
|
|
|
<P>
|
|
Since less bandwidth should be required using these it would be possible
|
|
to have more frequent updates (and hence, more accurate information).
|
|
|
|
<P>
|
|
Costs:
|
|
|
|
<P>
|
|
<ITEMIZE>
|
|
<ITEM>RAM - extra RAM needed to hold two digests while comparisons takes place.
|
|
<ITEM>CPU - probably a negligible amount.
|
|
</ITEMIZE>
|
|
|
|
|
|
<SECT1>When and how often is the local digest built?
|
|
|
|
<P>
|
|
The local digest is built:
|
|
|
|
<P>
|
|
<ITEMIZE>
|
|
<ITEM> when store_rebuild completes after startup
|
|
(the cache contents have been indexed in RAM), and
|
|
<ITEM> periodically thereafter. Currently, it is rebuilt every hour
|
|
(more data and experience is required before other periods, whether
|
|
fixed or dynamically varying, can "intelligently" be chosen).
|
|
The good thing is that the local cache decides on the expiry time and
|
|
peers must obey (see later).
|
|
</ITEMIZE>
|
|
|
|
<P>
|
|
While the [new] digest is being built in RAM the old version (stored
|
|
on disk) is still valid, and will be returned to any peer requesting it.
|
|
When the digest has completed building it is then swapped out to disk,
|
|
overwriting the old version.
|
|
|
|
<P>
|
|
The rebuild is CPU intensive, but not overly so.
|
|
Since Squid is programmed using an event-handling model, the approach
|
|
taken is to split the digest building task into chunks (i.e. chunks
|
|
of entries to add) and to register each chunk as an event.
|
|
If CPU load is overly high, it is possible to extend the build period
|
|
- as long as it is finished before the next rebuild is due!
|
|
|
|
<P>
|
|
It may prove more efficient to implement the digest building as a separate
|
|
process/thread in the future...
|
|
|
|
|
|
<SECT1>How are Cache Digests transferred between peers?
|
|
|
|
<P>
|
|
Cache Digests are fetched from peers using the standard HTTP protocol
|
|
(note that a <EM>pull</EM> rather than <EM>push</EM> technique is
|
|
used).
|
|
|
|
<P>
|
|
After the first access to a peer, a <EM>peerDigestValidate</EM> event
|
|
is queued
|
|
(this event decides if it is time to fetch a new version of a digest
|
|
from a peer).
|
|
The queuing delay depends on the number of peers already queued
|
|
for validation - so that all digests from different peers are not
|
|
fetched simultaneously.
|
|
|
|
<P>
|
|
A peer answering a request for its digest will specify an expiry
|
|
time for that digest by using the HTTP <EM>Expires</EM> header.
|
|
The requesting cache thus knows when it should request a fresh
|
|
copy of that peers digest.
|
|
|
|
<P>
|
|
Note: requesting caches use an If-Modified-Since request in case the peer
|
|
has not rebuilt its digest for some reason since the last time it was
|
|
fetched.
|
|
|
|
|
|
<SECT1>How and where are Cache Digests stored?
|
|
|
|
<P>
|
|
<SECT2>Cache Digest built locally
|
|
|
|
<P>
|
|
Since the local digest is generated purely for the benefit of its neighbours
|
|
keeping it in RAM is not strictly required.
|
|
However, it was decided to keep the local digest in RAM partly because of
|
|
the following:
|
|
|
|
<P>
|
|
<ITEMIZE>
|
|
<ITEM> Approximately the same amount of memory will be (re-)allocated on every
|
|
rebuild of the digest,
|
|
<ITEM> the memory requirements are probably quite small (when compared to other
|
|
requirements of the cache server),
|
|
<ITEM> if ongoing updates of the digest are to be supported (e.g. additions/deletions) it will be necessary to perform these operations on a digest
|
|
in RAM, and
|
|
<ITEM> if diffs/deltas are to be supported the "old" digest would have to
|
|
be swapped into RAM anyway for the comparisons.
|
|
</ITEMIZE>
|
|
|
|
<P>
|
|
When the digest is built in RAM, it is then swapped out to disk, where it is
|
|
stored as a "normal" cache item - which is how peers request it.
|
|
|
|
|
|
<SECT2>Cache Digest fetched from peer
|
|
|
|
<P>
|
|
When a query from a client arrives, <EM>fast lookups</EM> are
|
|
required to decide if a request should be made to a neighbour cache.
|
|
It it therefore required to keep all peer digests in RAM.
|
|
|
|
<P>
|
|
Peer digests are also stored on disk for the following reasons:
|
|
|
|
<P>
|
|
<ITEMIZE>
|
|
<ITEM><EM>Recovery</EM>
|
|
- If stopped and restarted, peer digests can be reused from the local
|
|
on-disk copy (they will soon be validated using an HTTP IMS request
|
|
to the appropriate peers as discussed earlier), and
|
|
<ITEM><EM>Sharing</EM>
|
|
- peer digests are stored as normal objects in the cache. This
|
|
allows them to be given to neighbour caches.
|
|
</ITEMIZE>
|
|
|
|
|
|
<SECT1>How are the Cache Digest statistics in the Cache Manager to be interpreted?
|
|
|
|
<P>
|
|
Cache Digest statistics can be seen from the Cache Manager or through the
|
|
<EM>client</EM> utility.
|
|
The following examples show how to use the <EM>client</EM> utility
|
|
to request the list of possible operations from the localhost, local
|
|
digest statistics from the localhost, refresh statistics from the
|
|
localhost and local digest statistics from another cache, respectively.
|
|
|
|
<P>
|
|
<VERB>
|
|
./client mgr:menu
|
|
|
|
./client mgr:store_digest
|
|
|
|
./client mgr:refresh
|
|
|
|
./client -h peer mgr:store_digest
|
|
</VERB>
|
|
|
|
<P>
|
|
The available statistics provide a lot of useful debugging information.
|
|
The refresh statistics include a section for Cache Digests which
|
|
explains why items were added (or not) to the digest.
|
|
|
|
<P>
|
|
The following example shows local digest statistics for a 16GB
|
|
cache in a corporate intranet environment
|
|
(may be a useful reference for the discussion below).
|
|
|
|
<P>
|
|
<VERB>
|
|
store digest: size: 768000 bytes
|
|
entries: count: 588327 capacity: 1228800 util: 48%
|
|
deletion attempts: 0
|
|
bits: per entry: 5 on: 1953311 capacity: 6144000 util: 32%
|
|
bit-seq: count: 2664350 avg.len: 2.31
|
|
added: 588327 rejected: 528703 ( 47.33 %) del-ed: 0
|
|
collisions: on add: 0.23 % on rej: 0.23 %
|
|
</VERB>
|
|
|
|
<P>
|
|
<EM>entries:capacity</EM> is a measure of how many items "are likely" to
|
|
be added to the digest.
|
|
It represents the number of items that were in the local cache at the
|
|
start of digest creation - however, upper and lower limits currently
|
|
apply.
|
|
This value is multiplied by <EM>bits: per entry</EM> (an arbitrary constant)
|
|
to give <EM>bits:capacity</EM>, which is the size of the cache digest in bits.
|
|
Dividing this by 8 will give <EM>store digest: size</EM> which is the
|
|
size in bytes.
|
|
|
|
<P>
|
|
The number of items represented in the digest is given by
|
|
<EM>entries:count</EM>.
|
|
This should be equal to <EM>added</EM> minus <EM>deletion attempts</EM>.
|
|
|
|
Since (currently) no modifications are made to the digest after the initial
|
|
build (no additions are made and deletions are not supported)
|
|
<EM>deletion attempts</EM> will always be 0 and <EM>entries:count</EM>
|
|
should simply be equal to <EM>added</EM>.
|
|
|
|
<P>
|
|
<EM>entries:util</EM> is not really a significant statistic.
|
|
At most it gives a measure of how many of the items in the store were
|
|
deemed suitable for entry into the cache compared to how many were
|
|
"prepared" for.
|
|
|
|
<P>
|
|
<EM>rej</EM> shows how many objects were rejected.
|
|
Objects will not be added for a number of reasons, the most common being
|
|
refresh pattern settings.
|
|
Remember that (currently) the default refresh pattern will be used for
|
|
checking for entry here and also note that changing this pattern can
|
|
significantly affect the number of items added to the digest!
|
|
Too relaxed and False Hits increase, too strict and False Misses increase.
|
|
Remember also that at time of validation (on the peer) the "real" refresh
|
|
pattern will be used - so it is wise to keep the default refresh pattern
|
|
conservative.
|
|
|
|
<P>
|
|
<EM>bits: on</EM> indicates the number of bits in the digest that are set
|
|
to 1.
|
|
<EM>bits: util</EM> gives this figure as a percentage of the total number
|
|
of bits in the digest.
|
|
As we saw earlier, a figure of 50% represents the optimal trade-off.
|
|
Values too high (say > 75%) would cause a larger number of collisions,
|
|
and hence False Hits,
|
|
while lower values mean the digest is under-utilised (using unnecessary RAM).
|
|
Note that low values are normal for caches that are starting to fill up.
|
|
|
|
<P>
|
|
A bit sequence is an uninterrupted sequence of bits with the same value.
|
|
<EM>bit-seq: avg.len</EM> gives some insight into the quality of the hash
|
|
functions.
|
|
Long values indicate problem, even if <EM>bits:util</EM> is 50%
|
|
(> 3 = suspicious, > 10 = very suspicious).
|
|
|
|
|
|
<SECT1>What are False Hits and how should they be handled?
|
|
|
|
<P>
|
|
A False Hit occurs when a cache believes a peer has an object
|
|
and asks the peer for it <EM>but</EM> the peer is not able to
|
|
satisfy the request.
|
|
|
|
<P>
|
|
Expiring or stale objects on the peer are frequent causes of False
|
|
Hits.
|
|
At the time of the query actual refresh patterns are used on the
|
|
peer and stale entries are marked for revalidation.
|
|
However, revalidation is prohibited unless the peer is behaving
|
|
as a parent, or <EM>miss_access</EM> is enabled.
|
|
Thus, clients can receive error messages instead of revalidated
|
|
objects!
|
|
|
|
<P>
|
|
The frequency of False Hits can be reduced but never eliminated
|
|
completely, therefore there must be a robust way of handling them
|
|
when they occur.
|
|
The philosophy behind the design of Squid is to use lightweight
|
|
techniques and optimise for the common case and robustly handle the
|
|
unusual case (False Hits).
|
|
|
|
<P>
|
|
Squid will soon support the HTTP <EM>only-if-cached</EM> header.
|
|
Requests for objects made to a peer will use this header and if
|
|
the objects are not available, the peer can reply appropriately
|
|
allowing Squid to recognise the situation.
|
|
The following describes what Squid is aiming towards:
|
|
|
|
<P>
|
|
<ITEMIZE>
|
|
<ITEM>Cache Digests used to obtain good estimates of where a
|
|
requested object is located in a Cache Hierarchy.
|
|
<ITEM>Persistent HTTP Connections between peers.
|
|
There will be no TCP startup overhead and both latency and
|
|
network load will be similar for ICP (i.e. fast).
|
|
<ITEM>HTTP False Hit Recognition using the <EM>only-if-cached</EM>
|
|
HTTP header - allowing fall back to another peer or, if no other
|
|
peers are available with the object, then going direct (or
|
|
<EM>through</EM> a parent if behind a firewall).
|
|
</ITEMIZE>
|
|
|
|
|
|
<SECT1>How can Cache Digest related activity be traced/debugged?
|
|
|
|
<P>
|
|
<SECT2>Enabling Cache Digests
|
|
|
|
<P>
|
|
If you wish to use Cache Digests (available in Squid version 2) you need to
|
|
add a <EM>configure</EM> option, so that the relevant code is compiled in:
|
|
|
|
<P>
|
|
<VERB>
|
|
./configure --enable-cache-digests ...
|
|
</VERB>
|
|
|
|
|
|
<SECT2>What do the access.log entries look like?
|
|
|
|
<P>
|
|
If a request is forwarded to a neighbour due a HIT in that neighbour's
|
|
Cache Digest the hierarchy (9th) field of the access.log file for
|
|
the <EM>local cache</EM> will look like <EM>CACHE_DIGEST_HIT/neighbour</EM>.
|
|
The Log Tag (4th field) should obviously show a MISS.
|
|
|
|
<P>
|
|
On the peer cache the request should appear as a normal HTTP request
|
|
from the first cache.
|
|
|
|
|
|
<SECT2>What does a False Hit look like?
|
|
|
|
<P>
|
|
The easiest situation to analyse is when two caches (say A and B) are
|
|
involved neither of which uses the other as a parent.
|
|
In this case, a False Hit would show up as a CACHE_DIGEST_HIT on A and
|
|
<EM>NOT</EM> as a TCP_HIT on B (or vice versa).
|
|
If B does not fetch the object for A then the hierarchy field will
|
|
look like <EM>NONE/-</EM> (and A will have received an Access Denied
|
|
or Forbidden message).
|
|
This will happen if the object is not "available" on B and B does not
|
|
have <EM>miss_access</EM> enabled for A (or is not acting as a parent
|
|
for A).
|
|
|
|
|
|
<SECT2>How is the cause of a False Hit determined?
|
|
|
|
<P>
|
|
Assume A requests a URL from B and receives a False Hit
|
|
|
|
<ITEMIZE>
|
|
<ITEM> Using the <EM>client</EM> utility <EM>PURGE</EM> the URL from A, e.g.
|
|
|
|
<P>
|
|
<VERB>
|
|
./client -m PURGE 'URL'
|
|
</VERB>
|
|
|
|
<ITEM> Using the <EM>client</EM> utility request the object from A, e.g.
|
|
|
|
<P>
|
|
<VERB>
|
|
./client 'URL'
|
|
</VERB>
|
|
|
|
</ITEMIZE>
|
|
|
|
<P>
|
|
The HTTP headers of the request are available.
|
|
Two header types are of particular interest:
|
|
|
|
<P>
|
|
<ITEMIZE>
|
|
<ITEM> <EM>X-Cache</EM> - this shows whether an object is available or not.
|
|
<ITEM> <EM>X-Cache-Lookup</EM> - this keeps the result of a store table lookup
|
|
<EM>before</EM> refresh causing rules are checked (i.e. it indicates if the
|
|
object is available before any validation would be attempted).
|
|
</ITEMIZE>
|
|
|
|
<P>
|
|
The X-Cache and X-Cache-Lookup headers from A should both show MISS.
|
|
|
|
<P>
|
|
If A requests the object from B (which it will if the digest lookup indicates
|
|
B has it - assuming B is closest peer of course :-) then there will be another
|
|
set of these headers from B.
|
|
|
|
<P>
|
|
If the X-Cache header from B shows a MISS a False Hit has occurred.
|
|
This means that A thought B had an object but B tells A it does not
|
|
have it available for retrieval.
|
|
The reason why it is not available for retrieval is indicated by the
|
|
X-Cache-Lookup header. If:
|
|
|
|
<P>
|
|
<ITEMIZE>
|
|
<ITEM>
|
|
<EM>X-Cache-Lookup = MISS</EM> then either A's (version of
|
|
B's) digest is out-of-date or corrupt OR a collision occurred
|
|
in the digest (very small probability) OR B recently purged
|
|
the object.
|
|
<ITEM>
|
|
<EM>X-Cache-Lookup = HIT</EM> then B had the object, but
|
|
refresh rules (or A's max-age requirements) prevent A from
|
|
getting a HIT (validation failed).
|
|
</ITEMIZE>
|
|
|
|
|
|
<SECT2>Use The Source
|
|
|
|
<P>
|
|
If there is something else you need to check you can always look at the
|
|
source code.
|
|
The main Cache Digest functionality is organised as follows:
|
|
|
|
<P>
|
|
<ITEMIZE>
|
|
<ITEM> <EM>CacheDigest.c (debug section 70)</EM> Generic Cache Digest routines
|
|
<ITEM> <EM>store_digest.c (debug section 71)</EM> Local Cache Digest routines
|
|
<ITEM> <EM>peer_digest.c (debug section 72)</EM> Peer Cache Digest routines
|
|
</ITEMIZE>
|
|
|
|
<P>
|
|
Note that in the source the term <EM>Store Digest</EM> refers to the digest
|
|
created locally.
|
|
The Cache Digest code is fairly self-explanatory (once you understand how Cache
|
|
Digests work):
|
|
|
|
|
|
<SECT1>What about ICP?
|
|
<P>
|
|
|
|
COMING SOON!
|
|
|
|
|
|
<SECT1>Is there a Cache Digest Specification?
|
|
|
|
<P>
|
|
There is now, thanks to
|
|
<URL url="mailto:martin@net.lut.ac.uk" name="Martin Hamilton"> and
|
|
<URL url="mailto:rousskov@ircache.net" name="Alex Rousskov">.
|
|
|
|
<P>
|
|
Cache Digests, as implemented in Squid 2.1.PATCH2, are described in
|
|
<URL url="/CacheDigest/cache-digest-v5.txt" name="cache-digest-v5.txt">.
|
|
|
|
You'll notice the format is similar to an Internet Draft.
|
|
We decided not to submit this document as a draft because Cache Digests
|
|
will likely undergo some important changes before we want to try to make
|
|
it a standard.
|
|
|
|
<sect1>Would it be possible to stagger the timings when cache_digests are retrieved from peers?
|
|
<p>
|
|
<em>Note: The information here is current for version 2.2.</em>
|
|
<p>
|
|
Squid already has code to spread the digest updates. The algorithm is
|
|
currently controlled by a few hard-coded constants in <em/peer_digest.c/. For
|
|
example, <em/GlobDigestReqMinGap/ variable determines the minimum interval
|
|
between two requests for a digest. You may want to try to increase the
|
|
value of GlobDigestReqMinGap from 60 seconds to whatever you feel
|
|
comfortable with (but it should be smaller than hour/number_of_peers, of
|
|
course).
|
|
|
|
<p>
|
|
Note that whatever you do, you still need to give Squid enough time and
|
|
bandwidth to fetch all the digests. Depending on your environment, that
|
|
bandwidth may be more or less than an ICP would require. Upcoming digest
|
|
deltas (x10 smaller than the digests themselves) may be the only way to
|
|
solve the ``big scale'' problem.
|
|
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Interception Caching/Proxying
|
|
<label id="trans-caching">
|
|
|
|
<P>
|
|
<it>
|
|
How can I make my users' browsers use my cache without configuring
|
|
the browsers for proxying?
|
|
</it>
|
|
|
|
First, it is <em/critical/ to read the full comments in the squid.conf
|
|
file! That is the only authoritative source for configuration
|
|
information. However, the following instructions are correct as of
|
|
this writing (July 1999.)
|
|
|
|
<P>
|
|
Getting transparent caching to work requires four distinct steps:
|
|
<enum>
|
|
<item>
|
|
|
|
<bf/Compile and run a version of Squid which accepts
|
|
connections for other addresses/. For some operating systems,
|
|
you need to have configured and built a version of Squid which
|
|
can recognize the hijacked connections and discern the
|
|
destination addresses. For Linux this seems to work
|
|
automatically. For *BSD-based systems, you probably have to
|
|
configure squid with the <em/--enable-ipf-transparent/ option.
|
|
(Do a <em/make clean/ if you previously configured without that
|
|
option, or the correct settings may not be present.)
|
|
|
|
<item>
|
|
|
|
<bf/Configure Squid to accept and process the connections/.
|
|
You have to change the Squid configuration settings to
|
|
recognize the hijacked connections and discern the destination
|
|
addresses. Here are the important settings in <em/squid.conf/:
|
|
<verb>
|
|
http_port 8080
|
|
httpd_accel_host virtual
|
|
httpd_accel_port 80
|
|
httpd_accel_with_proxy on
|
|
httpd_accel_uses_host_header on
|
|
</verb>
|
|
|
|
<item>
|
|
|
|
<bf/Get your cache server to accept the packets/. You have to
|
|
configure your cache host to accept the redirected packets -
|
|
any IP address, on port 80 - and deliver them to your cache
|
|
application. This is typically done with IP
|
|
filtering/forwarding features built into the kernel.
|
|
On linux they call this <em/ipfilter/ (kernel 2.4.x),
|
|
<em/ipchains/ (2.2.x) or <em/ipfwadm/ (2.0.x).
|
|
On FreeBSD and other
|
|
*BSD systems they call it <em/ip filter/ or <em/ipnat/; on many
|
|
systems, it may require rebuilding the kernel or adding a new
|
|
loadable kernel module.
|
|
|
|
<item>
|
|
|
|
<bf/Get the packets to your cache server/. There are several
|
|
ways to do this. First, if your proxy machine is already in
|
|
the path of the packets (i.e. it is routing between your proxy
|
|
users and the Internet) then you don't have to worry about this
|
|
step. This would be true if you install Squid on a firewall
|
|
machine, or on a UNIX-based router. If the cache is not in the
|
|
natural path of the connections, then you have to divert the
|
|
packets from the normal path to your cache host using a router
|
|
or switch. You may be able to do this with a Cisco router using
|
|
their "route maps" feature, depending on your IOS version. You
|
|
might also use a so-called layer-4 switch, such as the Alteon
|
|
ACE-director or the Foundry Networks ServerIron. Finally, you
|
|
might be able to use a stand-alone router/load-balancer type
|
|
product, or routing capabilities of an access server.
|
|
|
|
</enum>
|
|
|
|
<bf/Notes/:
|
|
<itemize>
|
|
|
|
<item>The <em/http_port 8080/ in this example assumes you will redirect
|
|
incoming port 80 packets to port 8080 on your cache machine. If you
|
|
are running Squid on port 3128 (for example) you can leave it there via
|
|
<em/http_port 3128/, and redirect to that port via your IP filtering or
|
|
forwarding commands.
|
|
|
|
<item>In the <em/httpd_accel_host/ option, <em/virtual/ is the magic word!
|
|
|
|
<item>The <em/httpd_accel_with_proxy on/ is required to enable transparent
|
|
proxy mode; essentially in transparent proxy mode Squid thinks it is acting
|
|
both as an accelerator (hence accepting packets for other IPs on port 80) and
|
|
a caching proxy (hence serving files out of cache.)
|
|
|
|
<item> You <bf/must/ use <em/httpd_accel_uses_host_header on/ to get
|
|
the cache to work properly in transparent mode. This enables the cache
|
|
to index its stored objects under the true hostname, as is done in a
|
|
normal proxy, rather than under the IP address. This is especially
|
|
important if you want to use a parent cache hierarchy, or to share
|
|
cache data between transparent proxy users and non-transparent proxy
|
|
users, which you can do with Squid in this configuration.
|
|
|
|
</itemize>
|
|
|
|
<sect1>Interception caching for Solaris, SunOS, and BSD systems
|
|
<sect2>Install IP Filter
|
|
<P>
|
|
First, get and install the
|
|
<url url="http://coombs.anu.edu.au/ipfilter/"
|
|
name="IP Filter package">.
|
|
|
|
<sect2>Configure ipnat
|
|
<P>
|
|
Put these lines in <em>/etc/ipnat.rules</em>:
|
|
<verb>
|
|
# Redirect direct web traffic to local web server.
|
|
rdr de0 1.2.3.4/32 port 80 -> 1.2.3.4 port 80 tcp
|
|
|
|
# Redirect everything else to squid on port 8080
|
|
rdr de0 0.0.0.0/0 port 80 -> 1.2.3.4 port 8080 tcp
|
|
</verb>
|
|
|
|
<P>
|
|
Modify your startup scripts to enable ipnat. For example, on FreeBSD it
|
|
looks something like this:
|
|
<verb>
|
|
/sbin/modload /lkm/if_ipl.o
|
|
/sbin/ipnat -f /etc/ipnat.rules
|
|
chgrp nobody /dev/ipnat
|
|
chmod 644 /dev/ipnat
|
|
</verb>
|
|
|
|
<sect2>Configure Squid
|
|
<sect3>Squid-2
|
|
<P>
|
|
Squid-2 (after version beta25) has IP filter support built in.
|
|
Simple enable it when you run <em/configure/:
|
|
<verb>
|
|
./configure --enable-ipf-transparent
|
|
</verb>
|
|
Add these lines to your <em/squid.conf/ file:
|
|
<verb>
|
|
http_port 8080
|
|
httpd_accel_host virtual
|
|
httpd_accel_port 80
|
|
httpd_accel_with_proxy on
|
|
httpd_accel_uses_host_header on
|
|
</verb>
|
|
Note, you don't have to use port 8080, but it must match whatever you
|
|
used in the <em>/etc/ipnat.rules</em> file.
|
|
<sect3>Squid-1.1
|
|
<P>
|
|
Patches for Squid-1.X are available from
|
|
<url url="http://www.fan.net.au/~q/squid/" name="Quinton Dolan's Squid page">.
|
|
Add these lines to <em/squid.conf/:
|
|
<verb>
|
|
http_port 8080
|
|
httpd_accel virtual 80
|
|
httpd_accel_with_proxy on
|
|
httpd_accel_uses_host_header on
|
|
</verb>
|
|
|
|
<P>
|
|
Thanks to <url url="mailto:q@fan.net.au" name="Quinton Dolan">.
|
|
|
|
<sect1>Interception caching with Linux 2.0 and ipfwadm
|
|
<label id="trans-linux-1">
|
|
<P>
|
|
by <url url="mailto:Rodney.van.den.Oever@tip.nl" name="Rodney van den Oever">
|
|
|
|
<P><bf/Note:/ Interception proxying does NOT work with Linux 2.0.30!
|
|
Linux 2.0.29 is known to work well. If you're using a more recent
|
|
kernel, like 2.2.X, then you should probably use an ipchains configuration,
|
|
<ref id="trans-linux-2" name="as described below">.
|
|
|
|
<P>
|
|
<bf/Warning:/ this technique has some shortcomings.
|
|
<enum>
|
|
<item><bf/This method only supports the HTTP protocol, not gopher or FTP/
|
|
<item>Since the browser wasn't set up to use a proxy server, it uses
|
|
the FTP protocol (with destination port 21) and not the required
|
|
HTTP protocol. You can't setup a redirection-rule to the proxy
|
|
server since the browser is speaking the wrong protocol. A similar
|
|
problem occurs with gopher. Normally all proxy requests are
|
|
translated by the client into the HTTP protocol, but since the
|
|
client isn't aware of the redirection, this never happens.
|
|
</enum>
|
|
|
|
<P>
|
|
If you can live with the side-effects, go ahead and compile your
|
|
kernel with firewalling and redirection support. Here are the
|
|
important parameters from <em>/usr/src/linux/.config</em>:
|
|
|
|
<verb>
|
|
#
|
|
# Code maturity level options
|
|
#
|
|
CONFIG_EXPERIMENTAL=y
|
|
#
|
|
# Networking options
|
|
#
|
|
CONFIG_FIREWALL=y
|
|
# CONFIG_NET_ALIAS is not set
|
|
CONFIG_INET=y
|
|
CONFIG_IP_FORWARD=y
|
|
# CONFIG_IP_MULTICAST is not set
|
|
CONFIG_IP_FIREWALL=y
|
|
# CONFIG_IP_FIREWALL_VERBOSE is not set
|
|
CONFIG_IP_MASQUERADE=y
|
|
CONFIG_IP_TRANSPARENT_PROXY=y
|
|
CONFIG_IP_ALWAYS_DEFRAG=y
|
|
# CONFIG_IP_ACCT is not set
|
|
CONFIG_IP_ROUTER=y
|
|
</verb>
|
|
|
|
<P>
|
|
You may also need to enable <bf/IP Forwarding/. One way to do it
|
|
is to add this line to your startup scripts:
|
|
<verb>
|
|
echo 1 > /proc/sys/net/ipv4/ip_forward
|
|
</verb>
|
|
|
|
<P>
|
|
Go to the
|
|
<url url="http://www.xos.nl/linux/ipfwadm/"
|
|
name="Linux IP Firewall and Accounting"> page,
|
|
obtain the source distribution to <em/ipfwadm/ and install it.
|
|
Older versions of <em/ipfwadm/ may not work. You might need
|
|
at least version <bf/2.3.0/.
|
|
You'll use <em/ipfwadm/ to setup the redirection rules. I
|
|
added this rule to the script that runs from <em>/etc/rc.d/rc.inet1</em>
|
|
(Slackware) which sets up the interfaces at boot-time. The redirection
|
|
should be done before any other Input-accept rule. To really make
|
|
sure it worked I disabled the forwarding (masquerading) I normally
|
|
do.
|
|
<P>
|
|
<em>/etc/rc.d/rc.firewall</em>:
|
|
|
|
<verb>
|
|
#!/bin/sh
|
|
# rc.firewall Linux kernel firewalling rules
|
|
FW=/sbin/ipfwadm
|
|
|
|
# Flush rules, for testing purposes
|
|
for i in I O F # A # If we enabled accounting too
|
|
do
|
|
${FW} -$i -f
|
|
done
|
|
|
|
# Default policies:
|
|
${FW} -I -p rej # Incoming policy: reject (quick error)
|
|
${FW} -O -p acc # Output policy: accept
|
|
${FW} -F -p den # Forwarding policy: deny
|
|
|
|
# Input Rules:
|
|
|
|
# Loopback-interface (local access, eg, to local nameserver):
|
|
${FW} -I -a acc -S localhost/32 -D localhost/32
|
|
|
|
# Local Ethernet-interface:
|
|
|
|
# Redirect to Squid proxy server:
|
|
${FW} -I -a acc -P tcp -D default/0 80 -r 8080
|
|
|
|
# Accept packets from local network:
|
|
${FW} -I -a acc -P all -S localnet/8 -D default/0 -W eth0
|
|
|
|
# Only required for other types of traffic (FTP, Telnet):
|
|
|
|
# Forward localnet with masquerading (udp and tcp, no icmp!):
|
|
${FW} -F -a m -P tcp -S localnet/8 -D default/0
|
|
${FW} -F -a m -P udp -S localnet/8 -D default/0
|
|
</verb>
|
|
|
|
<P>
|
|
Here all traffic from the local LAN with any destination gets redirected to
|
|
the local port 8080. Rules can be viewed like this:
|
|
<verb>
|
|
IP firewall input rules, default policy: reject
|
|
type prot source destination ports
|
|
acc all 127.0.0.1 127.0.0.1 n/a
|
|
acc/r tcp 10.0.0.0/8 0.0.0.0/0 * -> 80 => 8080
|
|
acc all 10.0.0.0/8 0.0.0.0/0 n/a
|
|
acc tcp 0.0.0.0/0 0.0.0.0/0 * -> *
|
|
</verb>
|
|
|
|
<P>
|
|
I did some testing on Windows 95 with both Microsoft Internet
|
|
Explorer 3.01 and Netscape Communicator pre-release and it worked
|
|
with both browsers with the proxy-settings disabled.
|
|
|
|
<P>
|
|
At one time <em/squid/ seemed to get in a loop when I pointed the
|
|
browser to the local port 80. But this could be avoided by adding a
|
|
reject rule for client to this address:
|
|
<verb>
|
|
${FW} -I -a rej -P tcp -S localnet/8 -D hostname/32 80
|
|
|
|
IP firewall input rules, default policy: reject
|
|
type prot source destination ports
|
|
acc all 127.0.0.1 127.0.0.1 n/a
|
|
rej tcp 10.0.0.0/8 10.0.0.1 * -> 80
|
|
acc/r tcp 10.0.0.0/8 0.0.0.0/0 * -> 80 => 8080
|
|
acc all 10.0.0.0/8 0.0.0.0/0 n/a
|
|
acc tcp 0.0.0.0/0 0.0.0.0/0 * -> *
|
|
</verb>
|
|
|
|
<P>
|
|
<em/NOTE on resolving names/: Instead of just
|
|
passing the URLs to the proxy server, the browser itself has to
|
|
resolve the URLs. Make sure the workstations are setup to query
|
|
a local nameserver, to minimize outgoing traffic.
|
|
<P>
|
|
If you're already running a nameserver at the firewall or proxy server
|
|
(which is a good idea anyway IMHO) let the workstations use this
|
|
nameserver.
|
|
|
|
<P>
|
|
Additional notes from
|
|
<url url="mailto:RichardA@noho.co.uk"
|
|
name="Richard Ayres">
|
|
|
|
<quote>
|
|
<P>
|
|
I'm using such a setup. The only issues so far have been that:
|
|
|
|
<enum>
|
|
<item>
|
|
It's fairly useless to use my service providers parent caches
|
|
(cache-?.www.demon.net) because by proxying squid only sees IP addresses,
|
|
not host names and demon aren't generally asked for IP addresses by other
|
|
users;
|
|
|
|
<item>
|
|
Linux kernel 2.0.30 is a no-no as transparent proxying is broken (I use
|
|
2.0.29);
|
|
|
|
<item>
|
|
Client browsers must do host name lookups themselves, as they don't know
|
|
they're using a proxy;
|
|
|
|
<item>
|
|
The Microsoft Network won't authorize its users through a proxy, so I
|
|
have to specifically *not* redirect those packets (my company is a MSN
|
|
content provider).
|
|
</enum>
|
|
|
|
Aside from this, I get a 30-40% hit rate on a 50MB cache for 30-40 users and
|
|
am quite pleased with the results.
|
|
</quote>
|
|
|
|
<P>
|
|
See also <url url="http://www.unxsoft.com/transproxy.html"
|
|
name="Daniel Kiracofe's page">.
|
|
|
|
<sect1>Interception caching with Linux 2.2 and ipchains
|
|
<label id="trans-linux-2">
|
|
<P>
|
|
by <url url="mailto:Support@dnet.co.uk" name="Martin Lyons">
|
|
<P>
|
|
You need to configure your kernel for ipchains.
|
|
Configuring Linux kernels is beyond the scope of
|
|
this FAQ. One way to do it is:
|
|
<verb>
|
|
# cd /usr/src/linux
|
|
# make menuconfig
|
|
</verb>
|
|
<p>
|
|
The following shows important kernel features to include:
|
|
<verb>
|
|
[*] Network firewalls
|
|
[ ] Socket Filtering
|
|
[*] Unix domain sockets
|
|
[*] TCP/IP networking
|
|
[ ] IP: multicasting
|
|
[ ] IP: advanced router
|
|
[ ] IP: kernel level autoconfiguration
|
|
[*] IP: firewalling
|
|
[ ] IP: firewall packet netlink device
|
|
[*] IP: always defragment (required for masquerading)
|
|
[*] IP: transparent proxy support
|
|
</verb>
|
|
<P>
|
|
You must include the <em>IP: always defragment</em>, otherwise it prevents
|
|
you from using the REDIRECT chain.
|
|
|
|
<P>
|
|
You can use this script as a template for your own <em/rc.firewall/
|
|
to configure ipchains:
|
|
<verb>
|
|
#!/bin/sh
|
|
# rc.firewall Linux kernel firewalling rules
|
|
# Leon Brooks (leon at brooks dot fdns dot net)
|
|
FW=/sbin/ipchains
|
|
ADD="$FW -A"
|
|
|
|
# Flush rules, for testing purposes
|
|
for i in I O F # A # If we enabled accounting too
|
|
do
|
|
${FW} -F $i
|
|
done
|
|
|
|
# Default policies:
|
|
${FW} -P input REJECT # Incoming policy: reject (quick error)
|
|
${FW} -P output ACCEPT # Output policy: accept
|
|
${FW} -P forward DENY # Forwarding policy: deny
|
|
|
|
# Input Rules:
|
|
|
|
# Loopback-interface (local access, eg, to local nameserver):
|
|
${ADD} input -j ACCEPT -s localhost/32 -d localhost/32
|
|
|
|
# Local Ethernet-interface:
|
|
|
|
# Redirect to Squid proxy server:
|
|
${ADD} input -p tcp -d 0/0 80 -j REDIRECT 8080
|
|
|
|
# Accept packets from local network:
|
|
${ADD} input -j ACCEPT -s localnet/8 -d 0/0 -i eth0
|
|
|
|
# Only required for other types of traffic (FTP, Telnet):
|
|
|
|
# Forward localnet with masquerading (udp and tcp, no icmp!):
|
|
${ADD} forward -j MASQ -p tcp -s localnet/8 -d 0/0
|
|
${ADD} forward -j MASQ -P udp -s localnet/8 -d 0/0
|
|
</verb>
|
|
|
|
<P>
|
|
Also, <url url="mailto:andrew@careless.net" name="Andrew Shipton">
|
|
notes that with 2.0.x kernels you don't need to enable packet forwarding,
|
|
but with the 2.1.x and 2.2.x kernels using ipchains you do. Packet
|
|
forwarding is enabled with the following command:
|
|
<verb>
|
|
echo 1 > /proc/sys/net/ipv4/ip_forward
|
|
</verb>
|
|
|
|
<sect1>Interception caching with Linux 2.4 and netfilter
|
|
<label id="trans-linux-3">
|
|
<P>
|
|
NOTE: this information comes from Daniel Kiracofe's
|
|
<url url="http://www.linuxdoc.org/HOWTO/mini/TransparentProxy.html"
|
|
name="Transparent Proxy with Squid mini-HOWTO">.
|
|
<p>
|
|
You may need to build a new kernel. Be sure to enable
|
|
all of these options (none of them as modules):
|
|
<itemize>
|
|
<item>Networking support
|
|
<item>Sysctl support
|
|
<item>Network packet filtering
|
|
<item>TCP/IP networking
|
|
<item>Connection tracking (Under ``IP: Netfilter Configuration'' in menuconfig)
|
|
<item>IP tables support
|
|
<item>Full NAT
|
|
<item>REDIRECT target support
|
|
<item>/proc filesystem support
|
|
</itemize>
|
|
<p>
|
|
You must say NO to ``Fast switching''
|
|
<p>
|
|
After building the kernel, install it and reboot.
|
|
<p>
|
|
You may need to enable packet forwarding (e.g. in your startup scripts):
|
|
<verb>
|
|
echo 1 > /proc/sys/net/ipv4/ip_forward
|
|
</verb>
|
|
<p>
|
|
Use the <em/iptables/ command to make your kernel intercept HTTP connections
|
|
and send them to Squid:
|
|
<verb>
|
|
iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 3128
|
|
</verb>
|
|
|
|
|
|
<sect1>Interception caching with Cisco routers
|
|
|
|
<P>
|
|
by <url url="mailto:John.Saunders@scitec.com.au" name="John Saunders">
|
|
|
|
<P>
|
|
This works with at least IOS 11.1 and later I guess. Possibly earlier,
|
|
as I'm no CISCO expert I can't say for sure. If your router is doing
|
|
anything more complicated that shuffling packets between an ethernet
|
|
interface and either a serial port or BRI port, then you should work
|
|
through if this will work for you.
|
|
|
|
<P>
|
|
First define a route map with a name of proxy-redirect (name doesn't
|
|
matter) and specify the next hop to be the machine Squid runs on.
|
|
<verb>
|
|
!
|
|
route-map proxy-redirect permit 10
|
|
match ip address 110
|
|
set ip next-hop 203.24.133.2
|
|
!
|
|
</verb>
|
|
Define an access list to trap HTTP requests. The second line allows
|
|
the Squid host direct access so an routing loop is not formed.
|
|
By carefully writing your access list as show below, common
|
|
cases are found quickly and this can greatly reduce the load on your
|
|
router's processor.
|
|
<verb>
|
|
!
|
|
access-list 110 deny tcp any any neq www
|
|
access-list 110 deny tcp host 203.24.133.2 any
|
|
access-list 110 permit tcp any any
|
|
!
|
|
</verb>
|
|
Apply the route map to the ethernet interface.
|
|
<verb>
|
|
!
|
|
interface Ethernet0
|
|
ip policy route-map proxy-redirect
|
|
!
|
|
</verb>
|
|
|
|
<sect2>possible bugs
|
|
|
|
<P>
|
|
<url url="mailto:morgan@curtin.net" name="Bruce Morgan"> notes that
|
|
there is a Cisco bug relating to transparent proxying using IP
|
|
policy route maps, that causes NFS and other applications to break.
|
|
Apparently there are two bug reports raised in Cisco, but they are
|
|
not available for public dissemination.
|
|
|
|
<P>
|
|
The problem occurs with o/s packets with more than 1472 data bytes. If you try
|
|
to ping a host with more than 1472 data bytes across a Cisco interface with the
|
|
access-lists and ip policy route map, the icmp request will fail. The
|
|
packet will be fragmented, and the first fragment is checked against the
|
|
access-list and rejected - it goes the "normal path" as it is an icmp
|
|
packet - however when the second fragment is checked against the
|
|
access-list it is accepted (it isn't regarded as an icmp packet), and
|
|
goes to the action determined by the policy route map!
|
|
|
|
<P>
|
|
<url url="mailto:John.Saunders@scitec.com.au" name="John"> notes that you
|
|
may be able to get around this bug by carefully writing your access lists.
|
|
If the last/default rule is to permit then this bug
|
|
would be a problem, but if the last/default rule was to deny then
|
|
it won't be a problem. I guess fragments, other than the first,
|
|
don't have the information available to properly policy route them.
|
|
Normally TCP packets should not be fragmented, at least my network
|
|
runs an MTU of 1500 everywhere to avoid fragmentation. So this would
|
|
affect UDP and ICMP traffic only.
|
|
|
|
<P>
|
|
Basically, you will have to pick between living with the bug or better
|
|
performance. This set has better performance, but suffers from the
|
|
bug:
|
|
<verb>
|
|
access-list 110 deny tcp any any neq www
|
|
access-list 110 deny tcp host 10.1.2.3 any
|
|
access-list 110 permit tcp any any
|
|
</verb>
|
|
Conversely, this set has worse performance, but works for all protocols:
|
|
<verb>
|
|
access-list 110 deny tcp host 10.1.2.3 any
|
|
access-list 110 permit tcp any any eq www
|
|
access-list 110 deny tcp any any
|
|
</verb>
|
|
|
|
<sect1>Interception caching with LINUX 2.0.29 and CISCO IOS 11.1
|
|
|
|
<P>
|
|
Just for kicks, here's an email message posted to squid-users
|
|
on how to make transparent proxying work with a Cisco router
|
|
and Squid running on Linux.
|
|
|
|
<P>
|
|
by <url url="mailto:signal@shreve.net" name="Brian Feeny">
|
|
|
|
<P>
|
|
Here is how I have Interception proxying working for me, in an environment
|
|
where my router is a Cisco 2501 running IOS 11.1, and Squid machine is
|
|
running Linux 2.0.33.
|
|
|
|
<P>
|
|
Many thanks to the following individuals and the squid-users list for
|
|
helping me get redirection and transparent proxying working on my
|
|
Cisco/Linux box.
|
|
|
|
<itemize>
|
|
<item>Lincoln Dale
|
|
<item>Riccardo Vratogna
|
|
<item>Mark White
|
|
<item>Henrik Nordstrom
|
|
</itemize>
|
|
|
|
<P>
|
|
First, here is what I added to my Cisco, which is running IOS 11.1. In
|
|
IOS 11.1 the route-map command is "process switched" as opposed to the
|
|
faster "fast-switched" route-map which is found in IOS 11.2 and later.
|
|
You may wish to be running IOS 11.2. I am running 11.1, and have had no
|
|
problems with my current load of about 150 simultaneous connections to
|
|
squid.:
|
|
<verb>
|
|
!
|
|
interface Ethernet0
|
|
description To Office Ethernet
|
|
ip address 208.206.76.1 255.255.255.0
|
|
no ip directed-broadcast
|
|
no ip mroute-cache
|
|
ip policy route-map proxy-redir
|
|
!
|
|
access-list 110 deny tcp host 208.206.76.44 any eq www
|
|
access-list 110 permit tcp any any eq www
|
|
route-map proxy-redir permit 10
|
|
match ip address 110
|
|
set ip next-hop 208.206.76.44
|
|
</verb>
|
|
|
|
<P>
|
|
So basically from above you can see I added the "route-map" declaration,
|
|
and an access-list, and then turned the route-map on under int e0 "ip
|
|
policy route-map proxy-redir"
|
|
|
|
<P>
|
|
ok, so the Cisco is taken care of at this point. The host above:
|
|
208.206.76.44, is the ip number of my squid host.
|
|
|
|
<P>
|
|
My squid box runs Linux, so I had to do the following on it:
|
|
|
|
<P>
|
|
my kernel (2.0.33) config looks like this:
|
|
<verb>
|
|
#
|
|
# Networking options
|
|
#
|
|
CONFIG_FIREWALL=y
|
|
# CONFIG_NET_ALIAS is not set
|
|
CONFIG_INET=y
|
|
CONFIG_IP_FORWARD=y
|
|
CONFIG_IP_MULTICAST=y
|
|
CONFIG_SYN_COOKIES=y
|
|
# CONFIG_RST_COOKIES is not set
|
|
CONFIG_IP_FIREWALL=y
|
|
# CONFIG_IP_FIREWALL_VERBOSE is not set
|
|
CONFIG_IP_MASQUERADE=y
|
|
# CONFIG_IP_MASQUERADE_IPAUTOFW is not set
|
|
CONFIG_IP_MASQUERADE_ICMP=y
|
|
CONFIG_IP_TRANSPARENT_PROXY=y
|
|
CONFIG_IP_ALWAYS_DEFRAG=y
|
|
# CONFIG_IP_ACCT is not set
|
|
CONFIG_IP_ROUTER=y
|
|
</verb>
|
|
|
|
<P>
|
|
You will need Firewalling and Transparent Proxy turned on at a minimum.
|
|
|
|
<P>
|
|
Then some ipfwadm stuff:
|
|
<verb>
|
|
# Accept all on loopback
|
|
ipfwadm -I -a accept -W lo
|
|
# Accept my own IP, to prevent loops (repeat for each interface/alias)
|
|
ipfwadm -I -a accept -P tcp -D 208.206.76.44 80
|
|
# Send all traffic destined to port 80 to Squid on port 3128
|
|
ipfwadm -I -a accept -P tcp -D 0/0 80 -r 3128
|
|
</verb>
|
|
<P>
|
|
it accepts packets on port 80 (redirected from the Cisco), and redirects
|
|
them to 3128 which is the port my squid process is sitting on. I put all
|
|
this in /etc/rc.d/rc.local
|
|
|
|
<P>
|
|
I am using
|
|
<url url="/Versions/1.1/1.1.20/" name="v1.1.20 of Squid"> with
|
|
<url url="http://hem.passagen.se/hno/squid/squid-1.1.20.host_and_virtual.patch"
|
|
name="Henrik's patch">
|
|
installed. You will want to install this patch if using a setup similar
|
|
to mine.
|
|
|
|
<sect1>The cache is trying to connect to itself...
|
|
<P>
|
|
by <url url="mailto:hno@hem.passagen.se" name="Henrik Nordstrom">
|
|
<P>
|
|
I think almost everyone who have tried to build a transparent proxy
|
|
setup have been bitten by this one.
|
|
|
|
<P>
|
|
Measures you can take:
|
|
<itemize>
|
|
<item>
|
|
Deny Squid from fetching objects from itself (using ACL lists).
|
|
<item>
|
|
Apply a small patch that prevents Squid from looping infinitely
|
|
(available from <url url="http://hem.passagen.se/hno/squid/" name="Henrik's Squid Patches">)
|
|
<item>
|
|
Don't run Squid on port 80, and redirect port 80 not destined for
|
|
the local machine to Squid (redirection == ipfilter/ipfw/ipfadm). This
|
|
avoids the most common loops.
|
|
<item>
|
|
If you are using ipfilter then you should also use transproxyd in
|
|
front of Squid. Squid does not yet know how to interface to ipfilter
|
|
(patches are welcome: squid-bugs@squid-cache.org).
|
|
</itemize>
|
|
|
|
<sect1>Interception caching with FreeBSD
|
|
<label id="trans-freebsd">
|
|
<P>
|
|
by Duane Wessels
|
|
<P>
|
|
I set out yesterday to make transparent caching work with Squid and
|
|
FreeBSD. It was, uh, fun.
|
|
<P>
|
|
It was relatively easy to configure a cisco to divert port 80
|
|
packets to my FreeBSD box. Configuration goes something like this:
|
|
<verb>
|
|
access-list 110 deny tcp host 10.0.3.22 any eq www
|
|
access-list 110 permit tcp any any eq www
|
|
route-map proxy-redirect permit 10
|
|
match ip address 110
|
|
set ip next-hop 10.0.3.22
|
|
int eth2/0
|
|
ip policy route-map proxy-redirect
|
|
</verb>
|
|
Here, 10.0.3.22 is the IP address of the FreeBSD cache machine.
|
|
|
|
<P>
|
|
Once I have packets going to the FreeBSD box, I need to get the
|
|
kernel to deliver them to Squid.
|
|
I started on FreeBSD-2.2.7, and then downloaded
|
|
<url url="ftp://coombs.anu.edu.au/pub/net/ip-filter/"
|
|
name="IPFilter">. This was a dead end for me. The IPFilter distribution
|
|
includes patches to the FreeBSD kernel sources, but many of these had
|
|
conflicts. Then I noticed that the IPFilter page says
|
|
``It comes as a part of [FreeBSD-2.2 and later].'' Fair enough. Unfortunately,
|
|
you can't hijack connections with the FreeBSD-2.2.X IPFIREWALL code (<em/ipfw/), and
|
|
you can't (or at least I couldn't) do it with <em/natd/ either.
|
|
|
|
<P>
|
|
FreeBSD-3.0 has much better support for connection hijacking, so I suggest
|
|
you start with that. You need to build a kernel with the following options:
|
|
<verb>
|
|
options IPFIREWALL
|
|
options IPFIREWALL_FORWARD
|
|
</verb>
|
|
|
|
<P>
|
|
Next, its time to configure the IP firewall rules with <em/ipfw/.
|
|
By default, there are no "allow" rules and all packets are denied.
|
|
I added these commands to <em>/etc/rc.local</em>
|
|
just to be able to use the machine on my network:
|
|
<verb>
|
|
ipfw add 60000 allow all from any to any
|
|
</verb>
|
|
But we're still not hijacking connections. To accomplish that,
|
|
add these rules:
|
|
<verb>
|
|
ipfw add 49 allow tcp from 10.0.3.22 to any
|
|
ipfw add 50 fwd 127.0.0.1 tcp from any to any 80
|
|
</verb>
|
|
The second line (rule 50) is the one which hijacks the connection.
|
|
The first line makes sure we never hit rule 50 for traffic originated
|
|
by the local machine. This prevents forwarding loops.
|
|
|
|
<P>
|
|
Note that I am not changing the port number here. That is,
|
|
port 80 packets are simply diverted to Squid on port 80.
|
|
My Squid configuration is:
|
|
<verb>
|
|
http_port 80
|
|
httpd_accel_host virtual
|
|
httpd_accel_port 80
|
|
httpd_accel_with_proxy on
|
|
httpd_accel_uses_host_header on
|
|
</verb>
|
|
|
|
<P>
|
|
If you don't want Squid to listen on port 80 (because that
|
|
requires root privileges) then you can use another port.
|
|
In that case your ipfw redirect rule looks like:
|
|
<verb>
|
|
ipfw add 50 fwd 127.0.0.1,3128 tcp from any to any 80
|
|
</verb>
|
|
and the <em/squid.conf/ lines are:
|
|
<verb>
|
|
http_port 3128
|
|
httpd_accel_host virtual
|
|
httpd_accel_port 80
|
|
httpd_accel_with_proxy on
|
|
httpd_accel_uses_host_header on
|
|
</verb>
|
|
|
|
<sect1>Interception caching with ACC Tigris digital access server
|
|
<P>
|
|
by <url url="mailto:John.Saunders@scitec.com.au" name="John Saunders">
|
|
<P>
|
|
This is to do with configuring transparent proxy
|
|
for an ACC Tigris digital access server (like a CISCO 5200/5300
|
|
or an Ascend MAX 4000). I've found that doing this in the NAS
|
|
reduces traffic on the LAN and reduces processing load on the
|
|
CISCO. The Tigris has ample CPU for filtering.
|
|
|
|
<P>
|
|
Step 1 is to create filters that allow local traffic to pass.
|
|
Add as many as needed for all of your address ranges.
|
|
<verb>
|
|
ADD PROFILE IP FILTER ENTRY local1 INPUT 10.0.3.0 255.255.255.0 0.0.0.0 0.0.0.0 NORMAL
|
|
ADD PROFILE IP FILTER ENTRY local2 INPUT 10.0.4.0 255.255.255.0 0.0.0.0 0.0.0.0 NORMAL
|
|
</verb>
|
|
|
|
<P>
|
|
Step 2 is to create a filter to trap port 80 traffic.
|
|
<verb>
|
|
ADD PROFILE IP FILTER ENTRY http INPUT 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 = 0x6 D= 80 NORMAL
|
|
</verb>
|
|
|
|
<P>
|
|
Step 3 is to set the "APPLICATION_ID" on port 80 traffic to 80.
|
|
This causes all packets matching this filter to have ID 80
|
|
instead of the default ID of 0.
|
|
<verb>
|
|
SET PROFILE IP FILTER APPLICATION_ID http 80
|
|
</verb>
|
|
|
|
<P>
|
|
Step 4 is to create a special route that is used for
|
|
packets with "APPLICATION_ID" set to 80. The routing
|
|
engine uses the ID to select which routes to use.
|
|
<verb>
|
|
ADD IP ROUTE ENTRY 0.0.0.0 0.0.0.0 PROXY-IP 1
|
|
SET IP ROUTE APPLICATION_ID 0.0.0.0 0.0.0.0 PROXY-IP 80
|
|
</verb>
|
|
|
|
<P>
|
|
Step 5 is to bind everything to a filter ID called transproxy.
|
|
List all local filters first and the http one last.
|
|
<verb>
|
|
ADD PROFILE ENTRY transproxy local1 local2 http
|
|
</verb>
|
|
|
|
<P>
|
|
With this in place use your RADIUS server to send back the
|
|
``Framed-Filter-Id = transproxy'' key/value pair to the NAS.
|
|
|
|
<P>
|
|
You can check if the filter is being assigned to logins with
|
|
the following command:
|
|
<verb>
|
|
display profile port table
|
|
</verb>
|
|
|
|
<sect1>``Connection reset by peer'' and Cisco policy routing
|
|
<P>
|
|
<url url="mailto:fygrave at tigerteam dot net" name="Fyodor">
|
|
has tracked down the cause of unusual ``connection reset by peer'' messages
|
|
when using Cisco policy routing to hijack HTTP requests.
|
|
<P>
|
|
When the network link between router and the cache goes down for just a
|
|
moment, the packets that are supposed to be redirected are instead sent
|
|
out the default route. If this happens, a TCP ACK from the client host
|
|
may be sent to the origin server, instead of being diverted to the
|
|
cache. The origin server, upon receiving an unexpected ACK packet,
|
|
sends a TCP RESET back to the client, which aborts the client's request.
|
|
<P>
|
|
To work around this problem, you can install a static route to the
|
|
<em/null0/ interface for the cache address with a higher metric (lower
|
|
precedence), such as 250. Then, when the link goes down, packets from the client
|
|
just get dropped instead of sent out the default route. For example, if
|
|
1.2.3.4 is the IP address of your Squid cache, you may add:
|
|
<verb>
|
|
ip route 1.2.3.4 255.255.255.255 Null0 250
|
|
</verb>
|
|
This appears to cause the correct behaviour.
|
|
|
|
|
|
<sect1>WCCP - Web Cache Coordination Protocol
|
|
|
|
<p>
|
|
Contributors: <url url="mailto:glenn@ircache.net" name="Glenn Chisholm"> and
|
|
<url url="mailto:ltd@cisco.com" name="Lincoln Dale">.
|
|
|
|
<sect2>Does Squid support WCCP?
|
|
|
|
<p>
|
|
CISCO's Web Cache Coordination Protocol V1.0 is supported in squid
|
|
2.3 and later. support WCCP V2.0. Now that WCCP V2 is an open protocol,
|
|
Squid may be able to support it in the future.
|
|
|
|
<sect2>Configuring your Router
|
|
|
|
<p>
|
|
There are two different methods of configuring WCCP on CISCO routers.
|
|
The first method is for routers that only support V1.0 of the
|
|
protocol. The second is for routers that support both.
|
|
|
|
<sect3>IOS Version 11.x
|
|
|
|
<P>
|
|
It is possible that later versions of IOS 11.x will support V2.0 of the
|
|
protocol. If that is the case follow the 12.x instructions. Several
|
|
people have reported that the squid implimentation of WCCP does not
|
|
work with their 11.x routers. If you experience this please mail the
|
|
debug output from your router to <em/squid-bugs/.
|
|
|
|
<verb>
|
|
conf t
|
|
|
|
wccp enable
|
|
!
|
|
interface [Interface Carrying Outgoing Traffic]x/x
|
|
!
|
|
ip wccp web-cache redirect
|
|
!
|
|
CTRL Z
|
|
write mem
|
|
</verb>
|
|
|
|
<sect3> IOS Version 12.x
|
|
|
|
<P>
|
|
Some of the early versions of 12.x do not have the 'ip wccp version'
|
|
command. You will need to upgrade your IOS version to use V1.0.
|
|
|
|
<p>
|
|
You will need to be running at least IOS Software Release <em/12.0(5)T/
|
|
if you're running the 12.0 T-train. IOS Software Releases <em/12.0(3)T/
|
|
and <em/12.0(4)T/ do not have WCCPv1, but <em/12.0(5)T/ does.
|
|
|
|
|
|
<verb>
|
|
conf t
|
|
|
|
ip wccp version 1
|
|
ip wccp web-cache
|
|
!
|
|
interface [Interface Carrying Outgoing/Incomming Traffic]x/x
|
|
ip wccp web-cache redirect out|in
|
|
!
|
|
CTRL Z
|
|
write mem
|
|
</verb>
|
|
|
|
<sect2>IOS 12.3 problems
|
|
<p>
|
|
Some people report problems with WCCP and IOS 12.3. They see
|
|
truncated or fragmented GRE packets arriving at the cache. Apparently
|
|
it works if you disable Cisco Express Forwarding for the interface:
|
|
<verb>
|
|
conf t
|
|
ip cep # some systems may need 'ip cep global'
|
|
int Ethernet0/0
|
|
no ip route-cache cef
|
|
CTRL Z
|
|
</verb>
|
|
|
|
<sect2>Configuring FreeBSD
|
|
|
|
<P>
|
|
FreeBSD first needs to be configured to recieve and strip the GRE
|
|
encapsulation from the packets from the router. To do this you will
|
|
need to patch and recompile your kernel.
|
|
|
|
<P>
|
|
First, a patch needs to be applied to your kernel for GRE
|
|
support. Apply the
|
|
<url url="../../WCCP-support/FreeBSD-3.x/gre.patch" name="patch for FreeBSD-3.x kernels">
|
|
or the
|
|
<url url="../../WCCP-support/FreeBSD-4.x/gre.patch" name="patch for FreeBSD-4.x kernels">
|
|
as appropriate.
|
|
|
|
<P>
|
|
Secondly you will need to download
|
|
<url url="../../WCCP-support/FreeBSD-3.x/gre.c" name="gre.c for FreeBSD-3.x">
|
|
or
|
|
<url url="../../WCCP-support/FreeBSD-4.x/gre.c" name="gre.c for FreeBSD-4.x">
|
|
and copy it to <em>/usr/src/sys/netinet/gre.c</em>.
|
|
|
|
<P>
|
|
Finally add "OPTION GRE" to your kernel config file and rebuild
|
|
your kernel. Note, the <em/opt_gre.h/ file is
|
|
created when you run <em/config/.
|
|
Once your kernel is installed you will need to
|
|
<ref id="trans-freebsd" name="configure FreeBSD for transparent proxying">.
|
|
|
|
<sect2>Configuring Linux 2.2
|
|
|
|
<p>
|
|
Al Blake has written a <url url="http://www.spc.int/it/TechHead/Wccp-squid.html"
|
|
name="Cookbook for setting up transparent WCCP using Squid on RedHat Linux and a cisco access server">.
|
|
|
|
<P>
|
|
There are currently two methods for supporting WCCP with Linux 2.2.
|
|
A specific purpose module. Or the standard Linux GRE tunneling
|
|
driver. People have reported difficulty with the standard GRE
|
|
tunneling driver, however it does allow GRE functionality other
|
|
than WCCP. You should choose the method that suits your enviroment.
|
|
|
|
<sect3>Standard Linux GRE Tunnel
|
|
|
|
<P>
|
|
Linux 2.2 kernels already support GRE, as long as the GRE module is
|
|
compiled into the kernel.
|
|
|
|
<P>
|
|
You will need to patch the <em/ip_gre.c/ code that comes with your Linux
|
|
kernel with this <url url="http://www.vsb.cz/~hal01/cache/wccp/ip_gre.patch"
|
|
name="patch"> supplied by <url url="mailto:Jan.Haluza@vsb.cz" name="Jan Haluza">.
|
|
|
|
<P>
|
|
Ensure that the GRE code is either built as static or as a module by chosing
|
|
the appropriate option in your kernel config. Then rebuild your kernel.
|
|
If it is a module you will need to:
|
|
<verb>
|
|
modprobe ip_gre
|
|
</verb>
|
|
|
|
The next step is to tell Linux to establish an IP tunnel between the router and
|
|
your host. Daniele Orlandi <!-- daniele@orlandi.com --> reports
|
|
that you have to give the gre1 interface an address, but any old
|
|
address seems to work.
|
|
<verb>
|
|
iptunnel add gre1 mode gre remote <Router-IP> local <Host-IP> dev <interface>
|
|
ifconfig gre1 127.0.0.2 up
|
|
</verb>
|
|
<Router-IP> is the IP address of your router that is intercepting the
|
|
HTTP packets. <Host-IP> is the IP address of your cache, and
|
|
<interface> is the network interface that receives those packets (probably eth0).
|
|
|
|
<sect3>Joe Cooper's Patch
|
|
<p>
|
|
Joe Cooper has a patch for Linux 2.2.18 kernel on his
|
|
<url url="http://www.swelltech.com/pengies/joe/patches/" name="Squid page">.
|
|
|
|
<sect3>WCCP Specific Module
|
|
|
|
<P>
|
|
This module is not part of the standard Linux distributon. It needs
|
|
to be compiled as a module and loaded on your system to function.
|
|
Do not attempt to build this in as a static part of your kernel.
|
|
|
|
<P>
|
|
Download the <url url="../../WCCP-support/Linux/ip_wccp.c" name="Linux WCCP module">
|
|
and compile it as you would any Linux network module.
|
|
|
|
<P>
|
|
Copy the module to <em>/lib/modules/kernel-version/ipv4/ip_wccp.o</em>. Edit
|
|
<em>/lib/modules/kernel-version/modules.dep</em> and add:
|
|
|
|
<verb>
|
|
/lib/modules/kernel-version/ipv4/ip_wccp.o:
|
|
</verb>
|
|
|
|
<P>
|
|
Finally you will need to load the module:
|
|
|
|
<verb>
|
|
modprobe ip_wccp
|
|
</verb>
|
|
|
|
<sect3>Common Steps
|
|
|
|
<P>
|
|
The machine should now be striping the GRE encapsulation from any packets
|
|
recieved and requeuing them. The system will also need to be configured
|
|
for transparent proxying, either with <ref id="trans-linux-1" name="ipfwadm">
|
|
or with <ref id="trans-linux-2" name="ipchains">.
|
|
|
|
<sect2>Configuring Others
|
|
|
|
<P>
|
|
If you have managed to configuring your operating system to support WCCP
|
|
with Squid
|
|
please contact us with the details so we may share them with others.
|
|
|
|
<sect1>Can someone tell me what version of cisco IOS WCCP is added in?
|
|
<p>
|
|
IOS releases:
|
|
<itemize>
|
|
<item>11.1(19?)CA/CC or later
|
|
<item>11.2(14)P or later
|
|
<item>12.0(anything) or later
|
|
</itemize>
|
|
|
|
<sect1>What about WCCPv2?
|
|
<p>
|
|
Cisco has published WCCPv2 as an <url url="http://www.ietf.org/internet-drafts/draft-wilson-wrec-wccp-v2-00.txt"
|
|
name="Internet Draft"> (expires Jan 2001).
|
|
At this point, Squid does not support WCCPv2, but anyone
|
|
is welcome to code it up and contribute to the Squid project.
|
|
|
|
<sect1>Interception caching with Foundry L4 switches
|
|
<p>
|
|
by <url url="mailto:signal at shreve dot net" name="Brian Feeny">.
|
|
<p>
|
|
First, configure Squid for transparent caching as detailed
|
|
at the <ref id="trans-caching" name="beginning of this section">.
|
|
<p>
|
|
Next, configure
|
|
the Foundry layer 4 switch to
|
|
transparently redirect traffic to your Squid box or boxes. By default,
|
|
the Foundry
|
|
redirects to port 80 of your squid box. This can
|
|
be changed to a different port if needed, but won't be covered
|
|
here.
|
|
|
|
<p>
|
|
In addition, the switch does a "health check" of the port to make
|
|
sure your squid is answering. If you squid does not answer, the
|
|
switch defaults to sending traffic directly thru instead of
|
|
redirecting it. When the Squid comes back up, it begins
|
|
redirecting once again.
|
|
|
|
<p>
|
|
This example assumes you have two squid caches:
|
|
<verb>
|
|
squid1.foo.com 192.168.1.10
|
|
squid2.foo.com 192.168.1.11
|
|
</verb>
|
|
|
|
<p>
|
|
We will assume you have various workstations, customers, etc, plugged
|
|
into the switch for which you want them to be transparently proxied.
|
|
The squid caches themselves should be plugged into the switch as well.
|
|
Only the interface that the router is connected to is important. Where you
|
|
put the squid caches or other connections does not matter.
|
|
|
|
<p>
|
|
This example assumes your router is plugged into interface <bf/17/
|
|
of the switch. If not, adjust the following commands accordingly.
|
|
|
|
<enum>
|
|
<item>
|
|
Enter configuration mode:
|
|
<verb>
|
|
telnet@ServerIron#conf t
|
|
</verb>
|
|
|
|
<item>
|
|
Configure each squid on the Foundry:
|
|
<verb>
|
|
telnet@ServerIron(config)# server cache-name squid1 192.168.1.10
|
|
telnet@ServerIron(config)# server cache-name squid2 192.168.1.11
|
|
</verb>
|
|
|
|
<item>
|
|
Add the squids to a cache-group:
|
|
<verb>
|
|
telnet@ServerIron(config)#server cache-group 1
|
|
telnet@ServerIron(config-tc-1)#cache-name squid1
|
|
telnet@ServerIron(config-tc-1)#cache-name squid2
|
|
</verb>
|
|
|
|
<item>
|
|
Create a policy for caching http on a local port
|
|
<verb>
|
|
telnet@ServerIron(config)# ip policy 1 cache tcp http local
|
|
</verb>
|
|
|
|
<item>
|
|
Enable that policy on the port connected to your router
|
|
<verb>
|
|
telnet@ServerIron(config)#int e 17
|
|
telnet@ServerIron(config-if-17)# ip-policy 1
|
|
</verb>
|
|
|
|
</enum>
|
|
|
|
<p>
|
|
Since all outbound traffic to the Internet goes out interface
|
|
<bf/17/ (the router), and interface <bf/17/ has the caching policy applied to
|
|
it, HTTP traffic is going to be intercepted and redirected to the
|
|
caches you have configured.
|
|
|
|
<p>
|
|
The default port to redirect to can be changed. The load balancing
|
|
algorithm used can be changed (Least Used, Round Robin, etc). Ports
|
|
can be exempted from caching if needed. Access Lists can be applied
|
|
so that only certain source IP Addresses are redirected, etc. This
|
|
information was left out of this document since this was just a quick
|
|
howto that would apply for most people, not meant to be a comprehensive
|
|
manual of how to configure a Foundry switch. I can however revise this
|
|
with any information necessary if people feel it should be included.
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>SNMP
|
|
|
|
<P>
|
|
Contributors: <url url="mailto:glenn@ircache.net" name="Glenn Chisholm">.
|
|
|
|
<sect1>Does Squid support SNMP?
|
|
|
|
<P>
|
|
True SNMP support is available in squid 2 and above. A significant change in the implimentation
|
|
occured starting with the development 2.2 code. Therefore there are two sets of instructions
|
|
on how to configure SNMP in squid, please make sure that you follow the correct one.
|
|
|
|
<sect1>Enabling SNMP in Squid
|
|
<P>
|
|
To use SNMP, it must first be enabled with the <em/configure/ script,
|
|
and squid rebuilt. To enable is first run the script:
|
|
<verb>
|
|
./configure --enable-snmp [ ... other configure options ]
|
|
</verb>
|
|
Next, recompile after cleaning the source tree :
|
|
<verb>
|
|
make clean
|
|
make all
|
|
make install
|
|
</verb>
|
|
Once the compile is completed and the new binary is installed the <em/squid.conf/ file
|
|
needs to be configured to allow access; the default is to deny all requests. The
|
|
instructions on how to do this have been broken into two parts, the first for all versions
|
|
of Squid from 2.2 onwards and the second for 2.1 and below.
|
|
|
|
<sect1>Configuring Squid 2.2
|
|
|
|
<P>
|
|
To configure SNMP first specify a list of communities that you would like to allow access
|
|
by using a standard acl of the form:
|
|
<verb>
|
|
acl aclname snmp_community string
|
|
</verb>
|
|
For example:
|
|
<verb>
|
|
acl snmppublic snmp_community public
|
|
acl snmpjoebloggs snmp_community joebloggs
|
|
</verb>
|
|
This creates two acl's, with two different communities, public and joebloggs. You can
|
|
name the acl's and the community strings anything that you like.
|
|
|
|
<P>
|
|
To specify the port that the agent will listen on modify the "snmp_port" parameter,
|
|
it is defaulted to 3401. The port that the agent will forward requests that can
|
|
not be furfilled by this agent to is set by "forward_snmpd_port" it is defaulted
|
|
to off. It must be configured for this to work. Remember that as the requests will
|
|
be originating from this agent you will need to make sure that you configure
|
|
your access accordingly.
|
|
|
|
<P>
|
|
To allow access to Squid's SNMP agent, define an <em/snmp_access/ ACL with the community
|
|
strings that you previously defined.
|
|
For example:
|
|
<verb>
|
|
snmp_access allow snmppublic localhost
|
|
snmp_access deny all
|
|
</verb>
|
|
The above will allow anyone on the localhost who uses the community <em/public/ to
|
|
access the agent. It will deny all others access.
|
|
<p>
|
|
|
|
If you do not define any <em/snmp_access/ ACL's, then
|
|
SNMP access is denied by default.
|
|
|
|
<P>
|
|
Finally squid allows to you to configure the address that the agent will bind to
|
|
for incomming and outgoing traffic. These are defaulted to 0.0.0.0, changing these
|
|
will cause the agent to bind to a specific address on the host, rather than the
|
|
default which is all.
|
|
<verb>
|
|
snmp_incoming_address 0.0.0.0
|
|
snmp_outgoing_address 0.0.0.0
|
|
</verb>
|
|
|
|
<sect1>Configuring Squid 2.1
|
|
<P>
|
|
Prior to Squid 2.1 the SNMP code had a number of issues with the ACL's. If you are
|
|
a frequent user of SNMP with Squid, please upgrade to 2.2 or higher.
|
|
<p>
|
|
|
|
A sort of default, working configuration is:
|
|
<verb>
|
|
snmp_port 3401
|
|
snmp_mib_path /local/squid/etc/mib.txt
|
|
|
|
snmp_agent_conf view all .1.3.6 included
|
|
snmp_agent_conf view squid .1.3.6 included
|
|
snmp_agent_conf user squid - all all public
|
|
snmp_agent_conf user all all all all squid
|
|
snmp_agent_conf community public squid squid
|
|
snmp_agent_conf community readwrite all all
|
|
</verb>
|
|
<P>
|
|
Note that for security you are advised to restrict SNMP access to your
|
|
caches. You can do this easily as follows:
|
|
<verb>
|
|
acl snmpmanagementhosts 1.2.3.4/255.255.255.255 1.2.3.0/255.255.255.0
|
|
snmp_acl public deny all !snmpmanagementhosts
|
|
snmp_acl readwrite deny all
|
|
</verb>
|
|
You must follow these instructions for 2.1 and below exactly or you are
|
|
likely to have problems. The parser has some issues which have been corrected
|
|
in 2.2.
|
|
|
|
<sect1>How can I query the Squid SNMP Agent
|
|
|
|
<P>
|
|
You can test if your Squid supports SNMP with the <em/snmpwalk/ program
|
|
(<em/snmpwalk/ is a part of the
|
|
<url url="http://net-snmp.sourceforge.net/" name="NET-SNMP project">).
|
|
Note that you have to specify the SNMP port, which in Squid defaults to
|
|
3401.
|
|
<verb>
|
|
snmpwalk -p 3401 hostname communitystring .1.3.6.1.4.1.3495.1.1
|
|
</verb>
|
|
If it gives output like:
|
|
<verb>
|
|
enterprises.nlanr.squid.cacheSystem.cacheSysVMsize = 7970816
|
|
enterprises.nlanr.squid.cacheSystem.cacheSysStorage = 2796142
|
|
enterprises.nlanr.squid.cacheSystem.cacheUptime = Timeticks: (766299) 2:07:42.99
|
|
</verb>
|
|
then it is working ok, and you should be able to make nice statistics out of it.
|
|
|
|
<P>
|
|
For an explanation of what every string (OID) does, you should
|
|
refer to the <url url="/SNMP/"
|
|
name="Squid SNMP web pages">.
|
|
|
|
<sect1>What can I use SNMP and Squid for?
|
|
<P>
|
|
There are a lot of things you can do with SNMP and Squid. It can be useful
|
|
in some extent for a longer term overview of how your proxy is doing. It can
|
|
also be used as a problem solver. For example: how is it going with your
|
|
filedescriptor usage? or how much does your LRU vary along a day. Things
|
|
you can't monitor very well normally, aside from clicking at the cachemgr
|
|
frequently. Why not let MRTG do it for you?
|
|
|
|
<sect1>How can I use SNMP with Squid?
|
|
<p>
|
|
There are a number of tools that you can use to monitor Squid via
|
|
SNMP. Many people use MRTG. Another good combination is <url
|
|
url="http://net-snmp.sourceforge.net/" name="NET-SNMP"> plus <url
|
|
url="http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/"
|
|
name="RRDTool">. You might be able to find more
|
|
information at the <url url="/SNMP/"
|
|
name="Squid SNMP web pages">.
|
|
|
|
|
|
<sect2>MRTG
|
|
<P>
|
|
Some people use <url url="http://www.mrtg.org/" name="MRTG">
|
|
to query Squid through its SNMP interface.
|
|
|
|
<P>
|
|
To get instruction on using MRTG with Squid please visit these pages:
|
|
<enum>
|
|
<item><url url="http://unary.calvin.edu/squid.html" name="Squid + MRTG graphs">
|
|
</enum>
|
|
|
|
<sect1>Where can I get more information/discussion about Squid and SNMP?
|
|
|
|
<P>
|
|
General Discussion: <url url="mailto:cache-snmp@ircache.net" name="cache-snmp@ircache.net">
|
|
These messages are <url url="http://www.squid-cache.org/mail-archive/cache-snmp/"
|
|
name="archived">.
|
|
|
|
<P>
|
|
Subscriptions should be sent to: <url url="mailto:cache-snmp-request@ircache.net"
|
|
name="cache-snmp-request@ircache.net">.
|
|
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Squid version 2
|
|
|
|
<sect1>What are the new features?
|
|
|
|
<P>
|
|
<itemize>
|
|
<item>HTTP/1.1 persistent connections.
|
|
<item>Lower VM usage; in-transit objects are not held fully in memory.
|
|
<item>Totally independent swap directories.
|
|
<item>Customizable error texts.
|
|
<item>FTP supported internally; no more ftpget.
|
|
<item>Asynchronous disk operations (optional, requires pthreads library).
|
|
<item>Internal icons for FTP and gopher directories.
|
|
<item>snprintf() used everywhere instead of sprintf().
|
|
<item>SNMP.
|
|
<item><url url="/urn-support.html" name="URN support">
|
|
<item>Routing requests based on AS numbers.
|
|
<item><url url="FAQ-16.html" name="Cache Digests">
|
|
<item>...and many more!
|
|
</itemize>
|
|
|
|
|
|
|
|
<sect1>How do I configure 'ssl_proxy' now?
|
|
<P>
|
|
By default, Squid connects directly to origin servers for SSL requests.
|
|
But if you must force SSL requests through a parent, first tell Squid
|
|
it can not go direct for SSL:
|
|
<verb>
|
|
acl SSL method CONNECT
|
|
never_direct allow SSL
|
|
</verb>
|
|
With this in place, Squid <em/should/ pick one of your parents to
|
|
use for SSL requests. If you want it to pick a particular parent,
|
|
you must use the <em/cache_peer_access/ configuration:
|
|
<verb>
|
|
cache_peer parent1 parent 3128 3130
|
|
cache_peer parent2 parent 3128 3130
|
|
cache_peer_access parent2 allow !SSL
|
|
</verb>
|
|
The above lines tell Squid to NOT use <em/parent2/ for SSL, so it
|
|
should always use <em/parent1/.
|
|
|
|
<sect1>Logfile rotation doesn't work with Async I/O
|
|
|
|
<P>
|
|
It is a know limitation when using Async I/O on Linux. The Linux
|
|
Threads package steals (uses internally) the SIGUSR1 signal that squid uses
|
|
to rotate logs.
|
|
|
|
<P>
|
|
In order to not disturb the threads package SIGUSR1 use is disabled in
|
|
Squid when threads is enabled on Linux.
|
|
|
|
<sect1>Adding a new cache disk
|
|
<P>
|
|
Simply add your new <em/cache_dir/ line to <em/squid.conf/, then
|
|
run <em/squid -z/ again. Squid will create swap directories on the
|
|
new disk and leave the existing ones in place.
|
|
|
|
<sect1>Squid 2 performs badly on Linux
|
|
<P>
|
|
by <url url="mailto:hno@hem.passagen.se" name="Henrik Nordstrom">
|
|
<P>
|
|
You may have enabled Asyncronous I/O with the <em/--enable-async-io/
|
|
configure option.
|
|
Be careful when using threads on Linux. Most versions of libc5 and
|
|
early versions of glibc have problems with threaded applications. I
|
|
would not recommend <em/--enable-async-io/ on Linux unless your system
|
|
uses a recent version of glibc.
|
|
|
|
<P>
|
|
You should also know that <em/--enable-async-io/ is not optimal unless
|
|
you have a very busy cache. For low loads the cache performs slightly
|
|
better without <em/--enable-async-io/.
|
|
|
|
Try recompiling Squid without <em/--enable-async-io/. If a non-threaded
|
|
Squid performs better then your libc probably can't handle threads
|
|
correctly. (don't forget "make clean" after running configure)
|
|
|
|
<sect1>How do I configure proxy authentication with Squid-2?
|
|
<label id="configuring-proxy-auth">
|
|
<P>
|
|
For Squid-2, the implementation and configuration has changed.
|
|
Authentication is now handled via external processes.
|
|
Arjan's <url url="http://www.iae.nl/users/devet/squid/proxy_auth/" name="proxy auth page">
|
|
describes how to set it up. Some simple instructions are given below as well.
|
|
|
|
<enum>
|
|
<item>
|
|
We assume you have configured an ACL entry with proxy_auth, for example:
|
|
<verb>
|
|
acl foo proxy_auth REQUIRED
|
|
http_access allow foo
|
|
</verb>
|
|
|
|
<item>
|
|
You will need to compile and install an external authenticator program.
|
|
Most people will want to use <em/ncsa_auth/. The source for this program
|
|
is included in the source distribution, in the <em>auth_modules/NCSA</em>
|
|
directory.
|
|
<verb>
|
|
% cd auth_modules/NCSA
|
|
% make
|
|
% make install
|
|
</verb>
|
|
You should now have an <em/ncsa_auth/ program in the same directory where
|
|
your <em/squid/ binary lives.
|
|
|
|
<item>
|
|
You may need to create a password file. If you have been using
|
|
proxy authentication before, you probably already have such a file.
|
|
You can get <url url="../../htpasswd/" name="apache's htpasswd program">
|
|
from our server. Pick a pathname for your password file. We will assume
|
|
you will want to put it in the same directory as your squid.conf.
|
|
|
|
<item>
|
|
Configure the external authenticator in <em/squid.conf/.
|
|
For <em/ncsa_auth/ you need to give the pathname to the executable and
|
|
the password file as an argument. For example:
|
|
<verb>
|
|
authenticate_program /usr/local/squid/bin/ncsa_auth /usr/local/squid/etc/passwd
|
|
</verb>
|
|
</enum>
|
|
|
|
<P>
|
|
After all that, you should be able to start up Squid. If we left something out, or
|
|
haven't been clear enough, please let us know (squid-faq@squid-cache.org).
|
|
|
|
<sect1>Why does proxy-auth reject all users with Squid-2.2?
|
|
<P>
|
|
The ACL for proxy-authentication has changed from:
|
|
<verb>
|
|
acl foo proxy_auth timeout
|
|
</verb>
|
|
to:
|
|
<verb>
|
|
acl foo proxy_auth username
|
|
</verb>
|
|
Please update your ACL appropriately - a username of <em/REQUIRED/ will permit
|
|
all valid usernames. The timeout is now specified with the configuration
|
|
option:
|
|
<verb>
|
|
authenticate_ttl timeout
|
|
</verb>
|
|
|
|
<sect1>Delay Pools
|
|
|
|
<P>
|
|
by <url url="mailto:luyer@ucs.uwa.edu.au" name="David Luyer">.
|
|
|
|
<P>
|
|
<bf>
|
|
The information here is current for version 2.2. It is strongly
|
|
recommended that you use at least Squid 2.2 if you wish to use delay pools.
|
|
</bf>
|
|
|
|
<P>
|
|
Delay pools provide a way to limit the bandwidth of certain requests
|
|
based on any list of criteria. The idea came from a Western Australian
|
|
university who wanted to restrict student traffic costs (without
|
|
affecting staff traffic, and still getting cache and local peering hits
|
|
at full speed). There was some early Squid 1.0 code by Central Network
|
|
Services at Murdoch University, which I then developed (at the University
|
|
of Western Australia) into a much more complex patch for Squid 1.0
|
|
called ``DELAY_HACK.'' I then tried to code it in a much cleaner style
|
|
and with slightly more generic options than I personally needed, and
|
|
called this ``delay pools'' in Squid 2. I almost completely recoded
|
|
this in Squid 2.2 to provide the greater flexibility requested by people
|
|
using the feature.
|
|
|
|
<P>
|
|
To enable delay pools features in Squid 2.2, you must use the
|
|
<em>--enable-delay-pools</em> configure option before compilation.
|
|
|
|
<P>
|
|
Terminology for this FAQ entry:
|
|
|
|
<descrip>
|
|
<tag/pool/
|
|
a collection of bucket groups as appropriate to a given class
|
|
|
|
<tag/bucket group/
|
|
a group of buckets within a pool, such as the per-host bucket
|
|
group, the per-network bucket group or the aggregate bucket
|
|
group (the aggregate bucket group is actually a single bucket)
|
|
|
|
<tag/bucket/
|
|
an individual delay bucket represents a traffic allocation
|
|
which is replenished at a given rate (up to a given limit) and
|
|
causes traffic to be delayed when empty
|
|
|
|
<tag/class/
|
|
the class of a delay pool determines how the delay is applied,
|
|
ie, whether the different client IPs are treated seperately or
|
|
as a group (or both)
|
|
|
|
<tag/class 1/
|
|
a class 1 delay pool contains a single unified bucket which is
|
|
used for all requests from hosts subject to the pool
|
|
|
|
<tag/class 2/
|
|
a class 2 delay pool contains one unified bucket and 255
|
|
buckets, one for each host on an 8-bit network (IPv4 class C)
|
|
|
|
<tag/class 3/
|
|
contains 255 buckets for the subnets in a 16-bit network, and
|
|
individual buckets for every host on these networks (IPv4 class
|
|
B)
|
|
|
|
</descrip>
|
|
|
|
<P>
|
|
Delay pools allows you to limit traffic for clients or client groups,
|
|
with various features:
|
|
<itemize>
|
|
<item>
|
|
can specify peer hosts which aren't affected by delay pools,
|
|
ie, local peering or other 'free' traffic (with the
|
|
<em>no-delay</em> peer option).
|
|
|
|
<item>
|
|
delay behavior is selected by ACLs (low and high priority
|
|
traffic, staff vs students or student vs authenticated student
|
|
or so on).
|
|
|
|
<item>
|
|
each group of users has a number of buckets, a bucket has an
|
|
amount coming into it in a second and a maximum amount it can
|
|
grow to; when it reaches zero, objects reads are deferred
|
|
until one of the object's clients has some traffic allowance.
|
|
|
|
<item>
|
|
any number of pools can be configured with a given class and
|
|
any set of limits within the pools can be disabled, for example
|
|
you might only want to use the aggregate and per-host bucket
|
|
groups of class 3, not the per-network one.
|
|
|
|
</itemize>
|
|
|
|
<P>
|
|
This allows options such as creating a number of class 1 delay pools
|
|
and allowing a certain amount of bandwidth to given object types (by
|
|
using URL regular expressions or similar), and many other uses I'm sure
|
|
I haven't even though of beyond the original fair balancing of a
|
|
relatively small traffic allocation across a large number of users.
|
|
|
|
<P>
|
|
There are some limitations of delay pools:
|
|
<itemize>
|
|
<item>
|
|
delay pools are incompatible with slow aborts; quick abort
|
|
should be set fairly low to prevent objects being retrived at
|
|
full speed once there are no clients requesting them (as the
|
|
traffic allocation is based on the current clients, and when
|
|
there are no clients attached to the object there is no way to
|
|
determine the traffic allocation).
|
|
<item>
|
|
delay pools only limits the actual data transferred and is not
|
|
inclusive of overheads such as TCP overheads, ICP, DNS, icmp
|
|
pings, etc.
|
|
<item>
|
|
it is possible for one connection or a small number of
|
|
connections to take all the bandwidth from a given bucket and
|
|
the other connections to be starved completely, which can be a
|
|
major problem if there are a number of large objects being
|
|
transferred and the parameters are set in a way that a few
|
|
large objects will cause all clients to be starved (potentially
|
|
fixed by a currently experimental patch).
|
|
</itemize>
|
|
|
|
<sect2>How can I limit Squid's total bandwidth to, say, 512 Kbps?
|
|
|
|
<P>
|
|
<verb>
|
|
acl all src 0.0.0.0/0.0.0.0 # might already be defined
|
|
delay_pools 1
|
|
delay_class 1 1
|
|
delay_access 1 allow all
|
|
delay_parameters 1 64000/64000 # 512 kbits == 64 kbytes per second
|
|
</verb>
|
|
|
|
<bf>
|
|
For an explanation of these tags please see the configuration file.
|
|
</bf>
|
|
|
|
|
|
<P>
|
|
The 1 second buffer (max = restore = 64kbytes/sec) is because a limit
|
|
is requested, and no responsiveness to a busrt is requested. If you
|
|
want it to be able to respond to a burst, increase the aggregate_max to
|
|
a larger value, and traffic bursts will be handled. It is recommended
|
|
that the maximum is at least twice the restore value - if there is only
|
|
a single object being downloaded, sometimes the download rate will fall
|
|
below the requested throughput as the bucket is not empty when it comes
|
|
to be replenished.
|
|
|
|
<sect2>How to limit a single connection to 128 Kbps?
|
|
|
|
<P>
|
|
You can not limit a single HTTP request's connection speed. You
|
|
<EM>can</EM> limit individual hosts to some bandwidth rate. To limit a
|
|
specific host, define an <EM>acl</EM> for that host and use the example
|
|
above. To limit a group of hosts, then you must use a delay pool of
|
|
class 2 or 3. For example:
|
|
<verb>
|
|
acl only128kusers src 192.168.1.0/255.255.192.0
|
|
acl all src 0.0.0.0/0.0.0.0
|
|
delay_pools 1
|
|
delay_class 1 3
|
|
delay_access 1 allow only128kusers
|
|
delay_access 1 deny all
|
|
delay_parameters 1 64000/64000 -1/-1 16000/64000
|
|
|
|
</verb>
|
|
<bf>
|
|
For an explanation of these tags please see the configuration file.
|
|
</bf>
|
|
|
|
The above gives a solution where a cache is given a total of 512kbits to
|
|
operate in, and each IP address gets only 128kbits out of that pool.
|
|
|
|
<sect2>How do you personally use delay pools?
|
|
|
|
<P>
|
|
We have six local cache peers, all with the options 'proxy-only no-delay'
|
|
since they are fast machines connected via a fast ethernet and microwave (ATM)
|
|
network.
|
|
|
|
<P>
|
|
For our local access we use a dstdomain ACL, and for delay pool exceptions
|
|
we use a dst ACL as well since the delay pool ACL processing is done using
|
|
'fast lookups', which means (among other things) it won't wait for a DNS
|
|
lookup if it would need one.
|
|
|
|
<P>
|
|
Our proxy has two virtual interfaces, one which requires student
|
|
authentication to connect from machines where a department is not
|
|
paying for traffic, and one which uses delay pools. Also, users of the
|
|
main Unix system are allowed to choose slow or fast traffic, but must
|
|
pay for any traffic they do using the fast cache. Ident lookups are
|
|
disabled for accesses through the slow cache since they aren't needed.
|
|
Slow accesses are delayed using a class 3 delay pool to give fairness
|
|
between departments as well as between users. We recognize users of
|
|
Lynx on the main host are grouped together in one delay bucket but they
|
|
are mostly viewing text pages anyway, so this isn't considered a
|
|
serious problem. If it was we could take those hosts into a class 1
|
|
delay pool and give it a larger allocation.
|
|
|
|
<P>
|
|
I prefer using a slow restore rate and a large maximum rate to give
|
|
preference to people who are looking at web pages as their individual
|
|
bucket fills while they are reading, and those downloading large
|
|
objects are disadvantaged. This depends on which clients you believe
|
|
are more important. Also, one individual 8 bit network (a residential
|
|
college) have paid extra to get more bandwidth.
|
|
|
|
<P>
|
|
The relevant parts of my configuration file are (IP addresses, etc, all
|
|
changed):
|
|
<verb>
|
|
# ACL definitions
|
|
# Local network definitions, domains a.net, b.net
|
|
acl LOCAL-NET dstdomain a.net b.net
|
|
# Local network; nets 64 - 127. Also nearby network class A, 10.
|
|
acl LOCAL-IP dst 192.168.64.0/255.255.192.0 10.0.0.0/255.0.0.0
|
|
# Virtual i/f used for slow access
|
|
acl virtual_slowcache myip 192.168.100.13/255.255.255.255
|
|
# All permitted slow access, nets 96 - 127
|
|
acl slownets src 192.168.96.0/255.255.224.0
|
|
# Special 'fast' slow access, net 123
|
|
acl fast_slow src 192.168.123.0/255.255.255.0
|
|
# User hosts
|
|
acl my_user_hosts src 192.168.100.2/255.255.255.254
|
|
# "All" ACL
|
|
acl all src 0.0.0.0/0.0.0.0
|
|
|
|
# Don't need ident lookups for billing on (free) slow cache
|
|
ident_lookup_access allow my_user_hosts !virtual_slowcache
|
|
ident_lookup_access deny all
|
|
|
|
# Security access checks
|
|
http_access [...]
|
|
|
|
# These people get in for slow cache access
|
|
http_access allow virtual_slowcache slownets
|
|
http_access deny virtual_slowcache
|
|
|
|
# Access checks for main cache
|
|
http_access [...]
|
|
|
|
# Delay definitions (read config file for clarification)
|
|
delay_pools 2
|
|
delay_initial_bucket_level 50
|
|
|
|
delay_class 1 3
|
|
delay_access 1 allow virtual_slowcache !LOCAL-NET !LOCAL-IP !fast_slow
|
|
delay_access 1 deny all
|
|
delay_parameters 1 8192/131072 1024/65536 256/32768
|
|
|
|
delay_class 2 2
|
|
delay_access 2 allow virtual_slowcache !LOCAL-NET !LOCAL-IP fast_slow
|
|
delay_access 2 deny all
|
|
delay_parameters 2 2048/65536 512/32768
|
|
</verb>
|
|
|
|
<P>
|
|
The same code is also used by a some of departments using class 2 delay
|
|
pools to give them more flexibility in giving different performance to
|
|
different labs or students.
|
|
|
|
<sect2>Where else can I find out about delay pools?
|
|
|
|
<P>
|
|
This is also pretty well documented in the configuration file, with
|
|
examples. Since people seem to loose their config files, here's a copy
|
|
of the relevant section.
|
|
|
|
<verb>
|
|
# DELAY POOL PARAMETERS (all require DELAY_POOLS compilation option)
|
|
# -----------------------------------------------------------------------------
|
|
|
|
# TAG: delay_pools
|
|
# This represents the number of delay pools to be used. For example,
|
|
# if you have one class 2 delay pool and one class 3 delays pool, you
|
|
# have a total of 2 delay pools.
|
|
#
|
|
# To enable this option, you must use --enable-delay-pools with the
|
|
# configure script.
|
|
#delay_pools 0
|
|
|
|
# TAG: delay_class
|
|
# This defines the class of each delay pool. There must be exactly one
|
|
# delay_class line for each delay pool. For example, to define two
|
|
# delay pools, one of class 2 and one of class 3, the settings above
|
|
# and here would be:
|
|
#
|
|
#delay_pools 2 # 2 delay pools
|
|
#delay_class 1 2 # pool 1 is a class 2 pool
|
|
#delay_class 2 3 # pool 2 is a class 3 pool
|
|
#
|
|
# The delay pool classes are:
|
|
#
|
|
# class 1 Everything is limited by a single aggregate
|
|
# bucket.
|
|
#
|
|
# class 2 Everything is limited by a single aggregate
|
|
# bucket as well as an "individual" bucket chosen
|
|
# from bits 25 through 32 of the IP address.
|
|
#
|
|
# class 3 Everything is limited by a single aggregate
|
|
# bucket as well as a "network" bucket chosen
|
|
# from bits 17 through 24 of the IP address and a
|
|
# "individual" bucket chosen from bits 17 through
|
|
# 32 of the IP address.
|
|
#
|
|
# NOTE: If an IP address is a.b.c.d
|
|
# -> bits 25 through 32 are "d"
|
|
# -> bits 17 through 24 are "c"
|
|
# -> bits 17 through 32 are "c * 256 + d"
|
|
|
|
# TAG: delay_access
|
|
# This is used to determine which delay pool a request falls into.
|
|
# The first matched delay pool is always used, ie, if a request falls
|
|
# into delay pool number one, no more delay are checked, otherwise the
|
|
# rest are checked in order of their delay pool number until they have
|
|
# all been checked. For example, if you want some_big_clients in delay
|
|
# pool 1 and lotsa_little_clients in delay pool 2:
|
|
#
|
|
#delay_access 1 allow some_big_clients
|
|
#delay_access 1 deny all
|
|
#delay_access 2 allow lotsa_little_clients
|
|
#delay_access 2 deny all
|
|
|
|
# TAG: delay_parameters
|
|
# This defines the parameters for a delay pool. Each delay pool has
|
|
# a number of "buckets" associated with it, as explained in the
|
|
# description of delay_class. For a class 1 delay pool, the syntax is:
|
|
#
|
|
#delay_parameters pool aggregate
|
|
#
|
|
# For a class 2 delay pool:
|
|
#
|
|
#delay_parameters pool aggregate individual
|
|
#
|
|
# For a class 3 delay pool:
|
|
#
|
|
#delay_parameters pool aggregate network individual
|
|
#
|
|
# The variables here are:
|
|
#
|
|
# pool a pool number - ie, a number between 1 and the
|
|
# number specified in delay_pools as used in
|
|
# delay_class lines.
|
|
#
|
|
# aggregate the "delay parameters" for the aggregate bucket
|
|
# (class 1, 2, 3).
|
|
#
|
|
# individual the "delay parameters" for the individual
|
|
# buckets (class 2, 3).
|
|
#
|
|
# network the "delay parameters" for the network buckets
|
|
# (class 3).
|
|
#
|
|
# A pair of delay parameters is written restore/maximum, where restore is
|
|
# the number of bytes (not bits - modem and network speeds are usually
|
|
# quoted in bits) per second placed into the bucket, and maximum is the
|
|
# maximum number of bytes which can be in the bucket at any time.
|
|
#
|
|
# For example, if delay pool number 1 is a class 2 delay pool as in the
|
|
# above example, and is being used to strictly limit each host to 64kbps
|
|
# (plus overheads), with no overall limit, the line is:
|
|
#
|
|
#delay_parameters 1 -1/-1 8000/8000
|
|
#
|
|
# Note that the figure -1 is used to represent "unlimited".
|
|
#
|
|
# And, if delay pool number 2 is a class 3 delay pool as in the above
|
|
# example, and you want to limit it to a total of 256kbps (strict limit)
|
|
# with each 8-bit network permitted 64kbps (strict limit) and each
|
|
# individual host permitted 4800bps with a bucket maximum size of 64kb
|
|
# to permit a decent web page to be downloaded at a decent speed
|
|
# (if the network is not being limited due to overuse) but slow down
|
|
# large downloads more significantly:
|
|
#
|
|
#delay_parameters 2 32000/32000 8000/8000 600/64000
|
|
#
|
|
# There must be one delay_parameters line for each delay pool.
|
|
|
|
# TAG: delay_initial_bucket_level (percent, 0-100)
|
|
# The initial bucket percentage is used to determine how much is put
|
|
# in each bucket when squid starts, is reconfigured, or first notices
|
|
# a host accessing it (in class 2 and class 3, individual hosts and
|
|
# networks only have buckets associated with them once they have been
|
|
# "seen" by squid).
|
|
#
|
|
#delay_initial_bucket_level 50
|
|
</verb>
|
|
|
|
<sect1>Can I preserve my cache when upgrading from 1.1 to 2?
|
|
<P>
|
|
At the moment we do not have a script which will convert your cache
|
|
contents from the 1.1 to the Squid-2 format. If enough people ask for
|
|
one, then somebody will probably write such a script.
|
|
|
|
<P>
|
|
If you like, you can configure a new Squid-2 cache with your old
|
|
Squid-1.1 cache as a sibling. After a few days, weeks, or
|
|
however long you want to wait, shut down the old Squid cache.
|
|
If you want to force-load your new cache with the objects
|
|
from the old cache, you can try something like this:
|
|
<enum>
|
|
<item>
|
|
Install Squid-2 and configure it to have the same
|
|
amount of disk space as your Squid-1 cache, even
|
|
if there is not currently that much space free.
|
|
<item>
|
|
Configure Squid-2 with Squid-1 as a parent cache.
|
|
You might want to enable <em/never_direct/ on
|
|
the Squid-2 cache so that all of Squid-2's requests
|
|
go through Squid-1.
|
|
<item>
|
|
Enable the <ref id="purging-objects" name="PURGE method"> on Squid-1.
|
|
<item>
|
|
Set the refresh rules on Squid-1 to be very liberal so that it
|
|
does not generate IMS requests for cached objects.
|
|
<item>
|
|
Create a list of all the URLs in the Squid-1 cache. These can
|
|
be extracted from the access.log, store.log and swap logs.
|
|
<item>
|
|
For every URL in the list, request the URL from Squid-2, and then
|
|
immediately send a PURGE request to Squid-1.
|
|
<item>
|
|
Eventually Squid-2 will have all the objects, and Squid-1
|
|
will be empty.
|
|
</enum>
|
|
|
|
|
|
|
|
<sect1>Customizable Error Messages
|
|
<label id="custom-err-msgs">
|
|
<P>
|
|
Squid-2 lets you customize your error messages. The source distribution
|
|
includes error messages in different languages. You can select the
|
|
language with the configure option:
|
|
<verb>
|
|
--enable-err-language=lang
|
|
</verb>
|
|
|
|
<P>
|
|
Furthermore, you can rewrite the error message template files if you like.
|
|
This list describes the tags which Squid will insert into the messages:
|
|
<descrip>
|
|
<tag/%B/ URL with FTP %2f hack
|
|
<tag/%c/ Squid error code
|
|
<tag/%d/ seconds elapsed since request received
|
|
<tag/%e/ errno
|
|
<tag/%E/ strerror()
|
|
<tag/%f/ FTP request line
|
|
<tag/%F/ FTP reply line
|
|
<tag/%g/ FTP server message
|
|
<tag/%h/ cache hostname
|
|
<tag/%H/ server host name
|
|
<tag/%i/ client IP address
|
|
<tag/%I/ server IP address
|
|
<tag/%L/ contents of <em/err_html_text/ config option
|
|
<tag/%M/ Request Method
|
|
<tag/%p/ URL port \#
|
|
<tag/%P/ Protocol
|
|
<tag/%R/ Full HTTP Request
|
|
<tag/%S/ squid signature from ERR_SIGNATURE
|
|
<tag/%s/ caching proxy software with version
|
|
<tag/%t/ local time
|
|
<tag/%T/ UTC
|
|
<tag/%U/ URL without password
|
|
<tag/%u/ URL without password, %2f added to path
|
|
<tag/%w/ cachemgr email address
|
|
<tag/%z/ dns server error message
|
|
</descrip>
|
|
|
|
<sect1>My squid.conf from version 1.1 doesn't work!
|
|
<P>
|
|
Yes, a number of configuration directives have been renamed.
|
|
Here are some of them:
|
|
<descrip>
|
|
<tag/cache_host/
|
|
This is now called <em/cache_peer/. The old term does not
|
|
really describe what you are configuring, but the new name
|
|
tells you that you are configuring a peer for your cache.
|
|
<tag/cache_host_domain/
|
|
Renamed to <em/cache_peer_domain/.
|
|
<tag/local_ip, local_domain/
|
|
The functaionality provided by these directives is now implemented
|
|
as access control lists. You will use the <em/always_direct/ and
|
|
<em/never_direct/ options. The new <em/squid.conf/ file has some
|
|
examples.
|
|
<tag/cache_stoplist/
|
|
This directive also has been reimplemented with access control
|
|
lists. You will use the <em/no_cache/ option. For example:
|
|
<verb>
|
|
acl Uncachable url_regex cgi ?
|
|
no_cache deny Uncachable
|
|
</verb>
|
|
<tag/cache_swap/
|
|
This option used to specify the cache disk size. Now you
|
|
specify the disk size on each <em/cache_dir/ line.
|
|
<tag/cache_host_acl/
|
|
This option has been renamed to <em/cache_peer_access/
|
|
<bf/and/ the syntax has changed. Now this option is a
|
|
true access control list, and you must include an
|
|
<em/allow/ or <em/deny/ keyword. For example:
|
|
<verb>
|
|
acl that-AS dst_as 1241
|
|
cache_peer_access thatcache.thatdomain.net allow that-AS
|
|
cache_peer_access thatcache.thatdomain.net deny all
|
|
</verb>
|
|
This example sends requests to your peer <em/thatcache.thatdomain.net/
|
|
only for origin servers in Autonomous System Number 1241.
|
|
<tag/units/
|
|
In Squid-1.1 many of the configuration options had implied
|
|
units associated with them. For example, the <em/connect_timeout/
|
|
value may have been in seconds, but the <em/read_timeout/ value
|
|
had to be given in minutes. With Squid-2, these directives take
|
|
units after the numbers, and you will get a warning if you
|
|
leave off the units. For example, you should now write:
|
|
<verb>
|
|
connect_timeout 120 seconds
|
|
read_timeout 15 minutes
|
|
</verb>
|
|
</descrip>
|
|
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>httpd-accelerator mode
|
|
|
|
<sect1>What is the httpd-accelerator mode?
|
|
<label id="what-is-httpd-accelerator">
|
|
<P>
|
|
Occasionally people have trouble understanding accelerators and
|
|
proxy caches, usually resulting from mixed up interpretations of
|
|
"incoming" and ``outgoing" data. I think in terms of requests (i.e.,
|
|
an outgoing request is from the local site out to the big bad
|
|
Internet). The data received in reply is incoming, of course.
|
|
Others think in the opposite sense of ``a request for incoming data".
|
|
|
|
<P>
|
|
An accelerator caches incoming requests for outgoing data (i.e.,
|
|
that which you publish to the world). It takes load away from your
|
|
HTTP server and internal network. You move the server away from
|
|
port 80 (or whatever your published port is), and substitute the
|
|
accelerator, which then pulls the HTTP data from the ``real"
|
|
HTTP server (only the accelerator needs to know where the real
|
|
server is). The outside world sees no difference (apart from an
|
|
increase in speed, with luck).
|
|
|
|
<P>
|
|
Quite apart from taking the load of a site's normal web server,
|
|
accelerators can also sit outside firewalls or other network
|
|
bottlenecks and talk to HTTP servers inside, reducing traffic across
|
|
the bottleneck and simplifying the configuration. Two or more
|
|
accelerators communicating via ICP can increase the speed and
|
|
resilience of a web service to any single failure.
|
|
|
|
<P>
|
|
The Squid redirector can make one accelerator act as a single
|
|
front-end for multiple servers. If you need to move parts of your
|
|
filesystem from one server to another, or if separately administered
|
|
HTTP servers should logically appear under a single URL hierarchy,
|
|
the accelerator makes the right thing happen.
|
|
|
|
<P>
|
|
If you wish only to cache the ``rest of the world" to improve local users
|
|
browsing performance, then accelerator mode is irrelevant. Sites which
|
|
own and publish a URL hierarchy use an accelerator to improve other
|
|
sites' access to it. Sites wishing to improve their local users' access
|
|
to other sites' URLs use proxy caches. Many sites, like us, do both and
|
|
hence run both.
|
|
|
|
<P>
|
|
Measurement of the Squid cache and its Harvest counterpart suggest an
|
|
order of magnitude performance improvement over CERN or other widely
|
|
available caching software. This order of magnitude performance
|
|
improvement on hits suggests that the cache can serve as an httpd
|
|
accelerator, a cache configured to act as a site's primary httpd server
|
|
(on port 80), forwarding references that miss to the site's real httpd
|
|
(on port 81).
|
|
|
|
<P>
|
|
In such a configuration, the web administrator renames all
|
|
non-cachable URLs to the httpd's port (81). The cache serves
|
|
references to cachable objects, such as HTML pages and GIFs, and
|
|
the true httpd (on port 81) serves references to non-cachable
|
|
objects, such as queries and cgi-bin programs. If a site's usage
|
|
characteristics tend toward cachable objects, this configuration
|
|
can dramatically reduce the site's web workload.
|
|
|
|
<P>
|
|
Note that it is best not to run a single <em/squid/ process as
|
|
both an httpd-accelerator and a proxy cache, since these two modes
|
|
will have different working sets. You will get better performance
|
|
by running two separate caches on separate machines. However, for
|
|
compatability with how administrators are accustomed to running
|
|
other servers that provide both proxy and Web serving capability
|
|
(eg, CERN), the Squid supports operation as both a proxy and
|
|
an accelerator if you set the <tt/httpd_accel_with_proxy/
|
|
variable to <tt/on/ inside your <em/squid.conf/
|
|
configuration file.
|
|
|
|
<sect1>How do I set it up?
|
|
<P>
|
|
First, you have to tell Squid to listen on port 80 (usually), so set the 'http_port'
|
|
option:
|
|
<verb>
|
|
http_port 80
|
|
</verb>
|
|
<P>
|
|
Next, you need to move your normal HTTP server to another port and/or
|
|
another machine. If you want to run your HTTP server on the same
|
|
machine, then it can not also use port 80 (except see the next FAQ entry
|
|
below). A common choice is port 81. Configure squid as follows:
|
|
<verb>
|
|
httpd_accel_host localhost
|
|
httpd_accel_port 81
|
|
</verb>
|
|
Alternatively, you could move the HTTP server to another machine and leave it
|
|
on port 80:
|
|
<verb>
|
|
httpd_accel_host otherhost.foo.com
|
|
httpd_accel_port 80
|
|
</verb>
|
|
<P>
|
|
You should now be able to start Squid and it will serve requests as a HTTP server.
|
|
|
|
|
|
<P>
|
|
If you are using Squid has an accelerator for a virtual host system, then you
|
|
need to specify
|
|
<verb>
|
|
httpd_accel_host virtual
|
|
</verb>
|
|
|
|
|
|
<P>
|
|
Finally, if you want Squid to also accept <em/proxy/ requests (like it used to
|
|
before you turned it into an accelerator), then you need to enable this option:
|
|
<verb>
|
|
httpd_accel_with_proxy on
|
|
</verb>
|
|
|
|
<sect1>When using an httpd-accelerator, the port number for redirects is wrong
|
|
|
|
<P>
|
|
Yes, this is because you probably moved your real httpd to port 81. When
|
|
your httpd issues a redirect message (e.g. 302 Moved Temporarily), it knows
|
|
it is not running on the standard port (80), so it inserts <em/:81/ in the
|
|
redirected URL. Then, when the client requests the redirected URL, it
|
|
bypasses the accelerator.
|
|
|
|
<P>
|
|
How can you fix this?
|
|
|
|
<P>
|
|
One way is to leave your httpd running on port 80, but bind the httpd
|
|
socket to a <em/specific/ interface, namely the loopback interface.
|
|
With <url url="http://www.apache.org/" name="Apache"> you can do it
|
|
like this in <em/httpd.conf/:
|
|
<verb>
|
|
Port 80
|
|
BindAddress 127.0.0.1
|
|
</verb>
|
|
Then, in your <em/squid.conf/ file, you must specify the loopback address
|
|
as the accelerator:
|
|
<verb>
|
|
httpd_accel_host 127.0.0.1
|
|
httpd_accel_port 80
|
|
</verb>
|
|
|
|
<P>
|
|
Note, you probably also need to add an <em>/etc/hosts</em> entry
|
|
of 127.0.0.1 for your server hostname. Otherwise, Squid may
|
|
get stuck in a forwarding loop.
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Related Software
|
|
|
|
<sect1>Clients
|
|
|
|
<sect2>Wget
|
|
<P>
|
|
<url url="ftp://gnjilux.cc.fer.hr/pub/unix/util/wget/" name="Wget"> is a
|
|
command-line Web client. It supports HTTP and FTP URLs, recursive retrievals, and
|
|
HTTP proxies.
|
|
|
|
<sect2>echoping
|
|
<P>
|
|
If you want to test your Squid cache in batch (from a cron command, for
|
|
instance), you can use the <url
|
|
url="ftp://ftp.internatif.org/pub/unix/echoping/" name="echoping"> program,
|
|
which will tell you (in plain text or via an exit code) if the cache is
|
|
up or not, and will indicate the response times.
|
|
|
|
<sect1>Logfile Analysis
|
|
|
|
<p>
|
|
Rather than maintain the same list in two places, please see the
|
|
<url url="/Scripts/" name="Logfile Analysis Scripts"> page
|
|
on the Web server.
|
|
|
|
<sect1>Configuration Tools
|
|
|
|
<sect2>3Dhierarchy.pl
|
|
<P>
|
|
Kenichi Matsui has a simple perl script which generates a 3D hierarchy map (in VRML) from
|
|
squid.conf.
|
|
<url url="ftp://ftp.nemoto.ecei.tohoku.ac.jp/pub/Net/WWW/VRML/converter/3Dhierarchy.pl" name="3Dhierarchy.pl">.
|
|
|
|
<sect1>Squid add-ons
|
|
|
|
<sect2>transproxy
|
|
<P>
|
|
<url url="http://www.transproxy.nlc.net.au/" name="transproxy">
|
|
is a program used in conjunction with the Linux Transparent Proxy
|
|
networking feature, and ipfwadm, to transparently proxy HTTP and
|
|
other requests. Transproxy is written by <url url="mailto:john@nlc.net.au" name="John Saunders">.
|
|
|
|
<sect2>Iain's redirector package
|
|
<P>
|
|
A <url url="ftp://ftp.sbs.de/pub/www/cache/redirector/redirector.tar.gz" name="redirector package"> from
|
|
<url url="mailto:iain@ecrc.de" name="Iain Lea"> to allow Intranet (restricted) or Internet
|
|
(full) access with URL deny and redirection for sites that are not deemed
|
|
acceptable for a userbase all via a single proxy port.
|
|
|
|
<sect2>Junkbusters
|
|
<P>
|
|
<url url="http://internet.junkbuster.com" name="Junkbusters"> Corp has a
|
|
copyleft privacy-enhancing, ad-blocking proxy server which you can
|
|
use in conjunction with Squid.
|
|
|
|
<sect2>Squirm
|
|
<P>
|
|
<url url="http://www.senet.com.au/squirm/" name="Squirm"> is a configurable, efficient redirector for Squid
|
|
by <url url="mailto:chris@senet.com.au" name="Chris Foote">. Features:
|
|
<itemize>
|
|
<item> Very fast
|
|
<item> Virtually no memory usage
|
|
<item> It can re-read it's config files while running by sending it a HUP signal
|
|
<item> Interactive test mode for checking new configs
|
|
<item> Full regular expression matching and replacement
|
|
<item> Config files for patterns and IP addresses.
|
|
<item> If you mess up the config file, Squirm runs in Dodo Mode so your squid keeps working :-)
|
|
</itemize>
|
|
|
|
<sect2>chpasswd.cgi
|
|
<P>
|
|
<url url="mailto:orso@ineparnet.com.br" name="Pedro L Orso">
|
|
has adapated the Apache's <url url="../../htpasswd/" name="htpasswd"> into a CGI program
|
|
called <url url="http://www.ineparnet.com.br/orso/index.html" name="chpasswd.cgi">.
|
|
|
|
<sect2>jesred
|
|
<P>
|
|
<url url="http://ivs.cs.uni-magdeburg.de/~elkner/webtools/jesred/" name="jesred">
|
|
by <url url="mailto:elkner@wotan.cs.Uni-Magdeburg.DE" name="Jens Elkner">.
|
|
|
|
<sect2>squidGuard
|
|
<P>
|
|
<url url="http://ftp.ost.eltele.no/pub/www/proxy/" name="squidGuard"> is
|
|
a free (GPL), flexible and efficient filter and
|
|
redirector program for squid. It lets you define multiple access
|
|
rules with different restrictions for different user groups on a squid
|
|
cache. squidGuard uses squid standard redirector interface.
|
|
|
|
<sect2>Central Squid Server
|
|
<P>
|
|
The <url url="http://www.senet.com.au/css/" name="Smart Neighbour">
|
|
(or 'Central Squid Server' - CSS) is a cut-down
|
|
version of Squid without HTTP or object caching functionality. The
|
|
CSS deals only with ICP messages. Instead of caching objects, the CSS
|
|
records the availability of objects in each of its neighbour caches.
|
|
Caches that have smart neighbours update each smart neighbour with the
|
|
status of their cache by sending ICP_STORE_NOTIFY/ICP_RELEASE_NOTIFY
|
|
messages upon storing/releasing an object from their cache. The CSS
|
|
maintains an up to date 'object map' recording the availability of
|
|
objects in its neighbouring caches.
|
|
|
|
<sect1>Ident Servers
|
|
<p>
|
|
For
|
|
<url url="http://info.ost.eltele.no/freeware/identd/" name="Windows NT">,
|
|
<url url="http://identd.sourceforge.net/" name="Windows 95/98">,
|
|
and
|
|
<url url="http://www2.lysator.liu.se/~pen/pidentd/" name="Unix">.
|
|
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>DISKD
|
|
|
|
<sect1>What is DISKD?
|
|
<p>
|
|
DISKD refers to some features in Squid-2.4 to improve Disk I/O
|
|
performance. The basic idea is that each <em/cache_dir/ has its
|
|
own <em/diskd/ child process. The diskd process performs all disk
|
|
I/O operations (open, close, read, write, unlink) for the cache_dir.
|
|
Message queues are used to send requests and responses between the
|
|
Squid and diskd processes. Shared memory is used for chunks of
|
|
data to be read and written.
|
|
|
|
<sect1>Does it perform better?
|
|
<p>
|
|
Yes. We benchmarked Squid-2.4 with DISKD at the <url
|
|
url="http://polygraph.ircache.net/Results/bakeoff-2/" name="Second
|
|
IRCache Bake-Off">. The results are also described <url
|
|
url="/Benchmarking/bakeoff-02/" name="here">. At the bakeoff, we
|
|
got 160 req/sec with diskd. Without diskd, we'd have gotten about
|
|
40 req/sec.
|
|
|
|
<sect1>How do I use it?
|
|
<p>
|
|
You need to run Squid version <url url="/Versions/v2/2.4" name="2.4"> or later.
|
|
Your operating system must support message queues, and shared memory.
|
|
<p>
|
|
To configure Squid for DISKD, use the <em/--enable-storeio/ option:
|
|
<verb>
|
|
% ./configure --enable-storeio=diskd,ufs
|
|
</verb>
|
|
|
|
<sect1>FATAL: Unknown cache_dir type 'diskd'
|
|
<p>
|
|
You didn't put <em/diskd/ in the list of storeio modules as described
|
|
above. You need to run <em/configure/ and and recompile Squid.
|
|
|
|
<sect1>If I use DISKD, do I have to wipe out my current cache?
|
|
<p>
|
|
No. Diskd uses the same storage scheme as the standard "UFS"
|
|
type. It only changes how I/O is performed.
|
|
|
|
<sect1>How do I configure message queues?
|
|
<p>
|
|
Most Unix operating systems have message queue support
|
|
by default. One way to check is to see if you have
|
|
an <em/ipcs/ command.
|
|
|
|
<p>
|
|
However, you will likely need to increase the message
|
|
queue parameters for Squid. Message queue implementations
|
|
normally have the following parameters:
|
|
<descrip>
|
|
<tag/MSGMNB/
|
|
Maximum number of bytes in a single queue.
|
|
<tag/MSGMNI/
|
|
Maximum number of message queue identifiers.
|
|
<tag/MSGSEG/
|
|
Maximum number of message segments.
|
|
<tag/MSGMAX/
|
|
Maximum size of a message segment.
|
|
<tag/MSGTQL/
|
|
Maximum number of messages in the whole system.
|
|
</descrip>
|
|
|
|
<p>
|
|
The messages between Squid and diskd are 32 bytes. Thus, MSGMAX
|
|
should be 32 or greater. You may want to set it to a larger
|
|
value, just to be safe.
|
|
|
|
<p>
|
|
We'll have two queues for each <em/cache_dir/ -- one in each direction.
|
|
So, MSGMNI needs to be at least two times the number of <em/cache_dir/'s.
|
|
|
|
<p>
|
|
MSGMNB and MSGTQL affect how many messages can be in the queues
|
|
at one time. I've found that 75 messages per queue is about
|
|
the limit of decent performance. Thus, MSGMNB must be
|
|
at least 75*MSGMAX, and MSGTQL must be at least 75 times
|
|
the number of <em/cache_dir/'s.
|
|
|
|
<sect2>FreeBSD
|
|
<p>
|
|
Your kernel must have
|
|
<verb>
|
|
options SYSVMSG
|
|
</verb>
|
|
|
|
<p>
|
|
You can set the parameters in the kernel as follows. This is just
|
|
an example. Make sure the values are appropriate for your system:
|
|
<verb>
|
|
options MSGMNB=16384 # max # of bytes in a queue
|
|
options MSGMNI=41 # number of message queue identifiers
|
|
options MSGSEG=2049 # number of message segments
|
|
options MSGSSZ=64 # size of a message segment
|
|
options MSGTQL=512 # max messages in system
|
|
</verb>
|
|
|
|
<sect2>Digital Unix
|
|
<p>
|
|
Message queue support seems to be in the kernel
|
|
by default. Setting the options is as follows:
|
|
<verb>
|
|
options MSGMNB="8192" # max # bytes on queue
|
|
options MSGMNI="31" # # of message queue identifiers
|
|
options MSGMAX="2049" # max message size
|
|
options MSGTQL="1024" # # of system message headers
|
|
</verb>
|
|
|
|
<p>
|
|
by <url url="mailto:B.C.Phillips at massey dot ac dot nz" name="Brenden Phillips">
|
|
<p>
|
|
If you have a newer version (DU64), then you can probably use
|
|
<em/sysconfig/ instead. To see what the current IPC settings are run
|
|
<verb>
|
|
# sysconfig -q ipc
|
|
</verb>
|
|
To change them make a file like this called ipc.stanza:
|
|
<verb>
|
|
ipc:
|
|
msg-max = 2049
|
|
msg-mni = 31
|
|
msg-tql = 1024
|
|
msg-mnb = 8192
|
|
</verb>
|
|
then run
|
|
<verb>
|
|
# sysconfigdb -a -f ipc.stanza
|
|
</verb>
|
|
You have to reboot for the change to take effect.
|
|
|
|
|
|
|
|
<sect2>Linux
|
|
<p>
|
|
In my limited browsing on Linux, I didn't see any way to change
|
|
message queue parameters except to modify the include files
|
|
and build a new kernel. On my system, the file
|
|
is <em>/usr/src/linux/include/linux/msg.h</em>.
|
|
|
|
<p>
|
|
Stefan Köpsell reports that if you compile sysctl support
|
|
into your kernel, then you can change the following values:
|
|
<itemize>
|
|
<item>kernel.msgmnb
|
|
<item>kernel.msgmni
|
|
<item>kernel.msgmax
|
|
</itemize>
|
|
|
|
<sect2>Solaris
|
|
<p>
|
|
Refer to <url url="http://www.sunworld.com/sunworldonline/swol-11-1997/swol-11-insidesolaris.html"
|
|
name="Demangling Message Queues"> in Sunworld Magazine.
|
|
|
|
<p>
|
|
I don't think the above article really tells you how to set the parameters.
|
|
You do it in <em>/etc/system</em> with lines like this:
|
|
<verb>
|
|
set msgsys:msginfo_msgmax=2049
|
|
set msgsys:msginfo_msgmnb=8192
|
|
set msgsys:msginfo_msgmni=31
|
|
set msgsys:msginfo_msgssz=64
|
|
set msgsys:msginfo_msgtql=1024
|
|
</verb>
|
|
<p>
|
|
Of course, you must reboot whenever you modify <em>/etc/system</em>
|
|
before changes take effect.
|
|
|
|
<sect1>How do I configure shared memory?
|
|
<p>
|
|
Shared memory uses a set of parameters similar to the ones for message
|
|
queues. The Squid DISKD implementation uses one shared memory area
|
|
for each cache_dir. Each shared memory area is about
|
|
800 kilobytes in size. You may need to modify your system's
|
|
shared memory parameters:
|
|
|
|
<p>
|
|
<descrip>
|
|
<tag/SHMSEG/
|
|
Maximum number of shared memory segments per process.
|
|
<tag/SHMMNI/
|
|
Maximum number of shared memory segments for the whole system.
|
|
<tag/SHMMAX/
|
|
Largest shared memory segment size allowed.
|
|
<tag/SHMALL/
|
|
Total amount of shared memory that can be used.
|
|
</descrip>
|
|
|
|
<p>
|
|
For Squid and DISKD, <em/SHMMNI/ and <em/SHMMNI/ must be greater than
|
|
or equal to the number of <em/cache_dir/'s that you have. <em/SHMMAX/
|
|
must be at least 800 kilobytes. <em/SHMALL/ must be at least
|
|
<em/SHMMAX/ 800 kilobytes multiplied by the number of <em/cache_dir/'s.
|
|
|
|
<sect2>FreeBSD
|
|
<p>
|
|
Your kernel must have
|
|
<verb>
|
|
options SYSVSHM
|
|
</verb>
|
|
|
|
<p>
|
|
You can set the parameters in the kernel as follows. This is just
|
|
an example. Make sure the values are appropriate for your system:
|
|
<verb>
|
|
options SHMSEG=16 # max shared mem id's per process
|
|
options SHMMNI=32 # max shared mem id's per system
|
|
options SHMMAX=2097152 # max shared memory segment size (bytes)
|
|
options SHMALL=4096 # max amount of shared memory (pages)
|
|
</verb>
|
|
|
|
<sect2>Digital Unix
|
|
<p>
|
|
Message queue support seems to be in the kernel
|
|
by default. Setting the options is as follows:
|
|
<verb>
|
|
options SHMSEG="16" # max shared mem id's per process
|
|
options SHMMNI="32" # max shared mem id's per system
|
|
options SHMMAX="2097152" # max shared memory segment size (bytes)
|
|
options SHMALL=4096 # max amount of shared memory (pages)
|
|
</verb>
|
|
|
|
<p>
|
|
by <url url="mailto:B.C.Phillips at massey dot ac dot nz" name="Brenden Phillips">
|
|
<p>
|
|
If you have a newer version (DU64), then you can probably use
|
|
<em/sysconfig/ instead. To see what the current IPC settings are run
|
|
<verb>
|
|
# sysconfig -q ipc
|
|
</verb>
|
|
To change them make a file like this called ipc.stanza:
|
|
<verb>
|
|
ipc:
|
|
shm-seg = 16
|
|
shm-mni = 32
|
|
shm-max = 2097152
|
|
shm-all = 4096
|
|
</verb>
|
|
then run
|
|
<verb>
|
|
# sysconfigdb -a -f ipc.stanza
|
|
</verb>
|
|
You have to reboot for the change to take effect.
|
|
|
|
|
|
<sect2>Linux
|
|
<p>
|
|
In my limited browsing on Linux, I didn't see any way to change
|
|
shared memory parameters except to modify the include files
|
|
and build a new kernel. On my system, the file
|
|
is <em>/usr/src/linux/include/asm-i386/shmparam.h</em>
|
|
|
|
<p>
|
|
Oh, it looks like you can change <em/SHMMAX/ by writing
|
|
the file <em>/proc/sys/kernel/shmmax</em>.
|
|
|
|
<p>
|
|
Stefan Köpsell reports that if you compile sysctl support
|
|
into your kernel, then you can change the following values:
|
|
<itemize>
|
|
<item>kernel.shmall
|
|
<item>kernel.shmmni
|
|
<item>kernel.shmmax
|
|
</itemize>
|
|
|
|
<sect2>Solaris
|
|
|
|
<p>
|
|
Refer to
|
|
<url url="http://www.sunworld.com/swol-09-1997/swol-09-insidesolaris.html"
|
|
name="Shared memory uncovered">
|
|
in Sunworld Magazine.
|
|
|
|
<p>
|
|
To set the values, you can put these lines in <em>/etc/system</em>:
|
|
<verb>
|
|
set shmsys:shminfo_shmmax=2097152
|
|
set shmsys:shminfo_shmmni=32
|
|
set shmsys:shminfo_shmseg=16
|
|
</verb>
|
|
|
|
<sect1>Sometimes shared memory and message queues aren't released when Squid exits.
|
|
<p>
|
|
Yes, this is a little problem sometimes. Seems like the operating system
|
|
gets confused and doesn't always release shared memory and message
|
|
queue resources when processes exit, especially if they exit abnormally.
|
|
To fix it you can ``manually'' clear the resources with the <em/ipcs/ command.
|
|
Add this command into your <em/RunCache/ or <em/squid_start/
|
|
script:
|
|
<verb>
|
|
ipcs | grep '^[mq]' | awk '{printf "ipcrm -%s %s\n", $1, $2}' | /bin/sh
|
|
</verb>
|
|
|
|
<sect1>What are the Q1 and Q2 parameters?
|
|
<p>
|
|
In the source code, these are called <em/magic1/ and <em/magic2/.
|
|
These numbers refer to the number of oustanding requests on a message
|
|
queue. They are specified on the <em/cache_dir/ option line, after
|
|
the L1 and L2 directories:
|
|
<verb>
|
|
cache_dir diskd /cache1 1024 16 256 Q1=72 Q2=64
|
|
</verb>
|
|
<p>
|
|
If there are more than Q1 messages outstanding, then Squid will
|
|
intentionally fail to open disk files for reading and writing.
|
|
This is a load-shedding mechanism. If your cache gets really really
|
|
busy and the disks can not keep up, Squid bypasses the disks until
|
|
the load goes down again.
|
|
<p>
|
|
If there are more than Q2 messages outstanding, then the main Squid
|
|
process ``blocks'' for a little bit until the diskd process services
|
|
some of the messages and sends back some replies.
|
|
<p>
|
|
Q1 should be larger than Q2. You want Squid to get to the
|
|
``blocking'' condition before it gets to the ``refuse to open files''
|
|
condition.
|
|
<p>
|
|
Reasonable values for Q1 and Q2 are 72 and 64, respectively.
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Authentication
|
|
|
|
<sect1>How does Proxy Authentication work in Squid?
|
|
<p>
|
|
<em>Note: The information here is current for version 2.4.</em>
|
|
<p>
|
|
Users will be authenticated if squid is configured to use <em/proxy_auth/
|
|
ACLs (see next question).
|
|
<p>
|
|
Browsers send the user's authentication credentials in the
|
|
<em/Authorization/ request header.
|
|
<p>
|
|
If Squid gets a request and the <em/http_access/ rule list
|
|
gets to a <em/proxy_auth/ ACL, Squid looks for the <em/Authorization/
|
|
header. If the header is present, Squid decodes it and extracts
|
|
a username and password.
|
|
<p>
|
|
If the header is missing, Squid returns
|
|
an HTTP reply with status 407 (Proxy Authentication Required).
|
|
The user agent (browser) receives the 407 reply and then prompts
|
|
the user to enter a name and password. The name and password are
|
|
encoded, and sent in the <em/Authorization/ header for subsequent
|
|
requests to the proxy.
|
|
|
|
<p>
|
|
<em>NOTE</em>: The name and password are encoded using ``base64''
|
|
(See section 11.1 of <url url="ftp://ftp.isi.edu/in-notes/rfc2616.txt"
|
|
name="RFC 2616">). However, base64 is a binary-to-text encoding only,
|
|
it does NOT encrypt the information it encodes. This means that
|
|
the username and password are essentially ``cleartext'' between
|
|
the browser and the proxy. Therefore, you probably should not use
|
|
the same username and password that you would use for your account login.
|
|
|
|
<p>
|
|
Authentication is actually performed outside of main Squid process.
|
|
When Squid starts, it spawns a number of authentication subprocesses.
|
|
These processes read usernames and passwords on stdin, and reply
|
|
with "OK" or "ERR" on stdout. This technique allows you to use
|
|
a number of different authentication schemes, although currently
|
|
you can only use one scheme at a time.
|
|
<p>
|
|
The Squid source code comes with a few authentcation processes.
|
|
These include:
|
|
<itemize>
|
|
<item>
|
|
LDAP: Uses the Lightweight Directory Access Protocol
|
|
<item>
|
|
NCSA: Uses an NCSA-style username and password file.
|
|
<item>
|
|
MSNT: Uses a Windows NT authentication domain.
|
|
<item>
|
|
PAM: Uses the Linux Pluggable Authentication Modules scheme.
|
|
<item>
|
|
SMB: Uses a SMB server like Windows NT or Samba.
|
|
<item>
|
|
getpwam: Uses the old-fashioned Unix password file.
|
|
</itemize>
|
|
|
|
<p>
|
|
In order to authenticate users, you need to compile and install
|
|
one of the supplied authentication modules, one of <url url="http://www.squid-cache.org/related-software.html#auth" name="the others">,
|
|
or supply your own.
|
|
|
|
<p>
|
|
You tell Squid which authentcation program to use with the
|
|
<em/authenticate_program/ option in squid.conf. You specify
|
|
the name of the program, plus any command line options if
|
|
necessary. For example:
|
|
<verb>
|
|
authenticate_program /usr/local/squid/bin/ncsa_auth /usr/local/squid/etc/passwd
|
|
</verb>
|
|
|
|
|
|
<sect1>How do I use authentication in access controls?
|
|
<p>
|
|
Make sure that your authentication program is installed
|
|
and working correctly. You can test it by hand.
|
|
<p>
|
|
Add some <em/proxy_auth/ ACL entries to your squid configuration.
|
|
For example:
|
|
<verb>
|
|
acl foo proxy_auth REQUIRED
|
|
acl all src 0/0
|
|
http_access allow foo
|
|
http_access deny all
|
|
</verb>
|
|
The REQURIED term means that any authenticated user will match the
|
|
ACL named <em/foo/.
|
|
<p>
|
|
Squid allows you to provide fine-grained controls
|
|
by specifying individual user names. For example:
|
|
<verb>
|
|
acl foo proxy_auth REQUIRED
|
|
acl bar proxy_auth lisa sarah frank joe
|
|
acl daytime time 08:00-17:00
|
|
acl all src 0/0
|
|
http_access allow bar
|
|
http_access allow foo daytime
|
|
http_access deny all
|
|
</verb>
|
|
In this example, users named lisa, sarah, joe, and frank
|
|
are allowed to use the proxy at all times. Other users
|
|
are allowed only during daytime hours.
|
|
|
|
<sect1>Does Squid cache authentication lookups?
|
|
<p>
|
|
Yes. Successful authentication lookups are cached for
|
|
one hour by default. That means (in the worst case) its possible
|
|
for someone to keep using your cache up to an hour after he
|
|
has been removed from the authentication database.
|
|
<p>
|
|
You can control the expiration
|
|
time with the <em/authenticate_ttl/ option.
|
|
|
|
|
|
<sect1>Are passwords stored in clear text or encrypted?
|
|
<p>
|
|
Squid stores cleartext passwords in itsmemory cache.
|
|
<p>
|
|
Squid writes cleartext usernames and passwords when talking to
|
|
the external authentication processes. Note, however, that this
|
|
interprocess communication occors over TCP connections bound to
|
|
the loopback interface. Thus, its not possile for processes on
|
|
other comuters to "snoop" on the authentication traffic.
|
|
|
|
<p>
|
|
Each authentication program must select its own scheme for persistent
|
|
storage of passwords and usernames.
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Terms and Definitions
|
|
|
|
<sect1>Neighbor
|
|
|
|
<P>
|
|
In Squid, <em/neighbor/ usually means the same thing as <em/peer/.
|
|
A neighbor cache is one that you have defined with the <em/cache_host/ configuration
|
|
option. Neighbor refers to either a parent or a sibling.
|
|
|
|
<P>
|
|
In Harvest 1.4, neighbor referred to what Squid calls a sibling. That is, Harvest
|
|
had <em/parents/ and <em/neighbors/. For backward compatability, the term
|
|
neighbor is still accepted in some Squid configuration options.
|
|
|
|
<sect1>Regular Expression
|
|
<p>
|
|
Regular expressions are patterns that used for matching sequences
|
|
of characters in text. For more information, see
|
|
<url url="http://jmason.org/software/sitescooper/tao_regexps.html"
|
|
name="A Tao of Regular Expressions"> and
|
|
<url url="http://www.newbie.org/gazette/xxaxx/xprmnt02.html"
|
|
name="Newbie's page">.
|
|
|
|
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
|
|
|
|
<sect>Security Concerns
|
|
|
|
<sect1>Open-access proxies
|
|
<p>
|
|
Squid's default configuration file denies all client requests. It is the
|
|
administrator's responsibility to configure Squid to allow access only
|
|
to trusted hosts and/or users.
|
|
<p>
|
|
If your proxy allows access from untrusted hosts or users, you can be
|
|
sure that people will find and abuse your service. Some people
|
|
will use your proxy to make their browsing anonymous. Others will
|
|
intentionally use your proxy for transactions that may be illegal
|
|
(such as credit card fraud). A number of web sites exist simply
|
|
to provide the world with a list of open-access HTTP proxies. You
|
|
don't want to end up on this list.
|
|
<p>
|
|
Be sure to carefully design your access control scheme. You should
|
|
also check it from time to time to make sure that it works as you
|
|
expect.
|
|
|
|
<sect1>Mail relaying
|
|
<p>
|
|
SMTP and HTTP are rather similar in design. This, unfortunately, may
|
|
allow someone to relay an email message through your HTTP proxy. To
|
|
prevent this, you must make sure that your proxy denies HTTP requests
|
|
to port 25, the SMTP port.
|
|
<p>
|
|
Squid is configured this way by default. The default <em/squid.conf/
|
|
file lists a small number of trusted ports. See the <em/Safe_ports/
|
|
ACL in <em/squid.conf/. Your configuration file should always deny
|
|
unsafe ports early in the <em/http_access/ lists:
|
|
<verb>
|
|
http_access deny !Safe_ports
|
|
(additional http_access lines ...)
|
|
</verb>
|
|
<p>
|
|
Do NOT add port 25 to <em/Safe_ports/ (unless your goal is to end
|
|
up in the <url url="http://mail-abuse.org/rbl/" name="RBL">). You may
|
|
want to make a cron job that regularly verifies that your proxy blocks
|
|
access to port 25.
|
|
|
|
<verb>
|
|
$Id: FAQ.sgml,v 1.3 2004/09/09 12:36:55 cvsdist Exp $
|
|
</verb>
|
|
</article>
|
|
<!-- LocalWords: SSL MSIE Netmanage Chameleon WebSurfer unchecking remotehost
|
|
-->
|
|
<!-- LocalWords: authuser peerstatus peerhost SWAPIN SWAPOUT unparsable
|
|
-->
|