Re: Squid 2.x performance curioisities

From: Dancer <dancer@dont-contact.us>
Date: Tue, 27 Apr 1999 08:15:36 +1000

Hmm. Thanks for your words. The target filesystem in this case is EXT2,
and there are six cache-disks. The entire disk-subsystem (including
cache) never exceeds 400-600 blocks of throughput per second (about half
in, and half out, at peak)...Physically, the disks spend most of their
operational time each second idle.

Whatever is causing the problem is throttling us to the tune of maybe 40
requests/second (or more). Despite not being (apparently) disk bound,
I'm tempted to try async-io and see what happens.

D

Andres Kroonmaa wrote:
>
> On 26 Apr 99, at 8:34, Dancer <dancer@zeor.simegen.com> wrote:
> > (When replying, make sure I get a copy. The list removed me from
> > squid-dev back a while ago, when I moved states and email addresses)
>
> > Checking the access logs reveals connection durations after the critical
> > mark climbed rapidly to 50,000 ms and more, and are still being
> > processed, completed and logged for a minute or two after the client has
> > ceased (check the server...still serving, yes).
>
> > Run test at 100/sec. Runs for some time. Symptoms repeat. I'm quietly
> > watching 'vmstat 5' this whole time, and watching the disk buffers
> > filling (490MB now..squid's cache_mem is set at 8MB just as a reference
>
>
> > Now:
> >
> > ASSUMPTION: Squid's lookup time for a given object does not grow
> > significantly as the number of objects increases. (It'll grow, but not
> > hugely)
>
> Agree.
>
> > GUESS: The slowdown is the operating system looking through a
> > half-gigabyte of cache-blocks to find objects, and actually managing to
> > exceed the time it would take to fetch the object from the physical
> > disk.
>
> Agree that is is more OS issue than squid issue.
>
> But I don't think it's to do with cache-block search thats bogging
> you. cache-blocks are supposedly handled with VM system in round-robin,
> flushing/freeing dirty blocks and allocating new for buffering.
> It all should be fairly fast, as long as there is no need to wait
> for disk io and there is enought of kernel internal resources.
>
> There might be some notion of max pageout rate that is limiting
> disk io rate. I'm not a Linux guy, but on Solaris, they intendedly
> limit max pageout rate with kernel variable, not to saturate disks.
>
> So, if your request rate exceeds max pageout rate, things add up
> until there is no cache free and disk trashing starts.
>
> > Requesting clue-transfer,
>
> On most UNIX'es there is a notion of file (page)cache, and _separate_
> MetaData buffer cache. File cache is allowd to fill full unused ram,
> but buffer cache is used for doing UFS metadata io, is limited in max
> size, and sometimes fixed in kernel at compiletime.
> This cache is where file and free space allocation, file open, close,
> unlink is happening. Usually, metadata io is not write-cached but
> syncronous. This is what limits number of file creations per second
> at disk performance levels.
> If you have async meta updates, then the size of this cache determines
> for how long OS can sustain delaying this disk io, after which things
> get back slow.
>
> I don't think UFS itself is direct culprit of what is seen, but
> indirectly it might be. It depends on how OS is handling multiple
> UFS cyl groups and how it is allocating its MD cache for these.
> I'd imagine, that as long as your test is run withing a single CG,
> all needed metadata might fit into buffer cache and things might
> look much faster, but after critical number of CG's comes into play,
> sharp change is speed might occur. There are lots of OS tunables to
> tweak for high-end squid-cache operation.
>
> Without these, general OS is expected to service some 20-100 users
> at a time, which is not what you're after. Look at inode cache,
> directory name lookup cache, metadata buffer cache.
> Some parts of UFS caching are kept locked for some time after
> use, not to allow fast reuse of them and to increase probability
> of hit on them. This might lock all available slots and cause
> severe delays in UFS metadata operations, without actual disk io,
> with just waiting for timeouts.
> Then, you can have simply hit a bug in your OS...
>
> Of course, all this is rant and speculations. You should look
> deeper into what is going on to know better.
>
> Arjan de Vet some time ago wrote a patch for Squid to measure
> time taken by OS to service certain system calls and report these
> into debug logs. Very useful for this kind of stuff.
>
> You should look closer on what happens with system service times
> for calls like open/read/write/close/unlink, for disk and network
> separately. Then you can have much better hint on what to look at
> further.
>
> You might see diskfile open/close(/unlink) call times to skyrocket
> when you see the trouble scenario.
>
> ----------------------------------------------------------------------
> Andres Kroonmaa mail: andre@online.ee
> Senior Network Engineer
> Organization: MicroLink Online Tel: 6308 909
> Tallinn, Sakala 19 Pho: +372 6308 909
> Estonia, EE0001 http://www.online.ee Fax: +372 6308 901
> ----------------------------------------------------------------------
Received on Tue Jul 29 2003 - 13:15:57 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:06 MST