Re: Squid, events, and file descriptors from Bradley Kite on 2008-06-24 (squid-dev)

From: Bradley Kite <bradley.kite_at_gmail.com>
Date: Tue, 24 Jun 2008 15:39:41 +0100

On 23/06/2008, Adrian Chadd <adrian_at_creative.net.au> wrote:
> On Mon, Jun 23, 2008, Bradley Kite wrote:
> > Hi all,
> >
> > I have a problem that I am currently trying to diagnose. Please
> > correct me if you think this should be on the -users list instead, but
> > I think its a bug and I need further help trying to diagnose it.
>
>
> Ruh roh!
>
>
> > Dropping the squid server out of service on the load balancer to stop
> > actual traffic, and then running "squid -k debug" produces the
> > following messages for lots of different FDs (I presume its for all
> > FD's that squid thinks are active):
>
>
> Ok:
>
>
> > 2008/06/23 10:57:55| comm_call_handlers(): got fd=11314 read_event=19
> > write_event=18 F->read_handler=0x8070890 F->write_handler=(nil)
>
>
> Ok. hm.
>
>
> > 2008/06/23 10:57:55| comm_call_handlers(): Calling read handler on fd=11314
> > 2008/06/23 10:57:55| clientReadRequest: FD 11314: reading request...
> > 2008/06/23 10:57:55| clientReadRequest: FD 11314 closed?
>
>
> So thats from:
>
> debug(33, 5) ("clientReadRequest: FD %d closed?\n", fd);
> F->flags.socket_eof = 1;
> conn->defer.until = squid_curtime + 1;
> conn->defer.n++;
> fd_note_static(fd, "half-closed");
>
>
> > 2008/06/23 10:57:55| cbdataLock: 0x655b16c0
> > 2008/06/23 10:57:55| cbdataValid: 0x655b16c0
> > 2008/06/23 10:57:55| cbdataUnlock: 0x655b16c0
> > 2008/06/23 10:57:55| commSetSelect: FD 11314 type 1
> > 2008/06/23 10:57:55| commSetEvents(fd=11314)
>
>
> I'd modify that debug to include the conn->defer.n value, so you can see what the
> heck is going on. The defer count should grow to a fixed amount before the socket
> is properly closed.
>
> Try setting half_closed_clients to off?
>
> Ah!
>
>
> > I could set "half_closed_clients off", however, even at the start of
> > the decline in file descriptors (ie when there are still file
> > descriptors available) there are problems browsing certain websites,
> > so I think this will just mask a symptom of the problem rather than
> > fix it.
>
>
> Like what?
>
>
> > A simple restart of squid fixes the issue, but only for a while. Our
> > support guys are having to restart squid on various devices about 5-10
> > times a day at the moment in order to try minimise impact to our
> > customers.
> >
> > We are running squid-2.6-STABLE18 on Linux (Centos 5), using the
> > epoll() event notification interface.
> >
> > Any help will be much appreciated as I've run out of things to check.
> > Please let me know if you need any further information.
>
>
> its hard to tell. The sockets are returning EOF (0 byte reads) and should
> be closed. The question is why thats happening..
>

Hi Adrian,

I have some debug for you now:

This server just started showing the symptoms described earlier. When
dropped into debug mode, the defer count is mostly around 6 or 7
thousand - each being 1 second then its relative to the time of the
probem (or the time of the initial request since the problem).

2008/06/24 15:20:48| clientReadRequest: FD 1944 closed? Defer Count: 5974
2008/06/24 15:20:48| clientReadRequest: FD 1942 closed? Defer Count: 6302
2008/06/24 15:20:48| clientReadRequest: FD 1941 closed? Defer Count: 6179
2008/06/24 15:20:48| clientReadRequest: FD 1940 closed? Defer Count: 7005

On this particular occasion the problem has been caught early, so not
that many FD's are in this state.

squidclient mgr:info is below, if that provides any additional clues:

Squid Object Cache: Version 2.6.STABLE18
Start Time: Tue, 24 Jun 2008 03:02:55 GMT
Current Time: Tue, 24 Jun 2008 14:30:21 GMT
Connection information for squid:
        Number of clients accessing cache: 5081
        Number of HTTP requests received: 5475693
        Number of ICP messages received: 0
        Number of ICP messages sent: 0
        Number of queued ICP replies: 0
        Request failure ratio: 0.00
        Average HTTP requests per minute since start: 7965.5
        Average ICP messages per minute since start: 0.0
        Select loop called: 81904070 times, 0.504 ms avg
Cache information for squid:
        Request Hit Ratios: 5min: 0.0%, 60min: 0.0%
        Byte Hit Ratios: 5min: 0.7%, 60min: 3.1%
        Request Memory Hit Ratios: 5min: 0.0%, 60min: 0.0%
        Request Disk Hit Ratios: 5min: 0.0%, 60min: 0.0%
        Storage Swap size: 191721208 KB
        Storage Mem size: 916892 KB
        Mean Object Size: 23.52 KB
        Requests given to unlinkd: 0
Median Service Times (seconds) 5 min 60 min:
        HTTP Requests (All): 0.06286 0.06640
        Cache Misses: 0.06286 0.06640
        Cache Hits: 0.00000 0.00000
        Near Hits: 0.00000 0.00000
        Not-Modified Replies: 0.00000 0.00000
        DNS Lookups: 0.00094 0.00190
        ICP Queries: 0.00000 0.00000
Resource usage for squid:
        UP Time: 41245.752 seconds
        CPU Time: 5015.237 seconds
        CPU Usage: 12.16%
        CPU Usage, 5 minute avg: 1.06%
        CPU Usage, 60 minute avg: 1.32%
        Process Data Segment Size via sbrk(): 2095744 KB
        Maximum Resident Size: 0 KB
        Page faults with physical i/o: 378705
Memory usage for squid via mallinfo():
        Total space in arena: 2095876 KB
        Ordinary blocks: 2066961 KB 329163 blks
        Small blocks: 0 KB 0 blks
        Holding blocks: 60984 KB 6 blks
        Free Small blocks: 0 KB
        Free Ordinary blocks: 28914 KB
        Total in use: -2066359 KB 101%
        Total free: 28914 KB 0%
        Total size: -2037444 KB
Memory accounted for:
        Total accounted: 1677823 KB
        memPoolAlloc calls: 768370423
        memPoolFree calls: 747816806
File descriptor usage for squid:
        Maximum number of file descriptors: 16384
        Largest file desc currently in use: 2499
        Number of file desc currently in use: 1418
        Files queued for open: 0
        Available number of file descriptors: 14966
        Reserved number of file descriptors: 100
        Store Disk files open: 0
        IO loop method: epoll
Internal Data Structures:
        8182835 StoreEntries
        181239 StoreEntries with MemObjects
        181238 Hot Object Cache Items
        8151855 on-disk objects

In particular, squid thinks it has 1418 file descriptors in use.
netstat shows only a few hundred:

[root_at_elc-url-sqd-13 squid]# netstat -an | grep 3128 | wc -l
287

I am now going to set "half_closed_connections off" and see what debug
can be obtained from that.

In the mean time if you have any other suggestions please let me know.

--
Brad.

Received on Tue Jun 24 2008 - 15:08:50 MDT

This archive was generated by hypermail 2.2.0 : Wed Jun 25 2008 - 12:00:08 MDT