On 23/06/2008, Adrian Chadd <adrian_at_creative.net.au> wrote:
> On Mon, Jun 23, 2008, Bradley Kite wrote:
> > Hi all,
> >
> > I have a problem that I am currently trying to diagnose. Please
> > correct me if you think this should be on the -users list instead, but
> > I think its a bug and I need further help trying to diagnose it.
>
>
> Ruh roh!
>
>
> > Dropping the squid server out of service on the load balancer to stop
> > actual traffic, and then running "squid -k debug" produces the
> > following messages for lots of different FDs (I presume its for all
> > FD's that squid thinks are active):
>
>
> Ok:
>
>
> > 2008/06/23 10:57:55| comm_call_handlers(): got fd=11314 read_event=19
> > write_event=18 F->read_handler=0x8070890 F->write_handler=(nil)
>
>
> Ok. hm.
>
>
> > 2008/06/23 10:57:55| comm_call_handlers(): Calling read handler on fd=11314
> > 2008/06/23 10:57:55| clientReadRequest: FD 11314: reading request...
> > 2008/06/23 10:57:55| clientReadRequest: FD 11314 closed?
>
>
> So thats from:
>
> debug(33, 5) ("clientReadRequest: FD %d closed?\n", fd);
> F->flags.socket_eof = 1;
> conn->defer.until = squid_curtime + 1;
> conn->defer.n++;
> fd_note_static(fd, "half-closed");
>
>
> > 2008/06/23 10:57:55| cbdataLock: 0x655b16c0
> > 2008/06/23 10:57:55| cbdataValid: 0x655b16c0
> > 2008/06/23 10:57:55| cbdataUnlock: 0x655b16c0
> > 2008/06/23 10:57:55| commSetSelect: FD 11314 type 1
> > 2008/06/23 10:57:55| commSetEvents(fd=11314)
>
>
> I'd modify that debug to include the conn->defer.n value, so you can see what the
> heck is going on. The defer count should grow to a fixed amount before the socket
> is properly closed.
>
> Try setting half_closed_clients to off?
>
> Ah!
>
>
> > I could set "half_closed_clients off", however, even at the start of
> > the decline in file descriptors (ie when there are still file
> > descriptors available) there are problems browsing certain websites,
> > so I think this will just mask a symptom of the problem rather than
> > fix it.
>
>
> Like what?
>
>
> > A simple restart of squid fixes the issue, but only for a while. Our
> > support guys are having to restart squid on various devices about 5-10
> > times a day at the moment in order to try minimise impact to our
> > customers.
> >
> > We are running squid-2.6-STABLE18 on Linux (Centos 5), using the
> > epoll() event notification interface.
> >
> > Any help will be much appreciated as I've run out of things to check.
> > Please let me know if you need any further information.
>
>
> its hard to tell. The sockets are returning EOF (0 byte reads) and should
> be closed. The question is why thats happening..
>
Hi Adrian,
I have some debug for you now:
This server just started showing the symptoms described earlier. When
dropped into debug mode, the defer count is mostly around 6 or 7
thousand - each being 1 second then its relative to the time of the
probem (or the time of the initial request since the problem).
2008/06/24 15:20:48| clientReadRequest: FD 1944 closed? Defer Count: 5974
2008/06/24 15:20:48| clientReadRequest: FD 1942 closed? Defer Count: 6302
2008/06/24 15:20:48| clientReadRequest: FD 1941 closed? Defer Count: 6179
2008/06/24 15:20:48| clientReadRequest: FD 1940 closed? Defer Count: 7005
On this particular occasion the problem has been caught early, so not
that many FD's are in this state.
squidclient mgr:info is below, if that provides any additional clues:
Squid Object Cache: Version 2.6.STABLE18
Start Time: Tue, 24 Jun 2008 03:02:55 GMT
Current Time: Tue, 24 Jun 2008 14:30:21 GMT
Connection information for squid:
Number of clients accessing cache: 5081
Number of HTTP requests received: 5475693
Number of ICP messages received: 0
Number of ICP messages sent: 0
Number of queued ICP replies: 0
Request failure ratio: 0.00
Average HTTP requests per minute since start: 7965.5
Average ICP messages per minute since start: 0.0
Select loop called: 81904070 times, 0.504 ms avg
Cache information for squid:
Request Hit Ratios: 5min: 0.0%, 60min: 0.0%
Byte Hit Ratios: 5min: 0.7%, 60min: 3.1%
Request Memory Hit Ratios: 5min: 0.0%, 60min: 0.0%
Request Disk Hit Ratios: 5min: 0.0%, 60min: 0.0%
Storage Swap size: 191721208 KB
Storage Mem size: 916892 KB
Mean Object Size: 23.52 KB
Requests given to unlinkd: 0
Median Service Times (seconds) 5 min 60 min:
HTTP Requests (All): 0.06286 0.06640
Cache Misses: 0.06286 0.06640
Cache Hits: 0.00000 0.00000
Near Hits: 0.00000 0.00000
Not-Modified Replies: 0.00000 0.00000
DNS Lookups: 0.00094 0.00190
ICP Queries: 0.00000 0.00000
Resource usage for squid:
UP Time: 41245.752 seconds
CPU Time: 5015.237 seconds
CPU Usage: 12.16%
CPU Usage, 5 minute avg: 1.06%
CPU Usage, 60 minute avg: 1.32%
Process Data Segment Size via sbrk(): 2095744 KB
Maximum Resident Size: 0 KB
Page faults with physical i/o: 378705
Memory usage for squid via mallinfo():
Total space in arena: 2095876 KB
Ordinary blocks: 2066961 KB 329163 blks
Small blocks: 0 KB 0 blks
Holding blocks: 60984 KB 6 blks
Free Small blocks: 0 KB
Free Ordinary blocks: 28914 KB
Total in use: -2066359 KB 101%
Total free: 28914 KB 0%
Total size: -2037444 KB
Memory accounted for:
Total accounted: 1677823 KB
memPoolAlloc calls: 768370423
memPoolFree calls: 747816806
File descriptor usage for squid:
Maximum number of file descriptors: 16384
Largest file desc currently in use: 2499
Number of file desc currently in use: 1418
Files queued for open: 0
Available number of file descriptors: 14966
Reserved number of file descriptors: 100
Store Disk files open: 0
IO loop method: epoll
Internal Data Structures:
8182835 StoreEntries
181239 StoreEntries with MemObjects
181238 Hot Object Cache Items
8151855 on-disk objects
In particular, squid thinks it has 1418 file descriptors in use.
netstat shows only a few hundred:
[root_at_elc-url-sqd-13 squid]# netstat -an | grep 3128 | wc -l
287
I am now going to set "half_closed_connections off" and see what debug
can be obtained from that.
In the mean time if you have any other suggestions please let me know.
-- Brad.Received on Tue Jun 24 2008 - 15:08:50 MDT
This archive was generated by hypermail 2.2.0 : Wed Jun 25 2008 - 12:00:08 MDT