<http://www.squid-cache.org/bugs/show_bug.cgi?id=2376>
When a peer goes down and then comes back, its round-robin counters
aren't
reset, causing it to get a disproportionate amount of traffic until it
"catches
up" with the rest of the peers in the round-robin pool.
If it was down for load-related issues, this has the effect of making
it more
likely that it will go down again, because it's temporarily handling
the load
of the entire pool.
Normally, this isn't a concern, because the number of requests that it
can get
out-of-step is relatively small (bounded to how many requests it can
be given
before it is considered down -- is this 10 in all cases, or are there
corner
cases?), but in an accelerator case where the origin has a process-based
request-handling model, or back-end processes are CPU-intensive, it is.
It looks like the way to fix this is to call peerClearRR from
neighborAlive in
neighbors.c. However, that just clears one peer - it's necessary to
clear *all*
peers simultaneously.
Therefore, I sugest:
1) calling peerClearRR from neighborAlive
2) changing the semantics of peerClearRR to clear all neighbours at
once, and
change how it's called appropriately.
-- Mark Nottingham mnot_at_yahoo-inc.comReceived on Fri Jun 06 2008 - 01:46:36 MDT
This archive was generated by hypermail 2.2.0 : Fri Jun 06 2008 - 12:00:04 MDT