On 15/02/11 17:41, Alex Rousskov wrote:
> Hello,
>
> The attached patch contains some of the Store changes we made while
> working on adding SMP-shared support to cache_dirs. These changes were
> made while chasing and fixing a particular set of bugs introduced and
> exposed by shared Rock Store cache_dirs.
>
> I wish the changes were better isolated but many are inter-dependent and
> most were discovered/developed at the same time while trying to make the
> code work. Fully isolating all of them will not be possible and
> isolating many will take a long time.
They look like a small enough set to go in as a group by themselves
after they have been tested.
<snip>
>
> --------- change log ---------------
>
> Changes revolving around making Store work with SMP-shared max-size
> cache_dirs:
>
> * Added MemObject::expectedReplySize() and used it instead of object_sz.
>
> When deciding whether an object with a known content length can be
> swapped out, do not wait until the object is completely received and its
> size (mem_obj->object_sz) becomes known (while asking the store to
> recheck in vain with every incoming chunk). Instead, use the known
> content length, if any, to make the decision.
>
> This optimizes the common case where the complete object is eventually
> received and swapped out, preventing accumulating potentially large
> objects in RAM while waiting for the end of the response. Should not
> affect objects with unknown content length.
Hooray! Thank you.
>
> Side-effect1: probably fixes several cases of unknowingly using negative
> (unknown) mem_obj->object_sz in calculations. I added a few assertions
> to double check some of the remaining object_sz/objectLen() uses.
>
> Side-effect2: When expectedReplySize() is stored on disk as StoreEntry
> metadata, it may help to detect truncated entries when the writer
> process dies before completing the swapout.
>
Indeed.
>
> * Removed mem->swapout.memnode in favor of mem->swapout.queue_offset.
>
> The code used swapout.memnode pointer to keep track of the last page
> that was swapped out. The code was semi-buggy because it could reset the
> pointer to NULL if no new data came in before the call to doPages().
> Perhaps the code relied on the assumption that the caller will never
> doPages if there is no new data, but I am not sure that assumption was
> correct in all cases (it could be that I broke the calling code, of course).
>
> Moreover, the page pointer was kept without any protection from page
> disappearing during asynchronous swapout. There were "Evil hack time"
> comments discussing how the page might disappear.
>
> Fortunately, we already have mem->swapout.queue_offset that can be fed
> to getBlockContainingLocation to find the page that needs to be swapped
> out. There is no need to keep the page pointer around. The
> queue_offset-based math is the same so we are not adding any overheads
> by using that offset (in fact, we are removing some minor computations).
>
Excellent.
>
> * Added "close how?" parameter to storeClose() and friends.
>
> The old code would follow the same path when closing swapout activity
> for an aborted entry and when completing a perfectly healthy swapout. In
> non-shared case, that could have been OK because the abort code would
> then release the entry, removing any half-written entry from the index
> and the disk (but I am not sure that release happened fast enough in
> 100% of cases).
I think it did not. Debian has bugs open against apt doing parallel
downloads and/or fast recovery repeats producing a corrupt/truncated
object. Other fixes have reduced the problem greatly but it still pops
up on (now rare) occasions.
<snip>
>
> * Moved "can you store this entry?" code to virtual SwapDir::canStore().
>
> The old code had some of the tests in SwapDir-specific canStore()
> methods and some in storeDirSelect*() methods. This resulted in
> inconsistencies, code duplication, and extra calculation overheads.
> Making this call virtual allows individual cache_dir types to do custom
> access controls.
Interesting potentials there. Looks good.
>
> The same method is used for cache_dir load reporting (if it returns
> true). Load management needs more work, but the current code is no worse
> than the old one in this aspect, and further improvements are outside
> this change scope.
In UFSSwapDir::canStore I think load needs to be set regardless of the
shedLoad() results. If I understand it right that would help balance
things better when they are all overloaded.
>
> * Minimized from-disk StoreEntry loading/unpacking code duplication.
>
> Moved common (and often rather complex!) code from store modules into
> storeRebuildLoadEntry, storeRebuildParseEntry, and storeRebuildKeepEntry.
>
I'm a little suspicious of the removal from
StoreEntry::getSerialisedMetaData and its effect on storeSwapMetaBuild.
IIRC you had reasons for adding it earlier that involved
storeSwapMetaBuild using an invalid value in one or more cases. I hope
those cases are resolved now?
<snip>
> * When swapout initiation fails, release StoreEntry. This prevents the
> caller code from trying to swap out again and again because swap_status
> becomes SWAPOUT_NONE.
>
> TODO: Consider add SWAPOUT_ERROR, STORE_ERROR, and similar states. It
> may solve several problems where the code sees _NONE or _OK and thinks
> everything is peachy when in fact there was an error.
>
in StoreEntry::swapoutPossible the logic tests of (currentEnd >
store_maxobjsize) is broken. To reach it we already have checked
(expectedEnd > 0 and expectedEnd < store_maxobjsize) so if (expectedEnd
< currentEnd) there is a bug somewhere.
>
> * Always call StoreEntry::abort() instead of setting ENTRY_ABORTED manually.
>
? I see no changes implementing that statement.
<snip>
>
> * Added operator<< for dumping StoreEntry summary into the debugging
> log. Needs more work to report more info (and not report yet-unknown info).
? I see no changes implementing that statement.
in storeSwapOutFileClosed the assert commented "we checked that above"
needs to be dropped. We are trying to remove useless asserts remember?
The bits snipped look okay to me, though I don't know store enough yet
to find any subtle logic bugs.
Amos
-- Please be using Current Stable Squid 2.7.STABLE9 or 3.1.11 Beta testers wanted for 3.2.0.5Received on Tue Feb 15 2011 - 08:24:49 MST
This archive was generated by hypermail 2.2.0 : Tue Mar 01 2011 - 12:00:14 MST