Re: thoughts on memory usage... from Andres Kroonmaa on 1997-08-20 (squid-dev)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Wed, 20 Aug 1997 11:03:30 +0200 (EETDST)

--MimeMultipartBoundary
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT

On 19 Aug 97 at 23:00, squid-dev@nlanr.net wrote:

> From: Duane Wessels <wessels@nlanr.net>

> Its probably not necessary to compress StoreEntry->key. That should
> only be different from StoreEntry->url while a request is in progress.
>
> Initially I was going to suggest that we always leave StoreEntry->url
> compressed and change every reference of entry->url with
> DECODE(entry->url). But then things become complicated if you ever
> need to do:
>
> foo(.., DECODE(e1->url), DECODE(e2->url), ...)

do we ever need to?

> because you can't just decode into a static array (i.e. like
> inet_ntoa() does).

I've been thinking some time ago about the possibility to exclude
URL strings from RAM altogether. But I'm at all not sure if it is
possible, still, I'd share my thoughts and I'd like your comments:

    URL string is needed only for logging, and finding actual source, thus
only while request is serviced. I don't seem to find any other uses for
keeping actual URL string in ram. Squid is request driven, and it doesn't
care very much of what is in its cache for other times.
    For URL search all we need is a uniq identifier that can be calculated
from any given URL. thus we'd need an alogritm that gives always a uniq
(hash) id from any possible URL. It could be 64 bits, or whatever is
possible to make it uniq enough. This algoritm could well be
non-reversible.
    Then, for logging purposes we'd need to carry request URL along for
all the service time, but this is not very much a RAM eater and happens
anyway.
    Upon request we'd calc a uniq hash id from URL and make a lookup.
HIT/MISS doesn't change anything in squid operation, no need to uncompress
URL at any stage. swaplog could also contain only hash id of URL (or both).
    I don't know if it is possible to calc a uniq id from any url in such a
way that no two different URL's would yield the same id, but I believe
that "collisions" could be made extremely rare.
    ICP could use these cryptic ID's to ask for hits from peering caches,
(if they have negotiated to use the same algoritm), reducing ICP traffic
and remote cpu usage.
    As no place on disk would contain actual URL for which the id was
made, it could be very difficult to change algoritm if the need arises.
Also it would be hard to detect when collisions occur. To doublecheck,
I'd suggest to prepend URL to any object on disk. Then, when servicing
object, it would be easy to strip URL and check with actual request URL.
In addition, saving URL's with objects gives a way to rebuild all store
data from files spread on disks in case swaplog gets trashed or corrupted.
    Then, swaplog could contain both the ID and URL to help detect errors
on startup, but I still don't think squid needs to keep URLs in ram while
running.
    And last, this ID calc routine could use reversable algoritm also,
although I think it would have much less compression ratio.

In conclusion, if this idea is worth anything, squid RAM usage could
drop from average 100 bytes per URL to 6-10, giving more ram and speeding
up lookups.

I must be missing something here...?

best regards,

----------------------------------------------------------------------
  Andres Kroonmaa mail: andre@online.ee
  Network Manager
  Organization: MicroLink Online Tel: 6308 909
  Tallinn, Sakala 19 Pho: +372 6308 909
  Estonia, EE0001 http://www.online.ee Fax: +372 6308 901
----------------------------------------------------------------------

--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:42 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:23 MST