Re: memory-mapped files in Squid from Andres Kroonmaa on 1999-01-28 (squid-dev)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Thu, 28 Jan 1999 18:19:57 +0300 (EETDST)

On 26 Jan 99, at 10:16, Alex Rousskov <rousskov@nlanr.net> wrote:

> On Tue, 26 Jan 1999, Andres Kroonmaa wrote:
>
> > If you know how to solve the problem of noncontinous free space on a
> > system that has random expiration of objects and without sacrifying
> > some >20% of disk space for that, then its way cool. I'm all ears.
> > I just can't think of a trivial way for that, and thats what I admit.
>
> The trivial way is a "cyclic" approach. You always write at the current
> pointer, incrementing it after write completes. You do not care if the
> objects you overwrite expire or not. You do not have any garbage
> collection. Just one huge FIFO disk-resident queue. Writes are always
> continuous. Inter-object reads for small objects are continues.
> Intra-object reads are random (but can be optimized with Scan or whatnot).

Yes, of course, mentioned that. But this would be too simplistic.
It is hard to believe that fifo as Replacement Policy could be very
effective in both reducing network traffic and disk utilisation.

Sliding pointer of fifo leaves behind whatever it gets from source,
including fast-expiring objects. We can consider expired objects as
dead or deleted objects, because when (if ever) they are referenced,
they are probably overwritten by refresh.

Of 100% fresh objects that we fetch, some part will survive a cycle.
Some part is expired long before. But just this part that survives is
what increases squid' hit/miss ratio, and we want to keep it. The
other part is a waste, and we'd want to replace it first.

There are basically 2 strong desires that compete with speed and
simplicity:
1) we want to have our disks as full as possible. 90% at least.
2) we want that these disks are full of _useful_ stuff, not waste.

So, we have to look at allocation algoritms that integrates both
1) intelligent replacement policy
2) and does not suffer speed loss at high disk utilisation.

If we just overwrite useful stuff on every pass, whats left behind?
Perhaps some 40% of waste. Having effective use of 60% of available
disk space seems very expensive. Performance win on disks can easily
transform into performance loss on network.

This basically means that we do NOT want to blindly overwrite objects
that could be useful and longterm. And this leads us to unavoidable
need to handle fragmented free space by either resorting to some
randomness in free space reuse or some sort of free space compression
and data reorganisation. Both add overhead and there must be found
an acceptable compromise between speed-simplicity and intelligence+
effectiveness.

btw,
this cyclic idea reminded me log-structured FS alot, and I went on
digging some papers on that. Of course I realise that squid has much
more relaxed requirements on preserving ondisk objects than real FS,
but still, there are lots of similar problems to solve, like batching
writes, optimising reads and plugging or reordering holes.

Henrik, there's a work done that seems extremely related to what
you thinking about. You should read that, it may be useful for your
work.
http://now.cs.berkeley.edu/Papers2/Abstracts/sosp97.html

> A non trivial (smart) part is to spend minimal efforts in maintaining
> metadata and preserving "popular" objects. The latter may be needed to get
> decent hit ratio. IMO, before coding any smartness into Squid, one has to
> test it using simple trace-driven simulations because it is not obvious
> how many/what objects have to be specially preserved on any decent size
> FIFO queue.

right.

----------------------------------------------------------------------
  Andres Kroonmaa mail: andre@online.ee
  Network Manager
  Organization: MicroLink Online Tel: 6308 909
  Tallinn, Sakala 19 Pho: +372 6308 909
  Estonia, EE0001 http://www.online.ee Fax: +372 6308 901
----------------------------------------------------------------------
Received on Tue Jul 29 2003 - 13:15:56 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:02 MST