Re: html prefetching from Andres Kroonmaa on 2000-07-14 (squid-dev)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Fri, 14 Jul 2000 15:39:49 +0200

On 1 Jul 2000, at 16:08, Henrik Nordstrom <hno@hem.passagen.se> wrote:

> Robert Collins wrote:
>
> > This would also allow tuning so fast linked browsers (say users
> > on a LAN) don't have prefetching, while users on high latency
> > links do..
>
> Hmm.. shouldn't it be the other way around?
>
> Users with low link latency to the cache are the users who benefit most
> from reducing the latency from the cache and up. For high latency
> clients the request overlap from multiple connections should hide most
> of this latency.

It should be the other way, but this also depends, to some extent.

> The idea behind active prefetching is to reduce the latency of the
> client, to make sure his link are fully utilized. The slower link the
> client has, the less data needs to be prefetched to be able to fill the
> clients connection.

Yes, reduce latency for the client, but I'd view it slightly differently.
Idea behind prefetch would be the goal to avoid causing user waiting for
remote server. All else should be viewed as optimisations to avoid needless
traffic. We want not simply fill the client pipe, but we want to keep it
filled and hide the latencies of connecting to origin servers from client.
We can't do that if we initiate connects as per client request, therefore
we try to do that before client requests.

It is important to realise that we are talking not about raw performance
numbers here, but about subjective perception of enduser. From that
perspective we have different goal. We shift from per-object service to
per-webpage servicing. Without that shift, there is no point in discussing
prefetching at all. access-log entry wouldn't show performance increase
for a single url just because of prefetching.

It is generally agreed that decent page access speed is the case when
enduser does not need to wait more than 5 seconds until he sees the most
part of the web page. Page wait time beyond 5 sec is becoming unacceptable.
5 sec being some high limit after which average human becomes annoyed ;)
We could establish our goal to service any web page withing 5 seconds.

When we have such goal in place, we'll be seeing several situations
that make us fail to achieve this goal. Prefetching is just potentially
efficient way to overcome those problems, and help us make performance
being acceptable more consistently.

> To make this discussion fruitful, concentrate on latency pattern
> reduction rather than prefetching. The type of prefetching discussed
> here is only one very simple way to approach latency reduction.

I disagree on the "simple" part. As has been already pointed out here
by several people, prefetching becomes tricky as soon as we define
that we cannot tolerate any waste of bandwidth. This would force us
to define cases when prefetching can show any benefit, when it simply
cannot, create an algoritm that operates between the two limits and
call it "intelligent prefetching." Prefetching can't solve us all
latency related problems for all scenarios, so the ideal algoritm
has be be self-adapting. Creating such may prove to be not worth the
effort, agreed, but I think its worth discussing merits of prefetching
nevertheless.

I fully agree, that we should concentrate on other means of latency
reduction as well, at the same time or even before, but prefetching
is so much different approach that it deserves a separate view.

We've already realised that "too straightforward" implementation may
cause lots of bad side-effects, waste bandwidth, cause conflicts.
Therefore we should take a closer look at when its worth using, how
it can help, and of course how to make sure it fulfills all our
requirements.

> I guess it should in most cases be enought to, in priority order:
> 1. Make sure there is an idle server connection immediately available to
> be used when the client request for a inlined object is received and the
> object is not in cache.

This measure can bring some benefits, I agree. especially making sure
that DNS is resolved as you said previously, but this is still completely
different approach compared to prefetching. I think we still have to go
through the prefetching idea again, to better understand why this
discussion is around and popped up in the first place.

> 2. Make sure the client makes use of pipelining where possible, to allow
> the proxy to utilize the bandwidth differences without speculating to
> much in prefetching / HTML parsing.
> 3. Tune the proxy to also make use of pipelining

I think that benefits of HTML-pipelining are somewhat overestimated.
I'd say that pipelined, single TCP session over _congested_ links with
notable latency is eventually slower and more prone to starvation than
several concurrent but short TCP sessions.
pipelined TCP sessions have major benefit on high-latency no-congestion
fast links, like SAT, but only in case of quite large transfers.

The reasons for such statements are mainly arising from how routers are
managing congestion, and how TCP is adapting to different situations.

Besides, we shouldn't only look at TCP latency, but rather all possible
latency that there is in the way of request-reply processing. This
includes ICP queries, parent caches, server loads, perhaps remote database
query delays, etc, etc.

We should keep in mind all this to better understand potential merits
of prefetching.

In general, we want to be ready to start flooding reply to the client
as soon as it requests it, and not just _start_ looking around _after_
client requests it.

------------------------
I'm sorry for the following description of TCP, which most of you most
probably know very well, but I thought I'd rather write it down to avoid
confusion about what I mean.

Lets view "ideal" TCP session, with TCP connect, slow start, and all.
Start:
/\/\_/^\_/^^\_/^^^\_/^^^^\_/^^^^^\_/^^^^^^^\_/F\ Done.

SYN goes out, ACK reply, session established, send request, wait, start
receiving reply with increasing speed, until final byte and session close.

TCP stacks, to work well on fast LAN are tuned to increase bandwidth pretty
quickly. Few TCP bulk sessions over limited capacity link will make it
congested, if latency is low. If latency is not low, then peak bandwidth
depends on TCP window sizes and RTT.

During congestion, routers using weighted-fair queueing (most) are keeping
track of sessions passing link and in case they have to drop a packet,
prefer sessions that consume most of bandwidth (have most packets queued)
The effect is that longlived and streaming sessions are throttled back
allowing for slower sessions to proceed (and increase bandwidth).
Eventually, on average all sessions get equal (approx) share of capacity.
Well, thats the theory. In practice, no router is weighting sessions for
their full duration, but based on packets in its queue, which is only
small subset of all active sessions. Therefore, unfair backoffs are not
at all rare, and some longlived tcp session can get several drops in a
row, forcing TCP stacks to backoff more than intended and needed,
introducing long delays and even bring some sesions to a crawl.

/\/\_/^\_/^^\___/^\_/^^\____/^\___/^\__/^\_/^\_/^^\_/^^^\_/^^^\_/^^^^\_/F\ Done.

In this sample I tried to show effect of random drops, compared to the
"ideal" session above. Instead of smoothly gaining speed, drops are forcing
TCP to retry and restart slow-start. Inital connect packets and slow-start
with small congestion windows (cwnd) TCP session have low probability of
experiencing packet drops. When cwnd grows, this probability also grows.

If we now consider sequential fetch of objects, then we could show it
like this:

Y=bytes |
          | |
        | | |
      | | | |
    | | | | |
_|_|_|_|_|_|_ Done.
            X=time
Pic.1

Thats "ideal" case. smoothly increasing bandwidth until final byte.
Over the congested link it'll be more like this:
|
| | | | | |
_|_|__|_|___|__|_|_|__|_|__|_|_____|___|_|_|_|_ Done.

Pic.2

Time taken increases. amount of work is the same.
Now if we do concurrent fetches, then we are looking at several small
parallel transfers, (perhaps not even passing slow-start) occuring at
the same time, adding up into a spike of traffic:

     |
     |
     ||
    |||
    ||||
   |||||
   |||||
__||||||_Done.

Pic.3

In which case same amount of work is compressed in time, freeing rest of
time for others to consume.

-----------------------------

With HTML pipelining, as soon as given TCP session _gets_ starved,
fetching of ALL objects is being delayed.

HTML pipelining is a technique to avoid many control packets, reusing
existing TCP connection, reducing amount of open sockets.
Idea is make data transfers more efficient, and by this way "reduce"
load on congested links. Unfortunately, during heavy congestion this
results in fetch times similar to Pic.2, while enduser, who (lets face
it) doesn't really care about congestion, whishes to see times more like
in Pic.3. Besides, those "bad" control packets are so small, that they
really don't contribute too much to congestion, only to router's CPU usage
(which was an issue long time back, and not too much nowadays)

This technique works well for links with no congestion, but with high
latency and serial fetch patterns. It works well on fat SAT links.

Several other techniques (like persistent connections, avoiding or
aggressive slow-start, large tcp windows, etc.) eventually also work
best with uncongested (SAT) links.

Way back most browsers started to use many concurrent sessions towards
web server and proxy. Currently, all windows based browsers start upto
4 concurrent sessions and are pipelining requests over these sessions.
This is done to reach access pattern in Pic.3. This works well and gives
considerable speedup on average, irrespective of ability or lack of to
support persistent connections or html-pipelining. Assumption is made
that some component objects take longer time to complete than others,
and such multisession pipelining is used to avoid getting stuck behind
some unfortunate object.

It is important to keep in mind, that for any web-page fetched by
browser, every single TCP session can happen according to any of the
scenarios described above. If link is not congested, then Pic.1 applies,
if link is congested, but only for a short period of time, then maybe
only one TCP session experiences scenario in Pic.2 while the others
see Pic.1 case. And together with browser's pipelining over 4 concurrent
TCP sessions, all these tend to approach case in Pic.3.

Thus, we could see also pattern like this:
          |
          |
          |
     | |
    || ||
   ||| ||| |
_||||_|_|||_|__|_|_|_|_ Done.
0...2...4.5.6.....10...12 sec time

combining all three. First session streaming at full speed until end,
next few sessions fetching in parallel, and one unfortunate session
taking long time. I believe this is typical scenario that we usually
perceive as annoying. When there are lots of component objects, things
just get much worse.

Now, suppose we have defined that acceptable web-page total fetchtime
is 5 seconds. Then everything upto sec-5 is ok, and we only want to
deal with time spent from sec-5 up. To complete the page fetch sooner,
we need to start fetching components earlier. All this is irrelevant
if webpage has only 4 components, because then browser initiates all
concurrent fetches and we could only win little time defined by RTT
between client and proxy. It becomes relevant, when webpage has 30 or
more components that all need to be fetched, and browser is able to fetch
only 4 at a time, and especially when all 4 sessions get stuck at some
slow objects. If there is no congestion between proxy and webserver,
and proxy can make use of persistent connections, then it may also
be irrelevant, because TCP bandwidth grows fast and all work gets done
quickly enough. And this most probably happens quite often.
It is also irrelevant, if enduser itself is behind slow or congested
link, much slower that origin server. We simply can't do much about that.

There remain cases when enduser is relatively well-connected, the
origin server is behind high-latency and congested links, and webpage
has very many components. Then starting component fetch without waiting
for the browser asking for them allows us to start sooner and get done
sooner.
For the full benefit of prefetching, correct prediction of whether
enduser will fetch components we prefetch or not becomes crucial.
If we prefetch objects never asked for, we simply waste time and
bandwidth. If we prefetch objects in wrong order, then we fail to
help it, but might make user happy when he finally fetches components
we prefetched.

It should be obvious that we are not talking about some fancy method
of prepopulating cache with objects that we hope users will ask for
next day, but solely about accelerating total time for cache-miss.
And not simply single-object cachemiss, but we set our goal to be
to service full page within acceptable time frame.

To keep fast client from waiting, and given lots of components, we'd
want to:
- parse html objects and extract list of potential components needed
   soon.
- initiate peering hierarchy search on them
- initiate DNS lookups for url servers.
- open upto (clientspeed/serverspeed) concurrent connections to source
   server to fetch components in parallel. Or relax this based on the
   goal to complete the page fetch within defined time (5 sec). Its no
   point in opening 1000 sessions just because client is 1000 times
   faster than remote server.
- keep track of bandwidth and latency to origin server
- keep track of bandwidth to client
- keep track of count of client sessions
- keep track of components of container page and their states.
- if we sense bandwidth reduction to origin server, perhaps open
   additional sessions to it to compensate with parallelism.
- extract not only antient IMG SRC tags, but also determine
   components used with java, as this is becoming ever more popular.
- possibly, for potential cache-hits that need refreshing, initiate
   IMS checks.

We potentially would NOT want to actually prefetch when:
- normal fetch from origin server proceeds at 20%/sec or better
- enduser consumes data at overall speed of 19%/sec or worse
- all components can be fetched at consumer speed within 5 secs
   (meaning at 4 concurrent sessions towards given origin server,
    at about 5%/sec/session or better consuming speed)
- page has too few components so that prefetching won't give
   anything to win.
- we are near resource exhaustion (open sockets, files, CPU time)
- client haven't started fetch of any prefetchable components
   (eg. autoload images off, lynx, java override, etc)
- we can't predict reliably what to prefetch or have failed to predict
   correctly in near past (correctness is below some threshold %).
- opening additional sessions to origin server seems to reduce
   overall bandwith to origin server and increase latency instead
   of helping.
- we "think" client could be another cache server. We should leave
   prefetching to be its problem. or not?
- client is IMS'ing components
- prefetchable objects are "large"
- prefetchable components "look like" dynamic content? (not gifs)
   although if we can reliably determine that such objects will be
   fetched, it is against our goal to delay such fetch until client
   requests it.

Thinking of when we could use prefetching for benefit, I came to
table below:

Server link quality:
latency speed congested | Optimal strategy
   low high N no prepare, no prefetch
   low high Y prepare, no prefetch
   low low N prepare, no prefetch
   low low Y prepare, prefetch some
  high high N prepare, no prefetch
  high high Y prepare, prefetch
  high low N prepare, may prefetch
  high low Y prepare, prefetch aggressively

Client quality, Server quality:
speed congested speed congested | Optimal strategy
  high * high * no prepare, no prefetch
   low * high * no prepare, no prefetch
   low * low N prepare, no prefetch
   low * low Y prepare, prefetch

Probably it would be difficult to decide whether link is congested
in squid, we can only assume that if latency is suddenly much worse
than on average. We'd need to keep additional stats per server, in
netbd_state for eg. besides RTT also request servicing latency,
link speed in bps, and estimated congestion state. That would be
interesting stats of it own, although we could probably get away
without need for all bits.

For cases when both, client and server are slow, prefetching still
could help to reduce total page fetch time:

(^ = fetching, _ = waiting)

squid server-side: ^^^^^\_______/^^^^^\_______/^^^^^\_______
squid client-side: _____/^^^^^^^\_____/^^^^^^^\_____/^^^^^^^\.

squid server-side: ^^^^^\/^^^^^\/^^^^^\___________
squid client-side: _____/^^^^^^^\/^^^^^^^\/^^^^^^^\.

we still have a chance to reduce waiting time.

To wrap it up, it seems to me that prefetching can be used to reduce
enduser wait time in cases where origin server is behind high-latency
congested links. The main goal is to reduce total web-page fetch time
to some acceptable value by means of many concurrent parallel fetches.
To accomplish that we need to predict correctly what enduser browser
will be fetching next to start more concurrent fetches than enduser
browser is starting, and adapting slow origin server to fast client.
In case of uncongested links to origin server, pipelined transfers
and persistent connections may result in better results than with
prefetching in concurrent tcp sessions. It is also hardly useful to
use prefetching towards low-latency fast servers or in cases when
client is substantially slower than remote server.

In any case I think it is useful to get prepared for future requests,
by making sure DNS is ready, ICP resolved, idle sessions to the source
or peers are ready.

------------------------------------
Andres Kroonmaa <andre@online.ee>
Network Development Manager
Delfi Online
Tel: 6501 731, Fax: 6501 708
Pärnu mnt. 158, Tallinn,
11317 Estonia
Received on Fri Jul 14 2000 - 07:42:19 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:32 MST