Re: [MERGE] Rework urlAbsolute to be a little more streamlined. from Alex Rousskov on 2008-09-01 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 01 Sep 2008 15:50:41 -0600

On Mon, 2008-09-01 at 12:42 +1000, Benno Rice wrote:
> Resubmit this patch, including changes based on comments by various people.
>
> - Mention RFC text in relation to changing the default behaviour in relation
> to unknown HTTP methods.
> - Use safe_free instead of xfree.
> - Rework urlAbsolute to use snprintf in a slightly better way. Snprintf is now
> used to construct the initial portion of the url and the rest is added on
> using POSIX string routines.

> + const char *url, *absUrl;
> +
> + if ((url = rep->header.getStr(hdr)) != NULL) {

The above is better written as

if (const char *url = rep->header.getStr(hdr)) {

It would also be better to name this URL "hdrUrl". You already have
"reqUrl" and "absUrl" and it is difficult to keep track.

> + const char *url, *absUrl;
...
> + if (absUrl != NULL) {
> + url = absUrl;
> + }
> + if (absUrl != NULL) { // if the URL was relative, it is by nature the same host
> + purgeEntriesByUrl(url);
> + } else if (sameUrlHosts(reqUrl, url)) { // prevent purging DoS, per RFC 2616 13.10, second last paragraph
> + purgeEntriesByUrl(url);
> + }
> + if (absUrl != NULL) {
> + safe_free(absUrl);
> + }

The above looks strange but since urlAbsolute is still not documented, I
may be misinterpreting this code. I would expect something like the code
below, which uses cleanup names:

  if (const char *absUrl = urlRelativeToAbsolute(req, hdrUrl)) {
        // hdrUrl was relative so absUrl points to the request host
        purgeEntriesByUrl(absUrl);
        safe_free(absUrl);
  } else
  if (sameUrlHosts(reqUrl, url)) { // prevent purging DoS; RFC 2616 13.10
        purgeEntriesByUrl(url);
  }

If you want to make it even more clear, split urlAbsolute in two:
urlIsRelative and urlMakeAbsolute, arriving at something like

  if (isRelative(hdrUrl)) {
        const char *absUrl = urlMakeAbsolute(req, hdrUrl);
        // hdrUrl was relative so absUrl points to the request host
        purgeEntriesByUrl(absUrl);
        safe_free(absUrl);
  } else
  if (sameUrlHosts(reqUrl, url)) { // prevent purging DoS: RFC 2616 13.10
        purgeEntriesByUrl(url);
  }

The above are polishing comments.

I would still insist that urlAbsolute() is documented as its
complicated, multi-purpose interface is impossible to guess by its name.
I think it returns NULL when the URL we feed is already absolute, but I
am not sure. If my understanding if the intent is correct, then
splitting urlAbsolute in two functions as shown above would be
preferred.

> + if (strchr(relUrl, ':') != NULL) {
> + return (NULL);
> + }

According to RFC 2396, Section 3.3 "Path Component", a URL path may
include a colon (':') character. This would make the above check wrong.
Did I misread the RFC? Are colons banned from URL paths?

I found this call in urlCanonical():

> if (request->protocol == PROTO_URN) {
> snprintf(urlbuf, MAX_URL, "urn:%s", request->urlpath.buf());
> } else {
> /// \todo AYJ: this could use "if..else and method == METHOD_CONNECT" easier.
> switch (request->method.id()) {
>
> case METHOD_CONNECT:
> snprintf(urlbuf, MAX_URL, "%s:%d", request->GetHost(), request->port);
> break;
>
> default:
> portbuf[0] = '\0';
>
> if (request->port != urlDefaultPort(request->protocol))
> snprintf(portbuf, 32, ":%d", request->port);
>
> snprintf(urlbuf, MAX_URL, "%s://%s%s%s%s%s",
> ProtocolStr[request->protocol],
> request->login,
> *request->login ? "@" : null_string,
> request->GetHost(),
> portbuf,
> request->urlpath.buf());
>

this code in urlCanonicalClean():

> if (request->protocol == PROTO_URN) {
> snprintf(buf, MAX_URL, "urn:%s", request->urlpath.buf());
> } else {
> /// \todo AYJ: this could use "if..else and method == METHOD_CONNECT" easier.
> switch (request->method.id()) {
>
> case METHOD_CONNECT:
> snprintf(buf, MAX_URL, "%s:%d",
> request->GetHost(),
> request->port);
> break;
>
> default:
> portbuf[0] = '\0';
>
> if (request->port != urlDefaultPort(request->protocol))
> snprintf(portbuf, 32, ":%d", request->port);
>
> loginbuf[0] = '\0';
>
> if ((int) strlen(request->login) > 0) {
> strcpy(loginbuf, request->login);
>
> if ((t = strchr(loginbuf, ':')))
> *t = '\0';
>
> strcat(loginbuf, "@");
> }
>
> snprintf(buf, MAX_URL, "%s://%s%s%s%s",
> ProtocolStr[request->protocol],
> loginbuf,
> request->GetHost(),
> portbuf,
> request->urlpath.buf());

and now there is this code in urlAbsolute (which handles the CONNECT
case earlier so the code below is guaranteed not to get a CONNECT
method):

> + if (req->protocol == PROTO_URN) {
> + snprintf(urlbuf, MAX_URL, "urn:%s", req->urlpath.buf());
> + } else {
> + if (req->port != urlDefaultPort(req->protocol)) {
> + urllen = snprintf(urlbuf, MAX_URL, "%s://%s%s%s:%d",
> + ProtocolStr[req->protocol],
> + req->login,
> + *req->login ? "@" : null_string,
> + req->GetHost(),
> + req->port
> + );
> + } else {
> + urllen = snprintf(urlbuf, MAX_URL, "%s://%s%s%s",
> + ProtocolStr[req->protocol],
> + req->login,
> + *req->login ? "@" : null_string,
> + req->GetHost()
> + );
> + }

There got to be some way to remove this code duplication (and related
inconsistencies)!

I hesitate voting against because I do not know whether that would
affect future re-submissions. I do not want to block them. Does anybody
know whether a negative vote is sticky and needs to be "cleared" by the
voter?

Thank you,

Alex.
Received on Mon Sep 01 2008 - 21:51:04 MDT

This archive was generated by hypermail 2.2.0 : Tue Sep 02 2008 - 12:00:02 MDT