Re: [MERGE] c++-refactor HttpHdrCc

From: Kinkie <gkinkie_at_gmail.com>
Date: Tue, 27 Sep 2011 13:23:22 +0200

Hi all,
  here's v5. It implements set/isSet(attribute) and makes the mask
private. I managed to obtain much more context in the diff (but I lost
the merge-bundle in the process).
I've removed the keep-diff-simple bits, this can be merged as-is from
my point of view.

Thanks

On Tue, Sep 27, 2011 at 8:23 AM, Kinkie <gkinkie_at_gmail.com> wrote:
> Hi Alex,
>
> [...]
>
>>>> On the other hand, you now deallocate and allocate a new SquidString for
>>>> every CC directive just to find that directive ID so the total cost may
>>>> be not be lower. I think we can do much better, and even if we cannot or
>>>> will not, I do not think we should commit the "optimization" part of
>>>> this change until proper performance regression testing.
>>>
>>> Point taken. I've implemented a small helper class (basically a const
>>> char*+length) to support the lookup in a very cheap way. I'd like to
>>> get your feedback on it before trying to extend its use (it's
>>> currently an in-cc-file definition).
>>
>> You are on the right track with this class. It can be useful for a lot
>> of things (we use similar classes in other projects). The new class
>> needs some polishing though:
> [...]
>> The class data is copied. Consider:
>>
>> /** A char* plus length combination. Useful for temporary storing
>>  * and quickly looking up strings.
>>  *
>>  * The pointed-to string may not be null-terminated.
>>  *
>>  * Not meant for stand-alone storage. Validity of the
>>  * pointed-to string is responsibility of the caller.
>>  **/
>
> Done with an additional explicit statement that pointed-to data is not copied.
>
>> The "may not be null-terminated" part can be removed, but it would be
>> useful to preserve it as it will allow us to use this class for many
>> other things. Please note that if you do preserve it, you would need to
>> fix comparison functions and add a stream output operator (among other,
>> minor changes).
>
> Preserved.
> The current iteration of the class is intentionally write-only (can be
> only initialized and compared, but not read, and its data members are
> private.
> This is consistent with the function it needs to perform now.
> Should I lift this limitation?
>
>>> +class strblob {
>>
>> The class name must be changed to follow Squid style and, hopefully have
>> more meaning. Consider "StringArea". You may want to move it to
>> src/base/StringArea.h.
>
> Done.
>
>>> +    const char *thePtr;
>>> +    size_t theLen;
>>
>> I would s/thePtr/theStart/ but it is your call.
>
> Done.
>
>>> +    bool operator==(strblob &s) const { return theLen==s.theLen && 0==strncmp(thePtr,s.thePtr,theLen); }
>>
>> The parameter "s" should be declared const.
>
> Done.
>
>> Value result specific before the computation putting avoid please:
>>    strcmp() == 0.
>>
>> Please add != operator, implemented using the == operator.
>
> Done.
>
>>> +    bool operator< ( const strblob &s2) const { return strncmp(thePtr,s2.thePtr,theLen) < 0; }
>>
>> This will crash if s2.theLen is smaller than this->theLen. Needs more
>> len-specific checks. Note that for your current purposes, you can
>> declare shorter strings smaller even if they are not so in the dictionary.
>
> Done.
>
>> This is one reason why such comparison may need to be moved outside the
>> object. Another reason is that we will probably want case-insensitive
>> comparison in other contexts.
>
> Can't be done without making the pointed-to string readable. Will do
> if you confirm that you think that's the best course forward.
>
>> Rename "s2" to "s" or similar.
>
> Done.
>
>>> +        const HttpHeaderCcFields *f=&CcAttrs[i];
>>
>> "f" can be a [const] reference to avoid taking the address and then
>> dereferencing it all the time:
>>
>>    const HttpHeaderCcFields &f = CcAttrs[i];
>
> Done.
>
>>> +        const strblob tmpstr(item,nlen);
>>> +        const CcNameToIdMap_t::iterator i=CcNameToIdMap.find(tmpstr);
>>
>> Consider s/tmpstr/lookup/ or some such. You can avoid this named object
>> altogether if you just create strblob when calling find(). The iterator
>> should be constant, I think. Here is a sketch:
>>
>>    const CcNameToIdMap_t::const_iterator match =
>>        CcNameToIdMap.find(strblob(item, nlen));
>
> std::map.find takes a const reference argument. With an unnamed
> temporary it wouldn't compile.
>
>>> +                cc->setMaxAge(MAX_AGE_UNSET);
>>
>> So it is set or unset? If you do not want to add an explicit clear
>> MaxAge() method, how about s/MAX_AGE_UNSET/MAX_AGE_UNKNOWN/?
>
> Sure.
>
>> Similar for S_MAXAGE_UNSET, MIN_FRESH_UNSET, MAX_STALE_ALWAYS, etc. (see
>> the blob below for the full list).
>>
>>
>>> +     static const int32_t MAX_AGE_UNSET=-1; //max-age is unset
>>> +     static const int32_t S_MAXAGE_UNSET=-1; //s-maxage is unset
>>> +     static const int32_t MAX_STALE_UNSET=-1; //max-stale is unset
>>> +     static const int32_t MAX_STALE_ALWAYS=-2; //max-stale is set to no value
>>> +     static const int32_t STALE_IF_ERROR_UNSET=-1; //stale_if_error is unset
>>> +     static const int32_t MIN_FRESH_UNSET=-1; //min_fresh is unset
>>
>> The old types were "int". You are making them more specific now with
>> "int32_t". I am not sure such a change is withing this patch scope, but
>> even if it is, perhaps you should add a HttpCcDirective typedef or
>> similar to avoid spreading int32_t throughout the code.
>
> I'm simply refactoring what's in there. They used to be initialized to
> -1, as a duplicate validity check to mask (see below for more on
> mask). even worse, they are (in trunk) tested inconsistently:
> sometimes with EBIT_TEST on mask, sometimes with checks on >=0 on the
> value.
> trunk's HttpHdrCc::parse has the clearest picture on the current conventions:
> for bool (present/not present) header values, EBIT_SET(mask) is used
> for integer header values, EBIT_SET(mask) + assigning value is used.
> In case of parse error on the value, both are cleared
> for optional-integer header values, EBIT_SET(mask) + assigning value
> is used. The special "-1" value means both "not set" and "set but not
> specified", to tell the difference the mask is used.
>
> What I've done is to define explicit constants to tell the three cases
> apart without needing to look up the map, and used getters/setters for
> those headers in order to enfoce consistency. Other headers are still
> set by looking at mask (although I'm inclined to define a generic
> get/set combo for bool values)
>
>> Can the above constants be converted into an enum?
>
> I don't think they should. But the code allows for that now.
>
>>> +        EBIT_CLR(mask, CC_MIN_FRESH);
>>> +        this->min_fresh=STALE_IF_ERROR_UNSET;
>>
>> Consistency typo.
>
> Thanks
>
>>> +public:
>> ...
>>> +    int32_t mask;
>>> +private:
>>> +    int32_t max_age;
>>> +    int32_t s_maxage;
>> ...
>>
>> Seems wrong for the mask to remain public if we hid most of the values.
>
> Discussed above.
>
>>
>>> +    int32_t mask;
>>> +private:
>>> +    int32_t max_age;
>>> +    int32_t s_maxage;
>>> +    int32_t max_stale;
>>> +    int32_t stale_if_error;
>>> +    int32_t min_fresh;
>>> +public:
>>> +    String other;
>>
>> It would be nice to document at least the mask and Other fields.
>
> Done.
>
>>> +    bool gotList;
>>
>> Please declare when first use and add const if possible.
>
> Done.
>
>>> -## acl needs wordlist. wordlist needs MemBug
>>> +## acl needs wordlist. wordlist needs MemBuf
>>
>> An unrelated change?
>
> Typo-fix. Reverted.
>
>>> -        if (cc && cc->min_fresh > 0) {
>>> +        const int32_t minFresh=cc->getMinFresh();
>>> +        if (cc && minFresh!=HttpHdrCc::MIN_FRESH_UNSET) {
>>
>> This change smells like a nil pointer dereference to me, but perhaps the
>> missing context makes it OK?
>
> No, it does not. Fixed.
>
>>> +        if (cc && minFresh!=HttpHdrCc::MIN_FRESH_UNSET) {
>> ...
>>> +             entry->mem_obj->getReply()->cache_control->getStaleIfError() != HttpHdrCc::STALE_IF_ERROR_UNSET &&
>> ...
>>> +            if (cc->getMaxAge() != HttpHdrCc::MAX_AGE_UNSET) {
>>
>> These and other similar comparisons now look even worse than they were.
>> How about adding HttpHdrCc::hasMaxAge() and such?
>
> Could be done, maybe reintroducing the mask tests?
>
> API would then tentatively be
> class HttpHdrCc {
>  //generic getters/setters, for bool header values
>  set(http_hdr_enum);
>  clear(http_hdr_enum);
>  bool get ((http_hdr_enum);
>
>  //specific getters/setters, for integer values. Some special values
> need to remain though.
>  getMinFresh()
>  setMinFresh()
>  clearMinFresh()
>
>  String other;
> }
>
>>>> And if you want to mix optimization with cleanup, here is one place
>>>> where you can save quite a few cycles for many messages by not ignoring
>>>> the return value of getList():
>>>>
>>>>>      getList(HDR_CACHE_CONTROL, &s);
>>>>>
>>>>> -    cc = httpHdrCcParseCreate(&s);
>>>>> +    cc=new HttpHdrCc();
>>>>> +    if (!cc->parseInit(s)) {
>>>>> +        delete cc;
>>>>> +        cc = NULL;
>>>>> +    }
>>
>>> Done.
>>
>>> +    bool gotList;
>>>
>>>      if (!CBIT_TEST(mask, HDR_CACHE_CONTROL))
>>>          return NULL;
>>>      PROF_start(HttpHeader_getCc);
>>>
>>> -    getList(HDR_CACHE_CONTROL, &s);
>>> -
>>> -    cc = httpHdrCcParseCreate(&s);
>>> +    gotList=getList(HDR_CACHE_CONTROL, &s);
>>> +
>>> +    cc=new HttpHdrCc();
>>> +    if (!gotList)
>>> +        return cc;
>>> +
>>> +    if (!cc->parse(s)) {
>>> +        delete cc;
>>> +        cc = NULL;
>>> +    }
>>
>>
>> Hmm.. I did not see that we already have a HDR_CACHE_CONTROL test above
>> that code. The new code now optimizes a rare case of an empty but
>> present Cache-Control header. This can only lead to bugs, with no
>> positive performance changes. We should remove this exception. Sorry.
>>
>> (One more reason to provide more than 3 lines of context during review!)
>
> Unfortunately "bzr send" doesn't allow to specify it, that I know of :(
> I've checked web and documentation, but I can't find of any
> alternative way to sanely produce the patch.
>
>>>>> -HttpHeaderFieldInfo *CcFieldsInfo = NULL;
>>>>> +/// Map an header name to its type, to expedite parsing
>>>>> +typedef std::map<String,http_hdr_cc_type> HdrCcNameToIdMap_t;
>>>>> +static HdrCcNameToIdMap_t HdrCcNameToIdMap;
>>>>
>>>> HdrCcNameToIdMap should be constant so that we do not accidentally
>>>> modify it. You can make it a pointer and use similar-to-current
>>>> initialization code or try to initialize statically.
>>>
>>> Are you sure we run such a risk? I've now changed it so that it is a
>>> static data member with const key. Do you think it could be enough?
>>
>> I was not worried about somebody modifying the key specifically. I was
>> more worried about somebody modifying the map (adding, removing, or
>> changing its elements). Thus, a constant key does not help much. This is
>> not a big deal though.
>
> Ok
>
>>>>> +class HttpHdrCc
>>>>> +{
>>>>> +
>>>>> +public:
>>>>> +    int32_t mask;
>>>>> +    int32_t max_age;
>>>>> +    int32_t s_maxage;
>>>>> +    int32_t max_stale;
>>>>> +    int32_t stale_if_error;
>>>>> +    int32_t min_fresh;
>>>>> +    String other;
>>>>> +
>>>>> +    HttpHdrCc(int32_t max_age_=-1, int32_t s_maxage_=-1,
>>>>
>>>> Please move public data member after public methods. It would be nice to
>>>> document them too, especially those that do not correspond to CC
>>>> directives. Also, check whether we really need all of the above,
>>>> especially mask, as public fields (more on that later below).
>>>
>>> Done. Documented "mask" and "other", the others should be straightforward.
>>> public mask could be avoided if we do full getters/setters, but doing
>>> that elegantly would be scope-balooning.
>>
>> Not done? And you did "balloon the scope" by adding full
>> getters/setters, removing your excuse to leave the mask exposed, right?
>
> Mask is widely used for all non-integer header values.
> See proposed API above.
>
>>>>> +    HttpHdrCc(int32_t max_age_=-1, int32_t s_maxage_=-1,
>>>>> +            int32_t max_stale_=-1, int32_t min_fresh_=-1) :
>>>>
>>>> Do we really all five of these constructors, including the one that
>>>> allows for implicit conversion from int32_t to HttpHdrCc? I do not think
>>>> so. Please leave just what we need and use "explicit" to prevent
>>>> implicit conversions as needed.
>>>
>>> Overengineering. Changed to only the default contstructor (we need it
>>> to set various fields to "-1" by default).
>>> Ok for explicit.
>>
>> An explicit keyword does not make sense for a default constructor, does
>> it? Not sure why it compiles for you, but I would recommend removing
>> that keyword because your current code does not allow implicit type
>> conversions.
>
> Not allowing implicit conversions was the objective, and it compiles.
> But I'm removing it anyways.
>
>>>>> +    /** reset the structure to a clear state.
>>>>> +     *
>>>>> +     */
>>>>> +    void clear();
>>>>
>>>> The "clear state" is an undefined concept. Consider a more specific
>>>> description such as:
>>>> /// reset to the after-default-construction state
>>>>
>>>> (and there is probably a more elegant way of saying that).
>>>
>>> How about "reset all data members to default state"? Otherwise I'll
>>> use your suggestion.
>>
>> I can leave with either one, but "reset to default state" is probably best.
>>
>> BTW, the latest patch lost documentation for this (and some other?) members.
>
> Re-added.
> Attached is iteration 4.
> Branch tree is at lp:~kinkie/squid/playground
>
> Thanks,
>
> --
>     /kinkie
>

-- 
    /kinkie

Received on Tue Sep 27 2011 - 11:23:30 MDT

This archive was generated by hypermail 2.2.0 : Tue Sep 27 2011 - 12:00:03 MDT