On 27/09/2013 5:35 a.m., Alex Rousskov wrote:
> On 09/26/2013 10:02 AM, Amos Jeffries wrote:
>
>> Last I saw on the new strict configuration issues was that Alex was
>> requesting final resolution of the squid.conf syntax for regex pattern
>> tokens in strict parse mode before it goes to 3.4.
> I did not request that the RE resolution is made before the code goes
> into v3.4. IIRC, the committed changes essentially disable RE support in
> strict mode. That should not introduce backward compatibility problems
> AFAICT.
>
> What I did consider important is that the "foo=bar" decision is made
> before the committed changes go into v3.4. Here is a quote from my
> 2013/08/28 email (I assume that is the email you refer to above):
>
>>> As I wrote earlier, the 'foo="bar and baz"' issue worries me, but I
>>> think we can discuss that after your commit. The important part is for
>>> Amos not to pull your changes into v3.4 until that discussion is over.
>
> The reason I wanted us to reach a decision is to avoid telling v3.4
> users that the syntax has changed yet again. However, very few users are
> going to start using the bad
>
> "foo=bar and baz"
>
> instead of the natural
>
> foo="bar and baz"
>
> syntax so we can probably go ahead with the pull if needed.
Aha. Okay I misremembered. That is better state then I was thinking.
Yes I agree that very few (or none) are going to use the "foo= bar"
style in what will hopefully been another shortish lifecycle.
At this point I am very much in favour of keeping the foo= syntax on
grounds of it being so familiar and well published that removing it will
be a major amount of pain to a lot of people.
>
>> That objection is
>> essentially blocking 3.4.0.2 release which requires several of the other
>> fixes in the patch.
> There are other reasons to worry about that parsing change, but I do not
> think RE support is holding us hostage here. I hope the above clarifies.
> As for "other reasons", see below and my next email.
>
>
>> Christos, Do you have any unseen progress on that last remaining piece
>> of the new parser?
> Christos has made a lot of parsing improvements since the last commit.
> However, I think we need to re-evaluate our overall approach to this
> problem. I told Christos as much yesterday, but he did not have a chance
> to respond yet. I will forward my email to him here although it is a bit
> rough. Christos, if you are reading this, please feel free to comment
> here instead of responding to my private email.
>
>
>> IMO;
>> I have kind of been favouring regex( some (pattern) ) since we have
>> now added function(...) style to squid.conf for parameters(things). Note
>> that brackets can be easily counted to skip the patterns internal ( and
>> ) groupings and \( \) literals, leaving an easily identifiable
>> terminator character for regex(...)some_garbage_token .
> I share the "regex" prefix direction (see my next email) but since
> parenthesis are used extensively in REs, I do not think they are a good
> default. It would be very difficult for admins to understand correctly
> which parenthesis they need to escape inside the RE and how.
As I said the bracket counting is very easy to do. We already have the
guarantee from regex syntax itself that all non-escaped ( and ) are
going to be paired. We just need to absorb the initial '(' from "regex("
and count scopes+= for heach ( and scopes-- for each ) until we hit )
with scopes=0. the middle bit is guaranteed to be pattern string, so
drop the trailing ')' and return from regex tokenize step.
>
>> I am objecting to the suggested use of // on grounds that it is too
>> easily confused with perl regex s/pattern/g syntax and if admin start
>> entering patterns from that regex language syntax into squid.conf
>> GNU-regex parser undefined problems will arise in ways hard to debug.
>> However we can always add preg(/pattern/) in future when we add support
>> for that expression type.
>>
>> What say you?
> AFAIK, the /re/ syntax is used by sed, PHP, Javascript, Ruby, and
> probably many other tools and languages. It is not specific to Perl and
> predates Perl. The /re/ syntax tells admins that they are looking at
> some RE, not that they are looking at a Perl (or any other specific
> flavor of) RE.
The only usage I've seen it in is the Perl regex and tools like the ones
you mention above which share that syntax. Whichever ones came first is
not much matter Perl is the donkey that carried that regex syntax into
my life and a great deal of other admin as well.
The main point is that Squid GNU-based pettern syntax is notably
different in a number of edge case (omissions mainly) which will trip
people up if they confuse the two. Lets not invite that confusion in the
new-and-improved parser.
> The only real problem with /re/ syntax as the default is that it does
> not work well with URLs, which are very common in Squid patterns. That
> is why I think a string-based "re" may be a better default for Squid.
Which menas that is make escaping mandatory in one form or another.
Which is giant leap #1 down the slipery slope towards
"/http:\\\/\\\/foo\\/i broke
it\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/?how/"
With string based or any other delimiter (including '/') we cannot
differentiate the pattern token from the delimiter token without
escaping the pattern token, then any escape-characters in the pattern as
well. Given your code expertise you have possibly read the same or
similar language design document I did about this problem.
Using () brackets or [] brackets we get that nice pairing guarantee
from regex (in all the flavours I'm aware of) and can apply the above
mentioned algorithm without any escaping necessary at the squid.conf
level. Regex may require escaping of some ( and ) itself but that is
more easily done without any squid escapes getting in the way.
> However, it is not urgent to decide this now if my understanding about
> RE support in the committed code is correct. The concerns I will
> highlight in my next email are far more important because they affect
> strict syntax adoption and a lot of code (so if you pull the committed
> changes into v3.4 now, we may end up with three rather different code
> bases to work with: old, v3.4, and trunk).
Okay. Will wait for that before making a decision.
Amos
Received on Thu Sep 26 2013 - 18:13:30 MDT
This archive was generated by hypermail 2.2.0 : Fri Sep 27 2013 - 12:00:11 MDT