[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [TSVN] regex help

From: Toby Johnson <toby_at_etjohnson.us>
Date: 2005-02-17 23:06:20 CET

Eric J. Smith wrote:

>>Capturing is not the problem here! It's the returned groups! Since
>>groups are numbered from 0 to X with 0 being the whole match and the 1
>>being the first group, 2 being the second this will only return the
>>first bug ID in group 1.
>>
>>
>
>Hrmm... when I run the regex with your tester, it seems to only report the
>very last issue number from each match. Is that what it's doing for you as
>well because it sounds like your saying that it's only returning the very
>first issue number for each match.
>
>Are you sure that you are somehow not enumerating through all of the groups
>properly or something? I wish I was a C++ guy and I could test it on my end
>for you, but I am not unfortunately.
>
>
The problem is with trying to match varying amounts of stuff between
"issue" and the issue number. You, Stefan, and I have now all tried the
same thing: match the word "Issue", followed maybe by some junk spacing
or pound signs or other issue numbers, and then return the issue number
at the end. The problem is that neither GRETA, nor Perl, nor any other
regex engine will behave that way when given either a greedy or
non-greedy "+" or "*".

The problem isn't whether a group can return more than one result if it
matches more than one time; it can. If we just use the regex of (\d+)+
then it will match every issue number and return them correctly. But
that's too much; we don't want to match every number in the log message,
just ones that look like they're actually referring to an issue number.

To explain it another way, the reason the former doesn't work is that
any given part of the string being searched can be matched only once. We
are attempting to match the word "Issue", followed by some stuff we
don't care about, followed by some stuff we do care about, multiple
times. Regex parsers don't work that way; they find the shortest portion
of the string that matches, they return that result, and if you
specified GLOBAL, they keep going. You can't go back to the same part of
the string twice. That's why two passes are necessary: one pass to strip
out the interesting parts, and another pass to strip out only the
numbers from those parts.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tortoisesvn.tigris.org
For additional commands, e-mail: dev-help@tortoisesvn.tigris.org
Received on Thu Feb 17 23:08:43 2005

This is an archived mail posted to the TortoiseSVN Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.