[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Re: svn commit: rev 163 - trunk/subversion/tests/clients/cmdline/svntest

From: Bill Tutt <billtut_at_microsoft.com>
Date: 2001-09-27 20:13:20 CEST

This is more of an informational post rather then, there's a problem
here post btw...

I don't know how many of you have read O'Reilly's Mastering Regular
Expressions book, but it really helps explain how to go about writing
performant regular expressions under the traditional NFA implmenttation.
(Python, Perl, etc...)

Anyway..
> > > + rm = re.compile ('^(..)(.)(.+)(\d+)\s+(.+)')

This will match lines that have a '*' inside of the first (.+)
expression.
Oddly enough, the above regular expression will also match something
like this:
AzQ * 012345 012345 012345 96789 ASDF

The first (.+) text will match:
" * 012345 012345 012345 "

\d+ will match 96789

This shows you how the matching process works. The (.+) expression
takes the regular expression clear to the end of the string. The engine
then backtracks along the string until it finds a point where the rest
of the regular expression matches.

A faster regular expression would be:
"^(..)(.)([^\d]+)(\d+)\s+(.+)"
In this case the the (\d+) would match 012345, and the enging wouldn't
have to backtrack.

Moral of the story:
Always see if you can avoid using .* or .+ in the middle of a regular
expression pattern, it degrades the matching performance of your regular
expression.

Secondary moral of the story:
You're usually better off matching exactly what you're trying to match,
and no more.
Anything else can get you into trouble.

FYI,
Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:42 2006

This is an archived mail posted to the Subversion Dev mailing list.