[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: XML and libsvn_wc performance

From: Daniel Berlin <dberlin_at_dberlin.org>
Date: 2003-11-12 05:43:39 CET

On Tue, 11 Nov 2003, C. Michael Pilato wrote:

> "Max Bowsher" <maxb@ukf.net> writes:
>
> > Branko ─îibej wrote:
> > > Jason Voegele wrote:
> > >
> > >> I've been browsing through the issues list looking for things I
> > >> might be able to lend a hand toward, and performance of checkouts
> > >> has piqued my interest.
> > >>
> > >> Issues #1429 and #1490 indicate that XML parsing in libsvn_wc has a
> > >> fairly significant impact on performance. What seems to me an
> > >> obvious (partial) solution has not yet been mentioned, perhaps
> > >> because I am not seeing the whole picture, or the solution I'm
> > >> proposing is too radical a change for this stage of Subversion
> > >> development. What I'm getting at is why use XML for
> > >> the .svn/entries files at all? Why not replace it with a simpler
> > >> file format and a lex/yacc based parser?
> > >>
> > >>
> > > Ouch. Let's stay away from lex or yacc, we have enough dependencies as
> > > it is.
> >
> > Um... svn *already* uses bison or yacc. libsvn_subr/getdate.y
>
> If we're searching for alternative parsers, I'd suggest seeing just
> how badly our own config-file parser is, performance-wise. I can
> imagine an entries file like:

Okay, speaking with billions of years of experience using lex/yacc here,
if you are looking for performance, go elsewhere.

Lex/Yacc are good for non-performance dependent things. But because they
generate table-driven parsers, they are slow .This is generally debated to
death, with those saying they must be fast, because the table stays in
cache, never bothering to actually time a table driven parser vs a
non-table driven one, and those who have actually tried to make a fast
lex/yacc parser, on the other side. :P

This is just lex/yacc though, there are lexer/parser generators that don't
generate table driven ones (re2c as an example on the lexer side, ANTLR on
the lexer/parser side).

If you want the best performance, and maintainability write your own
recursive descent parser.
It's generally complete bullshit that lex/yacc parsers are easier to
maintain. Try to find someone who understands the GCC C yacc parser, for
example The GCC C++ parser was a yacc parser, but was rewritten because

1. It wasn't easy to get good diagnostics out of it
2. It wasn't all that fast
3. It was a pain in the ass to maintain

If you want good performance, and maintainability, use a lexer/parser
generator that generates directly executable code.

If you want maintainability, write a mismash of a lexer/parser

If you just need it done quick, use lex/yacc.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Nov 12 05:44:22 2003

This is an archived mail posted to the Subversion Dev mailing list.