[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PROPOSAL] Drop XML from .svn/entries

From: Peter N. Lundblad <peter_at_famlundblad.se>
Date: 2006-04-24 13:09:09 CEST

brane@xbc.nu writes:
> Arlie Davis wrote:
> > I think this entire proposal to move away from XML and to an ad-hoc format
> > is a very bad one.
> >
> > I've seen many projects use XML for large datasets, and do so very
> > efficiently. Why not try to improve the existing implementation, rather
> > than inventing a new format and going through all the unnecessary pain?
> >
> I certainly agree in essence, and prophesy lots of horrible problems
> because of this change, but ...

Would you care to elaborate on the "lots of horrible problems" you
foresee? Or, are you just going to wait to one of those arrives, then
telling me "I knew that would happen"? Please?

<irony>
Maybe this is the same kind of problems we've had with the custom
formats of fsfs - or the svndiff file format - or maybe the hashdump
format? Or, perhaps with the svnserve protocol...
</irony>

> > The key to XML (or anything) is usually understanding the data flow of the
> > most important performance-sensitive operations, and then structuring your
> > implementation around it. If you are concerned about entries files that are
> > large enough that using an XML DOM is painful, then use one of the streaming
> > XML implementations, which give you a great deal of control over how the
> > data is read/written, and when.
> >
> We already use a SAX parser.

To be fair, a part of the performance problems in parsing the XML were
due to us putting small lists of attributes in hash tables, just to
grab them out again. A more optimized, and more complex,
implementation could have cut that down somewhat, but we still had
enough overhead to justify a format change.

> > Abandoning XML without truly compelling reasons is a very bad idea. And if
> > performance is so critical, then replace the XML entries file with a
> > Berkeley DB file (btree), rather than building yet another hacked-up text
> > format. At least with BDB, there's already a solid implementation.
> >
> I think we'd have already done that if BDB were reliable in combination
> with ermote filesystems (e.g., NFS-mounted home directories). AFAIK
> that's not even true when using a DB_PRIVATE environment.
>

I agree that plain text files doesn't really cut it when it comes to
large sets of data, but then XML doesn't either, so that part is
irrelevant for this discussion.

Anyway, the change is merged to trunk now. It is still possible to go
back in time, of course - it will just cause problems for those who
follow trunk, who will have to either manually downgrade their WCs, or
ditch them. However, most people seem to agree that this was a good
change and those who didn't haven't provided any convincing
arguments IMO. All I've heard is talk about "unquantifiable values of
using a standard format" (and the like) and some horrible problems in
the future. I'm definitely ready to reconsider this change, but
please be more specific when you argue against it.

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Apr 24 13:09:48 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.