[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[PROPOSAL] Drop XML from .svn/entries

From: Peter N. Lundblad <peter_at_famlundblad.se>
Date: 2006-04-07 00:00:21 CEST

Hi,

"Just For Fun", I experimented with replacing our current XML-based
.svn/entries syntax with a line-based, |-separated syntax. It
actually turns out that there is some overhead associated with using
XML, but it is not extreme overhead.

To get some idea how much overhead there actually is, I tested
something that might be considered as a "typical work cycle". I used
the following one-liner:

time (svn st; (sync&); sleep 10; svn up; (sync&); sleep 10; svn st; \
(sync&); sleep 10; svn ci)

The point is to do some common operations in a row, but get the disk
flushed in between, since you typically will not do the operations
this quickly (you look at the output, do some editing, etc). Then I
subtracted the times for the sleeps and compared the wallclock times
for these operations. (As usual, I tried with the GCC tree on my
Celereon 17. GHz using LInux).

The result is that this row of operation requires 19% less wallclock
time and 35% less CPU time. NOte that I'm not an expert in
performance measuring, so feel free to just call this method bullshit
or whatever:-)

ON another note, the size of all the entries files in the GCC tree
decreased from 8.2 to 4.2 megabytes, which probably explains much of
the time savings. (The size is not a big deal, since where we
currently consume peoples' disk space is in the textbase files.)

So, I propose to replace our current XML entries files with some
variant of this new format. Below is three lines from an entries file
to show how it might look:

|dir|2|file:///home/pl/svntest/gcc/repo|file:///home/pl/svntest/gcc/repo||||2006-04-06T20:31:40.024395Z|2|pl|t||svn:special svn:externals svn:needs-lock||||||||||||b4f160ba-c4e8-11da-bcd8-65faf9067726|
config-ml.in|file|||||2006-04-05T21:35:10.000000Z|e09724e0a7725cc08129db9293a35df9|2006-04-05T21:20:33.744115Z|1|pl|
libgomp|dir|
...

Each entry is on a separated line and each field is terminated by a |
character. Empty fields may be omitted provided all following fields
are also empty. Absent fields are represented by empty fields (except
for the entry name, where an empty field means the thisdir entry).
Boolean values are represent by "t" (for true" or an empty field for
false. IN the future, we can add new fields at the end of each
record. There will be an escaping mechanism for control characters
(think newlines in lock comments) and the | separator.

Since we are not used any of the things XML provides such as
structure, the only drawback I see to making this change is a slight
degradation in readability. But hey, computers will read this file
much more often than humans, and we could easily write a
pretty-printing script for debugging purposes. Another thing that
might be considered a drawback is that we will need to maintain botht
the old and new entries parsing code for compatibility. The parsing
code for this new format, however, is pretty straightforward, so I
don't think this is a problem.

If we want this, we should do it before 1.4, since we already have WC
format upgrades.

Opinions?

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Apr 7 00:00:51 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.