Philip Martin writes:
> "Peter N. Lundblad" <peter@famlundblad.se> writes:
>
> > Philip Martin writes:
> > > entries file. This would also seem to be an ideal opportunity to
> > > combine the format file with the entries file.
> >
> > Yeah, that sounds reasonable. What are the real benefits here?
>
> Reduced disk IO and a small reduction in working copy size. At
> present the format file is read every time the entries file is read.
> With your change more (most?) entries files will fit in a single disk
> block so getting rid of format file could halve the disk reads and
> reduce the pressure of the filesystem cache. Getting rid of the
> format file will reduce working copy disk usage by one or two percent
> in some cases.
>
OK, so I've been busy the last days implementing this functionality,
including combining .svn/format with .svn/entries. This is in
/branches/nonxml-entries for anyone interested.
I claim that a sequence of operations:
svn st
svn up (update with no significant changes to the tree)
sync (to avoid the previous operation interfering with the next)
svn st
svn diff
svn ci (with no modifications)
Are 25 - 30 % faster on the GCC tree now than before. The inclusion
of the format number in entries made the first of the operations above
significantly faster.
(According to a simple test, comparing nonxml-entries with an 1.3.1
client, we're now 60% faster on the sequence of commands listed above,
and this would be even more if the GCC made larger use of properties.)
There are further optimizations that could be done (for example,
currently, we open/read/close the first part of the entries file to
get the format number and then open/read/close the whole entries file
to get the entries). I'm focusing on the format change to be ready if
we include it in 1.4.
I strongly feel this branch should be merged and would like some
review if possible. I'll discuss some of the design choices I made
below.
Why not compress entries and keep it in XML?
I haven't tested this (yes I know I promised to), so I don't know how
much it would give us performance-wise. A problem with this aproach,
however, is that when we combine the format number with entries, it
would make us have to decompress the file to get to the working copy
format. This means we can't easily change the entries format with
regard to what/if to use compression in the future. With the version
number in plaintext on the first line, we can change the rest of the
file's format completely if we need to in the future.
Another suggestion was to use shorter attribute/element names in the
XML file, and perhaps trim some whitespace. If we want to include the
format number in the file, we face a similar problem to the one
described above. We can't just put the format number on the first
line, because then it wouldn't be a well-formed XML file anymore. We
could put the format number in a comment or processing instruction in
the prolog (before the first element), but then we would need to
either create an XML parser just to extract the version number, or do
some custom parsing, which can get complex.
Because of the above, and because I don't see any real benefit of
using XML in this particular case, I don't think any of the above
proposals are worth it.
Why keep the .svn/format file?
A disappointing thing with nonxml-entries is that we keep .svn/format,
at least for some minor svn releases. The reason for this is to let
old clients give the user reasonably meaningful error messages when
faced with a new working copy. It is better to say "your client is
too old" than "This is not a working copy". We need to prefer
usability to a small amount of disk space here.
So, any review comments or whatever are as always welcome. Unless
someone objects, I plan to merge this in a few days.
Regards,
//Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Apr 11 23:04:19 2006