Philip Martin writes:
> "Peter N. Lundblad" <peter@famlundblad.se> writes:
>
> > If we want this, we should do it before 1.4, since we already have WC
> > format upgrades.
>
> I think that is a bogus argument.
OK. Let me rephrase it like this:
I think we should either:
a) Get it into 1.4 since we already have WC formats bumps
or
b) Get it into the next minor release where we have a format bump for
some other reason.
I don't think this change warrants a format bump in itself, since the
costs to the users overweigh the benefits.
> I like it, I've always had doubts about the suitability of XML for the
> entries file. This would also seem to be an ideal opportunity to
> combine the format file with the entries file.
>
Yeah, that sounds reasonable. What are the real benefits here?
Julian Foad:
> Well, I have no particular objection to changing the format if it turns out to
> be a good idea after due consideration, but let me put the other side of the
> argument. Don't be too hasty. A 19% reduction in wall-clock time is worth
> considering, but not amazing.
>
Completely agree.
> Is this format that you are playing with one that you made up?
Well, you could call this a comma-separated database export format (but
the comma is a pipe) and then I don't claim to be the inventor:-)
But, yes, the specific details I have made up.
> Introducing Yet Another File Format is a bad thing in general,
> however simple it seems to be. There is significant but
> unquantifiable value in using a standard format. If a new format
> were to replace one of the other formats that we currently use, it
> would be more reasonable, but this won't be replacing XML.
Well, for interchange and communications, I agree about the general
statement above. But for an internal data format, I don't really see
the value of a standard format, especially since we don't use most of
its features. (I'm glad the days when XML should be used for
everything are over...)
We use custom formats in many places in our code already:
- FSFS has its own formats.
- The reporter in libsvn_repos as its own ad-hoc format for the spool
file.
- We have our famous hashdump format.
- The libsvn_ra_svn protocol.
- (And of course the svndiff format)
I don't claim that this validates yet another format, but OTOH, I
don't see why the entries file would benefit more from a standard
format than any of the other formats we have (maybe except for the
reporter spool file format which doesn't have any compatibility concerns).
>
> If, as you say, the savings might be mostly due to the reduced file size, then
> try compressing the files on the fly (reading/writing theough a zlib stream).
> Does that give a similar saving in wall-clock time? Presumably the CPU time
> would be increased, but is that important?
>
Since the new format reduced the CPU time by about 35%, I think the
performance gain is a combination of I/O and CPU, especially when much
of the entries files is ion the OS cache, which is the case when you
perform multiple operations in a row. But I don't think this should
be very hard to try, so I'll give it a shot to get some perspective.
> Basically, I think this is something to be considered but not
> something to be rushed in for v1.4 or that, on the present evidence,
> warrants a new file format. By all means post a patch and let it be
> tried and profiled a bit more.
I'll post a patch so we can compare numbers. As I said, before, maybe
I'm doing something wrong that favors my proposal:-) Other
OS/hardware characteristics might also be interesting.
Justin Erenkrantz:
> I wonder how the ConfigParser / INI format would do? We already use
> that in a bunch of other places and have all the parsing code in
> libsvn_subr. -- justin
That's basically a name=value format which isn't much different from
our flat XML format, so I don't think that'd give us anything. And I
really think the XML parser is more optimized than our config parser
(which is fine for our current use=).
Garrett Rooney:
> I wonder how much of the gain we could get just by making the xml less
> verbose. I've seen big speed gains in previous projects by taking
> verbose xml and making the tag/attribute names smaller (i.e. on the
> order of a single character each), it's less readable than the current
> xml format, but not as incomprehensible to the naked eye as the pipe
> separated one, and if it's really just size that's giving the speed
> gain it's worth looking at.
This should also be pretty simple and an interesting data point to have.
OK, it seems like I'll have some more profiling to do:-(
Thanks for the input, everyone,
//Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Apr 7 09:28:57 2006