[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: SVN Book Method for Splitting Repos doesn't work

From: Max Bowsher <maxb_at_ukf.net>
Date: 2004-12-30 21:10:29 CET

Roland Besserer wrote:
> I would like to comment on the concept of 'human readable' though.
> Although emacs (for example) can easily handle binary files just
> dumped into the output file, including 8-bit data sure doesn't make
> the dump file human readable anymore. It also makes processing the
> dump file with text (or more accurately line) oriented tools error
> prone.
> SVN is already, in my opinion, somewhat handicapped by the fact that
> it uses a database backend

You seem to have ignored FSFS.

> and thus a binary file format that puts you
> at the mercy of the decode/repair tools specifically designed for
> it.

Under the hood, the formats really aren't that much more difficult to
comprehend than CVSes. Anyone who really wants to peek under the covers, is
free to do so.

> It would be nice if at least the dump file format would stick to
> an ASCII only representation that makes processing of dump files with
> 'standard' utilities easy and less error prone.
> Max Bowsler made the interesting comment that "Personally, I think
> that uuencoding (or similar) doesn't increase human-readability, it
> just wastes processing time" which I completely disagree with. Who
> cares about minute incremental decoding time or even file size in
> this age of multi-GHz processors and 100GB disks. Human readable
> is a term that should not be taken literally. To me it means that it
> is an ASCII/text based representation I can feed any tool like sed or
> awk with.

Hi, it's me again :-). "Bowsher" not "Bowsler", by the way.

This is a debate about tradeoffs -

Subversion saves processing time and file size, at the cost of putting
greater requirements on the tools used.

I happen to feel that this is the right tradeoff to make in this case.

Any small overhead can become quite magnified when dealing with gigabytes of
data, and if you want to restrict the available byte values to printable
ASCII, then the amount of space required to store arbitrary data will
increase by approximately a factor of 3.

The downside, of course, is the increased restrictions on the tools:

I think expecting data processing tools to be 8-bit clean is a reasonable
demand for newly engineered systems today.

There is the further complication, of course, of dumpfile-header-like data
appearing in the middle of file content - I admit that this is a harder
problem. However, both perl and python are excellent tools, and the dumpfile
format has been deliberately designed to be easily parseable, offering a way
to cleanly circumvent this issue.

One particular choice of tradeoffs will never be the optimum for all cases.
The particular choice made by Subversion happens to work very nicely for the
common cases of using dumpstreams for backup and migration.


To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Thu Dec 30 21:15:01 2004

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.