[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] "svnadmin dump": An optional XML output format.

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2002-12-21 22:10:43 CET

On Sat, 2002-12-21 at 14:43, David Summers wrote:
> 2. It is a "standardized" way of presenting data in a portable format.

Elaborating on what Garret said: you can only do a limited amount with a
data stream if you only know that it's written in XML. "A sequence of
bytes" is also a standard way of presenting data in a portable format,
but it's only helpful up to a certain point. XML is the same.

I have heard of people who like to build huge databases of essentially
random XML content. But you still can't make many interesting
assumptions about the content of the database simply knowing that it's
in XML form. Instead of a sequence of bytes with unknown meaning, you
have a tree of strings of characters with unknown meaning.

> 3. It helps describe the data in a more readable format (although the
> original format is not bad, just takes some digging around to
> understand).

XML formats are sometimes more readable than other formats because each
piece of data is tagged (sometimes twice). But our dump format also
tags each piece of data, using a header:value notation. I'll grant that
property lists aren't presented in the most readable of formats, but
solving that problem is not worth inventing a whole new format.

> 4. I think it should help to improve the importability/exportability of
> the repository data to other tools (more easily
> readable/understandable).

This is the usual argument of the "paint the world XML" people, but I
don't buy it. The hard part of importing and exporting data is not
syntax, but semantics. Syntax is just string processing, and usually
not much of it. XML lets you trade a certain amount of string
processing for tree processing, which is slightly nicer--but that's only
a small benefit. (And XML comes with costs too. It's a very
complicated standard for what it does, and XML documents are terribly
verbose for the information they express.)

> 5. Someone mentioned RevML (which I'm looking for references for now)
> which should (theoretically and in the the future) increase interoperability
> with other SCM/SCCS systems.

RevML (thanks for hunting down the DTD) looks like it was very narrowly
designed for exchanging data between particular revision control
systems. It has element names like "p4_info", "cvs_info",
"sourcesafe_action", and stuff. I'm sure the author did that
deliberately to ensure that RevML never loses information about a
revision, but in the long run that's just not going to cut it for either
a standard dump format or a standard patch format.

A real, meaningful standard for revision control interchange has serious
semantic problems to address. Some systems don't record information
considered central to other systems (in particular, CVS doesn't record
which file changes are part of a single commit). There are at least
three radically different ways of handling branches (Subversion and
Perforce put branches into the filesystem namespace; Clearcase and CVS
put branches into an orthogonal namespace; Bitkeeper stores each branch
in a separate repository). Meta-data like "is this file executable" are
of crucial importance to some users, but are also very OS-specific.
Saying "we don't know about those problems, but hey, we can read and
write our data in an XML format highly specific to our model" is a
meaninglessly small step in the direction of interoperability.

> 7. It looked like a fairly easy way for me to jump into learning the
> Subversion source code while providing something useful.

This is always a worthy goal. (I was ready to throw out ra_svn and
write it off as a learning experience if people didn't want it.) But we
do have to be pretty careful about the functionality we put into the
Subversion libraries and core programs. Once a feature goes into 1.0,
we're pretty much stuck with it, and we want Subversion to be exactly as
complex as it needs to be, no more.

On the other hand, we don't have to be too careful about the code we
accept into the "tools" directory. So if you wanted to write a separate
program whch converts between the Subversion dump format and a format of
your choosing, I don't think there would be any objection to dumping it
into the tools directory. Of course, if a later change to Subversion
breaks your tool, you get to keep both halves. :)

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Dec 21 22:11:28 2002

This is an archived mail posted to the Subversion Dev mailing list.