I went to the Linux Kernel Summit on Friday evening, to speak about
Subversion at one of the BOFs. Linus is interested in Subversion for a
number of reasons, but also wanted a counterpoint to BitKeeper.
Larry McVoy was there when I first arrived, but had to be back home by 7pm
(an hour away, so he should have left by 6pm), so he couldn't stay for a
scheduled 9pm BitKeeper BOF. He left somewhere around 8pm, so I'm sure he's
in trouble :-) ... but it was unfortunate because Linus wanted to see some
Ah well. Back to SVN itself. I talked about SVN, its model, the advantages,
some of the architecture, etc. I think the three basic points of feedback
that I received are:
* Use some kind of checksum on the wire and in the DB. Given that whatever
is in source control is probably quite important, it would be a Good Thing
if we hashed/checksummed the data. Even better if we periodically reviewed
the whole database for items whose checksum has mysteriously changed or
failed. The point is that bits can corrupt over the N year period that the
data resides in source control. Catching the corruption sooner rather than
later makes it easier to get back to a correct state. Imagine if you tried
to get a checkin from four years ago, but it had been corrupted three
years back? You probably don't have a valid version anywhere.
I suggested that the client could compute an MD5 hash, send it over the
wire using HTTP's Content-MD5 header, Apache would verify it (built in!),
and that we'd place the hash into the database with the data. When the
data is retrieved, we can send it back over the wire with Content-MD5 and
the client can verify the hash value.
* Diff formats. Our SVNDIFF format is neato, but it can only change a
specific input. It has no context, so it does not apply to the typical use
case of mailing changes around, for application against arbitrary inputs.
(Sam TH just emailed about this use case)
People were somewhat leery of the XML form (even asking whether we stored
things that way). If we intend to use it for passing around patches, then
we'd want to consider allowing for content/unified diff formats inside the
XML (see foonote 1). They weren't so much bothered by XML, as by its
readability. I believe that we can create a readable XML-based patch, but
we do need to use other diff formats in there.
Our diff story also needs to be clarified for people; it was quite a
lengthy discussion. XML and SVNDIFFs are for transforming specific trees.
Something else (what?) is for mailing contextual patches.
Linus said he had recently seen an extension to the PATCH format which
started off with things like "mv foo.c bar.c" to be able to capture tree
types of changes. We may want to try to find that format and make it an
* Merging. People are definitely interested in some of the "genetic" merging
and being able to do merging right.
They'd also like to be able to see a patch format that can embody a
sequence of revision patches, each with their own log message. This kind
of patch would be useful for people developing code independently and then
merging changes (with history) into each others' trees. Consider it a
"Poor Man's Distributed Repository."
That's about it. I also got an earful from Larry about not trusting anything
(see fn 2). Not even the filesystem. I don't agree with his paranoia level,
but we can ameliorate some of that using the MD5 hashes.
And as Eric posted here already, I also got an earful from him about
metadata not in a textual/editable format (e.g. using DB). Jim has already
responsed quite well to that.
There was a short discussion about licenses, and I simply pointed out that
SVN uses the Apache License, that it is completely free, and it will always
be free. There was a question raised about whether we might possibly be
infringing on some patents, and I responded that we didn't know of any. The
person suggested that Larry/BitMover may have some patents in this area, but
someone else suggest that was a rumor, and I can find nothing on the
If you have any questions about the above stuff, want some more drill down,
etc, then (of course) just send an email.
(1) diff type selection is one of the things that changing txdelta's
direction can do. The piece of code writing out the XML data can choose
whatever format is appropriate (or selected by the user).
(2) I found a posting from Larry that appears to state his position
Greg Stein, http://www.lyra.org/
Received on Sat Oct 21 14:36:27 2006