[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Linux Kernel Summit

From: Greg Stein <gstein_at_lyra.org>
Date: 2001-04-02 08:49:20 CEST

Hey all,

I went to the Linux Kernel Summit on Friday evening, to speak about
Subversion at one of the BOFs. Linus is interested in Subversion for a
number of reasons, but also wanted a counterpoint to BitKeeper.

Larry McVoy was there when I first arrived, but had to be back home by 7pm
(an hour away, so he should have left by 6pm), so he couldn't stay for a
scheduled 9pm BitKeeper BOF. He left somewhere around 8pm, so I'm sure he's
in trouble :-) ... but it was unfortunate because Linus wanted to see some
blood :-)

Ah well. Back to SVN itself. I talked about SVN, its model, the advantages,
some of the architecture, etc. I think the three basic points of feedback
that I received are:

* Use some kind of checksum on the wire and in the DB. Given that whatever
  is in source control is probably quite important, it would be a Good Thing
  if we hashed/checksummed the data. Even better if we periodically reviewed
  the whole database for items whose checksum has mysteriously changed or
  failed. The point is that bits can corrupt over the N year period that the
  data resides in source control. Catching the corruption sooner rather than
  later makes it easier to get back to a correct state. Imagine if you tried
  to get a checkin from four years ago, but it had been corrupted three
  years back? You probably don't have a valid version anywhere.

  I suggested that the client could compute an MD5 hash, send it over the
  wire using HTTP's Content-MD5 header, Apache would verify it (built in!),
  and that we'd place the hash into the database with the data. When the
  data is retrieved, we can send it back over the wire with Content-MD5 and
  the client can verify the hash value.

* Diff formats. Our SVNDIFF format is neato, but it can only change a
  specific input. It has no context, so it does not apply to the typical use
  case of mailing changes around, for application against arbitrary inputs.
  (Sam TH just emailed about this use case)

  People were somewhat leery of the XML form (even asking whether we stored
  things that way). If we intend to use it for passing around patches, then
  we'd want to consider allowing for content/unified diff formats inside the
  XML (see foonote 1). They weren't so much bothered by XML, as by its
  readability. I believe that we can create a readable XML-based patch, but
  we do need to use other diff formats in there.

  Our diff story also needs to be clarified for people; it was quite a
  lengthy discussion. XML and SVNDIFFs are for transforming specific trees.
  Something else (what?) is for mailing contextual patches.

  Linus said he had recently seen an extension to the PATCH format which
  started off with things like "mv foo.c bar.c" to be able to capture tree
  types of changes. We may want to try to find that format and make it an

* Merging. People are definitely interested in some of the "genetic" merging
  and being able to do merging right.

  They'd also like to be able to see a patch format that can embody a
  sequence of revision patches, each with their own log message. This kind
  of patch would be useful for people developing code independently and then
  merging changes (with history) into each others' trees. Consider it a
  "Poor Man's Distributed Repository."

That's about it. I also got an earful from Larry about not trusting anything
(see fn 2). Not even the filesystem. I don't agree with his paranoia level,
but we can ameliorate some of that using the MD5 hashes.

And as Eric posted here already, I also got an earful from him about
metadata not in a textual/editable format (e.g. using DB). Jim has already
responsed quite well to that.

There was a short discussion about licenses, and I simply pointed out that
SVN uses the Apache License, that it is completely free, and it will always
be free. There was a question raised about whether we might possibly be
infringing on some patents, and I responded that we didn't know of any. The
person suggested that Larry/BitMover may have some patents in this area, but
someone else suggest that was a rumor, and I can find nothing on the
Internet (Larry?).

If you have any questions about the above stuff, want some more drill down,
etc, then (of course) just send an email.


(1) diff type selection is one of the things that changing txdelta's
    direction can do. The piece of code writing out the XML data can choose
    whatever format is appropriate (or selected by the user).

(2) I found a posting from Larry that appears to state his position
    reasonably well:

Greg Stein, http://www.lyra.org/
Received on Sat Oct 21 14:36:27 2006

This is an archived mail posted to the Subversion Dev mailing list.