[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: reducing code bloat by removing svnpatch? (except unidiff)

From: Augie Fackler <lists_at_durin42.com>
Date: Sat, 29 Aug 2009 09:54:25 -0500

On Aug 28, 2009, at 8:03 PM, Mark Phippard wrote:

> I was going to suggest the same thing for the same reasons. I tried
> to find documentation and details on this format but couldn't. Which
> is why I did not comment.

Documentation for the format is pretty sparse. Here's an overview (now
that I've written this out, it might be more like "everything you need
to know to support this format other than specifics of how base85

Copies and renames are both handled like this:
diff --git a/{oldname} b/{newname}
copy from {oldname}
copy to {newname}
{normal unidiff data here if there was a difference, only for how new
is different from old}

(naturally, with copy instead of rename if that was the case}

New files:
diff --git a/alpha b/alpha
new file mode 100644
--- /dev/null
+++ b/alpha
@@ -0,0 +1,1 @@

(the longer file mode is used for representing symlinks, they come in
with mode 120000, and the contents are listed as *just* the path of
the symlink, that is, svn's symlink format without the leading 'link ')

diff --git a/README b/README
deleted file mode 100644
--- a/README
+++ /dev/null
@@ -1,77 +0,0 @@
{pile of unidiff data}

File mode changes:
diff --git a/README b/README
old mode 100644
new mode 100755
(I think it's normal for patch implementations to not care about the
old mode, and just apply the new one blindly - the old mode I believe
is more for human consumption or reverse-patching than anything else.
Also, you don't really ever see any modes other than these two - the
first is just "not executable", the second is executable. I've never
*seen* any other modes for regular files).

Binaries (this is for literals, which I believe is the new binary in
its entirety. I think there's a delta format which Mercurial does not
support, per a "TODO: deltas" in the code):
index {old hash}..{new hash}
GIT binary patch
literal {file length}
{base85-encoded zlib-compressed data}

The hashes are either the nullid (40 0's) if there was (or is) no
text, and are
sha1('blob %d\0%s' % (len(text), text, )) (using python syntax).
The base85 encoding seems to use a leading z to indicate this isn't
the last line of data, and a leading e to indicate this is the last
line of data. There's more to base85 than that, and I'd gladly take
some time to write down how it works for those interested in
reimplementing it. It appears at first glance to be the part that
requries the most detailed documentation.

Supporting binary deltas is something that can obviously come later,
since Mercurial doesn't appear to implement it at all right now. Note
that with binary literals it does not appear that patch -R is
possible. That's really a minor limitation, since you're never using
this on non-version-controlled files anyway.

It should be possible to add your own hunk types to this format (once
the base format is supported) that would allow you to transmit
property modifications inline as well (other than symlink and
executable, which are already handled sanely).

For those interested in some source code (GPLv2):

I'm a big fan of nonproliferation of custom diff formats, so it'd be
really great if this could work out. I just don't have the time or
enough motivation to write the implementation for svn. I hope the
contents of this email helps!

> ISTR glasser suggested this recently too.
> Good thing is that all the unidiff work done recently would be very
> relevant for this too.

Received on 2009-08-29 16:54:45 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.