[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] Proposal for GSoC project - extend 'svn {patch,diff}' with git unidiff format

From: Stefan Sperling <stsp_at_elego.de>
Date: Mon, 5 Apr 2010 13:41:03 +0200

On Mon, Apr 05, 2010 at 10:02:56AM +0200, Daniel Näslund wrote:
> Hi!
> I'm supposed to send this proposal to the Google Summer of Code
> machinery and let it be forwarded to the interrested mentor of the
> Subversion community, in this case Stefan. In the interrest of openess
> I'm posting it here before sending it off to Google later today. Maybe
> someone has something they'd like to add?
> ===========================================================
> Git unidiff format extension to 'svn patch' and 'svn diff'
> ===========================================================
> Contents
> ==========
> Suggested workflow
> What is the git unidiff format?
> Parsing the git headers
> Applying tree changes
> Applying mode changes
> Applying binary patches
> Applying property changes
> Suggested workflow
> ----------------------
> Here's my project proposal for GSoC 2010. The purpose of the project is
> pretty self-explanatory; make 'svn patch' and 'svn diff' able to deal
> with git unidiff extensions. I've tried to point out some of the API
> changes that are neccessary to show that I have an understanding of what
> to do. If I'd get accepted I would do things in this order:
> 1) Rev funcs to allow a use_git_format flag to be passed down to
> libsvn_diff and create git diff format patches for adds and deletes.
> Write a copule of tests to verify that we get the intended
> format.

It's acceptable to always produce the information in the diff because
patch tools which don't understand it will just ignore it as noise.
So we may not even need a flag.
> 2) Add the ability to track renames and copies in libsvn_diff. Probably
> by using some wc funcs for getting the status. My first assumption
> was that the svn_wc_diff_callbacks4_t vtable would be revved to allow
> for copied and moved scenarios once we have editor-v2. But Neels was
> talking about some bigger rewrite where the diff editor would be
> dropped. Anyway, as goes for the 'git unidiff format' work, I need
> some way to detect copies and moves. When I have detection, add the
> git headers for copies and renames and write tests to confirm the
> right behavior.

It's a pity that diff is implemented with an editor because it makes
your task much harder than necessary (you don't have all information
you need available while the diff is produced piece-by-piece).

Can you think of a way to do this in a simpler fashion?
I think this could be done simpler by checking all files which
appear in the diff for 'copyfrom' info.
Let's assume your diff contains files A, B and C:

  If you find a single relation A->B,
    if A was deleted: A was moved to B
    if A was not deleted: A was copied to B
  If you find multiple relations A->B, A->C:
    A was copied to B and to C
    maybe A was also deleted

But maybe because of the editorness the receiver (i.e. diff text
producer) doesn't know about A when dealing with B and vice-versa,
because it only gets to deal with each of them in isolated function
calls? Bah. That is really ugly.
Maybe you can fix this by tweaking the way the diff editor is driven?
Note though that the same editor is used for merge!
Or maybe you should rewrite diff this year and then do the
git-diff-extensions project next year? :)

> 3) Determine how the base85 format works and write C-tests to confirm
> the behavior. Git does it like this: [4]

There was code to do base85 stuff in the original svnpatch implementation.
It's been deleted from trunk but you can dig it out of the history if
that helps.

Note that we *cannot* use code from git because git is GPL'd.

> 4) Pass down a flag for allowing or disallowing binary diffs to
> libsvn_diff. Detect binary files and write the patches. Write tests
> to confirm the behavior.

You don't need to support binary files in the first pass if you
don't want to. I'd probably put handling of binary files as an optional
item just to keep some room for unexpected difficulties.

> 5) Allow 'svn patch' to apply git diff formats for adds and deletes.
> Write tests to confirm the behavior.
> 6) Allow 'svn patch' to apply git diff formats for moves and copies.
> 7) Allow 'svn patch' to apply git diff formats for binary patches. I
> propably need to do some thinking about what state the wc can be in
> as for obstructed, missing, replaced, unversioned, ignored nodes and
> so on.
> 8) Make libsvn_diff able to record modes. Probably we're only
> interrested in the executable bit and that one can we get from
> svn:executable. Write tests to confirm the behavior.

Yeah we only care about the X bit.
The rest is done by the user's umask.
> 9) Allow 'svn patch' to apply mode changes (if we agree that we want
> that behavior):

It would be nice to have svn path understand about the X bit.

> 10) Decide on a header for dealing with props? Do we need to stay
> compatible with git and diff? Probably, so we need a header that will be
> ignored by applications not interrested in svn:properties.

We don't need to stay compatible, just do it in a way that will cause
git and hg to ignore our property data as noise.

> 11) Decide on the header format for properties. Implement it in the diff
> code and write tests for it.
> 12) Extend the diff parser to deal with property diffs. Write tests.
> 13) Done.


> What is the git unidiff format?
> --------------------------------
> The format is thoroughly described in [1] so I'll just recapitulate the
> use cases for it:
> 1) Track copies and renames
> 2) File mode changes
> 3) Binary patches
> Creating the git headers
> -------------------------
> A couple of funcs needs to be revved to pass down the neccessary
> parameters telling libsvn_diff to create a git diff. And we need a way
> to detect copies and renames.
> subversion/libsvn_client/diff.c
> (svn_client_diff5): We need a parameter to tell the diff machinery we
> want a git diff.

Or just do it by default. No need to rev the function.

> (svn_wc_diff_callbacks4_t): We have callbacks for changed, added and
> deleted nodes but none for copied or moved nodes. Since we don't
> have editor-v2 we can't get that info from the server so git diffs
> should only be possible for wc-wc diffs at the moment. At the moment
> I'll probably check the status of the path that we get in
> file_added() and record copied-from or moved-from.
> subversion/libsvn_diff/diff_file.c
> (svn_diff_file_diff2):
> Parsing the git headers
> ------------------------
> We have examples of how the parsing should be done from the mercurial
> source code [2]. (This link was found in the notes document referred
> above. A big thank you to Augie Fackler for taking the time to write
> down all the information).
> subversion/libsvn_diff/parse-diff.c
> (parse_git_hunk_header): Create this func to be invoked before
> parse_hunk_header(). Captures oldname, newname, operation and mode.
> Applying tree changes
> -----------------------
> We already have many different scenarios to handle with nodes beeing
> obstructed, missing, ignored, unversioned and so on. If we'll track tree
> changes the number of scenarios will increase. I probably should make
> some kind of graph to map out the possible scenarios.
> subversion/libsvn_client/patch.c
> (install_patched_target): Here we're currently handling deletes, adds
> and modifications. With the git diff format we can handle copies
> and moves here too.
> Applying mode changes
> -------------------------
> Subversion does not allow file permissions to be recorded. I
> assume it's since it's hard to make those portable between windows fs
> and non-windows fs [3]. We'll have to make a decision as to whether 'svn
> patch' and 'svn diff' should be able to deal with applying permissions.
> As I see it, version control is about tracking file contents, not that
> kind of userdata but if someone has a good usecase let me hear! From
> what I understand it, mode changes are mostly used for setting the
> executable bit but we have svn:executable for that. Hrm, that of course
> can't be used yet since we can't use property diffs. :-)
> Applying binary patches
> -------------------------
> subversion/libsvn_client/patch.c
> (init_patch_target): If the content is binary it will be encoded with
> base85. A really, really small possibility but the translated stream
> might translate something encoded as a keyword.

Don't translate binary files then :)

> Applying property changes
> --------------------------
> Subversion has properties and it would be great if those could be
> included in patches. We have a diff format for properties that patch(1)
> (and hopefully the rest of the patch family) ignores, e.g. they can be
> displayed without beeing interpreted by the parser. We need a header
> format that tells the parser on what lines in the patch we have the
> properties. All the action is in:
> subversion/libsvn_client/diff.c
> (display_prop_diffs)

I like the proposal. The only thing that worries me is that we may
need to tackle the diff-is-implemented-with-the-merge-editor hack.
Note that this hack is already biting us elsewhere:

Maybe we should just bite the bullet and try to re-implement (or "fix")
diff together before or during gsoc? It certainly would not hurt to
have this problem solved before adding the git extensions.

Received on 2010-04-05 13:41:39 CEST

This is an archived mail posted to the Subversion Dev mailing list.