[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[RFC] Proposal for GSoC project - extend 'svn {patch,diff}' with git unidiff format

From: Daniel Näslund <daniel_at_longitudo.com>
Date: Mon, 5 Apr 2010 10:02:56 +0200

Hi!

I'm supposed to send this proposal to the Google Summer of Code
machinery and let it be forwarded to the interrested mentor of the
Subversion community, in this case Stefan. In the interrest of openess
I'm posting it here before sending it off to Google later today. Maybe
someone has something they'd like to add?

===========================================================
Git unidiff format extension to 'svn patch' and 'svn diff'
===========================================================

Contents
==========
Suggested workflow
What is the git unidiff format?
Parsing the git headers
Applying tree changes
Applying mode changes
Applying binary patches
Applying property changes

Suggested workflow
----------------------
Here's my project proposal for GSoC 2010. The purpose of the project is
pretty self-explanatory; make 'svn patch' and 'svn diff' able to deal
with git unidiff extensions. I've tried to point out some of the API
changes that are neccessary to show that I have an understanding of what
to do. If I'd get accepted I would do things in this order:

1) Rev funcs to allow a use_git_format flag to be passed down to
   libsvn_diff and create git diff format patches for adds and deletes.
   Write a copule of tests to verify that we get the intended
   format.

2) Add the ability to track renames and copies in libsvn_diff. Probably
   by using some wc funcs for getting the status. My first assumption
  was that the svn_wc_diff_callbacks4_t vtable would be revved to allow
  for copied and moved scenarios once we have editor-v2. But Neels was
  talking about some bigger rewrite where the diff editor would be
  dropped. Anyway, as goes for the 'git unidiff format' work, I need
  some way to detect copies and moves. When I have detection, add the
  git headers for copies and renames and write tests to confirm the
  right behavior.

3) Determine how the base85 format works and write C-tests to confirm
   the behavior. Git does it like this: [4]

4) Pass down a flag for allowing or disallowing binary diffs to
   libsvn_diff. Detect binary files and write the patches. Write tests
   to confirm the behavior.

5) Allow 'svn patch' to apply git diff formats for adds and deletes.
   Write tests to confirm the behavior.

6) Allow 'svn patch' to apply git diff formats for moves and copies.

7) Allow 'svn patch' to apply git diff formats for binary patches. I
   propably need to do some thinking about what state the wc can be in
   as for obstructed, missing, replaced, unversioned, ignored nodes and
   so on.

8) Make libsvn_diff able to record modes. Probably we're only
   interrested in the executable bit and that one can we get from
   svn:executable. Write tests to confirm the behavior.

9) Allow 'svn patch' to apply mode changes (if we agree that we want
   that behavior):

10) Decide on a header for dealing with props? Do we need to stay
    compatible with git and diff? Probably, so we need a header that will be
    ignored by applications not interrested in svn:properties.

11) Decide on the header format for properties. Implement it in the diff
    code and write tests for it.

12) Extend the diff parser to deal with property diffs. Write tests.

13) Done.

What is the git unidiff format?
--------------------------------
The format is thoroughly described in [1] so I'll just recapitulate the
use cases for it:

1) Track copies and renames
2) File mode changes
3) Binary patches

Creating the git headers
-------------------------
A couple of funcs needs to be revved to pass down the neccessary
parameters telling libsvn_diff to create a git diff. And we need a way
to detect copies and renames.

subversion/libsvn_client/diff.c
 (svn_client_diff5): We need a parameter to tell the diff machinery we
    want a git diff.
  (svn_wc_diff_callbacks4_t): We have callbacks for changed, added and
    deleted nodes but none for copied or moved nodes. Since we don't
    have editor-v2 we can't get that info from the server so git diffs
    should only be possible for wc-wc diffs at the moment. At the moment
    I'll probably check the status of the path that we get in
    file_added() and record copied-from or moved-from.

subversion/libsvn_diff/diff_file.c
  (svn_diff_file_diff2):

Parsing the git headers
------------------------
We have examples of how the parsing should be done from the mercurial
source code [2]. (This link was found in the notes document referred
above. A big thank you to Augie Fackler for taking the time to write
down all the information).

subversion/libsvn_diff/parse-diff.c
  (parse_git_hunk_header): Create this func to be invoked before
    parse_hunk_header(). Captures oldname, newname, operation and mode.

Applying tree changes
-----------------------
We already have many different scenarios to handle with nodes beeing
obstructed, missing, ignored, unversioned and so on. If we'll track tree
changes the number of scenarios will increase. I probably should make
some kind of graph to map out the possible scenarios.

subversion/libsvn_client/patch.c
  (install_patched_target): Here we're currently handling deletes, adds
    and modifications. With the git diff format we can handle copies
    and moves here too.

Applying mode changes
-------------------------
Subversion does not allow file permissions to be recorded. I
assume it's since it's hard to make those portable between windows fs
and non-windows fs [3]. We'll have to make a decision as to whether 'svn
patch' and 'svn diff' should be able to deal with applying permissions.
As I see it, version control is about tracking file contents, not that
kind of userdata but if someone has a good usecase let me hear! From
what I understand it, mode changes are mostly used for setting the
executable bit but we have svn:executable for that. Hrm, that of course
can't be used yet since we can't use property diffs. :-)

Applying binary patches
-------------------------

subversion/libsvn_client/patch.c
  (init_patch_target): If the content is binary it will be encoded with
    base85. A really, really small possibility but the translated stream
    might translate something encoded as a keyword.

Applying property changes
--------------------------
Subversion has properties and it would be great if those could be
included in patches. We have a diff format for properties that patch(1)
(and hopefully the rest of the patch family) ignores, e.g. they can be
displayed without beeing interpreted by the parser. We need a header
format that tells the parser on what lines in the patch we have the
properties. All the action is in:

subversion/libsvn_client/diff.c
  (display_prop_diffs)

cheers,
Daniel

[1] notes/svnpatch/svnpatch-git.txt
[2] http://mercurial.selenic.com/hg/hg/file/ac02b43bc08a/mercurial/patch.py#l195
[3] http://pagesperso-orange.fr/b.andre/permissions.html
[4] http://git.kernel.org/?p=git/git.git;a=blob;f=base85.c;hb=HEAD
Received on 2010-04-05 10:04:20 CEST

This is an archived mail posted to the Subversion Dev mailing list.