[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Extensible changeset format proposal

From: Stefan Sperling <stsp_at_elego.de>
Date: Thu, 26 Aug 2010 13:16:44 +0200

On Thu, Aug 26, 2010 at 12:57:47PM +0300, anatoly techtonik wrote:
> Hello,
>
> Don't you think it is time to design an extensible changeset format
> for exchanging information about changesets between systems?
>
> Right now I am struggling to extract full information from uncommitted
> Subversion changeset for uploading it for review (in Rietveld
> project). Rietveld code review tool was initially designed to work
> with Subversion, but so far it is still impossible to get complete
> diff of changes from SVN that reviewer can apply to its working copy
> and commit after review. The problem to get complete diff is twofold:
>
> 1. Subversion data for uncommited changeset is scattered and it is
> hard to say if it ever complete.
> 2. "svn diff" format is too limited.
>
> For the first part I can give an example of problem I am trying to
> solve currently - 'Rietveld code review data is missing files that
> were created as a result of "svn copy" or "svn move" operation'. If a
> text file is added with "svn add" - its contents will appear in "svn
> diff" output, but text files created as a result of "svn move" or "svn
> copy" operation will not.

In trunk, svn diff has a --show-copies-as-adds option, which causes copied
and moved files to be displayed even if they weren't modified after being
copied/moved. This will be released in 1.7.

> To get this missing information one need to
> run "svn status", check for the presence of copied or moved files
> (marked with "A +"), check these files are not binary, manually
> reconstruct change chunk for them and append missing data to the
> output of "svn diff". But even after that reviewer still won't be able
> to exactly reproduce changeset, because "svn diff" format will not
> contain information about source of copied or moved file. And here
> comes the second part.

svn diff does show deleted files, so it also shows the delete half
of each move. With the new --git option of svn diff, you get headers
which tell you where something was copied from.

> "svn diff" format doesn't record enough information to reproduce
> committed changeset. For example, it doesn't have data about source of
> copied and moved files. This is believed to be solved by "git diff"
> format, but it won't be a panacea either, because Subversion
> changesets also contain information about properties, mime types etc.

svn diff and svn patch in trunk can show and apply property diffs,
respectively. This will be released in 1.7.

> It is also impossible to include binary files (if needed) or original
> author info (can be useful for contibulyzer), or any other information
> that a given VCS (Subversion in this case) is needed to completely
> reconstruct its own changeset.

Support for binary data is on the todo list for svn diff / svn patch.
Nothing has been implemented yet.

Showing author information is interesting, though in the general case
where a diff spans multiple revisions it may not be very useful.
But note also that in Subversion trunk, svn log has a --diff option which
shows the committed diff beneath the log message (which includes author
and date information). This will also be released in Subversion 1.7.

> For code reviews, ideally, code review system such as Rietveld should
> grab the changeset, parse it and extract relevant information for
> reviewer (skipping or filtering non-interesting parts and giving
> warning about unknown parts). It should also save original or filtered
> changeset file to be imported and committed if review is successful.
>
>
> That's why extensible changeset format is required. It will not only
> be useful for sending changesets for review, but also for
> synchronizing changes with other VCSes. With new changeset format
> mirroring tool could automatically analyze incoming data to find
> Subversion related attributes to save them into repository directly
> and automatically save all other attributes to properties.

You realise that it's often impossible to represent data generated
by one version control tool in another version control tool?
If that was an easy problem, the company I work for would be out of
business because nobody would need our help. We're often migrating data
between version control systems, and there is always compromise involved.

Some things, like add/delete, and maybe even copy (unless you count older
systems like CVS), are virtually universal.
But renames are already represented very differently in virtually every tool.
Directories are another example -- some tools version them, some don't.
And most meta data, like EOL-style and character set of files, commit author
information, list of files touched by a changset, etc., is represented in very
different and sometimes incompatible ways, and sometimes not at all.

There is no single data format that can really solve this problem.
Version control tools differ. In general, you cannot magically mirror every
aspect of a change made in one tool to another tool.

I'm not saying that a common changeset exchange data format would be useless.
It would certainly help if all tools had a unified way of exporting and
importing changesets. But it will always be limited to handling the lowest
common denominator, which often isn't enough. The svn diff --git is the
best we've got so far. It's not perfect, but it's a good step forward.

> I see this format as an XML format that resembles Atom feed, with
> logical order of events (i.e. file removed after it was copied etc.).
> Subversion already uses XML formats internally,

Subversion uses virtually no XML internally.
It can produce some XML for presentation, but data isn't being stored
as XML inside of Subversion.

> so I logically assume
> that folks here possess required experience and may even have some
> ready pieces to work out an initial draft of such format.

We've added the --git option to svn diff, which produces output compatible
to Mercurial and git for some common operations (add, delete, copy).
That's a common denominator, and the format is nice because it is readable.

svn diff also has an --xml option which makes it produce XML output.
Currently that only works in --summarize mode, and only for repository to
repository diffs. You cannot use it to show changes in a working copy.
I guess if there really is a need we could extend the XML output.
But I think the --git diff format is nicer, because it contains more
information and is already usable by at least two other tools. Maybe
more tools will start to support it, now that Subversion also supports it.

I hope the new features I've listed above will help you solve the problems
you're trying to solve. If you have further ideas about how they can be
improved, please share them.

Thanks,
Stefan
Received on 2010-08-26 13:17:46 CEST

This is an archived mail posted to the Subversion Dev mailing list.