[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Improved patch format - SoC

From: Nicolás Lichtmaier <nick_at_reloco.com.ar>
Date: 2007-03-31 22:50:07 CEST

Hi, when I learned about SoC I started looking at the patch format task.
But I didn't post anything and that task has been already asked. Anyway,
near one of the deadlines, I submited a proposal. But I don't want to
step into Charles Acknin toes, it would be great if we can cooperate in
both, implementhing this and learning about Subversion coding. Or I can
do some other Subversion task, if that's undesirable. Or even nothing
=). Anyway, I'll post here some of the thoughts I had about the issue.

This is what I have written:

== Introduction ==

The diff/patch tools have been great tools. The format they use (called
"unified diff") has become a standard, and has been integrated into many
development processes, often with tools that provide means to attach
patches to issues. In designing a new format, we should learn from their
success points, those are:

 * A format which serves both for automatic patching and reviewing.
 * A format so simple that you can even edit it with a text editor.
 * A terse format. Metadata gets out of the way, so it has become the
preferred format to review changes.

An interesting use case of a patch format is bugzilla. Bugzilla was
designed around CVS, and it helps a community to coordinate the work by
attaching patches to issues. Those patches are reviewed, approved and
commited.

Just as Subversion took the best of CVS concepts and created a better
version control system, we should take this format and enhance it so
that it can serve Subversion needs. Those new needs are three:

 * The ability of describe tree modifications (renames, deletions, etc.),
 * The ability to describe modifications to properties.
 * Handle binary files' modifications.

== Requirements ==

 * The format must be easily readable, and the metadata should go out of
the way. It shouldn't be verbose. E.g: I wouldn't put information in the
style of RFC-822 headers. The current format's terse metadata is an
example of this.

 * The patch should convey all the meaning implied in a 'svn merge'
operation that is considered reasonable. A patch should be like tearing
the merge action into two steps, i.e.: 'diff + patch = merge'.

 * As the patch format should be designed so it can be used by other
tools, all Subversion specific references should be tagged as such. This
doesn't necesarilly mean complicating the format or overengineering it.
It's just leaving some place for expansion (e.g. ignoring unknown
merge-tracking info)

* The old format must keep working. This new feature should me
implemented with a new switch. Or... currently there's no support for
tree modifications, perhaps if there are no tree modifications the
format would be compatible...

* Support for binary diffs. Binary diffs would just be diffs created
with the binary diff algorithm already present in subversion, and then
encoded in base64 to get a text representation. The algorithm name
should be stated to allow for change in the future.

* The new format should handle properties. IMO they should be diffed as
if each property value were a file (easily see which lines had been
added to svn:ignore).

== Relationship with merge-tracking ==

As patching from an improved patch should work like a normal merge there
are implications related to the merge-tracking functionality. Of course,
this would be in case of patching in the same repository.

At first look, it seems that just including the merge info property
would suffice. It would need to be special-cased though, to resolve the
"elision", i.e. to include the merge info when that info being inherited
from a parent directory.

The most common case will be that patches and WCs are all from the same
repository, as patches are exchanged in a given coding community working
around a common repository. Cross-repository diff+patch case should be
studied carefuly. I would just ignore the merge info in cross-repo
pathes (perhaps a "force" switch could be used). This means that the
GUID of the repository should be included in the patch.

== Design notes ==

I don't think the new format should be 'unified diff' compatible. I see
much more value in showing more clearly the tree modifications. E.g.: A
file rename should be marked as a file rename. Not as the whole file
disappearing (with '-' lines) and reappearing (with '+' lines).

Implementation idea for the above: Each diff part could have a
'copy-from' field and a 'copy-to' field. This would allow for a clear
display of files that have been modified after being copied. The pure
rename case would only be a copy-from, copy-to part.

To handle this new format, the idea would be to have a standalone
patching program. This standalone code could be, hopefuly, later added
by GNU patch or other tools (I wouldn't use APR for this).

== Optional future steps ==

It could be posible to have a standalone tool to convert back and forth
the standard "unified diff" format and the new one. This tool would fail
when converting to unified diff a patch which has tree modifications.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Mar 31 22:50:27 2007

This is an archived mail posted to the Subversion Dev mailing list.