[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [A few SCM lists] Diff/Comparison of file formats others than ASCII/source code?

From: Alessandro Bottoni <alessandro.bottoni_at_libero.it>
Date: 2002-08-14 09:07:53 CEST

On Wednesday 14 August 2002 02:43, Jonathan S. Shapiro wrote:
>On Thu, 2002-08-08 at 23:08, Donovan Baarda wrote:
>> Any merge/diff3 operation must be file-format aware to get it right. Text
>> is a nice common denominator that the existing unix diff3 can handle.
>> Something like a HTML/XML/RTF aware diff3 could be relatively easily
>> implemented as a postprocessing stage to merges produced by the standard
>> diff3 because they use text as their underlying format.
>True, but not helpful. Most tools that export HTML export it without
>newlines, making line-based diff essentially useless.
>IBM has a decent DOM diff program for XML that could be readily adapted
>to HTML. I think that I'ld be inclined to go that way rather than try to
>hack up diff for this purpose.

Let me underline a strange situation...

Most, if not all, the RCS-SCM that I'm reviewing (wisely) delegate the
Diff/Merge/Comparison of the files content to an external program so, in
principle, most RCS-SCM can be adapted to every possible file format, from
XML/HTML to RTF to 3D CAD models and so on.

Strangely enough, there are very few of such "file-format-specific"
Diff/Merge tools around. This is strange because it is clear that such tools
could have a huge market. Just think to how many companies have large
repositories of CAD drawings, RTF (or, worse, MS Word) documents and HTML
files (that is: web sites). A RCS tool that was able to manage such file
formats would be of great help for a lot of people.

I hope that some developer of the list will think over this market
niche (even if as a commercial, not open source, one).

From a technological point of view, it is absolutely right that the diff,
merge and comparison operations must be carried out at a document structure
level ("DOM", "document tree", call it as you like most), not at a ASCII text
level. Maybe, it is not true that this would require a specific Diff/Merge

Most likely, these operations can be performed by invoking and driving an
existing file-format-aware program via CORBA, DCOP, COM-Automation or .NET.
As an example: the so-hated MS Word is accessible via COM-Automation. You can
load one or more .DOC documents, you can diff/compare them and you can
highlight the differencing text using the COM functions exposed by MS Word
via COM, only. All these operations can be performed without cracking the
so-secret MS Word file format and without having to write a new "WordDiff"
tool. It is even possible that MS Word was able to write down its own
diff-file on request, given that it is able to keep track of revisions.

Most likely, the same concept can be applied to RTF documents (via Ted,
Kwrite or AbiWord, maybe), HTML documents (via KHTML, Gecko or an XML parser)
and to CAD drawing (thanks to a the specific client).

What do you think? Am I dreaming?

PS: I did not cross-check the MSDN to see if MS Word has any built-in diff
and/or merge feature that can be reached via COM-Automation. It was just an

Alessandro Bottoni

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Aug 14 10:48:14 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.