[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [arch-users] Re: [A few SCM lists] Diff/Comparison of file formats others than ASCII/source code?

From: <angles_at_aminvestments.com>
Date: 2002-08-16 23:15:18 CEST

I work in a commercial RE office, I review lease documents on occasion. I've seen
lawyers (or their secretaries) send a draft in MS Word, then later make changes to
that MS Word file, then use an external app that (somehow) compares the old file and
the new file and produces a Rich Text Format .rtf file that has what is called the
"markup", where text removed from the old file is "strike thru" (and maybe bolded),
and new text is underlined and/or italics.

The .rtf file had a line at the top of the first page telling the name of the app
that made the file, and the command that was used. This was an awesome tool. I could
use the .rtf file on my Linux box (at the time .doc was a problem), and also make
that into HTML for a web site.

Indeed, there is a market for such tools.

P.S. My opinion, Open Source or not, the type of people who would use that tool
probably expect to pay for it and have support at the ready.

Alessandro Bottoni (alessandro.bottoni@libero.it) wrote*:
>On Wednesday 14 August 2002 02:43, Jonathan S. Shapiro wrote:
>>On Thu, 2002-08-08 at 23:08, Donovan Baarda wrote:
>>> Any merge/diff3 operation must be file-format aware to get it right. Text
>>> is a nice common denominator that the existing unix diff3 can handle.
>>> Something like a HTML/XML/RTF aware diff3 could be relatively easily
>>> implemented as a postprocessing stage to merges produced by the standard
>>> diff3 because they use text as their underlying format.
>>True, but not helpful. Most tools that export HTML export it without
>>newlines, making line-based diff essentially useless.
>>IBM has a decent DOM diff program for XML that could be readily adapted
>>to HTML. I think that I'ld be inclined to go that way rather than try to
>>hack up diff for this purpose.
>Let me underline a strange situation...
>Most, if not all, the RCS-SCM that I'm reviewing (wisely) delegate the
>Diff/Merge/Comparison of the files content to an external program so, in
>principle, most RCS-SCM can be adapted to every possible file format, from
>XML/HTML to RTF to 3D CAD models and so on.
>Strangely enough, there are very few of such "file-format-specific"
>Diff/Merge tools around. This is strange because it is clear that such tools
>could have a huge market. Just think to how many companies have large
>repositories of CAD drawings, RTF (or, worse, MS Word) documents and HTML
>files (that is: web sites). A RCS tool that was able to manage such file
>formats would be of great help for a lot of people.
>I hope that some developer of the list will think over this market
>niche (even if as a commercial, not open source, one).
>From a technological point of view, it is absolutely right that the diff,
>merge and comparison operations must be carried out at a document structure
>level ("DOM", "document tree", call it as you like most), not at a ASCII text
>level. Maybe, it is not true that this would require a specific Diff/Merge
>Most likely, these operations can be performed by invoking and driving an
>existing file-format-aware program via CORBA, DCOP, COM-Automation or .NET.
>As an example: the so-hated MS Word is accessible via COM-Automation. You can
>load one or more .DOC documents, you can diff/compare them and you can
>highlight the differencing text using the COM functions exposed by MS Word
>via COM, only. All these operations can be performed without cracking the
>so-secret MS Word file format and without having to write a new "WordDiff"
>tool. It is even possible that MS Word was able to write down its own
>diff-file on request, given that it is able to keep track of revisions.
>Most likely, the same concept can be applied to RTF documents (via Ted,
>Kwrite or AbiWord, maybe), HTML documents (via KHTML, Gecko or an XML parser)
>and to CAD drawing (thanks to a the specific client).
>What do you think? Am I dreaming?
>PS: I did not cross-check the MSDN to see if MS Word has any built-in diff
>and/or merge feature that can be reached via COM-Automation. It was just an
>Alessandro Bottoni
>arch-users mailing list

That's "angle" as in geometry.
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Aug 17 00:16:15 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.