[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: improving subversion treatment of compressed XML/text file formats

From: Benjamin Smith-Mannschott <bsmith.occs_at_gmail.com>
Date: Fri, 24 Oct 2008 18:28:15 +0200

On Oct 24, 2008, at 16:21, David Kaplan wrote:

> Hi,
>
> On Wed, 2008-10-22 at 17:45 -0500, Ryan Schmidt wrote:
>
>> Subversion stores all files in the repository as differences against
>> previous versions. It does not differentiate between text or binary
>> files at this point. However, depending on the compression algorithm,
>> compressed files don't necessarily lend themselves to efficient
>> diffing, which can result in them taking more space in the repository
>> over time than the uncompressed versions would have.
>>
>
> I wasn't sure about this point, but my experience is that small
> changes
> to a document seem to produce large diffs in the compressed version
> leading to a large repository.

I can confirm this. We'll be using an svn repository for
documentation in the future. On question that arose was which tool/
format to use for our textual documents. (I pushed for plain text
(markdown or reStructuredText) because it diffs and merges nicely, but
the usability story just isn't there for most of those who'll actually
be writing the documentation.)

I ran some tests simulating a few thousand edits and commits using a
few different formats. Traditional doc files are pretty well behaved
WRT repository space usage. ODF files stink because every edit, no
matter how minor, ends up storing the whole document in the repository
again. I discovered FODT (flat ODT), which merges all the parts of a
normal ODT file into a single XML (images and other binary things are
base-64 encoded). This sounds ludicrous, but it's quite svn-friendly.
Unfortunately, the flat variants of the openoffice.org file formats
only seem to be supported by the OO.o 2.4 included with Ubuntu. I've
not found much mention of it online and I've not found it supported
under Windows or MacOS. We finally settled on using OO.o's HTML
support as our "standard" format for textual documentation, knowing
that we could "upgrade" to ODT should we require its additional
features.

SVN Repo Space Efficiency when Edited often:

format space efficiency merge-friendlyness
============= ================ ==================
plain text very good very good
html very good good
flat ODT: good poor [1]
msword doc acceptable impossible [2]
msword docx poor impossible [2]
ODT poor impossible [2]
---------------------------------------------------
[1] This format isn't widely supported (a pitty, really).
[2] SVN will not and should not attempt to merge these
formats as they are not textual. Microsoft-word and
OpenOffice do contain features allowing a user to
perform merges independently of svn within the tool,
it's just that they'd have to do this "by hand" for
every merge conflict.
===================================================

// Ben

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-10-24 18:28:42 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.