[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: improving subversion treatment of compressed XML/text file formats

From: Ryan Schmidt <subversion-2008c_at_ryandesign.com>
Date: Wed, 22 Oct 2008 17:45:01 -0500

On Oct 22, 2008, at 10:03, David Kaplan wrote:

> I use subversion as my personal backup system. Though I do my
> share of
> coding, a lot of what I put in my subversion database are
> compressed XML
> files (for example, openoffice documents). Currently, svn treats
> these
> as binary files, leading to a ballooning svn database as there is no
> differencing on these files (correct me if I am wrong about this).

Subversion stores all files in the repository as differences against
previous versions. It does not differentiate between text or binary
files at this point. However, depending on the compression algorithm,
compressed files don't necessarily lend themselves to efficient
diffing, which can result in them taking more space in the repository
over time than the uncompressed versions would have.

> For a while I have been thinking that svn could do a lot better than
> that since these are trivially compressed files. This could reduce
> significantly the amount of disk space that versioning these files
> requires and improve the ability to see differences between files
> (e.g.,
> conflict resolution). As these file formats are popping up everywhere
> (openoffice, MS Office, ...), it might be worth integrating a third
> "type" of file into svn (along with text and binary): compressed-text.
> Someone smarter than I might even be able to do this with the current
> architecture of hooks with minimal changes to subversion itself, but a
> formal integration doesn't seem too hard.

Note that an OpenOffice.org file is not a compressed text file, but a
compressed directory of several text files.

> The basic idea would be that when svn adds one of these files, it adds
> the full compressed version initially, but thereafter it uncompresses
> stored and working copy versions, differences them and just stores
> these
> differences. The user would specify which file formats to
> autodetect as
> compressed text and the compression algorithm for each file type
> through
> configuration options and svn properties.
>
> One question would be what to do with conflicts, but I think this
> isn't
> a show stopper and a logical behavior can be found.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-10-23 00:45:35 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.