[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Space saving svn enhancements ...

From: Magnus Torfason <zulutime.net_at_gmail.com>
Date: Thu, 26 Mar 2009 10:06:58 -0400

I read through the "Enhancement suggestion..." thread, and thought I
would add another potential space/bandwidth saver for some users.

The impetus is the increased prevalence of file formats that are really
nothing but zipped collections of XML files. Both Open Office and MS
Office use them. I did a test using MS office, using a 6MB Word file.

I added the file in doc format, docx format (Microsoft's devious
counterattack to the standardized zip-xml formats used by OpenOffice),
the same collection of xml files tarred instead of zipped (on the
assumption that subversion diff algorithms might work better on that
file), and finally by unzipping the collection and committing the
complete directory tree.

I then added a few paragraphs, and committed the changes that resulted
in each of the formats. They are as follows:

# Results:
#
# Type: Adding: Changing:
# doc 3954 461
# docx 3973 513
# tar 3952 880
# dir 4081 56

I am certainly quite pleased with most of the results, a change only
costs (in repo, and presumably in transit) 10% of the space the original
document. I was surprised that the tarred version delta did not come out
smaller, but the main difference was when each component file of the
zip-xml file was treated individually. Then the delta took only 10% of
the regular (docx) delta, and only 1% of the original file.

Do people think that there may be realistic ways to leverage the
knowledge that a particular file is actually a zipped/gzipped/tarred
collection of files to reduce the cost of versioning changes to such files?

Best,
Magnus

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1430471
Received on 2009-03-26 15:30:25 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.