[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Subversion equivalent to VSS diff for binary files

From: David Weintraub <qazwart_at_gmail.com>
Date: Mon, 18 Oct 2010 09:47:32 -0400

Subversion does handle binary files without any problems. In uses the
property svn:mime-type on the file to mark it as a binary file, so it
knows not to attempt a text merge on the file.

Subversion does a good job with handling binaries. However, there is
issue that makes storing binary files in Subversion problematic.
Subversion really doesn't have an easy way to remove individual
revisions of particular files. Normally, with text files, this isn't
an issue. Text files are stored as diffs and removing a particular
revision of a text file won't save a lot of room in the repository.
Most people don't bother removing text revisions unless the text
revision contains inappropriate or proprietary information that you
don't want kept around.

However, binary files are a bit different. Changing one line in a file
and then compiling it may cause a cascade of changes, so the resulting
difference between the previous revision of the binary and current
version of the binary are quite huge. Storing a binary file as
revisions in ANY revision control system takes up a great deal of
space when compared with storing revisions of text.

In many sites, the built objects are stored under revision control,
maybe for every single build. You do this after a while, and you'll
chew up a lot of disk space. To handle this, many sites have a way of
identifying obsolete binary revisions and destroying them. I remember
several papers in Perforce conferences on this very topic. (The idea
was to remove the space hogging binary revision without destroying the
revision itself. That way, you'd still see the history, but not have
access to the file contents).

So, the best thing to do is not to store binary files when you don't
have to. Storing binaries is done for several reasons some are:

1. Not being sure that you can repeat your build process, so you want
to keep the binary revision "just in case". The solution to this is to
create a repeatable build process, so you don't have to store the
binaries.

2. Storing releases. A very common tactic, but revision control
systems aren't really ideal for this anyway. Most people who need to
grab the releases aren't necessarily developers, so using a revision
control system to get the release they want simply adds complexity. A
better way is to use a release repository system.

3. Storing third party artifacts. This is usually not a space issue
since it is unlikely you'll be storing a hundred revisions of a
particular third party binary. You might, maybe update a third party
binary one or twice a year. The problem with this (which is a problem
with every revision control system out there) is that you quickly lose
the true identify of the third party revision. This happens all the
time with Jar files. Is that log4j.jar revision 1.2.3 or 1.4.6? How do
I know? In the end, you'll end up with a pile of unidentifiable and
probably obsolete third party binaries.

Considering that the whole purpose of revision control is identifying
what is in your software, having a wad of unknown third party binaries
isn't a great way to accomplish this task.

The true solution to this is to use a release repository system. If
you use Ant, it's quite simple to incorporate ivy, and once done, the
developers are usually quite happy with the results. Even if you
aren't working on a Java project, you should use some sort of release
repository for this type of stuff.

Should you ever store binary files in Subversion? Of course, but only
when it is really the best way to handle the problem. If your source
code includes JPGs and GIFs, or you include a Word document in your
release, there isn't really an alternative, but to store the binary
file. Space isn't an issue since these files are relatively small and
aren't updated that frequently. (Compare a few megabytes of a Word
document that you update once per month vs. storing a 1.5 gigabyte
build that you produce two to three times per day).

So, there are two sides to binaries in Subversion story:

Yes, Subversion handles binaries just as well as other revision
systems. Some say even better. Subversion knows what files are binary
by using the svn:mime-type property. In fact, Subversion can, unlike
many version control systems, actually distinguish between binary
types, and it is possible via third party tools to actually diff
binary files (like between two Word documents).

No, Subversion doesn't allow easy pruning of space hogging binaries,
and therefore it can cause problems in that respect. If you're using a
revision control system, and now have a policy of removing obsolete
binaries on a regular basis, you'll have problems continuing this with
Subversion. However, if this is a problem, it's more likely due to
incorrectly using your revision control system (unable to rebuild
order binaries in a consistent manor, or using your version control
system as a release repository). The solution would be to fix the
underlying problem rather than not to use Subversion.

-- 
David Weintraub
qazwart_at_gmail.com
Received on 2010-10-18 15:48:12 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.