[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Pruning out old revisions?

From: David Weintraub <qazwart_at_gmail.com>
Date: Fri, 3 Oct 2008 11:49:18 -0400

On Oct 2, 2008, at 11:03 PM, David Weintraub wrote:
> However, if you are deleting entire obsolete projects or older
> versions of binary versions you will probably save quite a bit of
> room.

On Fri, Oct 3, 2008 at 6:11 AM, Ryan Schmidt
<subversion-2008c_at_ryandesign.com> wrote:
>
> That could certainly be. However you have to weigh the cost of the extra (or
> larger) hard drive against the cost of making your repository temporarily
> unavailable while your dump, filter, and load, and having everyone check out
> new working copies and transfer over any uncommitted changes from their old
> working copies.

Unfortunately, it isn't very easy in a production environment to get a
bigger disk drive. True, you can buy a 500Gb drive for under $200 at
Best Buy, but that's not the way it works in the production world. You
may need a high performance drive for a SAND, or several for a RAID.
Then, there's getting the backup setup for the larger drive. Plus
administrative overhead.

The last place where I worked charged our department about $100 per
gigabyte in storage. Plus, it could take weeks to get it. The place I
worked before that wouldn't give me more than 16 Gigabytes total for
an entire source repository that had over 20 years of history in it
(it was in RCS format and not Subversion) and that's only after I went
to the head of the division to complain.

You're right though about deciding whether or not the lack of access
to the source repository is worth the dump, filter, and load, but I'm
merely laying out the option. The original poster can write up a
proposal and explain to their boss what's involved. It may very well
be that they decide to opt for more disk space. That's definitely what
I would tell my boss in this situation.

The real problem is that Subversion doesn't have a way of removing
older versions of files without taking down the entire repository for
an extended period of time. This is really a killer in many commercial
development environments. Saying "Get a bigger disk" is not a good
answer when you don't control your own systems and where your
technical service department charges extremely high prices for disk
storage.

Plus, you have situations where someone accidently checked in
information that contains proprietary information that doesn't belong
in the source repository. Removing this information is impossible with
Subversion without a dump, filter, and load.

> Subversion does not distinguish between text and binary files at this level;
> it stores all files as differences. Some types of binary files can be stored
> quite efficiently this way, but others (especially some compressed formats)
> not so much.

Actually, Subversion does differentiate between binaries and text
files. It has to. Otherwise, you might end up attempting to merge
binaries. What Subversion doesn't do is distinguish the difference
between binaries and text files in storage. Subversion uses properties
to say whether a file is binary or text.

In both Perforce and ClearCase, you could store binaries in diff
format if you so chose to. The repository would work. Of course, you'd
have to make sure you don't merge binaries. But, the main reason both
of these companies chose to use a separate format for binary storage
is space efficiency. They store binaries in compressed format and both
claim that binary files don't store in diff format very efficiently.

Subversion's decision to use the same format for binary and text came
out of the open source environment. Disk space is cheap, so why add
complexity to a program to save a gigabyte here or there? I actually
agree with this sentiment, but not the IT departments I have to work
with.

--
David Weintraub
qazwart_at_gmail.com
On Fri, Oct 3, 2008 at 6:11 AM, Ryan Schmidt
<subversion-2008c_at_ryandesign.com> wrote:
>
> On Oct 2, 2008, at 11:03 PM, David Weintraub wrote:
>
>> On 10/2/08, Ryan Schmidt wrote:
>>
>>> On Oct 2, 2008, at 2:08 PM, David Weintraub wrote:
>>>
>>>> There's no EASY way to remove information in a Subversion repository.
>>>>
>>>> What you need to do is do a data dump, filter the results, and then
>>>> reload the data into a new Subversion repository. Yes, it isn't fun.
>>>>
>>>> You need to be on the repository host and use the "svnadmin dump"
>>>> command to dump the repository.
>>>>
>>>> Once you do that, you can use svndumpfilter command to filter out the
>>>> unwanted revisions.
>>>>
>>>> Once you have a filtered dump file, you can use "svnadmin load" to
>>>> reload your data into a new repository.
>>>>
>>>> See <http://svnbook.red-bean.com/en/1.5/svn-
>>>> book.html#svn.reposadmin.maint.tk.svndumpfilter>
>>>> for more information.
>>>
>>> And what you also need to realize is that since you use branches (and
>>> maybe other cheap copies within your project), removing history may
>>> make your repository larger instead of smaller, as all the cheap
>>> copies of information you're deleting has to be converted into full
>>> representations.
>>>
>>> Basically, the answer to your question is don't. Get a larger hard
>>> drive.
>>
>> You're right if you are selecting versions of text files.
>>
>> However, if you are deleting entire obsolete projects or older
>> versions of binary versions you will probably save quite a bit of
>> room.
>
> That could certainly be. However you have to weigh the cost of the extra (or
> larger) hard drive against the cost of making your repository temporarily
> unavailable while your dump, filter, and load, and having everyone check out
> new working copies and transfer over any uncommitted changes from their old
> working copies.
>
>> Almost all version control systems use diff format for saving text
>> files, but this is not true for binaries nor if you're deleting whole
>> directory trees.
>
> Subversion does not distinguish between text and binary files at this level;
> it stores all files as differences. Some types of binary files can be stored
> quite efficiently this way, but others (especially some compressed formats)
> not so much.
>
>
>
>>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-10-03 17:49:50 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.