[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: SVN Dump Question

From: Justin Connell <justin.connell_at_propylon.com>
Date: Tue, 16 Feb 2010 17:39:40 +0000

Andrey Repin wrote:
> Greetings, Justin Connell!
>
>
>>> I have a really huge repo that occupies 151 GB of space on the file system.
>>> Just to give some background, there is a lot of content that gets added and
>>> deleted from the repo, now we are sitting with a rev number of over 1, 500 000.
>>>
>>> My question is, would it be possible to take a dump of just a specified
>>> path within the repo, for example if my repo is located at /path/to/repo ,
>>> could I run a dump such as /path/to/repo/specific/location/in/repo ?
>>>
>
>
>> No but you can take a complete dump and pipe it through svndumpfilter to
>> extract out just the part you want.
>>
>
> To clarify, if it's unclear, you could attempt to directly pipe the dump to
> filter on the fly, saving disk space for intermediate storage.
> Although, it's not the one failsafe process, I'm afraid.
>
>
> --
> WBR,
> Andrey Repin (anrdaemon_at_freemail.ru) 16.02.2010, <20:01>
>
> Sorry for my terrible english...
>
>
>
>
Thanks Andrey,
The reason, I'm asking such strange questions is that I have a very
abnormal situation on my hands here. I previously took a full dump of
the repo (for the reason you implied) where the original size of the
repo on disk was 150 GB, and the resulting dump file ended up at 46 GB.
This was quite unexpected (the dump is usually larger than the repos on
smaller repos that I have worked on).

Just as a sanity check, this is what I was trying to accompliesh:

Scenario - The repo needs to get trimmed down from 150 GB to a more
maintainable size. We have a collection of users who access the
repository as a document revision control system. Many files have been
added and deleted over the past 2 years and all these transactions have
caused such an astronomical growth in the physical size of the repo. My
mission is to solve this issue preferably using subversion best
practices. There are certain locations in the repo that do not have to
retain version history and others that must retain their version history.

Proposed solution -

   1. Take a full dump of the repo
   2. run a svnadmin dumpfilter including the paths that need to have
      version history preserved into a single filtered.dump file
   3. export the top revision of the files that do not require version
      history to be preserved
   4. create a new repo and load the filtered repo
   5. import the content form the svn export to complete the process

Is this a sane approach to solving the problem? and what about the size
difference between the dump file and the original repo - am I loosing
revisions (the dump shows all revision numbers being written to the dump
file and this looks correct).

Another aspect could also be that there are unused log files occupying
disk space (we are not using Berkley DB though) is this a valid
assumption to make when using the FS configuration.

Thanks so much to all who have responded to this mail, and all of you
who take the time and read these messages

Justin
Received on 2010-02-16 18:40:20 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.