[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svndumpfilter and svnsync?

From: Chris <devnullaccount_at_yahoo.se>
Date: Wed, 10 Oct 2018 07:04:52 +0000 (UTC)

Hi again,

I managed to get some better permissions so I don't have to do svnsync and can get by with doing incremental dumps/loads, but I'm a bit confused by the svndumpfilter + load process so any help would be appreciated.

First of all, my statement about the dump taking 2 weeks was a big fat urban legend. More like 20 minutes so that's good news.

I've trawled through bad commits of data files in our repo and added such paths to a filter file that I'm using for svndumpfilter to get a reasonably-looking dump. In most cases, the files in question existed in a single path(branch( and were no problem. But in some cases, the same files had been copied to a 2nd branch and then svndumpfilter gave me errors about missing source paths, so I added the same path on the 2nd branch to the filter expressions and tried again. After a few iterations of this process, I have a dump that should do what I want.
So I start "svnadmin load" and based on initial progress, that might take a couple of days to complete so I leave it overnight. I get back today and the load has crashed with a missing path. The error was:

svnadmin: E160013: File not found: transaction '16289-ckh', path 'branches/second/dir/datafile'

And looking up the history for that file, I see that "datafile" was added on branch "first" but the path "branches/first/dir" is already in my filter list. So why didn't svndumpfilter throw me an error on this like it did for a lot of other cases?
Since the load process it so much slower, the turnaround time for each error in that step is beyond painful, so if there's anything that I can do to assure that this gets caught by the filter would make my life a lot easier.

The syntax I used:
svnadmin dump -q MYREPO | svndumpfilter exclude --targets filterfile > filterdump
svnadmin load -q --no-flush-to-disk --force-uuid -M 2048 --bypass-prop-validation ./NEWREPO < filterdump

(I had to use the bypass-prop-validation due to some newline issues in old log message, similar to this one https://groups.google.com/forum/#!topic/subversion_users/P3ohZ-hKhCA, don't know why they have wrong newlines, but the repo works as it is now...)

An additional question about what Johan wrote below:
>- You can perfectly well use a 1.10 version of svnadmin or svnsync (or svnrdump, to create
>a dumpfile from a remote server) to interact with a 1.8 server / repository.

Can I even do this with "svnadmin load"; I thought that would use an FSFS version 8 while 1.8 should have 6? I got that impression from my "research", but I'm probably off base.

TIA,
  Chris

--------------------------------------------
On Thu, 10/4/18, Johan Corveleyn <jcorvel_at_gmail.com> wrote:

 Subject: Re: svndumpfilter and svnsync?
 To: "Chris" <devnullaccount_at_yahoo.se>
 Cc: "Ryan Schmidt" <subversion-2018_at_ryandesign.com>, "Daniel Shahaf" <d.s_at_daniel.shahaf.name>, "Subversion" <users_at_subversion.apache.org>
 Date: Thursday, October 4, 2018, 4:26 PM
 
 On Thu, Oct 4, 2018 at 3:03 PM
 Chris <devnullaccount_at_yahoo.se>
 wrote:
>
> (apologies
 for the top-posting, I really need to stop using this yahoo
 web interface which is useless with quoting)
>
> Thanks for all the
 replies. I'll try out what you outlined. There are
 unfortunately problems outside of my control that makes it
 worse and that is that for company-internal policy reasons,
 I'm not allowed direct access to the server, I'm
 only able to get a copy of the repo to work with and a
 promise that they can replace the repo with my modified
 version when I'm done. This might make some of the
 suggestions hard to work with, but I'll see if seems
 possible. Also, the server runs 1.8, and I have no authority
 to get it upgraded. I think I may have a chance to change
 the read permissions for the sync user though, so
 there's a ray of light somewhere in there :)
>
> W.r.t. Johan's
 question about the time consumption for dumping, I
 haven't been yet able to test it myself, I only got this
 as second-hand info from someone who did a dump of the repo
 last year, so I hope that is completely incorrect. Will try
 dumping as soon as I get my hands on a repo copy.
>
> Regarding why the
 repo is so large: my estimate from running some analysis on
 old revisions is that 90-95% of the data consists of
 beginners doing accidental commits of things that should not
 have been allowed to commit
>
 
 Okay, good luck with those
 "operations". I wanted to add a couple more
 bits of info:
 
 - After dump+filter+load or
 svnsync-with-filtering (effectively
 creating
 a new repository with an alternate history compared to
 the
 original) your new repository will /
 should have a new UUID. This
 effectively
 invalidates all existing working copies out there (which
 keep track of the UUID they were a checkout
 from). So all users will
 have to checkout
 new working copies.
 
 - You
 can perfectly well use a 1.10 version of svnadmin or svnsync
 (or
 svnrdump, to create a dumpfile from a
 remote server) to interact with
 a 1.8 server
 / repository. So if using a more modern version of
 svnadmin or svnsync is beneficial, you should
 use it :).
 
 - A dump file
 can be (much) larger than the original repository
 itself, depending on how the dump is created.
 That's because the
 repository
 potentially uses deltification, compression and
 "representation sharing". If you use
 the --deltas option for 'svnadmin
 dump', it will be smaller, at the expense
 of cpu time for the
 deltification. Usually
 people will not use the --deltas option when
 sending it directly through a pipe (saving on
 the cpu cycles for
 deltification), but when
 writing it to a file you should probably use
 --deltas.
 
 --
 Johan
Received on 2018-10-10 09:05:16 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.