[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svndumpfilter and svnsync?

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Wed, 10 Oct 2018 10:42:37 +0200

On Wed, Oct 10, 2018 at 9:16 AM Ryan Schmidt
<subversion-2018_at_ryandesign.com> wrote:
>
>
>
> On Oct 10, 2018, at 02:04, Chris wrote:
>
> > I've trawled through bad commits of data files in our repo and added such paths to a filter file that I'm using for svndumpfilter to get a reasonably-looking dump. In most cases, the files in question existed in a single path(branch( and were no problem. But in some cases, the same files had been copied to a 2nd branch and then svndumpfilter gave me errors about missing source paths, so I added the same path on the 2nd branch to the filter expressions and tried again. After a few iterations of this process, I have a dump that should do what I want.
> > So I start "svnadmin load" and based on initial progress, that might take a couple of days to complete so I leave it overnight. I get back today and the load has crashed with a missing path. The error was:
> >
> > svnadmin: E160013: File not found: transaction '16289-ckh', path 'branches/second/dir/datafile'
> >
> > And looking up the history for that file, I see that "datafile" was added on branch "first" but the path "branches/first/dir" is already in my filter list. So why didn't svndumpfilter throw me an error on this like it did for a lot of other cases?
> > Since the load process it so much slower, the turnaround time for each error in that step is beyond painful, so if there's anything that I can do to assure that this gets caught by the filter would make my life a lot easier.
>
> I don't know the answer to that, but:

Hm, not really a clear answer here either. I don't know why
svndumpfilter did not detect these.

However, you might also give 'svnadmin dump --exclude' a try, if you
can use version 1.10 of svnadmin.
http://subversion.apache.org/docs/release-notes/1.10.html#dump-include-exclude

This feature works similarly to 'svnsync with an authz file that
denies the excluded files'. That means that, when the source of a copy
is excluded, the copy is transformed into an add (so to complete
eliminate a bad file and all its copies this might be more difficult
to get a hold of these copies ... you won't get any warnings or errors
I think -- not sure if it emits a notification for such a copy-to-add
conversion). OTOH, 'svnadmin dump --exclude' supports wildcards if you
add the --pattern option, so it might be easier to filter out all
appearances of a specific filename, as in 'svnadmin dump --pattern
--exclude /*/datafile'.

>
> > The syntax I used:
> > svnadmin dump -q MYREPO | svndumpfilter exclude --targets filterfile > filterdump
> > svnadmin load -q --no-flush-to-disk --force-uuid -M 2048 --bypass-prop-validation ./NEWREPO < filterdump
> >
> > (I had to use the bypass-prop-validation due to some newline issues in old log message, similar to this one https://groups.google.com/forum/#!topic/subversion_users/P3ohZ-hKhCA, don't know why they have wrong newlines, but the repo works as it is now...)
>
> Instead of ignoring wrong newlines, you could fix them using svndumptool (using its eolfix-revprop command), originally at:
>
> http://svn.borg.ch/svndumptool/
>
> Newer fork at:
>
> https://github.com/jwiegley/svndumptool

Also, as of version 1.10, svnadmin finally has an option to normalize
these on-the-fly during 'load':
http://subversion.apache.org/docs/release-notes/1.10.html#normalize-props

It's a lot better to normalize these (either with the
--normalize-props option for 'svnadmin load' or by using svndumptool)
than to "bypass" them. Otherwise you'll run into this again later (if
you would dump+load again sometime in the future).

And another tip: put the repo-to-be-loaded-into (NEWREPO) on as fast a
storage system as possible (SSD, ramdisk if feasible, ...). If you're
satisfied with the result, run 'svnadmin pack' on that fast storage,
and only then copy it over to the final location. Depending on the
final storage that technique might save you a lot of time (especially
if you have to redo it a couple of times).

> > An additional question about what Johan wrote below:
> >> - You can perfectly well use a 1.10 version of svnadmin or svnsync (or svnrdump, to create
> >> a dumpfile from a remote server) to interact with a 1.8 server / repository.
> >
> > Can I even do this with "svnadmin load"; I thought that would use an FSFS version 8 while 1.8 should have 6? I got that impression from my "research", but I'm probably off base.
>
> If you use a newer version of svnadmin (than the one that will be used to serve the repo) to create the new repo and load the dump file, then make sure you pass the right --compatible-version argument to svnadmin create.

Indeed. It's at 'svnadmin create' time that the FSFS version is
decided. 'svnadmin load' will just "commit" new revisions in the
repository that you first created, and it will follow / respect the
FSFS format that's already set. So it's perfectly doable to create and
load a NEWREPO with 1.10 svnadmin, which you intend to be served by a
1.8 svn server (as long as you use the --compatible-version argument
at create time). (Small note though: 1.8 is no longer supported, so if
you can, plan to do an upgrade to 1.9 or preferably 1.10 soon).

-- 
Johan
Received on 2018-10-10 10:43:01 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.