Re: svndumpfilter and svnsync?

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Thu, 4 Oct 2018 14:36:22 +0200

On Thu, Oct 4, 2018 at 2:33 PM Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
>
> Ryan Schmidt wrote on Thu, 04 Oct 2018 06:04 -0500:
> > On Oct 4, 2018, at 02:32, Chris wrote:
> > > I figured using svnsync to get the "cleaned repo" up to date with the changes on the "live repo", but a note in the svnsync documentation says "The only commits and revision property modifications that ever occur on that mirror repository should be those performed by the svnsync tool". Does that also include this kind of cleanup operation where I remove paths that don't exist on HEAD?
>
> Yes. The precondition for running 'svnsync' is that every revision in
> the target repository is identical to the corresponding revision in the
> source repository. "Correspondence", in this sense, simply means
> numeric equality: r5 must correspond to r5, not to r6 nor to r4.
>
> > > If I should't use svnsync for this, what should I do instead?
>
> You should use svnsync and set the source repository's URL to a URL
> that has authz restrictions denying read to the large binary blobs.
>
> That's it.

Indeed, like Daniel said, you can do this with svnsync by setting up
and authz configuration on the source repository, denying read access
to the problematic files to the svnsync user (see [1]).

Also, I'm quite surprised that dumping your repository takes 2 weeks.
What version of svn are you using? I'm used to 'load' taking a long
time (but that has been improved a lot in 1.10 by adding a
--no-flush-to-disk option for 'svnadmin load' [2]), but 'dump'
shouldn't take that long. Perhaps the problem is that the dump file is
getting way too large. You can also consider piping svnadmin dump |
svndumpfilter | svnadmin load.

I would also suggest you read this FAQ entry [3], where I documented a
procedure (which I've used myself) to perform a dump + load, while the
source repo is still fully online. The initial dump+load can take a
long time. Then you follow up with an incremental dump+load to catch
up with commits that happened in the meantime (you can repeat this
catch-up procedure as many times as you like, so you eventually have
minimal downtime for the "final catchup").

Another useful thing for you to look at is the new --include and
--exclude options for 'svnadmin dump' directly, which have been added
in svn 1.10 [4]. These work in a similar way as svnsync + (denying via
authz). If you go that route, you don't need to use svndumpfilter.

[1] http://subversion.apache.org/faq.html#removal
[2] http://subversion.apache.org/docs/release-notes/1.10.html#no-flush-to-disk
[3] http://subversion.apache.org/faq.html#dumpload
[4] http://subversion.apache.org/docs/release-notes/1.10.html#dump-include-exclude

-- 
Johan

Received on 2018-10-04 14:49:23 CEST

This message: [ Message body ]
Next message: Yasuhito FUTATSUKI: "svn_client_list() description in API document(svn_client.h)"
Previous message: Daniel Shahaf: "Re: svndumpfilter and svnsync?"
In reply to: Daniel Shahaf: "Re: svndumpfilter and svnsync?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]