[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svndumpfilter and svnsync?

From: Chris <devnullaccount_at_yahoo.se>
Date: Thu, 4 Oct 2018 13:03:24 +0000 (UTC)

(apologies for the top-posting, I really need to stop using this yahoo web interface which is useless with quoting)

Thanks for all the replies. I'll try out what you outlined. There are unfortunately problems outside of my control that makes it worse and that is that for company-internal policy reasons, I'm not allowed direct access to the server, I'm only able to get a copy of the repo to work with and a promise that they can replace the repo with my modified version when I'm done. This might make some of the suggestions hard to work with, but I'll see if seems possible. Also, the server runs 1.8, and I have no authority to get it upgraded. I think I may have a chance to change the read permissions for the sync user though, so there's a ray of light somewhere in there :)

W.r.t. Johan's question about the time consumption for dumping, I haven't been yet able to test it myself, I only got this as second-hand info from someone who did a dump of the repo last year, so I hope that is completely incorrect. Will try dumping as soon as I get my hands on a repo copy.

Regarding why the repo is so large: my estimate from running some analysis on old revisions is that 90-95% of the data consists of beginners doing accidental commits of things that should not have been allowed to commit

BR,
 Chris

--------------------------------------------
On Thu, 10/4/18, Johan Corveleyn <jcorvel_at_gmail.com> wrote:

 Subject: Re: svndumpfilter and svnsync?
 To: "Chris" <devnullaccount_at_yahoo.se>
 Cc: "Ryan Schmidt" <subversion-2018_at_ryandesign.com>, "Daniel Shahaf" <d.s_at_daniel.shahaf.name>, "Subversion" <users_at_subversion.apache.org>
 Date: Thursday, October 4, 2018, 2:36 PM
 
 On Thu, Oct 4, 2018 at 2:33 PM
 Daniel Shahaf <d.s_at_daniel.shahaf.name>
 wrote:
>
> Ryan
 Schmidt wrote on Thu, 04 Oct 2018 06:04 -0500:
> > On Oct 4, 2018, at 02:32, Chris
 wrote:
> > > I figured using
 svnsync to get the "cleaned repo" up to date with
 the changes on the "live repo", but a note in the
 svnsync documentation says "The only commits and
 revision property modifications that ever occur on that
 mirror repository should be those performed by the svnsync
 tool". Does that also include this kind of cleanup
 operation where I remove paths that don't exist on
 HEAD?
>
> Yes.  The
 precondition for running 'svnsync' is that every
 revision in
> the target repository is
 identical to the corresponding revision in the
> source repository. 
 "Correspondence", in this sense, simply means
> numeric equality: r5 must correspond to
 r5, not to r6 nor to r4.
>
> > > If I should't use svnsync
 for this, what should I do instead?
>
> You should use svnsync and set the source
 repository's URL to a URL
> that has
 authz restrictions denying read to the large binary
 blobs.
>
> That's
 it.
 
 Indeed, like Daniel
 said, you can do this with svnsync by setting up
 and authz configuration on the source
 repository, denying read access
 to the
 problematic files to the svnsync user (see [1]).
 
 Also, I'm quite surprised
 that dumping your repository takes 2 weeks.
 What version of svn are you using? I'm used
 to 'load' taking a long
 time (but
 that has been improved a lot in 1.10 by adding a
 --no-flush-to-disk option for 'svnadmin
 load' [2]), but 'dump'
 shouldn't take that long. Perhaps the
 problem is that the dump file is
 getting way
 too large. You can also consider piping svnadmin dump |
 svndumpfilter | svnadmin load.
 
 I would also suggest you read
 this FAQ entry [3], where I documented a
 procedure (which I've used myself) to
 perform a dump + load, while the
 source repo
 is still fully online. The initial dump+load can take a
 long time. Then you follow up with an
 incremental dump+load to catch
 up with
 commits that happened in the meantime (you can repeat
 this
 catch-up procedure as many times as you
 like, so you eventually have
 minimal
 downtime for the "final catchup").
 
 Another useful thing for you
 to look at is the new --include and
 --exclude options for 'svnadmin dump'
 directly, which have been added
 in svn 1.10
 [4]. These work in a similar way as svnsync + (denying
 via
 authz). If you go that route, you
 don't need to use svndumpfilter.
 
 [1] http://subversion.apache.org/faq.html#removal
 [2] http://subversion.apache.org/docs/release-notes/1.10.html#no-flush-to-disk
 [3] http://subversion.apache.org/faq.html#dumpload
 [4]
 http://subversion.apache.org/docs/release-notes/1.10.html#dump-include-exclude
 
 
 --
 Johan
Received on 2018-10-04 15:03:47 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.