(apologies for the top-posting, I really need to stop using this yahoo web interface which is useless with quoting)
Thanks for all the replies. I'll try out what you outlined. There are unfortunately problems outside of my control that makes it worse and that is that for company-internal policy reasons, I'm not allowed direct access to the server, I'm only able to get a copy of the repo to work with and a promise that they can replace the repo with my modified version when I'm done. This might make some of the suggestions hard to work with, but I'll see if seems possible. Also, the server runs 1.8, and I have no authority to get it upgraded. I think I may have a chance to change the read permissions for the sync user though, so there's a ray of light somewhere in there :)
W.r.t. Johan's question about the time consumption for dumping, I haven't been yet able to test it myself, I only got this as second-hand info from someone who did a dump of the repo last year, so I hope that is completely incorrect. Will try dumping as soon as I get my hands on a repo copy.
Regarding why the repo is so large: my estimate from running some analysis on old revisions is that 90-95% of the data consists of beginners doing accidental commits of things that should not have been allowed to commit
BR,
Chris
--------------------------------------------
On Thu, 10/4/18, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
Subject: Re: svndumpfilter and svnsync?
To: "Chris" <devnullaccount_at_yahoo.se>
Cc: "Ryan Schmidt" <subversion-2018_at_ryandesign.com>, "Daniel Shahaf" <d.s_at_daniel.shahaf.name>, "Subversion" <users_at_subversion.apache.org>
Date: Thursday, October 4, 2018, 2:36 PM
On Thu, Oct 4, 2018 at 2:33 PM
Daniel Shahaf <d.s_at_daniel.shahaf.name>
wrote:
>
> Ryan
Schmidt wrote on Thu, 04 Oct 2018 06:04 -0500:
> > On Oct 4, 2018, at 02:32, Chris
wrote:
> > > I figured using
svnsync to get the "cleaned repo" up to date with
the changes on the "live repo", but a note in the
svnsync documentation says "The only commits and
revision property modifications that ever occur on that
mirror repository should be those performed by the svnsync
tool". Does that also include this kind of cleanup
operation where I remove paths that don't exist on
HEAD?
>
> Yes. The
precondition for running 'svnsync' is that every
revision in
> the target repository is
identical to the corresponding revision in the
> source repository.
"Correspondence", in this sense, simply means
> numeric equality: r5 must correspond to
r5, not to r6 nor to r4.
>
> > > If I should't use svnsync
for this, what should I do instead?
>
> You should use svnsync and set the source
repository's URL to a URL
> that has
authz restrictions denying read to the large binary
blobs.
>
> That's
it.
Indeed, like Daniel
said, you can do this with svnsync by setting up
and authz configuration on the source
repository, denying read access
to the
problematic files to the svnsync user (see [1]).
Also, I'm quite surprised
that dumping your repository takes 2 weeks.
What version of svn are you using? I'm used
to 'load' taking a long
time (but
that has been improved a lot in 1.10 by adding a
--no-flush-to-disk option for 'svnadmin
load' [2]), but 'dump'
shouldn't take that long. Perhaps the
problem is that the dump file is
getting way
too large. You can also consider piping svnadmin dump |
svndumpfilter | svnadmin load.
I would also suggest you read
this FAQ entry [3], where I documented a
procedure (which I've used myself) to
perform a dump + load, while the
source repo
is still fully online. The initial dump+load can take a
long time. Then you follow up with an
incremental dump+load to catch
up with
commits that happened in the meantime (you can repeat
this
catch-up procedure as many times as you
like, so you eventually have
minimal
downtime for the "final catchup").
Another useful thing for you
to look at is the new --include and
--exclude options for 'svnadmin dump'
directly, which have been added
in svn 1.10
[4]. These work in a similar way as svnsync + (denying
via
authz). If you go that route, you
don't need to use svndumpfilter.
[1] http://subversion.apache.org/faq.html#removal
[2] http://subversion.apache.org/docs/release-notes/1.10.html#no-flush-to-disk
[3] http://subversion.apache.org/faq.html#dumpload
[4]
http://subversion.apache.org/docs/release-notes/1.10.html#dump-include-exclude
--
Johan
Received on 2018-10-04 15:03:47 CEST