[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Bulk copying revprops

From: Ivan Zhakov <ivan_at_visualsvn.com>
Date: Wed, 5 Aug 2015 15:05:08 +0300

On 24 July 2015 at 22:58, Philip Martin <philip.martin_at_wandisco.com> wrote:
> [Arising from some discussion on IRC today.]
>
> I've been considering the problem of a dump/load upgrade for a
> repository with a large number of revisions. To minimise downtime the
> initial dump/load would be carried out while the original repository
> remains live. When the load finishes the new repository is already
> out-of-date so an incremental dump/load is carried out. When this
> second load finishes the original repository is taken offline and we
> want to bring the new repository online as quickly as possible. A final
> incremental dump/load is required but that only involves a small number
> of revisions and so is fast. The remaining problems are locks and
> revprops.
>
> We do not have tools to handle locks so the options are: a) drop all the
> locks, or b) copy/move the whole db/locks subdir. I'm not really
> interested in locks at present.
>
> Revprops are more of a problem. Most revprops are up-to-date but a
> small number may be out-of-date. The problem is we do not know which
> revprops are out-of-date. Is there a reliable and efficient way to
> bring the revprops up-to-date? We could attempt to disable and/or track
> revprop changes during the load but this is not reliable. Post- hooks
> are not 100% reliable and revprop changes can bypass the hooks. We
> could attempt to copy/move the whole revprops subdir that is not always
> possible if the repository formats are different.
>
> One general solution is to use svnsync to bulk copy the revprops:
>
> ln -sf /bin/true dst/hooks/pre-revprop-change
> svnsync initialize --allow-non-empty file:///src file:///dst
> svnsync copy-revprops file:///src file:///dst
>
> This isn't very fast, I get about 2,000 revisions a minute for
> repositories on an SSD. There are typically three revprops per
> revisions and the FS/RA API change one at time. Each change must run
> the mandatory pre-revprop-change hook and fsync() the repository.
> svnsync has a simple algorithm that writes every revprop for each
> revision.
>
> A repository with a million revisions svnsync would invoke three million
> processes to run the hooks and three million fsync(). Typically, most
> of this work is useless because most of the revprops already match.
>
> I wrote a script using the Python FS bindings (see below). This avoids
> the hooks and also elides the writes when the values already match.
> Typically this just has to read and so will process several hundred
> thousand revisions a minute. This will reliably update a million
> revisions in minutes.
>
> I was thinking that perhaps we ought to provide a more accessible way to
> do this. First, modify the FS implementations to detect when a change
> is a noop that doesn't modify a value and skip all the writing. Second
> provide some new admin commands to dump/load revprops:
>
> svnadmin dump-revprops repo | svnadmin load-revprops repo
>
May be use existing 'load' subcommand with '--revprops-only' switch to
load revprops instead of new subcommand? I.e.:
  svnadmin dump --revprops-only | svnadmin load --revprops-only

-- 
Ivan Zhakov
Received on 2015-08-05 14:06:29 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.