Re: place of svnrdump

From: Ramkumar Ramachandra <artagnon_at_gmail.com>
Date: Mon, 27 Sep 2010 13:15:35 +0530

Hi Neels,

Neels J Hofmeyr writes:
> On a side note, svnsync happens to be relatively slow. I tried to svnsync
> the ASF repos once (for huge test data). The slowness of svnsync made it
> practically unfeasible to pull off. I ended up downloading a zipped dump and
> 'svnadmin load'ing that dump. Even with a zipped dump already downloaded,
> 'unzip | svnadmin load' took a few *days* to load the 950.000+ revisions.
> (And someone rebooted that box after two days, halfway through, grr. Took
> some serious hacking to finish up without starting over.)

Yeah, we had a tough time obtaining the complete undeltified ASF dump
for testing purposes as well.

> So, that experience tells me that svnsync and svnadmin dump/load aren't
> close to optimal, for example compared to a straight download of 34 gigs
> that the ASF repos is... Anything that could speed up a remote dump/load
> process would probably be good -- while I don't know any details about svnrdump.

I just benchmarked it recently and found that it dumps 10000 revisions
of the ASF repository in 106 seconds: that's about 94 revisions per
second. It used to be faster than `svnadmin` in an older benchmark:
I'll work on perf issues this week. I estimate that it should be
possible to get it to dump at ~140 revisions/second.

@Daniel and others: I'd recommend a feature freeze. I'm currently
profiling svnrdump and working on improving especially the I/O
profile.

> My two cents: Rephrasing everything into the dump format and back blows up
> both data size and ETA. Maybe a remote backup mechanism could even break
> loose from discrete revision boundaries during transfer/load...

I've been thinking about this too: we'll have to start attacking the
RA layer itself to make svnrdump even faster. The replay API isn't
optimized for this kind of operation.

> P.S.: If the whole ASF repos were a single Git WC, how long would that take
> to pull? (Given that Git tends to take up much more space than a Subversion
> repos, I wonder.)

The gzipped undeltified dump of the complete ASF repository comes to
about 25 GiB and it takes ~70 minutes to import it into the Git object
store using a tool which is currently under development in Git. Thanks
to David for these statistics.

Cloning takes as long as it takes to transmit this data. After a
repack, it'll probably shrink in size, but that's besides the
point. Git was never designed to handle this- each project being a
separate repository would be a fairer comparison. Even linux-2.6.git
contains just 210887 revisions, and it tests Git's limits.

-- Ram
Received on 2010-09-27 09:47:31 CEST

This message: [ Message body ]
Next message: Dominique Leuenberger: "[PATCH] Use neon's system proxy detection if not explicitly specified"
Previous message: Johan Corveleyn: "[WIP PATCH] Make svn_diff_diff skip identical prefix and suffix to make diff and blame faster"
In reply to: Neels J Hofmeyr: "Re: place of svnrdump"
Next in thread: Neels J Hofmeyr: "Re: place of svnrdump"
Reply: Neels J Hofmeyr: "Re: place of svnrdump"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]