Hi Neels,
Neels J Hofmeyr writes:
> On a side note, svnsync happens to be relatively slow. I tried to svnsync
> the ASF repos once (for huge test data). The slowness of svnsync made it
> practically unfeasible to pull off. I ended up downloading a zipped dump and
> 'svnadmin load'ing that dump. Even with a zipped dump already downloaded,
> 'unzip | svnadmin load' took a few *days* to load the 950.000+ revisions.
> (And someone rebooted that box after two days, halfway through, grr. Took
> some serious hacking to finish up without starting over.)
Yeah, we had a tough time obtaining the complete undeltified ASF dump
for testing purposes as well.
> So, that experience tells me that svnsync and svnadmin dump/load aren't
> close to optimal, for example compared to a straight download of 34 gigs
> that the ASF repos is... Anything that could speed up a remote dump/load
> process would probably be good -- while I don't know any details about svnrdump.
I just benchmarked it recently and found that it dumps 10000 revisions
of the ASF repository in 106 seconds: that's about 94 revisions per
second. It used to be faster than `svnadmin` in an older benchmark:
I'll work on perf issues this week. I estimate that it should be
possible to get it to dump at ~140 revisions/second.
@Daniel and others: I'd recommend a feature freeze. I'm currently
profiling svnrdump and working on improving especially the I/O
profile.
> My two cents: Rephrasing everything into the dump format and back blows up
> both data size and ETA. Maybe a remote backup mechanism could even break
> loose from discrete revision boundaries during transfer/load...
I've been thinking about this too: we'll have to start attacking the
RA layer itself to make svnrdump even faster. The replay API isn't
optimized for this kind of operation.
> P.S.: If the whole ASF repos were a single Git WC, how long would that take
> to pull? (Given that Git tends to take up much more space than a Subversion
> repos, I wonder.)
The gzipped undeltified dump of the complete ASF repository comes to
about 25 GiB and it takes ~70 minutes to import it into the Git object
store using a tool which is currently under development in Git. Thanks
to David for these statistics.
Cloning takes as long as it takes to transmit this data. After a
repack, it'll probably shrink in size, but that's besides the
point. Git was never designed to handle this- each project being a
separate repository would be a fairer comparison. Even linux-2.6.git
contains just 210887 revisions, and it tests Git's limits.
-- Ram
Received on 2010-09-27 09:47:31 CEST