[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Betr.: Re: "svnadmin load" a huge file

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Thu, 20 Jan 2011 19:09:01 +0100

On Thu, Jan 20, 2011 at 6:11 PM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> Victor Sudakov wrote on Thu, Jan 20, 2011 at 14:18:00 +0600:
>> Colleagues,
>>
>> I have finally completed a test cvs2svn conversion on an amd64 system.
>> The peak memory requirement of svnadmin during the conversion was
>> 9796M SIZE, 1880M RES. The resulting SVN repo size is 8.5G on disk.
>>
>> "svnadmin dump --deltas" of this new SVN repo required 6692M SIZE,
>> 2161M RES of memory at its peak.  Such memory requirements make this
>> repo completely unusable on i386 systems.
>>
>> The original CVS repo is 59M on disk with 17859 files (including those
>> in the Attic) and total 23911 revisions (in SVN terms). All files are
>> strictly text.
>>
>> Something seems to be very suboptimal either about SVN itself or about
>> the cvs2svn utility. I am especially surprised by the 8.5G size of the
>> resulting SVN repository (though the result of "svnadmin dump --deltas"
>> is 44M).
>>
>> > - Copy your CVS repository (say /myreypository to /myrepositoryconv)
>> > - In the copy move the ,v files into several subdirectories (using the
>> > operating system, not using CVS commands.)
>> > - Convert the directories one at a time and load them into svn.
>> > - Once loaded into svn you can move everything back into one folder
>> >   (using svn commands) if desired.
>>
>> Even if I do this, after moving everything back I will not be able to
>> do "svnadmin dump" on an i386 system, perhaps unless I write some
>> script which will iterate and keep track of dumped revision numbers.
>>
> That's not a nice result, but I think I said somewhere in this thread
> that there are known memory-usage bugs in svnadmin dump/load. Which
> means the fix (as opposed to 'workaround') to this issue is to have
> someone (possibly you or someone you hire) look into those bugs.
>
> With a bit of luck, this will boil down to looking for some place where
> allocations should be done in a scratch_pool or iterpool instead of some
> long-lived result_pool (which may be called 'pool'). One can compile
> with APR pool debugging enabled to information about what's allocated
> from which pool.
>
> Paul Burba's work on the recent fixed-in-1.6.15
> DoS-via-memory-consumption CVE can serve as an example.
>
> Daniel
> (workarounds are plenty --- svnsync, incremental dump, whatnot --- they
> are discussed elsethread)

But that doesn't explain why the resulting repository is so large
(compared to the original CVS repository). Sure, there might be memory
usage problems in dump/load (it uses more memory than the resulting
repository uses diskspace), but I think there is more going on.

That's why I'm guessing on rev files being large (and the
corresponding memory structures) because of the amount of dir entries
in each revision. I'm not that intimately familiar with how this is
all represented, and how the rev files are structured and all that, so
I'm just guessing ... I seem to remember something like this from
another discussion in the past.

Cheers,

-- 
Johan
Received on 2011-01-20 19:10:03 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.