[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Problem with large files

From: Brandon Ehle <azverkan_at_yahoo.com>
Date: 2006-08-29 07:38:30 CEST

I resurrected the script and uploaded it here

http://subversion.kicks-ass.org/genrepos.pl

There are a couple of parameters you can specify for the type of
repository you want. I typically use the "medium" or "large" style
repositories for profiling.

If you want to trash and make your machine box crash, try the
"everything" repository. It creates a 5GB working copy and keeps
checking in binary file changes until your machine runs out of disk space.

Also, here is one of my recent KCachegrind profiles of a large binary
checkout operation over ra_local for 1.5.0-dev.

http://subversion.kicks-ass.org/checkout.png

There has been a bunch of improvements, since I last profiled this
(around version 0.28), but vdelta is still taking most of the time with
MD5 calculation a close second.

Most if the vdelta time appears to be spent doing the two comparisons
and the branch in find_match_len(). Although this is most likely
related to the cache misses caused by find_match_len().

It also appears that the MD5 sum for the checked out files are
calculated multiple times in multiple places during a ra_local checkout
and a large portion of the time is spent doing that.

Brandon Ehle wrote:
> I have a Perl script I made to profile this problem when I submitted
> this problem to the bug tracker a couple years ago.
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=913
>
> It will generate you an asset repository that simulates an artist
> working on textures and generates as many revisions as you want.
>
> I'll try to dig it back up and send it to you.
>
>
> Daniel Berlin wrote:
>> On 8/28/06, Garrett Rooney <rooneg@electricjellyfish.net> wrote:
>>> On 8/28/06, Ben Collins-Sussman <sussman@red-bean.com> wrote:
>>>> I suspect the problem here isn't about working copy efficiency, it's
>>>> the fact that we delta-encode every file that gets stuffed into the
>>>> repository, even if it's something as simple as committing a file to a
>>>> local file:/// repository. That takes a lonnnnnnnng time on huge
>>>> binary files.
>>> That's why I was hoping Jeremy would hand some real world test cases
>>> off to DannyB so he could make it Go Real Fast ;-)
>>>
>> I've emailed every person who, on users@ has complained in the thread
>> about large file binary performance, and begged them to give me repos
>> and files i can reproduce with, promising to fix their speed issues.
>> I've even sent out the attached patch for testing
>>
>> I'm still waiting for an answer. :-(
>>
>> They seem to want solutions without having to test them.
>>
>> The last time someone had a significant binary performance problem
>> with large files, I sent them the attached (which disables vdelta, and
>> as such, is only really a good idea on svndiff1 using repos and
>> networks with no 1.3 clients/servers).
>> Basically, tell anyone who wants to try that they should take this
>> patch and create a new repo with a patched subversion, and dump/load
>> the old repo into the new one, and give checkouts/etc a try.
>>
>> The report from the one person who has ever tried it with large files
>> was that it sped up commit times from 45 minutes to less than 5 ;)
>>
>>
>> ------------------------------------------------------------------------
>>
>> Index: text_delta.c
>> ===================================================================
>> --- text_delta.c (revision 20792)
>> +++ text_delta.c (working copy)
>> @@ -148,7 +148,8 @@ compute_window(const char *data, apr_siz
>> build_baton.new_data = svn_stringbuf_create("", pool);
>>
>> if (source_len == 0)
>> - svn_txdelta__vdelta(&build_baton, data, source_len, target_len, pool);
>> + svn_txdelta__insert_op(&build_baton, svn_txdelta_new, 0, source_len,
>> + data, pool);
>> else
>> svn_txdelta__xdelta(&build_baton, data, source_len, target_len, pool);
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
>> For additional commands, e-mail: dev-help@subversion.tigris.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Aug 29 07:39:06 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.