[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Problem with large files

From: listman <listman_at_burble.net>
Date: 2006-08-29 08:41:02 CEST

so vdelta and md5 are the culprits?

anything we can do in the short to improve this?

On Aug 28, 2006, at 10:38 PM, Brandon Ehle wrote:

> I resurrected the script and uploaded it here
>
> http://subversion.kicks-ass.org/genrepos.pl
>
> There are a couple of parameters you can specify for the type of
> repository you want. I typically use the "medium" or "large" style
> repositories for profiling.
>
> If you want to trash and make your machine box crash, try the
> "everything" repository. It creates a 5GB working copy and keeps
> checking in binary file changes until your machine runs out of disk
> space.
>
>
> Also, here is one of my recent KCachegrind profiles of a large
> binary checkout operation over ra_local for 1.5.0-dev.
>
> http://subversion.kicks-ass.org/checkout.png
>
>
> There has been a bunch of improvements, since I last profiled this
> (around version 0.28), but vdelta is still taking most of the time
> with MD5 calculation a close second.
>
> Most if the vdelta time appears to be spent doing the two
> comparisons and the branch in find_match_len(). Although this is
> most likely related to the cache misses caused by find_match_len().
>
> It also appears that the MD5 sum for the checked out files are
> calculated multiple times in multiple places during a ra_local
> checkout and a large portion of the time is spent doing that.
>
>
> Brandon Ehle wrote:
>> I have a Perl script I made to profile this problem when I submitted
>> this problem to the bug tracker a couple years ago.
>> http://subversion.tigris.org/issues/show_bug.cgi?id=913
>> It will generate you an asset repository that simulates an artist
>> working on textures and generates as many revisions as you want.
>> I'll try to dig it back up and send it to you.
>> Daniel Berlin wrote:
>>> On 8/28/06, Garrett Rooney <rooneg@electricjellyfish.net> wrote:
>>>> On 8/28/06, Ben Collins-Sussman <sussman@red-bean.com> wrote:
>>>>> I suspect the problem here isn't about working copy efficiency,
>>>>> it's
>>>>> the fact that we delta-encode every file that gets stuffed into
>>>>> the
>>>>> repository, even if it's something as simple as committing a
>>>>> file to a
>>>>> local file:/// repository. That takes a lonnnnnnnng time on huge
>>>>> binary files.
>>>> That's why I was hoping Jeremy would hand some real world test
>>>> cases
>>>> off to DannyB so he could make it Go Real Fast ;-)
>>>>
>>> I've emailed every person who, on users@ has complained in the
>>> thread
>>> about large file binary performance, and begged them to give me
>>> repos
>>> and files i can reproduce with, promising to fix their speed issues.
>>> I've even sent out the attached patch for testing
>>>
>>> I'm still waiting for an answer. :-(
>>>
>>> They seem to want solutions without having to test them.
>>>
>>> The last time someone had a significant binary performance problem
>>> with large files, I sent them the attached (which disables
>>> vdelta, and
>>> as such, is only really a good idea on svndiff1 using repos and
>>> networks with no 1.3 clients/servers).
>>> Basically, tell anyone who wants to try that they should take this
>>> patch and create a new repo with a patched subversion, and dump/load
>>> the old repo into the new one, and give checkouts/etc a try.
>>>
>>> The report from the one person who has ever tried it with large
>>> files
>>> was that it sped up commit times from 45 minutes to less than 5 ;)
>>>
>>>
>>> --------------------------------------------------------------------
>>> ----
>>>
>>> Index: text_delta.c
>>> ===================================================================
>>> --- text_delta.c (revision 20792)
>>> +++ text_delta.c (working copy)
>>> @@ -148,7 +148,8 @@ compute_window(const char *data, apr_siz
>>> build_baton.new_data = svn_stringbuf_create("", pool);
>>> if (source_len == 0)
>>> - svn_txdelta__vdelta(&build_baton, data, source_len,
>>> target_len, pool);
>>> + svn_txdelta__insert_op(&build_baton, svn_txdelta_new, 0,
>>> source_len,
>>> + data, pool);
>>> else
>>> svn_txdelta__xdelta(&build_baton, data, source_len,
>>> target_len, pool);
>>>
>>>
>>> --------------------------------------------------------------------
>>> ----
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
>>> For additional commands, e-mail: dev-help@subversion.tigris.org
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
>> For additional commands, e-mail: dev-help@subversion.tigris.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Aug 29 08:41:59 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.