[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: SVN_STREAM_CHUNK_SIZE way too low?

From: <cmpilato_at_collab.net>
Date: 2002-02-08 17:10:31 CET

Daniel Berlin <dan@dberlin.org> writes:

> While I realize a delta combiner would likely fix the space issue i'm
> about to mention, i don't think it really changes the speed issue at all,
> nor does my profiler.
> I'm wondering who figured out the chunk size we currently use for streams
> (and in turn, the delta window size).

I believe the Gred Hudson performed some tests as to most optimal
window size, and came up with 102400. Greg?

> original CVS repo: 40 meg (two gcc directories):
> With SVN_STREAM_CHUNK_SIZE at 102400:
> Time to convert: 2 hours, 16 minutes
> Final Size: 2.3 gig (no joke)
> Reason: Lots of files > 102400 (source and Changelogs).
> Thus, almost everything is stored fulltext.

This doesn't make sense. Just because a file is larger than the delta
window shouldn't mean it gets stored as fulltext. It just means that
the whole file's data isn't available for compression heuristics.

> [root@danberlin db]# db_dump -p representations |grep fulltext|wc -l
> 22453
> [root@danberlin db]# db_dump -p representations |grep delta|wc
> 3270
> With SVN_STREAM_CHUNK_SIZE at 1024000:
> Time to convert: 1 hour, 10 minutes
> Final Size: 27 meg
> [root@danberlin db]# db_dump -p representations |grep fulltext|wc
> 1050
> [root@danberlin db]# db_dump -p representations |grep delta|wc
> 24673

Are the sizes of log-files factored into the values? In my opinion,
the only size values worth comparing are the sizes of the strings

I'd love to re-create your test locally and do some poking around in
the results, if that's possible. Is your setup available for snarfing
online somewhere?

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:05 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.