[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: SVN_STREAM_CHUNK_SIZE way too low?

From: Daniel Berlin <dan_at_dberlin.org>
Date: 2002-02-08 17:20:32 CET

On 8 Feb 2002 cmpilato@collab.net wrote:

> Daniel Berlin <dan@dberlin.org> writes:
>
> > While I realize a delta combiner would likely fix the space issue i'm
> > about to mention, i don't think it really changes the speed issue at all,
> > nor does my profiler.
> >
> > I'm wondering who figured out the chunk size we currently use for streams
> > (and in turn, the delta window size).
>
> I believe the Gred Hudson performed some tests as to most optimal
> window size, and came up with 102400. Greg?
>
> > original CVS repo: 40 meg (two gcc directories):
> >
> > With SVN_STREAM_CHUNK_SIZE at 102400:
> > Time to convert: 2 hours, 16 minutes
> > Final Size: 2.3 gig (no joke)
> >
> > Reason: Lots of files > 102400 (source and Changelogs).
> > Thus, almost everything is stored fulltext.
>
> This doesn't make sense. Just because a file is larger than the delta
> window shouldn't mean it gets stored as fulltext.

Buzz.
reps-strings.c:svn_fs__rep_deltify
...

/* To favor time over space, we don't currently deltify files that
     are larger than the svndiff window size. This might seem
     counterintuitive, but most files are smaller than a window
     anyway, and until we write the delta combiner or something
     approaching it, the cost of retrieval for large files becomes
     simply prohibitive after about 10 or so revisions. See issue
     #531 for more details. */

Window size == SVN_STREAM_CHUNK_SIZE.

> It just means that
> the whole file's data isn't available for compression heuristics.
>
> > [root@danberlin db]# db_dump -p representations |grep fulltext|wc -l
> > 22453
> > [root@danberlin db]# db_dump -p representations |grep delta|wc
> > 3270
> >
> > With SVN_STREAM_CHUNK_SIZE at 1024000:
> > Time to convert: 1 hour, 10 minutes
> > Final Size: 27 meg
> > [root@danberlin db]# db_dump -p representations |grep fulltext|wc
> > 1050
> > [root@danberlin db]# db_dump -p representations |grep delta|wc
> > 24673
>
> Are the sizes of log-files factored into the values?

No.
Those numbers are only the strings table size anyway.
> In my opinion,
> the only size values worth comparing are the sizes of the strings
> table.

[root@danberlin db]# ls -l
...
-rw-r--r-- 1 root root 2354819072 Feb 8 07:45 strings

<switch to other dir>

[root@danberlin db]# ls -l
...
-rw-r--r-- 1 root root 27483123 Feb 8 08:49 strings

>
> I'd love to re-create your test locally and do some poking around in
> the results, if that's possible. Is your setup available for snarfing
> online somewhere?

rsync the gcc-cvs repository (600 meg), use cvs2svn.py (with patches to
make it work right) to convert egcs/gcc/cp to an svn repository

--Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:05 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.