[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

SVN_STREAM_CHUNK_SIZE way too low?

From: Daniel Berlin <dan_at_dberlin.org>
Date: 2002-02-08 16:48:46 CET

While I realize a delta combiner would likely fix the space issue i'm
about to mention, i don't think it really changes the speed issue at all,
nor does my profiler.

I'm wondering who figured out the chunk size we currently use for streams
(and in turn, the delta window size).

I ask because in coming up with log and cache statistics, i ran into the
following:

original CVS repo: 40 meg (two gcc directories):

With SVN_STREAM_CHUNK_SIZE at 102400:
Time to convert: 2 hours, 16 minutes
Final Size: 2.3 gig (no joke)

Reason: Lots of files > 102400 (source and Changelogs).
Thus, almost everything is stored fulltext.
[root@danberlin db]# db_dump -p representations |grep fulltext|wc -l
  22453
[root@danberlin db]# db_dump -p representations |grep delta|wc
   3270

With SVN_STREAM_CHUNK_SIZE at 1024000:
Time to convert: 1 hour, 10 minutes
Final Size: 27 meg
[root@danberlin db]# db_dump -p representations |grep fulltext|wc
   1050
[root@danberlin db]# db_dump -p representations |grep delta|wc
   24673

So it's not only smaller (which is expected), but twice as fast.

Not as much of this is IO time as one would think, the firewire disk it's
writing to sustains >20 meg a second writes, and is using reiserfs.

Did anyone calculate SVN_STREAM_CHUNK_SIZE, or was it randomly chosen?
While the space would be fixed by a delta combiner, i don't think the
speed would.

Removing the actual commit of the transaction, to avoid any real io time,
shaves 15 minutes off the first one, and 14 minutes off the second one, so
it's not an io time thing.
Profiling shows all the CPU time being spent in the delta routines in
either case.

I'll play with other values of SVN_STREAM_CHUNK_SIZE, and repositories,
to come up with more statistics, i'm just curious how the value was determined.

--Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:05 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.