[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: SVN_STREAM_CHUNK_SIZE way too low?

From: Daniel Berlin <dan_at_dberlin.org>
Date: 2002-02-08 17:37:26 CET

On 8 Feb 2002 cmpilato@collab.net wrote:

> Daniel Berlin <dan@dberlin.org> writes:
> > Buzz.
> > reps-strings.c:svn_fs__rep_deltify
> > ...
> >
> > /* To favor time over space, we don't currently deltify files that
> > are larger than the svndiff window size. This might seem
> > counterintuitive, but most files are smaller than a window
> > anyway, and until we write the delta combiner or something
> > approaching it, the cost of retrieval for large files becomes
> > simply prohibitive after about 10 or so revisions. See issue
> > #531 for more details. */
> >
> > Window size == SVN_STREAM_CHUNK_SIZE.
> *blink* *blink*. When did THAT happen?! Sorry, Daniel, that just
> wasn't our behavior the last time I was in that code, unless my brain
> is seriously failing me.

Hey, i think it's funny too, but it could be right.
I haven't tried *retrieving* a lot from the resulting db's, only
Though i do retrieve some random pieces just to make sure the script
didn't screw anything.

> Wow. So, yeah, your results make a lot more sense to me now! :-)

> > > Are the sizes of log-files factored into the values?
> >
> > No.
> > Those numbers are only the strings table size anyway.
> Xlnt.
> > > In my opinion,
> > > the only size values worth comparing are the sizes of the strings
> > > table.
> >
> > [root@danberlin db]# ls -l
> > ...
> > -rw-r--r-- 1 root root 2354819072 Feb 8 07:45 strings
> Heh heh. That just looks funny.
I thought i had read the number wrong at first, then i tried to db_dump
it, and canceled after it printed nothing for 2 minutes. "strings" on it
showed the same file repeated a bunch of times, so i checked it out in
more detail.

You should see how long it takes db_stat to give me btree stats on
it. :)

We also waste 100 meg in overflow pages (I'm testing, on a separate
computer, whether using a hash database helps here, since the identifiers
are always unique, string like, generally short, and the hash database has
less metadata overhead.)

Locality is a wash for us anyway, since the identifiers close to each
other in name have no bearing on what they actually represent, meaning a
btree just doesn't help at all. In fact, since we always insert in order,
depending on their balancing, we may be hurting. It's funny that we also
since we end up with the same tree all the time, for every database with
n representations in it.

> > > I'd love to re-create your test locally and do some poking around in
> > > the results, if that's possible. Is your setup available for snarfing
> > > online somewhere?
> >
> > rsync the gcc-cvs repository (600 meg), use cvs2svn.py (with patches to
> > make it work right) to convert egcs/gcc/cp to an svn repository
> Well, now that I know that we're just not deltifying files >
> window_size, I'm not so interested in playing with the results.
> That's just plain *wrongness* in our code.

I know a delta combiner would help the space issue, i'm just trying to
make sure we are using the right window size for speed.


To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:05 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.