[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: skels in string table

From: Blair Zajac <blair_at_orcaware.com>
Date: Wed, 11 Jun 2008 14:57:16 -0700

Ben Collins-Sussman wrote:
> On Wed, Jun 11, 2008 at 4:45 PM, Karl Fogel <kfogel_at_red-bean.com> wrote:
>> "Vyacheslav V. Zholudev" <vyacheslav.zholudev_at_gmail.com> writes:
>>> Hello!
>>>
>>> I'm digging into BDB backend and thought that strings "table" can
>>> contain only either full text or deltas, but I found out that smth
>>> like:
>>>
>>> "((ins.xml 5 6.0.7) (install_log.xml 5 4.0.5) (cp.xml 5 7.0.8))"
>>>
>>> is written to the strings table as a fulltext, which reminds me some
>>> skel. Or is it smth else? If yes, how can i distinguish whether the
>>> full content of file is written or smth else?
>>> Thanks in advance!
>> Vyacheslav,
>>
>> Since directories are just lists of entries anyway, we represent them
>> using the same skel syntax used for reps metadata -- so that's how
>> skel-like data ends up in the strings table.
>>
>> Each subcell above is a directory name followed by the name of its node
>> revision (the name is a string, so the length comes first, followed by a
>> space; see subversion/libsvn_fs_base/util/skel.h for details, but you've
>> probably already read that file :-) ).
>
> Alas, this design doesn't scale well when you have versioned
> directories with thousands of children. Every attempt to add, delete,
> or modify a child in the directory causes the *entire* list of dirents
> to be loaded into memory, then serialized back to disk again. As the
> number of children in the directory increases, it becomes O(N^2) to to
> modify them.
>
> I'm dreaming of fixing this someday... perhaps by splitting up a
> directory's dirents across multiple string-keys.

Yes, if you have 20,000 entries in a single directory, then in an fsfs backend,
a single modification to one of these entries or a child ends up creating a 700
kByte revision. Ouch!

In our svn backend, we're now md5 hashing entries into a one directory deep hash
buckets, using 30 buckets per directory.

Introducing this hashing shrunk one repository from 102.12 to 10.96 Gbytes.

Blair

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-06-11 23:57:42 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.