I haven't thought about it hard yet; we just need to 'split up' a
directory's entries into multiple places somehow. Perhaps via hash
buckets, as Blair has described. The ultimate feature we want is that
it should be *fast* to locate a single entry, and *fast* to append a
new entry to a directory, regardless of how many children it has. I'm
sure wikipedia has endless textbook algorithms on efficient filesystem
algorithms for this sort of thing. We simply took the naive approach
when first wrote libsvn_fs_base.
On Wed, Jun 11, 2008 at 5:31 PM, Vyacheslav V. Zholudev
<vyacheslav.zholudev_at_gmail.com> wrote:
> Ben,
>
> maybe I got you wrong, but would it be efficient if you split directory
> entries across multiple strings with the same key? Will BDB work fast if
> there are thousand of entries with the same key? How are you gonna iterate
> them in a right way? Or this question hasn't been elaborated yet?
>
> Ben Collins-Sussman wrote:
>
> On Wed, Jun 11, 2008 at 4:45 PM, Karl Fogel <kfogel_at_red-bean.com> wrote:
>
>
> "Vyacheslav V. Zholudev" <vyacheslav.zholudev_at_gmail.com> writes:
>
>
> Hello!
>
> I'm digging into BDB backend and thought that strings "table" can
> contain only either full text or deltas, but I found out that smth
> like:
>
> "((ins.xml 5 6.0.7) (install_log.xml 5 4.0.5) (cp.xml 5 7.0.8))"
>
> is written to the strings table as a fulltext, which reminds me some
> skel. Or is it smth else? If yes, how can i distinguish whether the
> full content of file is written or smth else?
> Thanks in advance!
>
>
> Vyacheslav,
>
> Since directories are just lists of entries anyway, we represent them
> using the same skel syntax used for reps metadata -- so that's how
> skel-like data ends up in the strings table.
>
> Each subcell above is a directory name followed by the name of its node
> revision (the name is a string, so the length comes first, followed by a
> space; see subversion/libsvn_fs_base/util/skel.h for details, but you've
> probably already read that file :-) ).
>
>
> Alas, this design doesn't scale well when you have versioned
> directories with thousands of children. Every attempt to add, delete,
> or modify a child in the directory causes the *entire* list of dirents
> to be loaded into memory, then serialized back to disk again. As the
> number of children in the directory increases, it becomes O(N^2) to to
> modify them.
>
> I'm dreaming of fixing this someday... perhaps by splitting up a
> directory's dirents across multiple string-keys.
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-06-12 03:27:32 CEST