[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: crash managing a large FSFS repository

From: <kfogel_at_collab.net>
Date: 2004-12-13 19:22:52 CET

Simon Spero <ses@unc.edu> writes:
> One approach to reducing the amount of memory needed would be to use a
> data structure that models directories, rather than complete paths.
> Each directory node should have its own lookup table; the keys can be
> just the name of the immediate child relative to this node.
> Intermediate nodes for path components that haven't been seen
> themselves should be marked as such; if the path is later explicitly
> encountered, the mark can be cleared (or vice versa).
>
> This approach requires space roughly proportional to the number of
> directories and files in the transaction, rather than total path
> length. For big, flat namespaces, this isn't much of a win, but it
> also isn't much worse; as the name space gets deeper, and closer to
> real source repositories, the win gets bigger. This approach also
> makes it faster to determine parent/child relationships.

This is how the Subversion repository itself is structured, actually.

The current interface of fetch_all_changes() is a result of the public
API it is supporting, namely, svn_fs_paths_changed(). We could
certainly make a new svn_fs_paths_changed2() that returns the
information in a different way, and adjust the internal code
accordingly (the old code would just become the obvious wrapper,
converting the tree structure to a flat hash).

We'd also want to write functions for accessing the tree structure,
for example:

   svn_error_t *
   svn_tree_has_path (svn_boolean_t *has_path,
                      svn_tree_t *tree,
                      const char *path);

Before we go down this road, though, we'd want to make absolutely sure
that the problem is the total paths length, that is, that the
assertion

> At the moment the code uses memory roughly proportional to the total
> lengths of all paths in the transactions.

is both true and the cause of our problems. I'm pretty sure it's
true, of course, it's the second half I'm not positive about :-).
Are you sure that path lengths are relevant to total memory usage, or
are they just lost in the noise?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Dec 13 19:26:08 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.