[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RFC: an idea for making 'svn list' faster

From: Garrett Rooney <rooneg_at_electricjellyfish.net>
Date: 2005-10-22 00:38:13 CEST

I'm looking at ways to make 'svn list' faster. Specifically, it seems
quite odd to that running 'svn list' on a large directory full of tags
takes far far longer than viewing that directory via mod_dav_svn.

It turns out that the reason for this is simple. mod_dav_svn doesn't
do very much work when you view a directory, it just calls
svn_fs_dir_entries and prints out the results, since really all you
need to create that html page is the svn_fs_dirent_t for each entry,
basically just the name and the kind, so you can tell if it's a
directory or not. When svn list tries to get the same data (for the
non-verbose case anyway, the verbose case requires more information
obviously) it does so via svn_ra_get_dir, which marshals svn_dirent_t
objects over the wire. Now a svn_dirent_t has more in it than an
svn_fs_dirent_t, specifically it holds info like the time the item was
modified, the last author, the rev it was last changed in, it's size,
and so on and so forth. Calculating this information is what really
kills us on large directories. If you hack svn_ra_get_dir to not ask
for any of that information svn list will complete almost instantly on
even very large directories (I tried it with a directory that contains
800 subdirectories, and it takes less time to retrieve the dirents
than it did to print them out to the screen, previously it took around
5 seconds to retrieve them).

So what's the solution to this problem? Well, for the verbose mode of
svn list we obviously still need this info, but for the non-verbose
mode it's totally useless. So I'd like to propose a new
svn_ra_get_dir2 function, which takes a new argument that indicates
what parts of the dirent you get back from the server. It'd look
something like this.

#define SVN_DIRENT_FIELD_KIND = 0x00001;
#define SVN_DIRENT_FIELD_SIZE = 0x00002;
#define SVN_DIRENT_FIELD_TIME = 0X00016;

svn_error_t *
svn_ra_get_dir2(svn_ra_session_t *session,
                const char *path,
                svn_revnum_t revision,
                apr_hash_t **dirents,
                int dirent_fields,
                svn_revnum_t *fetched_rev,
                apr_hash_t **props,
                apr_pool_t *pool);

So for a non-verbose svn list, we'd do something like:

SVN_ERR (svn_ra_get_dir2 (session,

Since we only really care about the kind of the field, not about any
of the other fields. Obviously this change would have to bubble up to
the svn_client_ls level as well, but that change would be very similar
to what you see here.

How would this be implemented? Well, for ra_dav it's actually pretty
simple, we just conditionalize the array of props we pass in to
svn_ra_dav__get_props. If you ask for the SVN_DIRENT_FIELD_HAS_PROPS
portion we'll have to get all properties back, at least until we add
in Jean-Marc Godbout's deadprops patch from issue 2151, but still,
this can all be implemented with the current mod_dav_svn code, and
it'll be quite fast for the non-verbose case, which is enough for some
uses, like listing a large directory full of tags or branches,
assuming that you use reasonable naming conventions for tags and
branches. For ra_svn it's obviously a bit more complex, probably
requiring both server and client changes (although I haven't actually
looked yet), but still, not exactly rocket science here.

Any thoughts?


To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 22 00:39:07 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.