[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: searchable revprops?

From: David Glasser <glasser_at_mit.edu>
Date: 2007-06-08 03:11:25 CEST

On 5/15/07, Ben Collins-Sussman <sussman@red-bean.com> wrote:
> On 5/15/07, C. Michael Pilato <cmpilato@collab.net> wrote:
> > That's a good observation, Ben. Let's not be guilty of rushing something
> > under-designed into the codebase, though.
> Totally agree. I meant, "here's a yummy feature someone could take
> the time to write a design spec for." :-) Whatever the design may
> be, I imagine that the implementation will be fairly easy, now that
> we've got SQL at our disposal.

I can confirm that the latter is true; last night just for the sake of
seeing how much work it would be, I implemented the creation of
revprop indices. As recognized above, the hardest part would be
designing a flexible API for searching the index (over RA,
presumably), but as Ben said, it's not too hard to implement the
functionality once it's designed.

Other than figuring out how search would work, the other big question
would be whether sqlite should be used as the canonical location of
the data or as an auxiliary index. Advantages for the former include
avoiding redundancy and (for FSFS) space efficiency: on filesystems
with large minimum file sizes, the FSFS revprops directory is very
wasteful. For example, on my OSX machine, the minimum file size is 4k
and most revprop files are around 250 bytes; my
~/.svk/local/db/revprops/ takes up half a gig! In practice, sqlite
seems to give about 5-6x space reduction. Advantages to just being an
index include not having to deal with blocking for reads (the same
issue I raised in another thread about mergeinfo).

I'm attaching a patch of what I did last night, though of course it's
certainly not ready for production. (It only writes to the index:
there are no read APIs. I only bothered to hook it into FSFS, though
it should be trivial to hook into BDB. The API for setting a revprop
takes the hash of all the revprops for a revision even in the code
path from "propset --revprop" which is only setting one. It has the
same SQLITE_BUSY issues as the mergeinfo code. Much of the sqlite
code is copied from mergeinfo and would be factored out if this were
actually applied. I didn't think incredibly carefully about what
indices to put on the table. etc.) But it does work well enough for
me to be able to run svnsync on svn.collab.net and then run queries
like "select value, count(*) as c from revprops where name =
'svn:author' group by value order by c desc" on the sqlite db!


David Glasser | glasser_at_mit.edu | http://www.davidglasser.net/

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Fri Jun 8 03:11:44 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.