[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: searchable revprops?

From: Ben Collins-Sussman <sussman_at_red-bean.com>
Date: 2007-06-08 03:58:54 CEST

This is extremely cool! I'd love to see folks run with this ... (at
least those not currently working on merge tracking)!

On 6/7/07, David Glasser <glasser@mit.edu> wrote:
> On 5/15/07, Ben Collins-Sussman <sussman@red-bean.com> wrote:
> > On 5/15/07, C. Michael Pilato <cmpilato@collab.net> wrote:
> >
> > > That's a good observation, Ben. Let's not be guilty of rushing something
> > > under-designed into the codebase, though.
> >
> > Totally agree. I meant, "here's a yummy feature someone could take
> > the time to write a design spec for." :-) Whatever the design may
> > be, I imagine that the implementation will be fairly easy, now that
> > we've got SQL at our disposal.
>
> I can confirm that the latter is true; last night just for the sake of
> seeing how much work it would be, I implemented the creation of
> revprop indices. As recognized above, the hardest part would be
> designing a flexible API for searching the index (over RA,
> presumably), but as Ben said, it's not too hard to implement the
> functionality once it's designed.
>
> Other than figuring out how search would work, the other big question
> would be whether sqlite should be used as the canonical location of
> the data or as an auxiliary index. Advantages for the former include
> avoiding redundancy and (for FSFS) space efficiency: on filesystems
> with large minimum file sizes, the FSFS revprops directory is very
> wasteful. For example, on my OSX machine, the minimum file size is 4k
> and most revprop files are around 250 bytes; my
> ~/.svk/local/db/revprops/ takes up half a gig! In practice, sqlite
> seems to give about 5-6x space reduction. Advantages to just being an
> index include not having to deal with blocking for reads (the same
> issue I raised in another thread about mergeinfo).
>
> I'm attaching a patch of what I did last night, though of course it's
> certainly not ready for production. (It only writes to the index:
> there are no read APIs. I only bothered to hook it into FSFS, though
> it should be trivial to hook into BDB. The API for setting a revprop
> takes the hash of all the revprops for a revision even in the code
> path from "propset --revprop" which is only setting one. It has the
> same SQLITE_BUSY issues as the mergeinfo code. Much of the sqlite
> code is copied from mergeinfo and would be factored out if this were
> actually applied. I didn't think incredibly carefully about what
> indices to put on the table. etc.) But it does work well enough for
> me to be able to run svnsync on svn.collab.net and then run queries
> like "select value, count(*) as c from revprops where name =
> 'svn:author' group by value order by c desc" on the sqlite db!
>
> --dave
>
> --
> David Glasser | glasser_at_mit.edu | http://www.davidglasser.net/
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Jun 8 03:59:06 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.