[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Repository Indexer demo (was Re: full text search source code, and change sets)

From: Ben Collins-Sussman <sussman_at_red-bean.com>
Date: 2005-10-30 22:58:07 CET

On Thu, 27 Oct 2005, Marcus Rueckert wrote:

> > sussman mentioned on irc and i think on the mailinglist to that they did
> > a integration of svn and pylucene via a post commit hook script.

OK, I cleaned it up, and the lucene-libsvn_fs demo works. You can
check out the project here:

  http://svn.red-bean.com/repos/sussman/software/subversion/ReposIndexer

Here's the README file:

----------

This is a proof-of-concept: it demonstrates how one can hook up a
text-indexing engine with a subversion repository.

Specifically, it connects the 'lupy' module
(http://divmod.org/projects/lupy) -- which is a python port of the
famous Lucene indexer, included in this package -- with calls to the
libsvn_fs python bindings.

---------------------------------------------------------------------
  DISCLAIMER: 'lupy' is now retired software. You should be using
  'PyLunene' instead, located at http://pylucene.osafoundation.org/.
  Rumor is that it's very easy to convert a lupy application into a
  PyLucene one via simple search and replace.
---------------------------------------------------------------------

To try this demo:

1. Make sure you have the subversion swig/python bindings installed.

    To verify this, enter the python interpreter and check that you
    can successfully run the command 'import svn.fs'.

2. Create an index of a single revision (say, revision 1) by running
    the 'svn_index.py' script against some repository:

       $ ./svn_index.py /path/to/repos 1 myindex
       Indexing changed file: (1, /libsvn_delta/xml_parse.c)
        ...done.
       Indexing changed file: (1, /libsvn_delta/delta.h)
        ...done.
       Indexing changed file: (1, /libsvn_delta/path_driver.c)
        ...done.
       [...]

    This creates a directory 'myindex' containing indexed data of all
    the *.c and *.h changed paths in revision 1. Ideally, we would
    want to index files that match other patterns. And also, we'd
    probably want to index more than a single revision!

3. Search the index for a term:

       $ ./svn_search.py "txdelta" myindex
       Found in (1, /libsvn_delta/compose_delta.c).

    Notice that each hit comes back as a (revision, path) pair.
    That's because the indexing script has declared each "key" to be
    of that form.

Presumably, one could develop this demo into a full-fledged
post-commit hook which indexes the changed paths of each newly created
revision, augmenting an ever-growing server-side index. One could
also then write a nice CGI script to search the index.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Oct 30 22:59:05 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.