[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Search subversion binary content

From: Daniel L. Rall <dlr_at_finemaltcoding.com>
Date: 2005-10-08 01:25:20 CEST

On Fri, 07 Oct 2005, Ben Collins-Sussman wrote:

> On 10/7/05, Daniel L. Rall <dlr@finemaltcoding.com> wrote:
> > On Fri, 07 Oct 2005, Malcolm Rowe wrote:
> > > ... you want a cross-history fulltext search engine that can deal with
> > > non-text content
> >
> > Given that Subversion has embraced WebDAV, this would have a high cool
> > factor hooked up to a DASL interface.
>
> I really need to upload the lucene/libsvn_fs hookup I did in python.
> It actually scans and indexes all repository history. Doesn't really
> work on binary files, though. ;-)

Yeah, but you got it -- this is exactly the type of crawler I was referring
to. Wrapping access -- the "visit" -- to each piece of content in the
appropriate library (e.g. OLE, PDF, etc.) which can turn it into textual
data would allow for indexing, and thus allow for searches of that index.

As indexing is generally hard work for a system, incremental indexing is
desired. The post-commit and post-revprop-change hook script could set a
"dirty" flag indicating that the index needs to be updated. The indexer
itself would likely be an external process.

Codifying this sort of tool into something which shipped with Subversion
would be most excellent.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 8 01:26:13 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.