Re: Document Management

From: Glenn A. Thompson <gthompson_at_cdr.net>
Date: 2002-05-03 16:26:26 CEST

Hey:

Funny you should mention this.
My primary interest in Subversion is for document management. Isn't that what
version control essentially is?
I will of course be switching to it for my code as well.

Alan Langford wrote:

> One of the guys on the XWT discussion list (see www.xwt.org, I won't rave
> about how cool this is over here) wants an open source a document
> management system that's not web based (read that as "not HTML based"). He
> had the idea of using XWT to build a GUI for his own document management
> system. So of course I pointed him over here to Subversion. I figure 'tis
> better to add to this project than spend a dozen or so person-years
> reinventing it.
>
> But he's come back with a question that I haven't seen and I thought I'd
> ask it here. Clearly the ability to have decent support for binary format
> files allows the repository to store and retrieve images, pdf documents,
> etc. The question is are there facilities or hooks for doing things like
> document profiling and indexing (or how difficult is it to implement this
> functionality).
>

I'm going to be doing my indexing prior to putting it in Subversion. The
reason is that indexing (in the OCR sense) is not a 100% fool proof process.
Most OCR packages have correction flows that can be used to resolve this
before hand. I think the commits can be held up for a review/fix process but
"boy" that seems to create a bit of a burden on Subversion if someone isn't
there to release it.

As for profiling. It could be done with hooks I would think. If you haven't
checked out Oracle IFS you might want to. It does all this and more. However
it has issues that caused me to get involved here instead of using it. On the
plus side: It's versioned (using a locking method by default eeeh), it
provides boat loads of protocols/interfaces. Including one that is similar to
TortoiseCVS.
On the negative: It is a P I G pig. No problem, hardware is cheap. Well not
so fast. Oracle wants a piece of you for every processor involved. Lets stop
the madness. Larry doesn't need another Airplane.

>
> It would be nice, for example, if Subversion could trigger an indexing
> process on the type of file being checked in (documents get indexed by a
> scan of the file, images accept a keyword list). Presumably this leads to
> search-based retrieval capabilities...

Oracle does this via "Context" and using custom parsers. They include a XML
based parser with IFS which appeared to be rather flexible. Worked very well.

I'm going to ease myself into indexing using Subversion properties (no more
than a dozen properties per document). If another "better" approach comes
along I can relocate the metadata from properties and delete the properties.
For my purposes it will work fine. After all, the current document management
system we use is called Samba:-)

>
>
> It would be *very* nice if Subversion could be the engine for a document
> management system like this.

I'm betting on it.
I think this is another way a SQL backend becomes quite useful.

Later,

gat

>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri May 3 16:22:58 2002

This message: [ Message body ]
Next message: Ben Collins-Sussman: "Re: RFC: import create parent directories"
Previous message: cmpilato_at_collab.net: "Re: RFC: import create parent directories"
In reply to: Alan Langford: "Document Management"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]