Re: Using Hooks To OCR Documents

From: David Weintraub <qazwart_at_gmail.com>
Date: Mon, 6 Dec 2010 10:57:18 -0500

On Fri, Dec 3, 2010 at 10:44 AM, Jim Jenkins <jej_at_homrichberg.com> wrote:

> I’m planning to use Hooks to add OCR scanning for select documents going
into
> a SVN repo. I’m not really sure where to start so I’m hoping someone here
can tell
> me if it’s possible and even suggest how best to proceed.

I'm going to take a slightly different approach. Pre-commit hooks are not
what you want.

   1. A pre-commit hook should only be used if the developer has some way of
   fixing an issue. A good pre-commit hook is to make sure all files that end
   in *.sh have the property svn:eol-style set to "LF". If a developer doesn't
   set this, and the pre-commit hook fails, the developer can easily fix the
   problem and recommit the file.
   2. The user is left twiddling their thumbs on hooks, even a post-commit
   hook. If you have a hook that takes a few minutes to run, users will get
   impatient. They may simply not bother committing changes they should until
   they have a big horking commit which they'll do at the end of the day and
   leave.
   3. Changing committed files on a commit is very difficult. You, after
   all, don't have access to the client's workspace, so you'll have to emulate
   their checkout, so you can make your changes and do a commit. Of course that
   means that your pre-commit hook will fire off once more, so you'll have to
   have some mechanism in place letting your pre-commit hook know to not do
   whatever is it was suppose to do in the first place.
   4. Also, it's a bad idea to change a commit on a user. As Ulrich
   Eckhardt pointed out, your user's client doesn't know that the files they
   just committed were changed. Besides, what if your pre-commt hook created an
   error as a side effect of that hook? I once wrote a pre-commit hook in
   ClearCase to automatically expand RCS keywords. On occasion, the pre-commit
   hook expanded a sprintf statement or something like that, and the developer
   was furious because their program worked, and I botched it up.

I would instead think of your committed files as a "source" code, and that
your OCR scans as a "compiled" code.

What you probably want, although you really don't compile, is a continuous
build server that takes the committed files, and creates the needed OCR
scans of these files, and stores them where they can be referenced. The
storage area does not have to be Subversion (and in fact, I would argue that
Subversion is not your ideal storage area).

Take a look at Hudson. It's a powerful continuous build server and is very
flexible in its setup. With Hudson, you could automatically do the scans
after a commit, and then email the user if the scan failed for some reason.
It is possible to only have Hudson scan the files that were changed (since
Hudson knows which files were committed). And, it is possible to have Hudson
FTP or store the changed OCR files onto another server (or to simply keep
the scanned archive on Hudson itself.

It'll. take a bit of tweaking, but so would trying this in Subversion. And,
you and your users would be much happier with this arrangement.

--
David Weintraub
qazwart_at_gmail.com

Received on 2010-12-06 16:57:56 CET

This message: [ Message body ]
Next message: D±browski, Leszek: "failed to add directory"
Previous message: Edward Ned Harvey: "RE: permission issues with apache and subversion"
In reply to: Jim Jenkins: "Using Hooks To OCR Documents"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]