[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

I/O filters and scripts.

From: David Soergel <lorax_at_lorax.org>
Date: 2000-06-15 10:51:35 CEST

Hi all,

One major feature that I'd very much like to see in Subversion is the
idea of input & output filters.

Shap and Zack have already mentioned a few benefits of this (6/5,
6/6), e.g. line feed canonicalization and variable expansion. I
think there is much to be gained from a more general filtering
mechanism, though.

In particular, I would like to be able to separate the syntax and
semantics of code from its aesthetic presentation. That is, I'd like
to run a prettyprinter to canonicalize formatting on every checkin,
so that purely aesthetic changes have no effect in the repository.
Also, I'd like to run a (perhaps different) prettyprinter on
checkout. I know that some people won't want to do this at all, so I
want to emphasize that I'm suggesting an _option_.

This has many benefits:

1. No need to store diffs of code where only whitespace has changed;
no need to read diffs cluttered with such changes. Presently, if I
have a badly formatted file in CVS, run a prettyprinter on it, and
check it in, then it's impossible to run meaningful diffs between
revisions before vs. after that event--every line in the file may
have changed!

2. Different developers may have different style preferences. I run
up against this a lot because my preferred java style is apparently
anathema to most everyone else. I don't want to change my style, nor
do I want to impose it on others. Having an automatic client-side
prettyprinter lets everyone transparently edit the same code with
their own aesthetic preferences.

3. Similarly, different repositories in a hierarchy or mirroring
relationship may have different code formatting policies, conformance
with which should be automated as the code propagates around.

4. Different stylings are appropriate for different contexts. An
emailed patch should be hard-wrapped to 74 chars, and indented with
spaces; but I don't want to be constrained to that in my working copy
just for the sake of nice emails.

There are also some serious drawbacks:

1. Line numbers no longer refer to specific parts of the file. As
long as you're debugging or diffing on files that all participate in
the same convention, you're OK; but a different positioning system
would have to be used for patches/deltas, e.g. "number of
non-whitespace characters" or such. That's not good enough,
actually, because some characters are used for aesthetics and may be
introduced or removed by prettyprinters (for example, some
conventions place a * character at the beginning of every line in a
multi-line /* */ comment; others don't).

2. Diffs may become difficult in some circumstances. Obviously,
line-oriented diffs must be done always between files that have been
formatted in the same way. If I want to do a basic diff, there's no
problem, because the "clean" files in the SVN directory should be
saved in my local format as part of the checkout process. If I want
to diff against an old revision, though, I'll have to download the
old revision, format it, and then diff. Subversion appears to do all
diffs in the client anyway, so this is probably OK too. Better,
though, would be to do character-level diffs using a
format-independent position as mentioned above.

While my examples naturally involve code, it's also important that
canonicalizing filters could in principle be useful for all kinds of
file formats containing all kinds of data. For instance, I might
have an svn repository of engineering drawings, where I want enforce
the usage of metric units; in this case an input filter could
recognize and convert values given in english units. OK, perhaps
this is a silly example, but you get the point.

This whole idea is strongly related to the "smart merges" of section 7.2.3.

The question remains whether the filters should run on the server or
on the client. I think the answer is "whichever machine is receiving
data". That is, the server should be responsible for canonicalizing
the contents of the repository, and this should be transparent to the
client; and the client should control prettyprinting on checkout with
local preferences, with no involvement of the server.

OK, now I'll go even further off into blue-sky speculation :) and
suggest that the filtering/scripting system should support messaging
between the server and the client, e.g. so that the client can invoke
actions on the server. For instance, if I keep my web site in SVN,
I'd like to be able to publish it by sending a message from the
client to invoke a checkout to the appropriate directory. Similarly,
when writing servlets, I might want to remotely invoke a
"checkout/compile/restart-servlet-engine" cycle, perhaps even
automatically on checkin. This is also important in the case of
heirarchical repositories, as mentioned in section 7.2.5; I'd like to
be able to say "OK, my working repository version is good now, so
check it in (check it up?) to the public repository". Such remote
script invocations would have to take account of the permissions
system, of course. We can already do this independently of svn, with
rexec/rsh/ssh, just making svn calls as necessary. Existing
solutions aren't fully cross-platform, though. There's something
appealing about having a "check up" dialog integrated in a Mac or
Java GUI svn client.

To return to my suggestion that everything be XML-based, I'll further
point out that the scripting system could be very similar to, if not
directly based on, Ant.


David Soergel .oooO Oooo. "Music and Living----"
123 Forest View ( ) ( ) "The same thing," said Pooh.
Woodside, CA 94062 \ ( ) / lorax@lorax.org
(650) 303-5324 \_) (_/ http://www.lorax.org
Received on Sat Oct 21 14:36:05 2006

This is an archived mail posted to the Subversion Dev mailing list.