[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [subversion-dev] autoformating code

From: Jason Elliot Robbins <jrobbins_at_collab.net>
Date: 2000-05-01 03:23:25 CEST

Sorry to reply to my own message, but here are a couple more thoughts:

(1) Even though the AST-based idea is conceptually radical for people
who have always thought of source code as a stream of character,
supporting it really need not be all that disrputive to the current
subversion design.

Instead of thinking of an AST as being PART of the vc system, just add
the concept of internal format vs. external format.

  + Simple internal formats might just compress the file in a way
appropirate for source code, images, or other file types.

  + Codec's (encode-decoders) for converting from internal to external
formats and back would be plug-ins to the vc system, not part of it.
Any installation without codec's for a given file format should work,
just not as efficently or conviently as one with codecs.

  + Codec's need to run on both the client and server side, so a
scripting langage or virtual machine approach would work best. Java
comes to mind as my clear choice, but safe-TCL or a subset of perl
might also be good choices. I guess we dont really need to run codecs
on the client side, but we would get better transmission times if they

(2) The main reason that AST-based approaches to software
development environments have failed to be widely adopted are that
they have taken an all-or-nothing approach that forces developers to
use one language (e.g., Interlisp-D), and one set of development tools
(e.g., the Interlisp-D editors, compilers, and debuggers).

Developers who use vi will never give up vi. Developers who use emacs
will never give up emacs. And even on one project, it is usally true
that we use multiple languages (or at least one main languge, plus
HTML, XML, makefiles, image formats, etc).

Our approach must allow people to see the files as simple files and
use any tools on them (e.g,, vi or emacs). The codec's will only come
in during the process of doing a checkout, update, commit, or diff.

The second reason AST-based approaches have failed is that
all-or-nothing approach dont work for vendors either. The Interlisp-D
developers produced a ton of tools, but they never had the whole set
of tools needed for development and they could never have hoped to
keep up with the rest of the tool developers that were assuming plain
ascii files rather than ASTs. This problem is also solved by keeping
the concept of internal vs. external formats limited to VC

A simple scenario:

+ This assumes a simple codec that compresses by replacing multiple
spaces and tabs (outside of string constants) by a single space. This
is much simpler than doing a true AST. The decompression algorithm is
to basically run GNU indent with each developer's preferences.

1. Developer A checks in in version 1 of file F formated in a way s/he

2. The codec compresses it to basically remove all indentation and
intra-line exra spacing.

3. The file is stored in this compressed form in the repository as

4. Developer B checks out file F.

5. Developer B's codec "uncompresses" the file by running gnu indent
with some options.

6. Developer B makes some changes and does a CVS diff.

7. The codec uncompresses the stored F as per B's preferences and
compares it against the working copy of F.

8. Developer B commits changes.

9. The codec compresses F and stores version 2 as normal (including
deltas as normal).

10. Developer A does a diff, s/he sees the differences formated as per
his/her preferences.

If a merge confict occurs, the conflict markers are inserted in the
compressed form of the file as comments (or something else that will
not throw off the decompression part of the codec). The resulting
file with conflict markers is presented to the developer with their
prefered formatting options.

Of course, the more exciting example is when you store true ASTs in
the repository and implement a plug-in to diff in that format. Then
you could show logical differences and logical conflicts regardless of
formatting, including line breaks. You could also intelegently
summarize differences rather than simply presenting lines that differ:

The difference between v2 and v3 is that v3 has
05 new methods:
  sit, beg, rollOver, eat, and drink
01 removed method:
17 new or modified comments
01 modified instance variable:

Logical diffs for XML files would be just as exciting.

Further out, additional support in their editors (e.g., emacs)
developers could use AST concepts to limit the scope of the VC
operation: update this funcion, show me diffs on this block of code,
commit changes to this header comment, show me diffs in this function
and the functions that it calls.

I now return to my regularly scheduled programming...


Jason Robbins, Ph.D. Collab.Net is hiring open source developers!
Senior Software Engineer http://www.collab.net/jobs
Received on Sat Oct 21 14:36:04 2006

This is an archived mail posted to the Subversion Dev mailing list.