[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Best Practices

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2002-08-08 18:38:45 CEST

Scott Lamb <slamb@slamb.org> writes:
> Here's the new chapter. I think I did manage to keep it pretty
> Subversion-centered.

Nice work! Several specific comments below, but overall it was very
readable, and I liked the practice of giving examples both as command
transcripts and then as a plain-language outline for clarity -- having
both really helps readers.

> +@c ------------------------------------------------------
> +@node Source code formatting
> +@section Source code formatting
> +
> +Subversion works by creating a set of binary differences between two
> +revisions. It has no knowledge of syntactic vs. sematic differences ---
> +everything is just a difference of bytes.
> +
> +Given this design, it's important to avoid unnecessary syntactic
> +differences. They can cause merges to unnecessarily conflict as well as
> +drowning you in noise when viewing differences between revisions.

I realize that I may have said something misleading about this in an
earlier mail, but:

How exactly does inconsistency between different developers'
indentation styles lead to these problems? I can see how it would
lead to the eventual need for reindentation, which *in turn* leads to
the problems described above, but that's not quite the same thing.
The difference is subtle but important, I think.

It is later reindentation, not the original inconsistent formatting
itself, that leads to the unnecessary noise problem (more on
reindentation later). And so the point of persuading everyone to use
consistent formatting now is to avoid that later reindentation, not
because consistent formatting *itself* causes Subversion to work
better.

The paragraph about syntactic vs semantic is misleading. Subversion's
binary diffing has nothing to do with this problem. In fact, the
`diff' and `merge' commands aren't merely paying attention to bytes,
they're paying attention to lines, because historically a lot of
formats have used line-based syntax. The issue isn't really knowing
syntactic vs semantic differences anyway -- it's about whether the
tool knows anything at all about syntax. And diff/merge *do* know
some very primitive things about syntax, just not much... You see what
I'm getting at here.

> +In the real world, we're not always so perfect. Formatting preferences may
> +change over time, or we may just make mistakes. There are things you can do
> +to minimize the problems of reformatting.
> +
> +Here's a simple rule: never mix formatting and semantic changes in a single
> +revision. Give precise directions on duplicating formatting changes. When
> +possible, combine all formatting changes into a single revision.

A simple rule, but not quite the one that we follow :-).

Many projects use these guidelines instead (CVS does, and I'm pretty
sure that in Subversion we do in practice):

   * If you're making a sweeping reformatting/reindentation change,
     then yes, do it in an independent commit with no semantic
     changes.

   * If you're working in some area of code anyway, and you see
     inconsistent indentation in the context lines around that area,
     it's okay to fix them up too while you're there. Just point out
     in the log message that there were also formatting changes, so
     people know to ignore them when scanning the change. In fact,
     this kind of gradual, as-you-go reindentation is one of the best
     ways to get a badly indented but active code base into shape,
     while still avoiding a painful "flag day" on which everything
     gets reindented at once. (Not that we have that problem, but CVS
     does, and this is how they're solving it).

> +@subsection Ignoring whitespace differences
> +
> +When viewing differences between revisions, you may want to not see any
> +whitespace-only changes. @command{svn diff -b} will accomplish this.
> +
> +The commit emails always show whitespace-only changes.
> +@file{commit-email.pl} uses @command{svnlook diff} to get differences, which
> +doesn't support the @samp{-b} option.

"svn diff -x -b", actually. And there may be other flags to diff
worth recommending here, too, check the diff documentation.

> +@subsection Line endings
> +
> +Different platforms (Unix, Windows, MacOS) have different conventions for
> +marking the line endings of text files. Simple editors may rewrite line
> +endings, causing problems with diff and merge. This is a subset of the
> +formatting problems.
> +
> +Subversion has built-in support for normalizing line endings. To enable it,
> +set the @samp{svn:eol-style} property to ``native''. @xref{Properties}.

"normalizing" doesn't mean much by itself. I think it would be better
to describe what it actually does, and why it helps solve the problem.

> +@c ------------------------------------------------------
> +@node When you commit
> +@section When you commit
> +
> +It pays to take some time before you commit to review your changes and
> +create an appropriate log message. Subversion does not have any convenient
> +way of removing revisions (there is no equivalent to @samp{cvs admin -o}),
> +so get it right the first time.

The inability to remove committed revisions is unrelated to the
quality of the log message. If the log message is bad, Subversion
*does* offer the ability to fix it up afterwards -- so "get it right
the first time" is not at all necessary here. And while revisions
can't be removed, more tweaks can always be committed later... So even
regarding the content of the change, the "get it right the first time"
advice is a little unconvincing, at least when justified on that
basis.

See below for another basis on which to justify it.

> +You should run a @samp{svn diff} before each commit and ask yourself:
> +
> +@itemize @bullet
> +@item do these changes belong together? It's best that each revision is a
> +single logical change. It's very easy to forget that you've started another
> +change.
> +@item do I have a log entry for these changes?
> +@end itemize

Nice points. This might be a good place to mention the practice many
projects have of keeping the head of trunk "stable" (buildable,
parseable, whatever the appropriate thing is for that project). That
is, to point out that the repository is not merely a place that stores
your changes -- rather, it effectively *publishes* the newly changed
project anew every time you commit. So if you don't get something
right, you may be inconveniencing an arbitrary number of people until
someone commits a fix. That ought to give one pause :-).

> +Defining a log entry policy is also helpful --- the Subversion
> +@uref{http://svn.collab.net/repos/svn/trunk/HACKING,HACKING} document is a
> +good model. If you always embed filenames, function names, etc. then you can
> +easily search through the logs with
> +@uref{http://svn.collab.net/repos/svn/trunk/tools/client-side/search-svnlog.pl,search-svnlog.pl}.

Yeah. This advice is specific to source code people; in general,
might want to consider what advice would be appropriate for other
kinds of projects too. (I'm not saying it's a bad example, just that
there might be more to say here. The log message policies appropriate
for programming projects may not be a good fit for other types of
projects.)

> +@subsection Binary files
> +
> +Subversion does not have any way to merge or view differences of binary
> +files, so it's critical that these have verbose log messages. Since you
> +can't review your changes with @samp{svn diff} immediately before
> +committing, you might create a new file @file{changes} whenever you start
> +work and modify it as you go. @samp{svn ci -F changes} will use this file as
> +a commit message.

I don't think that the non-diffability of non-text files implies that
the log message must be _verbose_, merely that it should summarize the
change accurately. (Which is really just as true of any other kind of
change...)

And the write-the-log-message-as-you-go thing is used by lots of
people, independently of whether they changed binary files.

Hmmm. I guess I'm just saying that these particular two "best
practices" are not ones I've actually observed in the developers whose
Subversion usage I'm most familiar with (myself, Mike, and Ben). We
all keep running log messages, whether or not binary files are
involved. And we don't aim for verbosity, but for descriptiveness,
again independent of binary files or no binary files. So these
practices may not be as widespread as you assumed... :-)

Okay. Maybe one more iteration, then we can commit this material and
work on it inline?

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Aug 8 18:54:50 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.