Here's a response that hits on just about every reason I can conceive
for not using tar wrappers or client side scripts.
Thanks to Bill Bumgarner for taking the time to give us a detailed
report from the front.
------- Forwarded Message
Date: Tue, 25 Feb 2003 08:35:34 -0500
Subject: Re: [Issue 1152] - Client side scripts would allow many requested features
From: Bill Bumgarner <firstname.lastname@example.org>
First, some background. I am one of the half dozen or so people that
played a role in the creation or maintenance of CVS wrappers support
over the last 12 years or so.
The tar wrappers feature in CVS is a bad implementation of a very bad
idea. I'm not going to go into the implementation details-- it is a
long and sordid tale involving politics and bad code (a google search
should reveal my comments on such over the past decade). In my
experience maintaining many CVS repositories across organizations large
and small, the wrappers functionality has been the single largest
source of confusion, damaged source and corrupted repositories.
With that said, let me comment on Will's specific email. All of the
examples are real -- names have been changed to protect the stupid
On Tuesday, Feb 25, 2003, at 02:07 US/Eastern, William Uther wrote:
>> ------- Additional Comments From email@example.com 2003-02-24 19:18 PST
>> A HUGE -1 on this. I've spent the last 3 years trying to explain to
>> people why tar wrappers are evil. What you are proposing is going to
>> break every client that does not have the *exact* same prefs setup
>> that you have.
> That is not entirely correct. You need prefs set up to handle the
> scripts used in the working copy - this does not need to be identical
> to someone else's set. The other options are i) built in
> functionality, or ii) server controlled client side scripts. Server
> controlled, client side scripts are a HUGE security hole. Built in
> functionality is nice, but the ability to script something that you
> want that no one else does is also nice.
The only situation with which two clients cannot be configured in an
identical fashion in this context is if the configuration does not
affect that state or handling of the files. Otherwise, you end up
with spurious changes that cause integration hell and can lead to the
history/log being completely useless as it is filled with noise
generated automatically by the automatic scripts. But that limitation
means such functionality isn't very useful.
Example: A repository maintainer wishes to automatically enforce code
formatting rules for all of the C files committed. Easy enough to set
up. But, oops, user A has a slightly different configuration for the
batch formatter than user B. Every time either user modifies code
last modified by the other and commits the code, nearly every line in
the source file(s) are marked as changed and commits include every file
in the module, regardless of whether the developer intended to make a
change to that file. The history is rendered utterly useless along
with all change quantification / control procedures.
Unless the scripted functionality can be invoked in a sandbox that has
a completely controlled environment-- same user, same environment
variables, same dot files, same shell, same everything-- the scripted
behavior will not be consistent across different users or clients.
This will cause problems-- often major and generally impossible to fix
once incurred (you can remove the broken functionality, but you can
never fix the repository).
On Tuesday, Feb 25, 2003, at 02:07 US/Eastern, William Uther wrote:
>> 1. Tar wrappers. -1 Please please please don't make me go into why
>> this sucks so bad.
> If all you are trying to solve is the opaque collection issue, then
> this might not be the best way to do it. Howver, for interests sake,
> do you have a pointer to why tar is so evil? I think it would be a
> reasonable, 70% solution. I'd prefer tar wrappers to nothing.
The command 'tar' is horribly overloaded. There are a half dozen or so
different versions of 'tar' floating around. Each version has
incompatibilities with the others and, in many cases, one version will
claim to have successfully expanded a tarchive when, in fact, it subtly
screwed things up (filename problems and, in some cases, file contents
Like scripting, unless you can absolutely guarantee that the 'tar'
command returned by 'which tar' is exactly the same version on every
machine and in every account that will access the repository, the use
of 'tar' will eventually cause havoc.
Example: I have seen a developer break down in tears because they had
checked out a source tree on a client system to add a copyright
statement to all the source files. On commit, the thought of "Why did
the commit just write all the NIB files and documentation (RTFD), even
though I hadn't touched either?". On creating a release candidate of
the, now "completed", software, it was discovered that all of the NIB
files and all of the RTFD documentation was corrupted because of a tar
versioning problem. The machine to which the developer had checked out
the source was not normally used for development-- but he was just
adding copyright statements, only needed a text editor-- and did not
have the same 'tar' as the rest of the universe. Ooops.
> (The problem with tar that immediately springs to mind is the inability
> to handle resource forks and other Mac meta-data. Of course,
> subversion already has this problem. If the collection format is your
> only problem, then the client side scripting solution gives you the
> ability to change that format as you please.)
Yes. Nasty problem, that. To deal with meta-data and resource forks
will obviously require a client side solution. However, subversion
needs to be fully aware of whatever that solution is or else you will
end up in a situation where touching the files on a client without the
same client-side solution or on a client that is non-Mac will cause
corruption of the repository.
>> 2. File transformations before committing, updating, checking out,
>> etc. -1 unless someone comes up with a stunningly elegant solution for
>> how to do it, in which case, I'm -0
I'm in the -0 camp for reasons similar to Fitz's. It is a useful
trick in a few controlled situations, but it needs to be implemented in
a fashion where the environment within which the transformations are
applied is 100% identical no matter who accesses the repository or how
they access the repository.
I have had a number of folks write with tech questions regarding
automatic transformations in CVS; most are using some kind of a 'lint'
processor to ensure code formatting conformity. Sounds like a good
idea, but proves to be more of a problem than a solution.
Specifically, automatic transformations generally assume that the thing
to be transformed is in a "known good" state. This is not always the
case. In the context of source code, it is not uncommon for one
developer to "hand off" a problem to another developer by committing a
non-conformant [sometimes flat out broken] chunk of code for the other
developer to then update and work against. When that happens, the
automatic transformation will often choke on the "in transit" code.
At best, it just means you have stupid formatting. At worst, the
commit will partially fail or the automatic transformation will wreak
havoc by deleting a chunk of code or otherwise raising holy hell.
And that all assumes that the automatic transformation works in all
>> 3. Opaque collections (see issue 707). This is the right way to deal
>> with the problem you're trying to hack around with the tar wrappers
>> thing. I'm +1 on this, and still trying to come up with the time for
>> it. :-)
> Great. Opaque collections are a wonderful idea. They only give you
> opaque collections though. They don't give you symlinks, or small
> diffs in compressed xml files (See Alessandro Polverini's recent mail
> "A simple (?) suggestion from a svn fan :)"), or any of the other
> things my proposal can. Heck, Nutti even mentions checking in /dev
> with his (similar) proposal:
I'm +1 on opaque collections, as well. And Opaque collections should
be just that; totally opaque. Anything that goes in, should come out
on another client. How symlinks will be handled in that context is
beyond me-- what do you do on Windows, for example? This is the same
problem as dealing with metadata on the Mac -- for all intents and
purposes, a symlink is foreign metadata to Windows.
As far as handling small diffs in XML or other such features, this
needs to be addressed very carefully. Most likely, it should be
implemented on a plugin basis or something similar. Furthermore, a
lot of careful thought must be given as to how to deal with the
situation where client A might not have the same plugin as client B --
most likely, the "correct" answer is that client A should not be
allowed to ever write anything back to the repository that requires the
> This issue is not a specific 'opaque collection' solution. It is a
> general 'client side scripting' idea, which could be used to work up a
> limited opaque collection solution, if someone chooses to do so.
Given my experience, I am a very much a -1 on the 'client side
scripting' idea. It is a seductively powerful idea that, if
implemented, will only lead to problems. If you need client side
scripting, wrap the script around the SVN client and make sure every
client uses that script.
It helps to consider a tool like subversion in light of its nature.
For all intents and purposes, Subversion represents a multi-user,
multi-platform, client via which the user can access and manipulate a
fully revisioned threaded data store that can contain tens of thousands
of files and millions of bytes of data. This is a very hard to do
right. A 70%-- or even 95%-- solution is generally not good enough;
if it fails once and that failure leads to repository corruption, the
end result can be catastrophic to the point of causing a project to
fail completely or a team of developers to fall apart.
While it may be tempting to dictate that all clients must be configured
identically-- must all be unix [symlinks], must all be macs [resource
forks], must all be windows [ow]-- this limitation does not work well
in practice. One of the strengths of a good revision control system
is the ability to integrate non-developers into the production process.
The artists can use revision control to commit the graphic assets, the
documentation team can write and commit docs... everything is in one
place and everything can be revision controlled with full historical
evidence to document the evolution of the project. A good revision
control system also allows people to work remotely. All of this means
that even the smallest of projects will likely have participants
working on "other" platforms and have client systems that might be
configured quite differently.
In CVS, the fact that ~/.cvswrappers controls certain client specific
behaviors of CVS has proven to be the single greatest flaw of the CVS
wrappers implementation. It is a nightmare. Anytime that
configuration changes, every person that uses the repository must
remember to update all copies of that file across all of their
accounts. Worse, if you are working against repository A and
repository B and the two require slightly different configurations, you
have to worry about swapping the damned file before interacting with
either repository. Certainly, this is an issue that could be taken
care of with some kind of a creative per-repository client side
configuration, but that path is also rife with numerous problems.
------- End of Forwarded Message
Brian W. Fitzpatrick <fitz_at_red-bean.com> http://www.red-bean.com/fitz/
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Tue Feb 25 17:49:05 2003