[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: CaSe insensetive OS not handled well

From: A.T.Hofkamp <a.t.hofkamp_at_tue.nl>
Date: 2005-08-23 13:05:48 CEST

Christopher Ness wrote:
> On Mon, 2005-08-22 at 18:30 +0200, A.T.Hofkamp wrote:
>
>>If 3 is trusted, then the only problem is to map names from 2 to names from 1
>>or 3, which can be done in the way Flex suggested in his answer to my email
>>(posted at this list 19/08/05 19:10).
>>If 3 cannot be trusted, then an svn client needs its own subroutine to
>>normalize files (otherwise there is no trusted source for new files).
>>This can take the form of a regular expression match/replace algorithm, for
>>example a configuration file like
>>
>>gnumakefile = GNUmakefile
>>*.c = *.c
>
>
> Did you really just map *.c = *.c in your above example? Because that
> would be wrong IMO. Imagine ( main.c = function.c ).

I may have been a bit too dense in my explanation here (it was late, I wanted
to go home).
Let me try again.

For the case that I type "svn add Myfile.c" there are 2 cases:
A. "Myfile.C" can be trusted to be case-correct.
B. "Myfile.C" cannot be trusted to be case-correct.

A is typically when a human enters the command, B is typically when a (GUI)
program enters the command (because such frontend often will use file system
information, which is assumed not case-correct (otherwise the discussion is
pointless)).

So the question is, what to do with "Myfile.C" in case B.

I believe that any project that has mixed CS and NCS machines using the
repository has some form of naming conventions.
So I think the answer is that "Myfile.C" is correct if (and only if) it holds
against those conventions.
That implies there has to be a way for the svn client to check whether the
filename matches the conventions (given the assumption that we are in case B
which means that the case of the given name cannot be trusted), which in turn
means that svn needs a description of the conventions.

The 'stupid' way is to have the svn user provide a list of all allowed
filenames. While the list may be somewhat long, that can be done.
To improve on this, you could also supply a mapping, that is a list of illegal
filenames where with each filename the 'correct' filename is also given.
This list is slightly longer :-)

Obviously, it is practically not feasible to demand such lists. There has to
be a shorter way to define the mapping/translation.

A smarter way can be to define generic translations like

*.c = *.c

where at the left a *case-insensitive* pattern is specified to match
"Myfile.C" against, and if it matches, the *case-sensitive* pattern at the
right is expanded in the obvious way as being the filename that matches the
conventions.
(the idea is that "Myfile.C" would match here, because the "*" at the left
matches the "Myfile" part, the "." matches the "." of the filename, and the
"c" of the left pattern matches the "C" of the filename (remember, the pattern
is cases insensitive!). At the right of the equal sign, is then the normalized
filename as a pattern. The "*" at the right expands to whatever the contents
of the "*" at the left was, in this case "Myfile", followed by "." and "c",
which results in the normalized filename "Myfile.c" .)
The only thing missing now is that you want to express normalization mappings
for the characters 'eaten' by the "*" at the left. One possible solution to do
that is

you can get rid of plain "*" at the right and define the following expansion
patterns instead:

*l first convert the contents of "*" at the left to lowercase, then insert
the string
*u as *l but convert to UPPERCASE
*i as *l but initial character must be uppercase (ie Initial)
*a insert the contents of "*" at the left as-is (ie no case change)

now you can specify convetions like "all to lowercase" with the single line

* = *l

and 'keep as-is' (ie the current behavior) by

* = *a

etc.
If you also define that the first line that matches must be taken, you can
specify specific names that do not match the general rule:

makefile = Makefile
* = *a

ie normalize "makefile" with any combination of case to "Makefile", keep the
other filenames "as-is".

These match/expand lines are part of the svn client configuration.

> Although I know what you are trying to suggest: ( main.c = Main.c ) that
> is not what is implied by the kleene star above.

Sorry to confuse you, I hope the above explanation clarifies what I meant.

> Not to be rude, but SVN allows you to reject mixed case commits with a
> hook script, but it will _never_ modify the transactions you create for
> commits.

I wasn't talking about commit time, what I am describing is done before that,
namely at the moment you insert a new filename into the svn WC. That is done
locally without even setting up a connection to the server.

If you implement strict enforcement of the policy, you can only have filenames
that match the policy in your svn WC (and as a result, also in your repository).
That means that everything beyond accepting a filename stays the same.

> That is the way I prefer it.
> The tool should not modify a commit, ever. No matter how trivial it may
> be. This is a more reliable implementation, but by no means robust -
> which is the motivation for hacks and fixes generated by poor
> development tools.

I don't understand the difference between reliable and robust. Could you
explain please?

I am trying to find the fundamentally sound solution to this problem. That
implies that hacks and tricks are not good enough.
In my opinion that includes
- forcing everything to lowercase (which effectively means only 1 filename
policy is supported), and
- rejecting files with commit hooks (which to me sounds as a step too late,
svn should not allow such filenames to be submitted to the repository in the
first place).
[ for people experimenting or believing in commit hooks etc, please don't feel
offended. I am speaking above about a long term solution. That doesn't mean
you don't have a short term problem which you want and need to solve.

In fact by trying to solve such short term problems and encountering
difficulties you expose weak points in the current implementation which drives
progress ]

> I prefer reliable over robust in this case because subversion is very
> important as a revision control program. It should not interpret what a
> user _might_ have meant in a commit transaction.
>
> In my humble opinion "transaction munging" leads down a long dark path
> which may end up at repository corruption or non-deterministic commits.

I agree completely, I think that repositories should be kept clear of such
practices.

> This topic is often discussed, please read the mail {user,dev} archives
> for more case insensitive discussions.

 From what I have read, the topic does seem to pop up, but a fundamental sound
solution is not yet reached, and implementation of a solution has been
postponed a few times.
(not that I critize that decision, I think to solve the problem in a generic
fundamentally sound way is a lot harder than what most people think (on the
other hand, most people may not be interested in the long term general solution).

Albert

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Tue Aug 23 13:08:25 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.