[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Umlaut problem on Mac (composed vs. decomposed UTF-8)

From: Daniel A. Steffen <das_at_users.sourceforge.net>
Date: 2007-07-21 01:42:40 CEST

On 18/07/2007, at 0:46, Erik Huelsmann wrote:

> On 7/17/07, David Glasser <glasser@mit.edu> wrote:
>> * Linux user adds a file with a :u in it, which is stored composed
>> * Mac user checks it out
>> * Mac user edits the file
>> * Mac user tries to commit; the commit request sends the name with a
>> decomposed :u
>> * Repository has no idea what file the mac user's client is
>> talking about
> Worse: the Mac client reports the versioned file as missing
> immediately after checkout *and* reports a file (which looks *exactly*
> the same to the user) as unversioned.

One cause for this behavior is the fact that the .svn/entries file
stores utf8 filenames as present in the repository rather than in the
working copy on disk.
The two need not be identical, which is exactly what happens in the
above situation: during checkout the file is created with the
repository name, but the actual on disk name ends up being different
(due to HFS filename normalization), causing a mismatch later on when
that working copy filename is compared to the repository names stored
in the entries file.

A possible alternative approach to fix this problem could thus be to
change the 'name' field of entries records to the working copy name,
and to store the repository name in a new field (or possibly the
existing 'url' field).

This has the advantage that there is no need for subversion to know/
implement how a given filesystem might transform filenames (i.e. no
platform or ICU dependency etc), as that info can be obtained by
creating a file with the name in question in an empty temp dir
followed by listing of dir contents.
In contrast to some of the other proposals, this would also also
allow mac users of filesystems which do not have the normalizing
behavior (e.g. UFS, NFS, NTFS) to continue to be able to checkout
repositories/share working copies containing files with mixed/
conflicting normalization (important esp. in the case of working
copies on NFS shared with other unix boxes).
Existing repository content would not change, only working copies
would be affected, composed and decomposed filenames could continue
to be added on all platforms (except from HFS, as is already the case
now), the only behavioral change would be that files with composed
chars in their repository name would become useable on HFS.

To test this in the scenario at the top (using a mac with unmodified
svn client), I have verified that manual editing of the entries file
after the checkout allows checkin of a modified file with composed
umlaut in the repository filename:
change the 'name' field of the record in question to the on-disk
representation (i.e. decomposed umlaut) and change the 'url' field
from empty to the repository url for the file (i.e. with composed
While checkin works, use of the 'url' field appears to interfere with
'svn status' switch detection, so a new entries field is probably
needed to store the repository filename.

I may well be overlooking something basic, but it seems to me that
this approach would be simpler, more robust & less restrictive than
the proposals involving normalization on some/all platforms and
potential repository changes.



** Daniel A. Steffen                   **
** <mailto:das@users.sourceforge.net>  **
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jul 21 01:41:56 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.