[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Umlaut problem on Mac (composed vs. decomposed UTF-8)

From: Daniel A. Steffen <das_at_users.sourceforge.net>
Date: 2007-07-25 09:11:51 CEST

On 23/07/2007, at 20:20, Erik Huelsmann wrote:

> On 7/23/07, Matthias Wńchter <matthias.waechter@tttech.com> wrote:
>> On 21.07.2007 01:42, Daniel A. Steffen wrote:
>
>> > A possible alternative approach to fix this problem could thus
>> be to
>> > change the 'name' field of entries records to the working copy
>> name, and
>> > to store the repository name in a new field (or possibly the
>> existing
>> > 'url' field).
>> >
>> > This has the advantage that there is no need for subversion to
>> > know/implement how a given filesystem might transform filenames
>> (i.e. no
>> > platform or ICU dependency etc), as that info can be obtained by
>> > creating a file with the name in question in an empty temp dir
>> followed
>> > by listing of dir contents.
>
> At face value this looks like a solution, but there's one huge problem
> with it: performance. This solution will cause a huge amount of
> additional FS calls.

I don't believe that would actually be a problem. The repository name
<-> on-disk name mapping can easily be cached for names that have
been seen previously (indeed the entries file itself can serve as the
cache), so the determination of the mapping via the filesystem needs
to happen only the first time a filename is seen, e.g. at first
checkout. In that case, a new file with the name in question (plus
extn) is already being created in .svn/tmp, so the only additional
cost in FS calls would be one dir list and one move, at a time where
a lot of other FS activity is already taking place anyway.

> There's also another problem: Network filesystems
> usually cache (for a short period of time) data written to the network
> drive. If that network changes the normal form (such as HFS+), reading
> the dir content won't help: we'll get the value from the cache.

I'd like to see some data to back up that this would actually be a
problem; FWIW, I have verified that AFP on the mac does not exhibit
this behavior, with an AFP server running on OSX as well as on Linux.

>> > I may well be overlooking something basic, but it seems to me
>> that this
>> > approach would be simpler, more robust & less restrictive than the
>> > proposals involving normalization on some/all platforms and
>> potential
>> > repository changes.
>
> So, what happens if a Linux and a Mac client both add the same file,
> with an A-umlaut in it and then they both commit? Currently, they'll
> both be added to the repository. The Linux client will end up with 2
> files with the same name, the Mac client will not be able to update
> anymore.

the client would have to error out at checkout when it detects that
two distinct repository names map to the same on-disk name, no
argument there.
This is not very different to currently existing behavior w.r.t names
differing only by case in a working copy on a case-insensitive FS,
and could be handled similarly by preventing commit of such
conflicting filenames via hook-script.

It is clearly a policy decision wether the normalizing behavior of
one filesystem should be extended to all filenames handled by
subversion; if yes, there is no way around doing normalization on all
platforms (and the ensuing dependencies); if no, my approach would
provide a low-impact way to allow the use of NFC repository names in
a working copy on HFS+.

Cheers,

Daniel

-- 
** Daniel A. Steffen                   **
** <mailto:das@users.sourceforge.net>  **
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jul 25 09:10:59 2007

This is an archived mail posted to the Subversion Dev mailing list.