[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Proposed resolution: Standardizing on UTF-8 isn't enough

From: Erik Huelsmann <ehuels_at_gmail.com>
Date: 2007-07-18 17:25:32 CEST

On 7/18/07, Mark Phippard <markphip@gmail.com> wrote:
> On 7/18/07, Erik Huelsmann <ehuels@gmail.com> wrote:
> > Proposed resolution
> > Considering the above, combined with the number of reports we have
> > received so far regarding creation of 2 files with the same name (on
> > Linux/Windows) - namely none - probably the best option is to use
> > option (1).
> > At least, that's what I was going to propose until I realized there
> > were mixed client version concerns. Now, I think the only option is to
> > go with (2).
>
> You lost me here. Great summary by the way. Anyway, I am not
> disagreeing with your conclusion, I just do not understand what made
> you decide 2 is the only option. You referenced mixed client
> concerns, but I did not understand that part either.

Yes: If all we do is convert NFD paths to NFC paths on the client
side, then this will only happen on clients which have the new
behaviour. Old clients will still upload NFD paths. If we want to
handle this case, we can't depend on having NFC normalized paths.
Meaning that path comparison must be made agnostic everywhere.

> FWIW, I think a solution that is not perfect but gives people
> reasonable options could be acceptable here. #1 seems like it fits
> that description. Today, OSX users are pretty much just screwed.
> Their only option is to not use these characters.
>
> Anyway, perhaps another way to make this proposal is to create two of them:
>
> 1) Better than what we have today proposal. We'd want to describe
> the things it could not handle as best as we can so that we can agree
> this is still reasonable and worth doing.
>
> 2) The as close to perfect as we can get proposal. This one needs to
> include the effort/time involved and whatever hurt might be involved
> in terms of trying to repair existing data if needed, or problems it
> would still have.
>
> If we go the route of using ICU and trying to really resolve the
> problem, we should also talk about whether this has to be a required
> or optional dependency.
>
> > Unicode has 2 different representations, a 'defect' from which we
> > suffer when comparing pathnames. We need to decide what to do about
> > this issue in order to create a workable situation on the Mac and to
> > prevent people from committing the same file with the same name twice
> > to the repository.
>
> One other issue is that the Mac, which is all we are really trying to
> fix here, does not actually use either of these 2 representations. It
> uses a third bastardized version. I get the impression it is
> compatible with NFD for the most common use cases, but it is not
> completely the same. If we go with something like ICU, presumably we
> would want to know that it is aware of these Mac peculiarities and
> does not just simply implement the official standard.

Actually, I nowhere in my proposal is there a translation from
'Subversion internal standard' to 'OS standard'. When I wrote the
proposal, I thought I'd propose 1 or 3, in those options, we don't
need to translate to OS standard, because the only OS not using to our
standard would enforce the convention itself. (Meaning that ICU
doesn't need to know about the Mac convention...)

Does that help?

bye,

Erik.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jul 18 17:24:42 2007

This is an archived mail posted to the Subversion Dev mailing list.