[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Encoding problems in subversion under Mac OS X (HFS+)

From: dLux <dlux_at_dlux.hu>
Date: 2005-12-05 23:22:04 CET

Hi Guys,

I really can't believe this thing can happen: why subversion uses
unicode filenames if it cannot handle such a common thing as a Mac OS
X default filesystem. I understand that OSX is a weird Unix in many
aspects, but man, many people use this.

Please someone just tell me what the heck I can do with this problem:
is it solvable easily? Is there any patch I can apply? Or just forget
using accents in the filenames? Or I am doing something wrong and it
works file for everyone else?


Balázs Szabó (dLux)
-- -- - - - -- -

On 2005.12.04., at 22:36, Balázs Szabó (dLux) wrote:

> Hi,
> Thank you for the explanation and the idea.
> But what can I do with it as a subversion user? Does anyone have a
> patch or something like this for this problem?
> Thanks,
> Balázs Szabó (dLux)
> -- -- - - - -- -
> On 2005.12.03., at 18:01, Paul Koning wrote:
>>>>>>> "Balázs" == Balázs Szab <Bal> writes:
>> Balázs> Hi, I have problems using Subversion on OSX (10.4.3). I have
>> Balázs> tried a few different versions and the problem is always the
>> Balázs> same.
>> Balázs> I have checked out a repository, which I created on Linux,
>> Balázs> and it contained filenames like "statisztikák.sxc"
>> Balázs> I set up the environment before I did anything:
>> Balázs> export LC_CTYPE="hu_HU.UTF-8"
>> Balázs> The checkout worked fine, but right after the checkout, I
>> had
>> Balázs> the following output for svn status (SVN 1.3RC4, but the
>> Balázs> results are similar with 1.2.3 as well):
>> Balázs> ? statisztikák.sxc ! statisztikák.sxc
>> Balázs> The problem can be that (as I read elsewhere), HFS+ stores
>> Balázs> the filenames in decomposed form, and since "á" has two
>> UTF-8
>> Balázs> code in composed and decomposed forms, SVN thinks that this
>> Balázs> file is different what is just checked out...
>> That sounds plausible. This problem can appear anytime you deal with
>> strings that aren't plain English text -- accents, for example.
>> There's a standard solution designed in the IETF called Stringprep
>> (it's an RFC, I don't have the number handy). Basically it involves
>> translating the string into a single "canonical" format, so that no
>> matter which choice of encoding you start with, after Stringprep
>> there
>> is only one possible outcome. Think of it as the UTF analog of
>> case-insensitive comparison.
>> So in order to compare UTF strings, you first run the two through
>> Stringprep, and after that you compare them. That way, two strings
>> that are the same to the user will also be the same to the program,
>> and any irrelevant transformations done in storing the strings (like
>> the HFS+ one) will not confuse things.
>> paul
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
>> For additional commands, e-mail: users-help@subversion.tigris.org

To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon Dec 5 23:25:12 2005

This is an archived mail posted to the Subversion Users mailing list.