[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8 support for Unix with APR?

From: Erik Huelsmann <ehuels_at_gmail.com>
Date: Wed, 13 Feb 2008 21:56:10 +0100

On Feb 13, 2008 9:25 PM, Ryan Schmidt <subversion-2008a_at_ryandesign.com> wrote:
>
> On Feb 13, 2008, at 09:14, Erik Huelsmann wrote:
>
> >> SVN doesn't get it right either since it's ignorant of
> >> unicode
> >> normalization forms [1].
> >
> > Well, yes and no :-) Subversion depends (more so than, say, /bin/ls)
> > on a sanely configured environment (locale on disk == locale in
> > terminal, locale configured in the first place, etc). This is fine,
> > since Subversion needs to operate accross different configurations and
> > even OSes (whereas /bin/ls does not).
>
> Hold up for a second... I'm havin' a little trouble...

Ok. Lemme explain.

> > locale on disk
>
> What is this? I know that on my Mac, ...

Ah! but the Mac (although that was snipped out of the quote) was
exempt from 'Normal unix behaviour', since they use UTF-8 on disk *all
the time*. The rest of the unix world uses LC_CTYPE, LC_ALL or LANG
environment variables to determine what the current locale is. It then
applies that setting both to paths on the disk as well as any output
sent to the terminal.

>
> > == locale in terminal,
>
> This is the locale I know about. "LANG=en_US.UTF-8" and so forth.

But, as stated above, in the rest of the unix world, LANG= also
applies to paths read from disk. The Mac situation seems more sane,
but unfortunately isn't widespread...

> > locale configured in the first place
>
> What is this? What is "in the first place"?

The fact that you actually *have* a LANG= setting. If you don't,
you're restricted to using ASCII characters (because you're restricted
to the default "C" locale which only supports ascii characters).

> Is that when I first
> checked out a working copy? when I first made a repository? when I
> first installed Subversion? when I first installed the OS?

When you installed your windows (presumably), or when you last created
your Unix user. On the Mac, it's a system convention, so no need to
configure what locale to expect from the disk. For the rest of the
world, I have no idea when or how locale settings may be influenced.

> It sounded like Vincent was saying that if a working copy is created
> under one terminal locale setting, but then accessed with a different
> terminal locale setting, things don't work right.

And that's correct. With the right choice of pathnames the sequence of
commands below could be broken (the second command will return a
"Non-conforming UTF-8 sequence encountered." error):

$ LANG=en_US.iso88591 svn checkout URL your-path
$ LANG=en_US.UTF-8 svn update your-path

Now, Subversion could remember that the path was checked out using the
latin1 setting, but essentially you're telling it you changed your
paths (and output) to UTF-8. Should it ignore that? Absolutely not!
You might be (*should* be) right, in which case you'd end up with the
wrong UTF-8, when it's being read as if it were the latin1 which you
checked out...

HTH,

Erik.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-02-13 21:56:40 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.