[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: "Strange" characters in file names

From: david x callaway <dxc_at_pobox.com>
Date: 2007-03-26 03:08:35 CEST

Hans Salvisberg wrote:
> Ryan Schmidt wrote:
>>>>> svn: Kurs Ern?\195?\164hrung.doc
>>>> Perhaps you have not set the LANG variable so ls and svn don't know how to properly display it. Try export LANG=de_DE.utf8 or whatever the correct value for your OS is. (The contents of the directory /usr/share/locale may tell you what the valid locales are on your system.)
>>> Yes, indeed, this helped. ls still doesn't display the file properly (I don't really care), but svn doesn't complain anymore.
>>>
>>> Can you explain (or point me to an explanation), what this does and why it's needed? I'd prefer not to set a locale (leave it at the default POSIX), because I don't want to introduce a bias towards German. This particular filename happens to be in German, but I'm sure someone will upload a file with a French name sooner or later.
>> svn: Can't convert string from native encoding to 'UTF-8':
>>
>> This does not introduce a bias towards German. It does cause error messages to be printed in German. Based on the name of the file, I assumed you would want that. If you prefer English error messages from Subversion, use en_US.utf8, or whatever it is on your OS.
>>
>> The important part is the .utf8 part, which explains to Subversion and other tools that you are using the UTF-8 character encoding. UTF-8 can handle all languages, so as long as your locale is a UTF-8 locale, you will be able to handle all filenames.
>
> How do I know I'm using the UTF-8 encoding? How do you know? Could svn know it, too?
>
> What exactly does it mean that I'm "using the UTF-8 encoding"? That the filename is UTF-8-encoded?

you can read about UTF-8 here: http://en.wikipedia.org/wiki/UTF-8

I don't know about svn, but the file system itself, on linux, doesn't
see UTF-8 as special. any byte except '/' and '\0' is legal in a file
name ("." and ".." alone always already exist as directories, so preempt
their use as filenames) and no UTF-8 char has an embedded null in it, so
all the system calls to do with files work just fine with UTF-8 names,
e.g. I have a file named "ddxЁЂЃЄЅІЇЈЉЊЋЌЎЏ" (mostly cyrillic) and one
named "größeren" (german), and the filesystem doesn't have any problem
with them. there are of course programming issues, e.g. you can't just
increment a char* that points to a UTF-8 string and have it end up at
the next char.

you don't need to set the locale to anything special to get a german
filename either, e.g.
     echo > "größeren"
will create a file with that name, and on my system at least it will
show up that way, for example if you list the directory either on
the cmdline or with a file browser. my env shows "LANG=en_US.UTF-8",
i.e. my default for all the LC_* vars is US english, UTF-8, but the
german and cyrillic still display fine.

filenames on modern windows are native UTF-16, a two-byte variable
length encoding, although the apis may translate back and forth
depending upon how you compile.

dxc

>
> Instead of two question marks, ls now displays two different box drawing characters. So, even if svn doesn't complain anymore, how do I know that UTF-8 really is the correct encoding and I wouldn't risk putting something into the repository that might cause trouble?
>
>
>>> The directory with the offending file has svn:ignore set to "*". I guess I'll just ignore the entire directory instead, but I wonder why svn looks at the files at all.
>> Whether you want to ignore the files or not is your business.
>
> In this case I do want to ignore thm, but I'm surprised that svn looks at them anyway and trips over the "strange" characters.
>
> Hans
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon Mar 26 03:09:05 2007

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.