[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Mac OS X: why LC_ALL needs to be specified (Was: problems adding files with umlauts)

From: Thomas Singer <subversion_at_smartcvs.com>
Date: 2006-07-07 11:00:32 CEST

Hi Ulrich,

You are making it too simple: you assume that the file name already _is_
plain UTF-8. My Java example works as expected:

   final File dir = new File("file-test");
   dir.mkdirs();
   final File file = new File(dir, "invalid\u00FF\u00FE");
   file.createNewFile();
   for (String fileName : dir.list()) {
     System.out.println(fileName);
   }
   file.delete();

> The thing is that, as Wilfredo said and whose attribution you snipped,
> filenames are UTF-8 _by_ _convention_ and nothing enforces this.

As I understand it, file names are stored *in the repository* as UTF-8 (by
convention) and the Subversion client needs to enforce the correct encoding
from the OS' native file name encoding. With Java this is no problem, since
it does not simply treat characters as bytes and lists the directory content
correctly (on Mac with decomposed umlauts, but thats another problem) and
hence can (without setting the LC_ALL variable) convert the file name to
UTF-8 or what ever encoding you want. If Java can do that without setting
LC_ALL, it also should be technically possible from C(++).

--
Best regards,
Thomas Singer
_____________
SyntEvo GmbH
Schillerallee 2
83457 Bayerisch Gmain
Germany
www.syntevo.com
Ulrich Eckhardt schrieb:
> On Friday 07 July 2006 09:02, Thomas Singer wrote:
>>> That said, it is possible to write file names containing bytes that can't
>>> decode as UTF-8.
>> I can't believe that. Could you please give an reproducible example?
> 
> C++ code:
> 
> #include <fstream>
> int main() {
>   // the value 0xff and 0xfe must not occur in UTF-8 text
>   char const filename[] = { 'i','n','v','a','l','i','d',0xff,0xfe,'\0' };
>   std::ofstream out(filename);
>   out << "aha!\n";
> }
> 
> The thing is that, as Wilfredo said and whose attribution you snipped, 
> filenames are UTF-8 _by_ _convention_ and nothing enforces this.
> 
> Uli
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Jul 7 11:02:28 2006

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.