[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Thoughts about the transcoding of path names in SVN - it's incontradiction to CVS, SVK and Mercurial...

From: Ryan Schmidt <subversion-2008b_at_ryandesign.com>
Date: Wed, 25 Jun 2008 04:24:07 -0500

On Jun 25, 2008, at 03:32, Marko Kaening wrote:

> I just wanted to double check on thing:
>
> it seems that SVN actually transcodes path names when meeting
> different
> codepages of server and client... Is this correct. I think it must be,
> because if I check out on my utf8-linux I get german umlauts in my
> path
> names the way I committed them from my cp1252-wxp clients.
>
> I LOVE this feature!!!
>
> Why? Because I abuse SVN for hosting all my media files,
> not only my source code. And because I am german, I like
> to have speaking path names, which in my case also include
> umlauts... ;)
>
>
> I wonder though why SVN's developers decided to go for this
> transcoding! I
> wouldn't want to miss this feature, but on mercurial's list it was
> stated
> clearly that such transcoding is highly error-prone and risky if
> you work
> in software development, since many tools like make rely on byte-by-
> byte
> comparisons of file names which get tricky with transcoding.
>
> This is the reason why mercurial (unfortunately for me) does not do
> any
> transcoding, which does not allow me to see correct path names on
> my two
> different systems.
>
> I mean, CVS also doesn't do such a thing! I am used to that and don't
> care - just accepted it. But I thought that more modern systems
> would be
> able to get that right. And SVN does get it right! That's why I was
> surprised to see that Mercurial would be reluctant to go for it.
>
> A little bit sad about this I tested eventually SVK: and see, even
> SVK,
> although based on SVN, DOES NOT GET IT RIGHT. Locally SVK stores a SVN
> repo in UTF-8 encoding, but on checkout it would not transcode it. At
> least it did not do it for me up to now. I still hope for response
> from
> their mailing list...
>
> So, SVN seems to be the only system using transcoding...
>
> I wonder what the lists thoughts about this issue are...
>
> Comments welcome, especially on how developers motivate their
> decision to
> go for this approach!
> How is SVN able to determine the clients codepage in a consistent
> manner
> and avoid a messing-up of the repo with so many possible codepages on
> client's sides?
>
> Regards,
> Marko
>
>
> P.S.: I never tested bazaar, arch or monotone - so I have no clue
> about
> their behaviour.

Subversion stores all filenames in UTF-8 in the repository. It
converts between the repository's UTF-8 and the client's character
encoding by relying on the user to properly set the LANG environment
variable to a locale that makes sense for their system.

I don't see how any software could be thought to function correctly
today if it did not take into consideration the character encoding of
its data.

In fact, Subversion's UTF-8 support is overly simplistic, which
causes nearly insurmountable issues for Mac users who must deal with
files with non-ASCII names which were originally committed from
Windows or Linux. Subversion must go further than just identifying
the character encoding; it must also normalize composed/decomposed
UTF-8 characters. See this issue:

http://subversion.tigris.org/issues/show_bug.cgi?id=2464

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-06-25 11:24:53 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.