[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Thoughts about the transcoding of path names in SVN - it's incontradiction to CVS, SVK and Mercurial...

From: Marko Kaening <mk362_at_mch.osram.de>
Date: Wed, 25 Jun 2008 10:32:54 +0200 (CEST)

Hi list,

I just wanted to double check on thing:

it seems that SVN actually transcodes path names when meeting different
codepages of server and client... Is this correct. I think it must be,
because if I check out on my utf8-linux I get german umlauts in my path
names the way I committed them from my cp1252-wxp clients.

        I LOVE this feature!!!

        Why? Because I abuse SVN for hosting all my media files,
        not only my source code. And because I am german, I like
        to have speaking path names, which in my case also include
        umlauts... ;)

I wonder though why SVN's developers decided to go for this transcoding! I
wouldn't want to miss this feature, but on mercurial's list it was stated
clearly that such transcoding is highly error-prone and risky if you work
in software development, since many tools like make rely on byte-by-byte
comparisons of file names which get tricky with transcoding.

This is the reason why mercurial (unfortunately for me) does not do any
transcoding, which does not allow me to see correct path names on my two
different systems.

I mean, CVS also doesn't do such a thing! I am used to that and don't
care - just accepted it. But I thought that more modern systems would be
able to get that right. And SVN does get it right! That's why I was
surprised to see that Mercurial would be reluctant to go for it.

A little bit sad about this I tested eventually SVK: and see, even SVK,
although based on SVN, DOES NOT GET IT RIGHT. Locally SVK stores a SVN
repo in UTF-8 encoding, but on checkout it would not transcode it. At
least it did not do it for me up to now. I still hope for response from
their mailing list...

So, SVN seems to be the only system using transcoding...

I wonder what the lists thoughts about this issue are...

Comments welcome, especially on how developers motivate their decision to
go for this approach!
How is SVN able to determine the clients codepage in a consistent manner
and avoid a messing-up of the repo with so many possible codepages on
client's sides?


P.S.: I never tested bazaar, arch or monotone - so I have no clue about
their behaviour.

To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-06-25 10:33:21 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.