[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Can't recode string

From: Patrick Smears <patrick.smears_at_ensoft.co.uk>
Date: 2004-10-08 13:21:25 CEST

On Fri, 8 Oct 2004, John Williams wrote:

> I found issues #1847 and #1946 on this subject.
> Both of these have been marked invalid, the second merely
> because it was not discussed on this list.
>
> The problem is that svn is not able to handle characters in
> the filename which do not match the current LANG character set,
> or are not valid UTF-8 characters, if LANG is not set.
>
> Issue #1847 asserts that this is not a subversion issue.
> I wish to dispute that.
>
> 1) Subversion claims to be able to handle binary data.
> But this problem shows that it cannot handle binary data
> in the filename, even when the host OS allows it.

The trouble is that filenames are not binary data - it's character data.

The difference is that character data has a different binary
representation depending on what character set is in use. So the character
'lowercase e acute' is '0xe9' in ISO-8859-1, '0x82' in DOS codepage 775,
'0xc3 0xa9' in UTF-8, and doesn't exist at all in ISO-8859-5.

Internally, subversion stores characters in UTF-8, which allows for all
characters defined in the Unicode standards to be represented (which for
all practical purposes means 'every character possible'). However, since
people's terminals etc do not always accept UTF-8, subversion converts the
characters to the appropriate format before output.

This is great, because it means that a user with an ISO-8859-1 terminal,
one with a UTF-8 terminal and one with a CP775 terminal can all be
accessing the same repository, and view the filename containing a
'lowercase e acute' correctly. However, in order to be able to do this,
subversion does need to be told which character set to use - this seems
fair enough (especially as it takes its information from the standard
environment variables for this purpose).

> 2) If a project has filenames in more than one character set,
> I don't think it is the revision control system's place
> to disallow that. A revision control system should be able
> to accept any file, and give it back in exactly the same way.

Agreed - and (once it knows what character set you're using), subversion
allows this. The restriction it does place, however, is that in order to
view a filename, you must be using a character set that supports that
character.

This is perhaps overly strict (in that it can make it hard to find out
what the problem is when there is a file with a troublesome name), but
consistent with subversion's philosophy of 'don't change the data' - it
will not display the filename in a misleading way.

However, you can force the commands to always succeed by setting up your
environment variables to use UTF-8. (Characters other than the ASCII ones
won't display correctly unless your terminal is also set up for UTF-8, but
at least this allows for debugging to take place.)

> 3) This error causes "svn status" to abort, not reporting the
> status of the remaining files.

I agree that this is less than ideal, but it's not immediately obvious to
me what the 'preferred' behaviour should be - how should it display
characters that are not in your current character set? Perhaps some form
of quoting - but that could easily confuse scripts that are parsing the
output...

At the very least, the error message should be improved to be more
helpful.

> 4) This error causes "svn add" to abort, leaving the directory
> in some sort of half-baked state, which svn revert will not revert.

That's bad... perhaps you could post a transcript of how to reproduce this
problem, and the errors it causes, so that people on the list can
investigate? Include the settings of your environment variables (LANG and
LC*) as these will affect the way subversion operates...

Patrick

-- 
The easy way to type accents in Windows: http://www.frkeys.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Oct 8 13:22:11 2004

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.