Why "invalid UTF-8 sequence" also on ignored files?

From: Markus Fischer <markus_at_fischer.name>
Date: Tue, 24 Feb 2009 12:26:07 +0100

Hello,

I was puzzled today when I saw this

$ svn status
svn: Valid UTF-8 data
(hex: 31 34 32 38 5f 73 74 75 70 70)
followed by invalid UTF-8 sequence
(hex: e4 63 6b 5f)

Using

$ svn --version
svn, version 1.5.1 (r32289)
compiled Dec 31 2008, 06:38:09

I converted the hex sequence and it turned out to be a filename,
"1428_stuppÃ¤ck_".

I searched for this filename, and to my surprise this file was, due
svn:ignore set on its directory, actually on the ignore list, but
nevertheless caused this message:

markus_at_dev01:/data/legacy/depression.at$ svn status
apachewrite/mediencache/1428_stuppÃ¤ck_neu.gif
I apachewrite/mediencache/1428_stuppÃ¤ck_neu.gif

Is this deliberately? It is somehow inconvenient because it seems the
ignoring of files doesn't properly work when files contain
out-of-the-current-locale characters.

But then, I was already puzzled that my find command turned up empty:

$ find -iname 1428\*
$

I had to unset the locale, then it worked:

$ unset LC_CTYPE
$ find -iname 1428\*
./apachewrite/mediencache/1428_stuppÃ¤ck_neu.gif

So all in all probably an expected edge case ...?

thanks for any tips,
- Markus

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1220289

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-02-24 12:31:48 CET

This message: [ Message body ]
Next message: Jan Erik MostrÃ¶m: "Updating time stamps"
Previous message: petesea_at_bigfoot.com: "Repository upgrade command in 1.5"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]