[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: bug report: property editor does not correctly handle UTF-8

From: Garret Wilson <garret_at_globalmentor.com>
Date: Mon, 26 May 2008 13:38:50 -0700

Stefan Küng wrote:
> Garret Wilson wrote:
>> TortoiseSVN 1.4.8 build 12137 on Windows XP SP3 does not correctly
>> handle UTF-8 when editing properties and seems to assume that all
>> property values are ASCII.
>>
>> This is broken.
>> This is very bad.
>> This is an important, fundamental data encoding issue.
>
> It's neither broken nor bad.
> Only the Subversion properties must be encoded in utf8 (the svn:
> properties, and our own ones bugtraq:, tsvn:, ...).
> All other properties are actually *binary* and therefore must not be
> encoded but treated as entered.

It's not that simple, but the story extends beyond TortoiseSVN.

As an example let me take one property named
"http·3A·2F·2Fpurl.org·2Fdc·2Felements·2F1.1·2Ftitle". (For information
on why the property is so named, see
http://www.garretwilson.com/blog/2008/04/08/subversionpropertynamespaces.xhtml
). This title is used to store the Dublin Core property value for "title".

The value of the property was specified using the WebDAV front-end to
Subversion running on mod_dav on Apache. The property value was
specified, not as binary, but as text in the WebDAV XML request
document. While the individual XML document may be encoded using some
charset (e.g. UTF-8 or UTF-16), the text node values in the XML document
object model have no knowledge of encoding. So the value I provided to
Subversion via WebDAV was 100% text; if Subversion exposes the property
value as a binary value of the UTF-8 encoding of the original provided
text value, this isn't really correct, as a binary UTF-16 encoding would
be just as valid a representation of the Unicode values I provided as is
UTF-8. I didn't provide binary values---I provided a sequence of Unicode
code points. Perhaps the WebDAV->Subversion conversion is partially at
fault here.

On the other side, though, if TortoiseSVN is interpreting the value as
binary, why does it provide a text input for editing? If these are
really binary values, TortoiseSVN should instead provide a hex editor, no?

>
> TSVN 1.5 does however check whether you enter text or some binary data
> and encodes it in UTF8 if it recognizes text.

OK, but what about the other way around---does it recognize data stored
in UTF-8 (placed there, for example, by WebDAV commands) as text instead
of binary data?

Garret

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_tortoisesvn.tigris.org
For additional commands, e-mail: users-help_at_tortoisesvn.tigris.org
Received on 2008-05-26 22:40:34 CEST

This is an archived mail posted to the TortoiseSVN Users mailing list.