On 7/26/05 2:22 AM, "Julian Foad" <email@example.com> sneezed:
> I've finally been reviewing this patch more thoroughly, working through the
> source code and thinking about how it operates and what results it achieves.
> Unfortunately there's a problem with the concept of putting property values
> into XML: text in the XML output must be UTF-8. If a property is one that we
> recognise ("svn:*") that's fine, we just output it without conversion. If it
> is one we don't recognise (e.g. "svk:merge") then we don't know how its value
> is already encoded, so we don't know how to convert it to UTF-8, so we need to
> do something that guarantees to produce valid XML and be decodable.
> We probably wouldn't want to base-64-encode all properties except "svn:*"
> because many of them would in fact be text compatible with UTF-8. It isn't
> possible to recognise automatically whether a value is already UTF-8, but we
> could recognise whether it /looks like/ UTF-8 and leave it alone if it does.
> That might be a workable compromise.
> Also note that even some UTF-8 character values are not valid in XML - for
> example, many control characters. Therefore we need to check even the values
> that are valid UTF-8, and possibly base-64-encode them.
Is using XML 1.1 a non-option...I think it allows control characters.
> To encode the value, we would probably choose base-64, but consider the case
> a textual field that is mostly ASCII but has some non-UTF-8 bytes too, or is
> entirely UTF-8 but contains some characters that are not valid in XML. To
> allow such proterty values to be readable we might prefer to use an encoding
> which preserves most of the text in a readable form, and just escapes the
> disallowed characters or bytes.
Didn't the [short lived] discussion on types in properties come to the
conclusion that just because data -can- be represented in one way does not
mean any conclusions should be drawn about that data. Just because a
property is mostly ASCII doesn't mean it is, or is a particulary useful
string when rendered as such (the first few bytes of many file formats are
ASCII, but I could pick one or two that would be actually meaningful to me).
The code is currently using an 'xml:endcoding="base64"' attribute (again XML
1.1?), which I think is fairly known, is it worth using something less
common just because it might be "more readable"?
Why is base64 so bad, the speed, the size?
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Tue Jul 26 04:41:31 2005