I've finally been reviewing this patch more thoroughly, working through the 
source code and thinking about how it operates and what results it achieves.
Unfortunately there's a problem with the concept of putting property values 
into XML: text in the XML output must be UTF-8.  If a property is one that we 
recognise ("svn:*") that's fine, we just output it without conversion.  If it 
is one we don't recognise (e.g. "svk:merge") then we don't know how its value 
is already encoded, so we don't know how to convert it to UTF-8, so we need to 
do something that guarantees to produce valid XML and be decodable.
We probably wouldn't want to base-64-encode all properties except "svn:*" 
because many of them would in fact be text compatible with UTF-8.  It isn't 
possible to recognise automatically whether a value is already UTF-8, but we 
could recognise whether it /looks like/ UTF-8 and leave it alone if it does. 
That might be a workable compromise.
Also note that even some UTF-8 character values are not valid in XML - for 
example, many control characters.  Therefore we need to check even the values 
that are valid UTF-8, and possibly base-64-encode them.
Does this sound like the most appropriate algorithm?:
   if ((property value is a valid UTF-8 byte sequence)
       and
       (property value consists of valid XML characters))
   then
     Just use the plain value (with XML escaping of course).
   else
     Encode the value (and then apply XML escaping).
To encode the value, we would probably choose base-64, but consider the case of 
a textual field that is mostly ASCII but has some non-UTF-8 bytes too, or is 
entirely UTF-8 but contains some characters that are not valid in XML.  To 
allow such proterty values to be readable we might prefer to use an encoding 
which preserves most of the text in a readable form, and just escapes the 
disallowed characters or bytes.
Thoughts, anyone?
- Julian
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Jul 26 02:23:01 2005