[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] add --xml to proplist command

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: 2005-07-26 02:22:14 CEST

I've finally been reviewing this patch more thoroughly, working through the
source code and thinking about how it operates and what results it achieves.

Unfortunately there's a problem with the concept of putting property values
into XML: text in the XML output must be UTF-8. If a property is one that we
recognise ("svn:*") that's fine, we just output it without conversion. If it
is one we don't recognise (e.g. "svk:merge") then we don't know how its value
is already encoded, so we don't know how to convert it to UTF-8, so we need to
do something that guarantees to produce valid XML and be decodable.

We probably wouldn't want to base-64-encode all properties except "svn:*"
because many of them would in fact be text compatible with UTF-8. It isn't
possible to recognise automatically whether a value is already UTF-8, but we
could recognise whether it /looks like/ UTF-8 and leave it alone if it does.
That might be a workable compromise.

Also note that even some UTF-8 character values are not valid in XML - for
example, many control characters. Therefore we need to check even the values
that are valid UTF-8, and possibly base-64-encode them.

Does this sound like the most appropriate algorithm?:

   if ((property value is a valid UTF-8 byte sequence)
       and
       (property value consists of valid XML characters))
   then
     Just use the plain value (with XML escaping of course).
   else
     Encode the value (and then apply XML escaping).

To encode the value, we would probably choose base-64, but consider the case of
a textual field that is mostly ASCII but has some non-UTF-8 bytes too, or is
entirely UTF-8 but contains some characters that are not valid in XML. To
allow such proterty values to be readable we might prefer to use an encoding
which preserves most of the text in a readable form, and just escapes the
disallowed characters or bytes.

Thoughts, anyone?

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Jul 26 02:23:01 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.