-----BEGIN PGP SIGNED MESSAGE-----
[Sorry that I forgot to make my original post to the list, so everybody,
read the context.]
On Monday 27 January 2003 20:16, you wrote:
Wow. Let me say up front that I appreciate the input. I have a few
questions before I'm fully convinced now that my 45 minutes of coding
were not well-spent. More below:
Peter Davis firstname.lastname@example.org writes:
* Conceptually, the name of an entry is an attribute of the entry.
That's why I think it should remain an attribute of the entry
Fine, but hardly a technical argument.
Well, my sanity has been technically ruined by many a poorly designed XML
format. :-) (Not that this would necessarily be poorly designed.)
* Making the name CDATA essentially prohibits adding children to the
entry element in the future. Who knows what those could be, but why
eliminate that option? (Yes, technically there is no restriction on
CDATA+children together, but it's ugly as hell, and whitespace from
pretty-printing becomes a big issue.)
This is a technical argument, and one that I considered, too. It just
seemed so unlikely that we'd need to subcategorize what is essential a
dirent that I dismissed it. Still, worth noting, and we probably
should avoid painting ourselves into a corner.
Maybe having nested entries could some day facilitate svn:externals or
having a single .svn dir for an entire wc, or maybe the prop file could be
eliminated by nesting appropriate XML in the entries themselves. Like I
said, who knows? -- nothing to argue here.
* Would newlines in filenames still need to be encoded? They're legal
under UNIX, and work fine in XML, although they ruin any
pretty-printing. But what about carriage returns in filenames? XML's
newline normalization still requires them to be escaped.
I've asked the XML parser to preserve whitespace in the CDATA, so I
get newlines as newlines.
Actually, I was talking about the CR in CRLF (or just CR by itself).
* Code-wise, it is hardly more complicated to encode tabs as in
addition to and .
My concern is not about special-casing tabs, but about special-casing
every character that's not preserved by parsing attribute values
(versus those not preserved when parsing CDATA). Is it a parser bug
that an attribute with a tab comes back with spaces, even though an
attribute with two contiguous spaces is returned as two contiguous
spaces? I'm not clear on what the XML spec states there (and got a
little lost reading the docs at W3.org).
No bug, see http://www.w3.org/TR/REC-xml#AVNormalize. According to the
listed algorithm, each whitespace char gets transformed into a single
space. So a tab and a space will become two spaces, two newlines become
two spaces, and so do two real spaces. It's only a bug if the attribute's
type is not CDATA according to the DTD, but since the entries file has no
DTD (and since the filename wouldn't fit into any other type), obviously
that is not the case.
Now about this special casing: is this merely because libsvn_subr/xml.c
functions provide to escape CDATA but not attributes? From line 50 of
while (q end *q != '' *q != '' *q != ''
*q != '' *q != '\'')
If you ask me, there needs two be two sets of functions:
(svn_)xml_escape_(*string*), and (svn_)xml_escape_attr_(*string*).
The only difference between CDATA and attribute escaping is the addition of
the four newline characters, #x20, #xD, #xA, and #x9, and the possible
addition of the single- and double-quote chars (by the way, why are quotes
being escaped for normal CDATA?):
while (q end *q != '' *q != '' *q != ''
*q != '' *q != '\'' *q != '\n'
*q != '\r' *q != '\t' *q != ' ')
// with appropriate additions to the switch()
Fixing attribute escaping, which as far as I can tell is currently
completely broken with regard to whitespace, will kill two birds.
Filenames are not the only thing that could potentially be affected by the
bug, so it needs to be fixed either way, and fixing it will eliminate the
technical need for this change. Did you decide to implement this change
because you tried to make a file with a tab in the name? I just tried it,
and it is in fact completely broken, unless there is a bug in the XML
parser that doesn't normalize tabs to spaces.
Oh yeah, while we're on the topic of entry names: would you care to
clean up the svn:this_dir hack? Perhaps if changed to a name
element, the lack of such an element could mean the current dir?
Actually, SVN_WC_ENTRY_THIS_DIR used to be set to , but programmers
became lazy and started assuming such, and not using the #define.
That was the original reason I changed the #define to svn:this_dir.
Now that Subversion actually has a large community full of watchful,
code-reviewing eyes, it's probably safe to switch this back.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
-----END PGP SIGNATURE-----
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org
Received on Sat Oct 14 02:21:11 2006