[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: 1.3 now blocking on path escaping in httpd operational logging

From: <kfogel_at_collab.net>
Date: 2005-09-27 17:51:00 CEST

"C. Michael Pilato" <cmpilato@collab.net> writes:
> > Here are the rules for logging within httpd (see gen_test_char.c):
> >
> > /* For logging, escape all control characters,
> > * double quotes (because they delimit the request in the log
> > * file)
> > * backslashes (because we use backslash for escaping)
> > * and 8-bit chars with the high bit set
> > */
> >
> > As I was trying to make clear in IRC yesterday, UTF-8 isn't an option.
> > httpd (via ap_escape_logitem function) will convert the high-bit values
> > via the c2x() function - which will print the hex value anyway
> > (%xx).
>
> Gotcha. And, ick. And, ugh. And, yuck.
>
> Then I must present my position reversal reversal petition, whereby I
> petition to reverse the reversal of my position. I'll follow with my
> updated position proposition, in which I propose that after reversing
> the reversal of my position, my position is now undefined.
>
> I need a physician.

Just FYI, Mike, I read that aloud to Karen and we laughed for a full
minute :-).

Justin Erenkrantz wrote:
> As Paul Querna and myself mentioned, if Subversion needs to write UTF-8
> logs, we can't rely upon httpd's logging mechanisms.

Okay. In other words, httpd essentially cannot log events about local
(server) paths in a perfectly parseable way. Furthermore, the
escaping that httpd enforces actually uses *two* different escape
codes: \ for some things, and % for others. But only one of the
sequences escapes itself; the other does not, and cannot be
distinguished from that escape sequence simply appearing in the
original string.

I need a moment to let this all sink in. I understand the security
concerns that required escaping control chars and high-bit chars, but
having two different escape sequences (and one of them unescapable!)
is a pity. Oh well. We've done similar things in Subversion, I
suppose, when we need to fall back from an recoding failure.

   -* moment passes *-

Okay, mental adjustment made. Now, despite Miha Vitorovic's mail,
pointing out this Apache error message with a file path in it:

  [Tue May 24 14:34:23 2005] [error] [client 192.168.225.98] File does not
  exist: C:/Program Files/Apache Group/Apache2/htdocs/favicon.ico

...I still think URI-escaping is the best solution we can do here.

Given the limitations of httpd logging, our only other choice would be
to come up with a non-intuitive custom solution. I mean that
literally: "non-intuitive" is not simply a synonym of "custom",
rather, our solution would have to be *both* custom and non-intuitive,
because we can't ever get an odd number of backslashes in the output.
For example:

1) We could use double backslash (\\) itself as the escape sequence.
   That is, any sequence of *two* backslashes in the log would be
   treated as one actual backslash by parsers, just to get to the
   first level of reduction, except for double quote, which would have
   to be treated specially (see example below). Highly unusual, hard
   for humans to read, and results in escape sequences of varying
   length. Yuck.

2) Use % as the escape sequence. But then you'd have to look at the
   next *two* characters, since we'd no choice but to represent
   backslash with %\\ and double-quote with %\", as httpd is going to
   force the \ on us in those cases. Hex sequences are already %xx,
   which is also two characters, so that's okay. But when escaping a
   single quote, we'd have to add in some extra char ourselves, and
   worse, it couldn't be backslash, because httpd would double that!
   So we'd have to do %-' or %'' or something unusual like that. Yuck.

Let's call URI-escaping scheme (0), to compare it with (1) and (2)
above. Here's how this path would look

   has'quote"doublequote\backslash space

in the three schemes:

   0) has%27quote%22doublequote%5Cbackslash%20space

   1) has\\'quote\\\"doublequote\\\\backslash space

   2) has%-'quote%\"doublequote%\\backslash space

Note that in (1) and (2), we'd first have to run the path through our
*own* escaper. Then it would go through httpd's escaper before
hitting the log file. For (1), our escaper would double all
backslashes, and put a backslash in front of double quotes. For (2),
our escaper would put % before double quotes, backslashes, and would
put %- before single quotes. Parsing these is left as an exercise for
the reader, especially the clinically insane reader with lots of time
on her hands and a research lab funded by an eccentric millionaire at
her disposal.

I hope I don't have to go much farther to convince you all how heinous
(1) and (2) would be :-). Let's go with URI-escaping, because it's
the best we can do in a difficult situation.

When I'd finished writing the above, I polled mail, and saw this:

Greg Hudson writes:
> I'm not sure what's so bad about URI-encoding things that aren't URIs.
>
> Inventing a whole new escaping system seems like a bad idea, especially
> since URI-encoding is perfectly readable for the 99% of pathnames and
> property names which don't contain special characters.

Yup. What he said.

Thoughts? Consensus?

-Karl

-- 
www.collab.net  <>  CollabNet  |  Distributed Development On Demand
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Sep 27 19:05:38 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.