[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: fuzzy_escape function in libsvn_subr is not reversible

From: Erik Huelsmann <ehuels_at_gmail.com>
Date: 2007-11-08 16:11:58 CET

On 11/8/07, David Glasser <glasser@davidglasser.net> wrote:
> On 11/8/07, Jendro, Carsten (SAZ-DE) <CJendro@saz.net> wrote:
> > > -----Original Message-----
> > > From: Karl Fogel [mailto:kfogel@red-bean.com]
> > > Sent: Thursday, November 08, 2007 12:25 AM
> > > To: Jendro, Carsten (SAZ-DE)
> > > Cc: dev@subversion.tigris.org
> > > Subject: Re: fuzzy_escape function in libsvn_subr is not reversible
> > >
> > > "Jendro, Carsten (SAZ-DE)" <CJendro@saz.net> writes:
> > > > I have a problem with svnlook when I use it to print an utf8 log
> > > > message in the console.
> > > >
> > > > Under windows it is not possible to set the output mode to utf8, so
> > > > all text output will be escaped with the fuzzy_escape
> > > function located
> > > > in libsvn_subr / utf.c
> > > >
> > > > This pocess is not reversible, because every char >= 128
> > > and 0 will be
> > > > converted to a replacement in a format like "?\000". But
> > > the starting
> > > > char of the replacement, the question mark (?) schould be
> > > replaced too
> > > > to make it reversible.
> > >
> > > I don't understand how this is reversible.
> > >
> > > In general, any file or stream might contain a string like "?\000"
> > > (for example, a file containing the email you just sent!).
> > >
> > > Escaping the "?" does no good, because anything might contain
> > > the escape sequence too (for example, a mail explaining the
> > > escape sequence!).
> >
> > In a Mail that is explaining the excape sequence there may be a text
> > like this
> > "Lorem ipsum dolor ?\xxx sit amet, ?\147?\211 consectetur, adipisci
> > velit"
> > Would be escaped to
> > "Lorem ipsum dolor ?\063\xxx sit amet, ?\063\147?\063\211 consectetur,
> > adipisci velit"
> > And can be simply unescaped 100% by a simple function.
> > "Lorem ipsum dolor ?\xxx sit amet, ?\147?\211 consectetur, adipisci
> > velit"
> > It would work fine.
> >
> > >
> > > In the absence of a well-defined character encoding,
> > > reversibility is not achievable. But that is the situation
> > > we are in, when we use fuzzy_escape.
> >
> > The '?' is <= 127 and well-defined in this situation, i dont see the
> > problem.
>
> While I agree that your scheme would be more reversible, wouldn't it
> mean that people who are just using ASCII characters to write log
> messages on platforms without UTF8 would have their question marks
> escaped? I think people would not like that.

I'm sorry to be so blunt, but I think the output is just fine. The
name of the function indicates it does a lossy thing, though I think
that chances of cases where the sequence '?\' being actually entered
by the user are extremely rare. This means that question marks without
a trailing backslash can safely be interpreted as question marks in
the current scheme.

It also shouldn't be too hard to get svnlook to output UTF8 (on *nix):

 $ LANG=en_US.UTF-8 svnlook <your arguments>

is all that's required.

HTH,

Erik.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Nov 8 16:28:57 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.