[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: fuzzy_escape function in libsvn_subr is not reversible

From: David Glasser <glasser_at_davidglasser.net>
Date: 2007-11-08 15:46:12 CET

On 11/8/07, Jendro, Carsten (SAZ-DE) <CJendro@saz.net> wrote:
> > -----Original Message-----
> > From: Karl Fogel [mailto:kfogel@red-bean.com]
> > Sent: Thursday, November 08, 2007 12:25 AM
> > To: Jendro, Carsten (SAZ-DE)
> > Cc: dev@subversion.tigris.org
> > Subject: Re: fuzzy_escape function in libsvn_subr is not reversible
> >
> > "Jendro, Carsten (SAZ-DE)" <CJendro@saz.net> writes:
> > > I have a problem with svnlook when I use it to print an utf8 log
> > > message in the console.
> > >
> > > Under windows it is not possible to set the output mode to utf8, so
> > > all text output will be escaped with the fuzzy_escape
> > function located
> > > in libsvn_subr / utf.c
> > >
> > > This pocess is not reversible, because every char >= 128
> > and 0 will be
> > > converted to a replacement in a format like "?\000". But
> > the starting
> > > char of the replacement, the question mark (?) schould be
> > replaced too
> > > to make it reversible.
> >
> > I don't understand how this is reversible.
> >
> > In general, any file or stream might contain a string like "?\000"
> > (for example, a file containing the email you just sent!).
> >
> > Escaping the "?" does no good, because anything might contain
> > the escape sequence too (for example, a mail explaining the
> > escape sequence!).
>
> In a Mail that is explaining the excape sequence there may be a text
> like this
> "Lorem ipsum dolor ?\xxx sit amet, ?\147?\211 consectetur, adipisci
> velit"
> Would be escaped to
> "Lorem ipsum dolor ?\063\xxx sit amet, ?\063\147?\063\211 consectetur,
> adipisci velit"
> And can be simply unescaped 100% by a simple function.
> "Lorem ipsum dolor ?\xxx sit amet, ?\147?\211 consectetur, adipisci
> velit"
> It would work fine.
>
> >
> > In the absence of a well-defined character encoding,
> > reversibility is not achievable. But that is the situation
> > we are in, when we use fuzzy_escape.
>
> The '?' is <= 127 and well-defined in this situation, i dont see the
> problem.

While I agree that your scheme would be more reversible, wouldn't it
mean that people who are just using ASCII characters to write log
messages on platforms without UTF8 would have their question marks
escaped? I think people would not like that.

--dave

-- 
David Glasser | glasser_at_davidglasser.net | http://www.davidglasser.net/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Nov 8 15:48:43 2007

This is an archived mail posted to the Subversion Dev mailing list.