[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: fuzzy_escape function in libsvn_subr is not reversible

From: Jendro, Carsten (SAZ-DE) <CJendro_at_saz.net>
Date: 2007-11-08 16:09:17 CET

> -----Original Message-----
> From: dglasser@gmail.com [mailto:dglasser@gmail.com] On
> Behalf Of David Glasser
> Sent: Thursday, November 08, 2007 3:46 PM
> To: Jendro, Carsten (SAZ-DE)
> Cc: Karl Fogel; dev@subversion.tigris.org
> Subject: Re: fuzzy_escape function in libsvn_subr is not reversible
>
> On 11/8/07, Jendro, Carsten (SAZ-DE) <CJendro@saz.net> wrote:
> > > -----Original Message-----
> > > From: Karl Fogel [mailto:kfogel@red-bean.com]
> > > Sent: Thursday, November 08, 2007 12:25 AM
> > > To: Jendro, Carsten (SAZ-DE)
> > > Cc: dev@subversion.tigris.org
> > > Subject: Re: fuzzy_escape function in libsvn_subr is not
> reversible
> > >
> > > "Jendro, Carsten (SAZ-DE)" <CJendro@saz.net> writes:
> > > > I have a problem with svnlook when I use it to print an
> utf8 log
> > > > message in the console.
> > > >
> > > > Under windows it is not possible to set the output mode
> to utf8,
> > > > so all text output will be escaped with the fuzzy_escape
> > > function located
> > > > in libsvn_subr / utf.c
> > > >
> > > > This pocess is not reversible, because every char >= 128
> > > and 0 will be
> > > > converted to a replacement in a format like "?\000". But
> > > the starting
> > > > char of the replacement, the question mark (?) schould be
> > > replaced too
> > > > to make it reversible.
> > >
> > > I don't understand how this is reversible.
> > >
> > > In general, any file or stream might contain a string like "?\000"
> > > (for example, a file containing the email you just sent!).
> > >
> > > Escaping the "?" does no good, because anything might contain the
> > > escape sequence too (for example, a mail explaining the escape
> > > sequence!).
> >
> > In a Mail that is explaining the excape sequence there may
> be a text
> > like this "Lorem ipsum dolor ?\xxx sit amet, ?\147?\211
> consectetur,
> > adipisci velit"
> > Would be escaped to
> > "Lorem ipsum dolor ?\063\xxx sit amet, ?\063\147?\063\211
> consectetur,
> > adipisci velit"
> > And can be simply unescaped 100% by a simple function.
> > "Lorem ipsum dolor ?\xxx sit amet, ?\147?\211 consectetur, adipisci
> > velit"
> > It would work fine.
> >
> > >
> > > In the absence of a well-defined character encoding,
> reversibility
> > > is not achievable. But that is the situation we are in,
> when we use
> > > fuzzy_escape.
> >
> > The '?' is <= 127 and well-defined in this situation, i
> dont see the
> > problem.
>
> While I agree that your scheme would be more reversible,
> wouldn't it mean that people who are just using ASCII
> characters to write log messages on platforms without UTF8
> would have their question marks escaped? I think people
> would not like that.

You can avoid this by escaping question marks only when it is followed
by a backslash in three numbers.
Escaping and unescaping will be only a little more complex then.

> --dave
>
> --
> David Glasser | glasser@davidglasser.net |
> http://www.davidglasser.net/
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Nov 8 16:17:38 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.