# RE: fuzzy_escape function in libsvn_subr is not reversible

From: Jendro, Carsten (SAZ-DE) <CJendro_at_saz.net>
Date: 2007-11-09 14:52:29 CET

> -----Original Message-----
> From: Erik Huelsmann [mailto:ehuels@gmail.com]
> Sent: Thursday, November 08, 2007 4:54 PM
> To: Jendro, Carsten (SAZ-DE)
> Cc: David Glasser; Karl Fogel; dev@subversion.tigris.org
> Subject: Re: fuzzy_escape function in libsvn_subr is not reversible
>
> On 11/8/07, Jendro, Carsten (SAZ-DE) <CJendro@saz.net> wrote:
> > > -----Original Message-----
> > > From: dglasser@gmail.com [mailto:dglasser@gmail.com] On Behalf Of
> > > David Glasser
> > > Sent: Thursday, November 08, 2007 3:46 PM
> > > To: Jendro, Carsten (SAZ-DE)
> > > Cc: Karl Fogel; dev@subversion.tigris.org
> > > Subject: Re: fuzzy_escape function in libsvn_subr is not
> reversible
> > >
> > > On 11/8/07, Jendro, Carsten (SAZ-DE) <CJendro@saz.net> wrote:
> > > > > -----Original Message-----
> > > > > From: Karl Fogel [mailto:kfogel@red-bean.com]
> > > > > Sent: Thursday, November 08, 2007 12:25 AM
> > > > > To: Jendro, Carsten (SAZ-DE)
> > > > > Cc: dev@subversion.tigris.org
> > > > > Subject: Re: fuzzy_escape function in libsvn_subr is not
> > > reversible
> > > > >
> > > > > "Jendro, Carsten (SAZ-DE)" <CJendro@saz.net> writes:
> > > > > > I have a problem with svnlook when I use it to print an
> > > utf8 log
> > > > > > message in the console.
> > > > > >
> > > > > > Under windows it is not possible to set the output mode
> > > to utf8,
> > > > > > so all text output will be escaped with the fuzzy_escape
> > > > > function located
> > > > > > in libsvn_subr / utf.c
> > > > > >
> > > > > > This pocess is not reversible, because every char >= 128
> > > > > and 0 will be
> > > > > > converted to a replacement in a format like "?\000". But
> > > > > the starting
> > > > > > char of the replacement, the question mark (?) schould be
> > > > > replaced too
> > > > > > to make it reversible.
> > > > >
> > > > > I don't understand how this is reversible.
> > > > >
> > > > > In general, any file or stream might contain a string
> like "?\000"
> > > > > (for example, a file containing the email you just sent!).
> > > > >
> > > > > Escaping the "?" does no good, because anything might contain
> > > > > the escape sequence too (for example, a mail explaining the
> > > > > escape sequence!).
> > > >
> > > > In a Mail that is explaining the excape sequence there may
> > > be a text
> > > > like this "Lorem ipsum dolor ?\xxx sit amet, ?\147?\211
> > > consectetur,
> > > > adipisci velit"
> > > > Would be escaped to
> > > > "Lorem ipsum dolor ?\063\xxx sit amet, ?\063\147?\063\211
> > > consectetur,
> > > > adipisci velit"
> > > > And can be simply unescaped 100% by a simple function.
> > > > "Lorem ipsum dolor ?\xxx sit amet, ?\147?\211 consectetur,
> > > > adipisci velit"
> > > > It would work fine.
> > > >
> > > > >
> > > > > In the absence of a well-defined character encoding,
> > > reversibility
> > > > > is not achievable. But that is the situation we are in,
> > > when we use
> > > > > fuzzy_escape.
> > > >
> > > > The '?' is <= 127 and well-defined in this situation, i
> > > dont see the
> > > > problem.
> > >
> > > While I agree that your scheme would be more reversible,
> wouldn't it
> > > mean that people who are just using ASCII characters to write log
> > > messages on platforms without UTF8 would have their
> question marks
> > > escaped? I think people would not like that.
> >
> >
> > You can avoid this by escaping question marks only when it
> is followed
> > by a backslash in three numbers.
> > Escaping and unescaping will be only a little more complex then.
>
> Your algorithm only complicates decoding (which is imo a good thing).
> Trying to work around the problem and creating an environment
> where Subversion can output UTF-8 output would be even better though.

Yes, this would be the best way, but i tryed this first with no results.

I havent found a way to get UTF8 output.
Other Codepages works fine on Windows, but the Windows UTF8 codepage
65000 don't works.

I think the best way would be a commandline switch like -f[ile]
<filename> to redirect output to a utf8 encoded file, or -utf8 to force
utf8 output.

That would solve all problems and unescaping fuzzy_excape is not needed
any more.

>
> bye,
>
> Erik.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org