[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: fuzzy_escape function in libsvn_subr is not reversible

From: Erik Huelsmann <ehuels_at_gmail.com>
Date: 2007-11-08 16:54:14 CET

On 11/8/07, Jendro, Carsten (SAZ-DE) <CJendro@saz.net> wrote:
> > -----Original Message-----
> > From: dglasser@gmail.com [mailto:dglasser@gmail.com] On
> > Behalf Of David Glasser
> > Sent: Thursday, November 08, 2007 3:46 PM
> > To: Jendro, Carsten (SAZ-DE)
> > Cc: Karl Fogel; dev@subversion.tigris.org
> > Subject: Re: fuzzy_escape function in libsvn_subr is not reversible
> >
> > On 11/8/07, Jendro, Carsten (SAZ-DE) <CJendro@saz.net> wrote:
> > > > -----Original Message-----
> > > > From: Karl Fogel [mailto:kfogel@red-bean.com]
> > > > Sent: Thursday, November 08, 2007 12:25 AM
> > > > To: Jendro, Carsten (SAZ-DE)
> > > > Cc: dev@subversion.tigris.org
> > > > Subject: Re: fuzzy_escape function in libsvn_subr is not
> > reversible
> > > >
> > > > "Jendro, Carsten (SAZ-DE)" <CJendro@saz.net> writes:
> > > > > I have a problem with svnlook when I use it to print an
> > utf8 log
> > > > > message in the console.
> > > > >
> > > > > Under windows it is not possible to set the output mode
> > to utf8,
> > > > > so all text output will be escaped with the fuzzy_escape
> > > > function located
> > > > > in libsvn_subr / utf.c
> > > > >
> > > > > This pocess is not reversible, because every char >= 128
> > > > and 0 will be
> > > > > converted to a replacement in a format like "?\000". But
> > > > the starting
> > > > > char of the replacement, the question mark (?) schould be
> > > > replaced too
> > > > > to make it reversible.
> > > >
> > > > I don't understand how this is reversible.
> > > >
> > > > In general, any file or stream might contain a string like "?\000"
> > > > (for example, a file containing the email you just sent!).
> > > >
> > > > Escaping the "?" does no good, because anything might contain the
> > > > escape sequence too (for example, a mail explaining the escape
> > > > sequence!).
> > >
> > > In a Mail that is explaining the excape sequence there may
> > be a text
> > > like this "Lorem ipsum dolor ?\xxx sit amet, ?\147?\211
> > consectetur,
> > > adipisci velit"
> > > Would be escaped to
> > > "Lorem ipsum dolor ?\063\xxx sit amet, ?\063\147?\063\211
> > consectetur,
> > > adipisci velit"
> > > And can be simply unescaped 100% by a simple function.
> > > "Lorem ipsum dolor ?\xxx sit amet, ?\147?\211 consectetur, adipisci
> > > velit"
> > > It would work fine.
> > >
> > > >
> > > > In the absence of a well-defined character encoding,
> > reversibility
> > > > is not achievable. But that is the situation we are in,
> > when we use
> > > > fuzzy_escape.
> > >
> > > The '?' is <= 127 and well-defined in this situation, i
> > dont see the
> > > problem.
> >
> > While I agree that your scheme would be more reversible,
> > wouldn't it mean that people who are just using ASCII
> > characters to write log messages on platforms without UTF8
> > would have their question marks escaped? I think people
> > would not like that.
>
>
> You can avoid this by escaping question marks only when it is followed
> by a backslash in three numbers.
> Escaping and unescaping will be only a little more complex then.

Your algorithm only complicates decoding (which is imo a good thing).
Trying to work around the problem and creating an environment where
Subversion can output UTF-8 output would be even better though.

bye,

Erik.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Nov 8 16:57:49 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.