[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1358022 - in /subversion/trunk: LICENSE NOTICE subversion/include/svn_utf.h subversion/libsvn_subr/utf_width.c subversion/svn/file-merge.c

From: Stefan Sperling <stsp_at_elego.de>
Date: Mon, 9 Jul 2012 16:48:00 +0200

On Mon, Jul 09, 2012 at 04:04:42PM +0200, Johan Corveleyn wrote:
> On Mon, Jul 9, 2012 at 3:30 PM, Stefan Sperling <stsp_at_apache.org> wrote:
> > On Mon, Jul 09, 2012 at 02:47:25PM +0200, Bert Huijben wrote:
> >> How do you check if the file you are merging is valid utf-8?
> >
> > See the merge_chunks() function.
> >
> > We convert data to UTF-8 from the native (locale) encoding.
> > This cannot fail (every encoding can be represented in UTF-8)
> > but the result might look funny in case the file uses some other encoding
> > than the native one. But that's OK -- this conversion happens only for
> > display purposes, data in the actual file is never changed, so you can
> > still edit individual chunks in their original form.
>
> I'm a bit confused (encoding issues always confuse me). If we only
> care about the width of the string for display purposes, doesn't this
> (also) depend on the encoding used by the console / terminal? How does
> that actually work: if you have a UTF-8 encoded file, and you 'cat' it
> to a terminal with LC_ALL=iso_8859_1 ... ?

Our cmdline output routines accept UTF-8 and try convert back to the
locale's native encoding before printing. If this conversion fails,
it falls back to svn_cmdline_cstring_from_utf8_fuzzy() which will
create some ASCII-representation of the data.

So what will happen in that case is that you'll see whatever unicode
character latin1 can represent as-is, while others are converted
in a fuzzy way. This might lead to mis-aligned side-by-side diff output.

However if you're trying to display unicode data on a terminal
that isn't unicode capable then such issues are the norm rather
then the exception.

In general, if your terminal can display your files, then the
side-by-side diff will also be shown properly. Else, the side-by-side
diff might look OK, or it might not, depending on how much longer
the "fuzzy" representation of the string really is.

Configure your locale properly and you want have an issue.
Received on 2012-07-09 16:48:41 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.