[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1358022 - in /subversion/trunk: LICENSE NOTICE subversion/include/svn_utf.h subversion/libsvn_subr/utf_width.c subversion/svn/file-merge.c

From: Stefan Sperling <stsp_at_apache.org>
Date: Mon, 9 Jul 2012 15:30:35 +0200

On Mon, Jul 09, 2012 at 02:47:25PM +0200, Bert Huijben wrote:
> How do you check if the file you are merging is valid utf-8?

See the merge_chunks() function.

We convert data to UTF-8 from the native (locale) encoding.
This cannot fail (every encoding can be represented in UTF-8)
but the result might look funny in case the file uses some other encoding
than the native one. But that's OK -- this conversion happens only for
display purposes, data in the actual file is never changed, so you can
still edit individual chunks in their original form.

> I assumed that we currently just passed files to the console mostly unmodified to allow the terminal to do the hard work.

That works fine as long as you don't care about the width of the
line you're printing.

For the side-by-side display we make an effort to make it look nice.
If that doesn't work, the side-by-side display might look strange
because lines appear with varying lengths. That is the fallback mode
which assumes width=1 and one-byte-per-character for all characters.
 
> I'm pretty sure that we can assume at least many (if not most) text files stored in Subversion are *not* utf-8 and will fail when tested for utf-8 validness.
>
> How does this library handle non-utf8 strings?

You mean the svn_utf_cstring_utf8_width() function? It will return
an error for invalid UTF-8.

In our usage of this API, the UTF-8 validness check in is performed
on data that the merge tool has converted to UTF-8. The API must fail for
invalid UTF-8 input since it cannot convert such input to UTF-32 in
order to run mk_wcwidth() on it.

Again, this is in-memory data which we're going to display to the user
in a formatted way to so we need to know its width.
None of this has anything to do with any versioned data in files.
Received on 2012-07-09 15:31:14 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.