[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: svn commit: r1358022 - in /subversion/trunk: LICENSE NOTICE subversion/include/svn_utf.h subversion/libsvn_subr/utf_width.c subversion/svn/file-merge.c

From: Bert Huijben <bert_at_qqmail.nl>
Date: Mon, 9 Jul 2012 14:47:25 +0200

> -----Original Message-----
> From: stsp_at_apache.org [mailto:stsp_at_apache.org]
> Sent: vrijdag 6 juli 2012 04:21
> To: commits_at_subversion.apache.org
> Subject: svn commit: r1358022 - in /subversion/trunk: LICENSE NOTICE
> subversion/include/svn_utf.h subversion/libsvn_subr/utf_width.c
> subversion/svn/file-merge.c
>
> Author: stsp
> Date: Fri Jul 6 02:20:40 2012
> New Revision: 1358022
>
> URL: http://svn.apache.org/viewvc?rev=1358022&view=rev
> Log:
> Add support for determining the display width of a unicode string, and make
> use of this when trimming lines for display by the internal file merge tool.
>
> Based on suitably licensed code by Markus Kuhn:
> http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
>
> * LICENSE, NOTICE: Add Markus Kuhn copyright notices.
>
> * subversion/include/svn_utf.h
> (svn_utf_cstring_utf8_width): Declare.
>
> * subversion/libsvn_subr/utf_width.c: New file, implementation of the
> above
> newly declared function. This file is based on Markus Kuhn's code, and has
> been adapted to use APR types. The svn_utf_cstring_utf8_width() function
> was written from scratch and replaces Markus' mk_wcswidth() function.
>
> * subversion/svn/file-merge.c
> (MAX_LINE_DISPLAY_LEN): Rename to ...
> (LINE_DISPLAY_WIDTH): ... this, because it is not a maximum but a fixed
> length since all lines are either trimmed or padded to this length.
> (prepare_line_for_display): Use the new character display width support
> to properly detect the width of unicode characters (tested with a bunch
> of asian texts from wikipedia). Track rename of MAX_LINE_DISPLAY_LEN.

How do you check if the file you are merging is valid utf-8?

I assumed that we currently just passed files to the console mostly unmodified to allow the terminal to do the hard work.

I'm pretty sure that we can assume at least many (if not most) text files stored in Subversion are *not* utf-8 and will fail when tested for utf-8 validness.

How does this library handle non-utf8 strings?
(Just assuming width 1 would probably be safe for our usage, but it should never crash)

        Bert
Received on 2012-07-09 14:48:05 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.