[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Temporary implementation of 'blame of UTF-16 file'

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Thu, 2 Aug 2012 00:38:48 +0200

On Mon, Jul 30, 2012 at 1:44 AM, ±è¼ºÈÆ2 [duinggul] <duinggul_at_nexon.co.kr> wrote:
> Hi, I¡¯m using Subversion everyday. :-)
> Recently, I converted native text files ( in my company¡¯s project repository
> ) to UTF-16 files.
> The problem was that Subversion library does not support ¡®blame of UTF-16
> file¡¯ currently.
> So I checked out Subversion¡¯s source code
> And modified ¡®libsvn_client\blame.c¡¯ file to support ¡®blame of UTF-16 file¡¯.
> Brief idea of implementation is :
> Export current file ( in svn temp directory ) to UTF-8 file if current
> file¡¯s format is UTF-16,
> shortly before processing blame for current file.
> It¡¯s a temporary implementation rather than formal implementation,
> But I think the implementation can be used temporarily before formal
> implementation of ¡®blame of UTF-16 file¡¯ is made. :-)
> So I attach ¡®blame.c¡¯ file for reference. J

[ Could you send a patch against trunk, instead of the full blame.c
file? Please also take a look at
http://subversion.apache.org/docs/community-guide/ in general and at
in particular. I'm continuing below for the sake of having some
discussion around this, regardless of the details of your patch. ]

I think yours is an interesting approach, at least worth some
discussion :-). I'm not a UTF-16 user myself, and I'm definitely not
an expert in encoding matters, but I can certainly empathize with
attempts to make subversion work nicely with UTF-16.

There seems to have been some discussion around UTF-16 support in SVN
in 2005, after this issue was filed:
http://subversion.tigris.org/issues/show_bug.cgi?id=2194 (Support
Unicode encodings other than UTF8 as plain text). The issue links to a
couple of old discussion threads.

For instance, there is this thread which highlights a couple of areas
where specific support would have to be added:
* Diff
* Merge
* Keyword expansion
* Newline conversion
* Text/binary discrimination
... any others not thought about here?

Here you are taking on "blame" (which is really just a series of
"diff's"). It's interesting that this can be done with so little
effort, just by performing conversion-to-UTF8 at the client layer.

I'm not sure myself if that's an appropriate solution, even for a
"temporary solution". But on the other hand, it seems there has been
zero progress on this issue since 2005, so it might be worth it to
think outside the box, and to look at some lightweight approaches that
can give UTF-16 users some improvement. If this can be done
incrementally, by adding support for specific subcommands by adding
some conversions ... why not?

Received on 2012-08-02 00:39:43 CEST

This is an archived mail posted to the Subversion Dev mailing list.