[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Support for multibyte character encodings, particularly in the diff/merge code

From: Alastair Houghton <alastair_at_alastairs-place.net>
Date: 2005-06-24 15:36:00 CEST

Hi all,

I was wondering what the status was with regard to support for
multibyte encodings (and in particular those that don't look like
ASCII, such as UTF-16 or UCS-2)?

I could only find one issue reported on this subject, and that was
closed and marked invalid with an instruction to see the big red text
on the issue page.

The reason I'm asking is that I want to store some OS X .strings
files in Subversion; currently they are marked as binary and
Subversion refuses to diff them (and will probably not merge them by
itself). How difficult would it be to add the necessary code to
support such cases in the diff library?

I wouldn't mind if it didn't support cross-encoding diff/merge, as
that's complicated and users can easily do that kind of diff by using
iconv to convert both files to the same encoding first, but surely it
wouldn't be too hard to get it to support the same-encoding case?

I took a look at the code, and clearly diff_file.c has to change;
where else in the code would need changes to make this work?

On an ancillary note, will Subversion treat files as binary if the
svn:mime-type is set to something like

   text/plain; charset=UTF-16

Or are there plans to add svn:character-set or svn:encoding?

Kind regards,



  • application/pkcs7-signature attachment: smime.p7s
Received on Fri Jun 24 18:21:27 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.