[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: SVN Blame Returns Corrupt Data

From: Stefan Sperling <stsp_at_elego.de>
Date: Fri, 11 Oct 2013 19:25:19 +0200

On Fri, Oct 11, 2013 at 09:52:31AM -0700, Ben Reser wrote:
> On 10/11/13 9:22 AM, Branko Čibej wrote:
> > You'd have to extend Subversion's file type detection to detect UTF-16.
> > See svn_io_detect_mimetype2 in line 3333 in this file:
> >
> > http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/io.c?view=markup
> > Subversion currently only looks at the first 1k Bytes of a file. It may
> > be enough to check that this initial part of the file contains only
> > valid UTF-16 (BE or LE) codes.
>
> Even if all we looked for is the BOM it might be helpful enough. I suspect the
> development tools producing UTF-16 are including BOMs. Windows seems to be
> fond of including them, Notepad puts one even on UTF-8.

Couldn't Subversion automatically convert UTF-16 files to UTF-8 before
processing them for diff/merge/blame, and convert output written to
the original files back to UTF-16?

That would require some work because existing streams, strings, and files
passed around in the code would need to be wrapped so that translation
to/from the internal from/to the external encoding is seamless.

But I don't see why such an approach couldn't be made to work in principle.
It might even result in some spring cleaning in the code base and pave the
way for improved handling of file formats such as XML for diff and merge.

What do you think? Is it worth adding this to our project ideas page?
Received on 2013-10-11 19:26:06 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.