[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Enforcing UTF-8 Coding

From: B Smith-Mannschott <bsmith.occs_at_gmail.com>
Date: Wed, 17 Dec 2008 00:16:48 +0100

On Tue, Dec 16, 2008 at 10:08 PM, David Weintraub <qazwart_at_gmail.com> wrote:

> I've been given the order to make sure all of our files are encoded in
> UTF-8. There are several problems: First of all, we use Eclipse on
> Windows as our development platform, and the default in that
> application is to use the Windows code page, and setting that is up to
> the user.
>
> I could try compiling Java with the --encoding parameter. In Java 1.6,
> the compile will fail when it hits a character that isn't property
> encoded. Actually, it fails when it finds a character it doesn't
> understand how it is encoded. The encoding could be wrong, so it is
> set to the wrong charater, but there's no way for the compiler to know
> that.
>
> Now comes the question of our HTML, CSS, XML, JavaScript, and XSL
> files. Since the compiler doesn't run through these, how can I ensure
> that these are property encoded too?
>
> If I had a way of checking the encoding of files, I could write a
> pre-commit hook to fail the build if the encoding on a file is wrong,
> but how do I do this?
>

What I do in my hook script, is something like this:

def probably_utf8(file_like):
    try:
        file_like.read().decode("UTF-8")
    except:
        return False
    else:
        return True

This won't catch every theoretically possible violation, but it's more than
good enough to keep everyone honest.

// ben

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=985295

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2008-12-17 00:17:47 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.