[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Ensuring File Encoding

From: B. Smith-Mannschott <bsmith.occs_at_gmail.com>
Date: Thu, 1 Oct 2009 21:39:01 +0200

2009/10/1 B Smith-Mannschott <bsmith.occs_at_gmail.com>:
> 2009/10/1 David Weintraub <qazwart_at_gmail.com>:
>> We are beginning to have problems with file encoding. We want to ensure all files we commit are in fact encoded in UTF-8. I would like to add this ability in my pre-commit hook, and reject any commits which has files in it that aren't encoded in UTF-8 (well, text files). But I am not 100% sure how to test a file's encoding.
>> How can I test to see if a file is encoded in UTF-8?
> I just do something like this. works well enough in practice since not all possible byte sequences are vaild UTF-8.
> def looks_like_utf8(bytes):
> """Attempt to decode bytes under the assumption that they are
> UTF-8. Return False if this throws a UnicodeDecodeError, otherwise
> return True."""
> try:
> bytes.decode("UTF-8")
> except UnicodeDecodeError:
> return False
> else:
> return True
> def looks_like_utf8_file(path):
> return looks_like_utf8(file(path, "rb").read())

G*D D**N F***$^#&^! gmail. see attachment.


To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].

Received on 2009-10-01 21:39:53 CEST

This is an archived mail posted to the Subversion Users mailing list.