2009/10/1 B Smith-Mannschott <bsmith.occs_at_gmail.com>:
>
>
> 2009/10/1 David Weintraub <qazwart_at_gmail.com>:
>> We are beginning to have problems with file encoding. We want to ensure all files we commit are in fact encoded in UTF-8. I would like to add this ability in my pre-commit hook, and reject any commits which has files in it that aren't encoded in UTF-8 (well, text files). But I am not 100% sure how to test a file's encoding.
>>
>> How can I test to see if a file is encoded in UTF-8?
>
> I just do something like this. works well enough in practice since not all possible byte sequences are vaild UTF-8.
>
> def looks_like_utf8(bytes):
> """Attempt to decode bytes under the assumption that they are
> UTF-8. Return False if this throws a UnicodeDecodeError, otherwise
> return True."""
> try:
> bytes.decode("UTF-8")
> except UnicodeDecodeError:
> return False
> else:
> return True
>
> def looks_like_utf8_file(path):
> return looks_like_utf8(file(path, "rb").read())
G*D D**N F***$^#&^! gmail. see attachment.
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2402662
To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-10-01 21:39:53 CEST