[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Subversion binary file detection is look like broken

From: Stefan Sperling <stsp_at_elego.de>
Date: Fri, 3 Oct 2014 13:35:01 +0200

On Fri, Oct 03, 2014 at 11:26:32AM +0400, Navrotskiy Artem wrote:
> Hello,
>
>
>
> Subversion console client try to detect binary file with algorythm:
>
> 1. File is NOT BINARY if it contains only BOM UTF-8 signature (why not
> check as first N bytes is corret UTF-8?);
> 2. File is BINARY if first 1024 bytes contains ZERO byte (uniform
> distribution of bytes takes change of absent ZERO byte: (1 - 1 /
> 256) ^ 1024 = ~1.8%);
> 3. File is BINARY if first 1024 bytes contains over 85% of characters
> not in range 0x07-0x0D, 0x20-0x7F (total we have 153 "binary"
> bytes, ~60%).
>
> This algoritm looks like broken.
>

Can you suggest a better algoritm?
Received on 2014-10-03 13:35:33 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.