Re: Heuristic for detecting 'binary' data vs. 'text' data [was: FW: Generating a dump file using a powershell script]

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Tue, 22 Jun 2010 16:25:25 +0100

(I'm just changing the subject line.)
- Julian

On Tue, 2010-06-22 at 16:58 +0200, Bert Huijben wrote:
> > -----Original Message-----
> > From: Geoff Worboys [mailto:geoff_at_telesiscomputing.com.au]
> > Sent: dinsdag 22 juni 2010 16:37
> > To: users_at_subversion.apache.org
> > Subject: Generating a dump file using a powershell script
> >
>
> <snip>
>
> > Q2: When writing the code to try and identify text versus
> > binary files I decided to look at what subversion did ... but
> > now I am confused. In libsvn_subr\io.c function
> > svn_io_detect_mimetype2 a comment says:
> > going to examine the first block of data, and make sure that 85%
> > of the bytes are such that their value is in the ranges 0x07-0x0D
> > or 0x20-0x7F, and that 100% of those bytes is not 0x00.
> > but my reading of this code
> > if (((binary_count * 1000) / amt_read) > 850)
> > {
> > *mimetype = generic_binary;
> > return SVN_NO_ERROR;
> > }
> > suggests that it is actually setting the type to binary only
> > if it finds more than 85% are binary bytes (in earlier code a
> > file binary if forced if any null byte is found).
> >
> > Can anyone explain this? A bug or am I missing something?
>
> Looking at the code, this seems looks like a bug to me. But it's not a bug
> that I like to fix without further review, because the current code might
> work better then the intended behavior for users of different character
> sets.
>
> So it might be safer to just fix the documentation.
>
> Bert
Received on 2010-06-22 17:26:08 CEST

This message: [ Message body ]
Next message: C. Michael Pilato: "Re: svn commit: r956921 - /subversion/trunk/subversion/libsvn_fs_fs/structure"
Previous message: Bert Huijben: "FW: Generating a dump file using a powershell script"
In reply to: Bert Huijben: "FW: Generating a dump file using a powershell script"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]