On 2007-08-14 16:43:19 +0200, Erik Huelsmann wrote:
> On 8/14/07, Vincent Lefevre <email@example.com> wrote:
> > On 2007-08-14 16:25:10 +0200, Erik Huelsmann wrote:
> > > There's an algorithm to estimate whether files are binary or texty:
> > >
> > > Check the first 1024 bytes to be within the 020-0x7F and 0x07-0x0D
> > > regions. If more than 85% of the bytes fall in that region (and none
> > > were 0x00), then the file is probably texty.
> > I wonder if non-occidental users would agree with you.
> They don't have to. This is what currently defines texty and we've had
> had no complaints. It's based on what diff thinks what's texty.
This is not true (see below).
> > And what about UTF-16?
> There's no support for wide characters in the built-in diff routine.
> You can use external diff routines, or provide a patch to support
There have been some complaints concerning UTF-16 (but the threads also
mention the problem of UTF-8 sometimes being recognized as binary), and
there's even an open issue:
> > One can have compressed XML files with text/xml mime-type. How does
> > Subversion handle that?
> As incorrectly as the mime-type. Clearly a compressed XML file isn't
> text. More appropriate seems application/xml. Or even
No, this is wrong. For instance, see /etc/mime.types distributed in
# Note: Compression schemes like "gzip", "bzip", and "compress" are not
# actually "mime-types". They are "encodings" and hence must _not_ have
# entries in this file to map their extensions. The "mime-type" of an
# encoded file refers to the type of data that has been encoded, not the
# type of encoding.
Apache behaves the same way: the compression is declared in a separate
header (Content-Encoding). That's HTTP/1.1 (RFC 2616) after all...
> > Also, for instance, is text/rtf more textual than application/x-sh
> > as far as diff is concerned?
> Yes, because it doesn't have a text/* mime-type.
But that's wrong: doing a textual diff on sh scripts makes more sense
than doing one on RTF files. Again, there have been several complaints.
Vincent Lefèvre <vincent_at_vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Wed Aug 15 00:47:18 2007