[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Ascii/binary detection.

From: <cmpilato_at_collab.net>
Date: 2001-08-01 16:39:11 CEST

Branko =?ISO-8859-2?Q?=C8ibej?= <brane@xbc.nu> writes:

> Got a big, fat, orange fluorescent marker here, hur hur hur!

Yeah-heh!

> Right. Note that when we were discussing this (damn, don't have that
> archive any more ...), someone pointed out that the "text/*" mime types
> actually imply CRLF line endings. But I think we can safely ignore that;
> Subversion is not an MUA.

Indeed, and I confirmed the CRLF thing yesterday reading RFCs. Our
plan, I think is to encourage folks to use text/unknown for textfiles
whose line endings are unimportant. Whatcha think?

> (Two suggestions: a) don't mark the file as binary just because there's
> a byte with value >= 128 in it; b) if other tests aren't conclusive,
> check for extremely long lines in the file?)

Good suggestions. I've done no research into common heuristics on
this. I think one used by some text/hex editors is to see if some
percentage of the bytes is >= 128 or 0. 35% seems to ring a bell, but
whatever. The point is that some work needs to be done to create the
Subversion Binariness Heuristic, and your suggestions are good ones.

> >2. During `svn add', svn_io_is_binary_file () is called (only on
> > files, of course). If it returns TRUE, the property
> > `svn:mime-type' is set on the file with a value of
> > `application/octet-stream'.
> >
> What do you think about following the HTTP convention here? Call the
> property svn:content-type, and encode the character set, too? Not that
> we'll do anything with that info in 1.0.

I definitely thought about that while reading RFCs, and am certainly
not opposed to it.

> Just have 'svn add' accept --text and --binary, and possibly
> --content-type=..., --end-of-line=...

Ah...yes...

> Why not keyword substitution? Just make it off by default. If the user
> wants keyword substitution in binary files, we cna always let him shoot
> himself in the foot. Besides, it can actually make sense in some kinds
> of binary formats.

Hm. While I personally agree with you, I'm sure Karl "Ambassador For
the Little User Guy" Fogel will have something to say about this. But
not until Thursday when he's back at work, so let's implement it now! :-)

> Um. I'd rather use 'none' (':', if you accept the idea outlined above),
> and make 'native' the default for text files. Oh, and we have to
> prescribe the repository's native format, so that we can send deltas
> back and forth.

I'd like to stick with the 'none' being implied by absence of the
property, but I agree that 'native' should be the default for
textfiles.

Anyway, we have some time on this (it's not an M3 requirement). Any
other feedback (or, uh, code contributions...) are welcome.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:34 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.