[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: AW: Compressed Pristines (Call for Vote)

From: Greg Stein <gstein_at_gmail.com>
Date: Tue, 27 Mar 2012 16:00:04 -0400

On Tue, Mar 27, 2012 at 15:29, Branko Čibej <brane_at_apache.org> wrote:
> On 27.03.2012 17:29, Markus Schaber wrote:
>...
>> That's an easy one: When the compressed file (including gzip-headers etc.) is not smaller than the original file, store it uncompressed. This can be enhanced with some percentage or watermark value (e. G. we need at least a reduction of 42 bytes or 2%).

Actually, I would suggest that any file that has an *original* size of
<8k (or some other cutoff), then it just goes into SQLite. Maybe we
compress it in memory, and store the result (since we can't stream the
compression into a SQLite row).

What I'd like to avoid is compress-first, *then* examining the size.
Especially where that operation takes place on disk.

If we say "oh. larger than 8k. compress streamily onto disk." And
conversely we say, "less than 8k. compress into memory. store in
SQLite."

>...
> There are much better ways, e.g., using file(1) or mime.magic (IIRC we
> already have code for the latter), methods that actually look at the
> file contents. Most compressed file formats will have a well-known
> header, or some other discriminating mark (not file name/extension, I
> might add) that's fairly cheap to check, and is good enough to be at
> least an initial rough filter.

Yup. Please see include/private/svn_magic.h. Stefan dropped that in a
while back.

I'd suggest we use it before *any* compression (whether into memory or
onto disk). We can then hard-code certain mime types that are
compressable (or not). *Maybe* provide some escape hatches via
configuration (but y'll know how I hate more user knobs).

Cheers,
-g
Received on 2012-03-27 22:00:37 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.