On 27.03.2012 17:29, Markus Schaber wrote:
> Hello, Branko,
>
>> Von: Branko Čibej [mailto:brane_at_xbc.nu] Im Auftrag von Branko Cibej
>>>> As others have pointed out, the only cases where compression and
>>>> packing may cause trouble is working copies with a lot of binary,
>>>> compressed files (.jars, various image formats, etc.) That's a
>>>> problem that needs solving regardless of the compression algorithm or
>>>> pack format.
>>> This can be handled with a not-compressed flag and storing the data as-
>>> is.
>> You're jumping a couple steps here, the first two obviously being, a),
>> identify which file formats should not be compressed, and b) figure out
>> how to detect such files reasonably reliably.
> That's an easy one: When the compressed file (including gzip-headers etc.) is not smaller than the original file, store it uncompressed. This can be enhanced with some percentage or watermark value (e. G. we need at least a reduction of 42 bytes or 2%).
If you're willing to waste the compression time, yes. We do this on the
server side to find out if it's better to store a given version of a
file as a delta to a previous version, or as a fulltext. I was never too
happy about the price of backing out from a wrong assumption, however.
There are much better ways, e.g., using file(1) or mime.magic (IIRC we
already have code for the latter), methods that actually look at the
file contents. Most compressed file formats will have a well-known
header, or some other discriminating mark (not file name/extension, I
might add) that's fairly cheap to check, and is good enough to be at
least an initial rough filter.
(N.B., this kind of check could, at some later day, be used on the
server to optimize contents storage, too.)
-- Brane
Received on 2012-03-27 21:29:36 CEST