[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: FSFS format7 and compressed XML bundles

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Sat, 2 Mar 2013 01:49:45 +0000 (GMT)

> Vincent Lefevre wrote:
>> Ben Reser wrote:
>>> It resets the compression algorithm every 1000 bytes and thus makes
>>> blocks that can be saved between revisions of the file.
>>
>> Wouldn't this work only when data are appended to the file?

>> If data are inserted or deleted, this would change the block
>> boundaries. Instead of fixed-length blocks, I'd rather see
>> boundaries based on the file contents.
>
> That's true, the compression blocks are fixed.

No, that's not true.  I think the article Ben read was inaccurate.  The '--rsyncable' option doesn't reset the compression after a fixed number of bytes, but rather at every point where a rolling checksum of the last N bytes leading up to that point has a certain value.  It will resynchronize after an insertion or deletion.  The intervals between resets are irregular but deterministic.

Here's an old but readable description and proof-of-concept: <http://svana.org/kleptog/rgzip.html>.

Here's an announcement of implementation in pigz: <http://mail.zlib.net/pipermail/pigz-announce_zlib.net/2012-January/000003.html>.  It's described in more detail in a big comment near the beginning of 'pigz.c' in the source tarball available at <http://zlib.net/pigz/>.

Philip Martin wrote:
> Julian Foad <julianfoad_at_btopenworld.com> writes:
>
>> Yes, a client-side plug-in -- either to Subversion or to OpenOffice --
>> seems to me the best practical solution.
>
> A server-side solution is difficult.  Suppose the client has some
> uncompressed content U which it compresses to C and sends to the server.
> The server can uncompress C to get U but unless the compression scheme
> has a canonical compressed form, with no other forms allowed, the server
> cannot avoid storing C because there is no guarantee that C can be
> reconstructed from U.

Yes, a server-side solution would have lots of problems including that one.  Scalability is another -- keeping the server up to date with plug-ins for all (or most) of the compressed content types that the clients are using.

A client-side plug-in does not have those problems, at least not to the same extent.  It does have its own problems, though, including installation & configuration & portability issues.

- Julian
Received on 2013-03-02 02:50:21 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.