[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: FSFS format7 and compressed XML bundles

From: Ben Reser <ben_at_reser.org>
Date: Thu, 28 Feb 2013 10:45:14 -0800

On Thu, Feb 28, 2013 at 8:28 AM, Mark Phippard <markphip_at_gmail.com> wrote:
> FWIW, the Branch Readme does imply he intends to work on some things that
> might have an impact here. Specifically:
>
> TxDelta v2
> ----------
>
> Version 1 of txdelta turns out to be limited in its effectiveness for
> larger files when data gets inserted or removed. For typical office
> documents (zip files), deltification often becomes ineffective.
>
> Version 2 shall introduce the following changes:
>
> - increase the delta window from 100kB to 1MB
> - use a sliding window instead of a fixed-sized one
> - use a slightly more efficient instruction encoding

I think that the office documents example in the above is a really
poor example for what he's looking at doing. An adaptively compressed
file is unlikely to be stored more efficiently just because of changes
to the delta windows. The instruction encoding will help ever so
slightly but not enough to be of any consequence tot he issue at hand
here.

> Large file storage
> ------------------
>
> Even most source code repositories contain large, hard to compress,
> hard to deltify binaries. Reconstructing their content becomes very I/O
> intense and it "dilutes" the data in our pack files. The latter makes
> e.g. caching, prefetching and packing less efficient.
>
> Once a representation exceeds a certain configured threshold (16M default),
> the fulltext of that item will be stored in a separate file. This will
> be marked in the representation_t by an extra flag and future reps will
> not be deltified against it. From that location, the data can be forwarded
> directly via SendFile and the fulltext caches will not be used for it.
>
> Note that by making the decision contingent upon the size of the deltified
> and packed representation, all large data that benefits from these will
> still be stored within the rev and pack files.

This would help more than the previously mentioned changes because
there won't be additional overhead from our deltification which is
made inefficient by the fact the file is already compressed. However,
you'll still be stuck with storing full texts of each revision. Which
again is not what I think in the end is a desired outcome. But only
if the file is over 16MB. I'm not sure how often that applies to
these files.
Received on 2013-02-28 19:45:51 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.