[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Compressed Pristines (Summary)

From: Ashod Nakashian <ashodnakashian_at_yahoo.com>
Date: Wed, 4 Apr 2012 11:38:57 -0700 (PDT)

>________________________________
> From: Mark Phippard <markphip_at_gmail.com>
>To: Ashod Nakashian <ashodnakashian_at_yahoo.com> 
>Cc: Daniel Shahaf <danielsh_at_elego.de>; Markus Schaber <m.schaber_at_3s-software.com>; "julianfoad_at_btopenworld.com" <julianfoad_at_btopenworld.com>; "mtherieau_at_gmail.com" <mtherieau_at_gmail.com>; Subversion Development <dev_at_subversion.apache.org> 
>Sent: Wednesday, April 4, 2012 9:23 PM
>Subject: Re: Compressed Pristines (Summary)

>On Wed, Apr 4, 2012 at 1:18 PM, Ashod Nakashian
><ashodnakashian_at_yahoo.com> wrote:
>
>> That's an easy question. The answer is that at *best* they'll do as good as in-place compression. However, in practice
>> they'll do much worse. The reason is that the OS level compression works on not only the single file level, but actually
>> at the block level. This is to make modifications reasonably fast (read compressed data, uncompress, modify, write
>> recompressed data). If the complete file is compressed then even changing a single byte (neglecting that no storage
>> works on the byte-level anyway) will yield performance that will at least linearly degrade by the filesize.
>
>FWIW, that is exactly my concern with your custom file format.  I do
>not see how you can achieve the benefits you expect without needing to
>repack files and I do not see how that can perform reasonably.

That's the tricky part of course. To attack this problem we need to strike a balance between pack size and how aggressively we repack to regain wasted "holes". It's not difficult to find a good middle-ground because working with a few MBs is reasonably fast (please see the estimations on size/speed in the proposal) and the waste of a full block is negligible for a file of even 1MB of size. I'm oversimplifying to convey a point: we don't need optimality, we need a practical approach that yields the biggest bang for our buck. And as far as that goes, I'm in agreement with the sentiment of settling for the easiest solution that gets us the farthest. It's just that we haven't yet reached consensus on what that is! :-)

>
>That said, you also seem aware that the solution has to perform well
>so at worst it is just a question as to whether you want to spend the
>cycles to prove it can work and achieve all the goals.  I am skeptical
>but look forward to being wrong.
>

>The lazy part of me thinks storing files up to 32KB in SQLite and
>storing the rest as just single compressed files would give 99% of our
>users what they want and would be less likely to have issues.

I have to agree with you here. We just need to get working on having something that can actually work and verified to meet this goal. If we can do that *with decent performance* then we have a clear winner. 

-Ash

>
>-- 
>Thanks
>
>Mark Phippard
>http://markphip.blogspot.com/
>
>

Received on 2012-04-04 20:39:30 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.