[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Compressed Pristines (Design Doc)

From: Branko Čibej <brane_at_e-reka.si>
Date: Mon, 26 Mar 2012 12:52:14 +0200

On 26 March 2012 11:53, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
> 2012/3/25 Branko Čibej <brane_at_e-reka.si>:
>> On 22.03.2012 17:01, Branko Čibej wrote:
>>> On 22.03.2012 16:50, Daniel Shahaf wrote:
>>>> Branko Čibej wrote on Thu, Mar 22, 2012 at 16:37:24 +0100:
> [..]
>> Based on these observations, it's clear that the implementation should
>> proceed as follows:
>> Step 1: Just compress the pristine files, do not use any packing. This
>> gives a 60% decrease in disk usage in the HTTPD case, but even if the
>> decrease is only 30%, it's still worth the effort.
>> Step 2: Store small (for some definition of "small") compressed pristine
>> files in a SQLite database. In the case of HTTPD, this gives an exter up
>> to 90% savings in disk usage, but this is a very specific test case and
>> it's hard to guess what kind of gain we'd get on average.
> Makes sense for me. In that case we also benefit on performance (in
> case sqlite blob API has acceptable performance)
> And IMHO "small" should be really small (up to 4k) to prevent wc.db
> growing in size.

There's no requirement for putting pristines in the wc.db, it can
easily be a different database that's part of the same connection.
More to the point, in order to make using a database worthwhile, the
size limit shouldn't be /too/ low.

With a 4k filesystem block size, files up to 4k in size will have 50%
wasted on average; 8k files will waste 25%; and so on. My test
compared using 8k and 32k limits, and just increasing that limit added
an extra more than 50% space savings (on top of the already huge
savings of storing up-to-8k files in blobs) with no significant
difference in insertion times. (This last makes sense, as sqlite will
flush in multiples of page sizes, so the insertion times are really
proportional to the overall amount of data written. On average YMMV

-- Brane
Received on 2012-03-26 12:52:46 CEST

This is an archived mail posted to the Subversion Dev mailing list.