[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Compressed Pristines (Summary)

From: Ashod Nakashian <ashodnakashian_at_yahoo.com>
Date: Sat, 31 Mar 2012 14:30:01 -0700 (PDT)

----- Original Message -----

> From: Branko ÄŒibej <brane_at_apache.org>
> To: dev_at_subversion.apache.org
> Cc:
> Sent: Sunday, April 1, 2012 12:58 AM
> Subject: Re: Compressed Pristines (Summary)
>
> On 31.03.2012 18:16, Ashod Nakashian wrote:
>> So it's fair to say I'm ignorant about the details, but I must say
> this: A repository, precisely like Git pack files, don't necessarily need
> good (if at all) support of deletion. This is a very critical issue that I can
> see why it might not be obvious at first.
>
> Instead of repeating the obvious, I suggest you start reading here:
>
>     http://www.sqlite.org/pragma.html#pragma_auto_vacuum
>
> It's clear that, in order to optimize the pristine store, the pristine
> files should not be stored in wc.db but in a separate database (simply
> because auto-vacuum optimization will be different for wc.db and
> pristine.db). Apart from that, I still don't see how a custom pack
> format can do better in the short term than what SQLite already does.
> 

In the short term, it probably won't do any better (especially for small WC's). In the long term, we'll have control over both organization/defragmentation, compression, and virtually all operations and behaviors. As for the PS residing in a separate db, I think there is no question about that - it should be separate from wc.db.

>
>> Git can keep deleted items until git-gc is invoked, should we support
> something similar, we need to be consistent and probably support arbitrary
> revision history, which is out of scope.
>
> I'm confused: how does revision history affect the pristine store?

If the pristine store also keeps multiple revisions, then it's a whole different set of features than what we are aiming for (at least for compressed pristines).

>
>>   Sqlite (which internally uses a b-tree pointing to fixed-size pages that
> overflow using linked-lists) is designed for fast
> additions/modifications/deletions of typically tiny data (a row is reasonably
> assumed to be -much- less than a page in most cases)
>
> Are you quite sure about that? Certainly, the /keys/ need to be much
> smaller than a page size in order for the B+-tree implementation to be
> reasonably efficient, but I can't see how that can be the case for
> BLOBs, which are treated differently all the way from SQL semantics
> level to the C API, and aren't keys.

Yes, I'm fairly certain. There are b-tree specific pages that hold the b-tree information. See http://www.sqlite.org/fileformat2.html%c2%a0for more details. But again, I'd like to have hard numbers than to speculate based on theory.

>
>>   and *without* promising a compact footprint, which we dearly care about.
>
> Not all the time. It's OK to make it "compact" only during
> "svn
> cleanup", and if we add a "--gc" option to that (which would, I
> expect,
> invoke SQLite's VACUUM command), then the user will clearly understand
> that they're trading time for space.

Fair enough, provided that's a reasonable compromise that is documented and given.

>
>>   We will be doing the same on KBytes worth of data for each entry. This is
> something that we must certainly research more with actual data. However in my
> mind our
>>   use-case is quite different from what Sqlite is designed to do best, which
> is why I'm suggesting we do some benchmarking if we go with Sqlite.
>>
>> Just wanted to make this clear just to be sure we're not talking cross
> purposes at this point.
>
> I suspect that benchmarking for its own sake is not worth the trouble at
> this point. Just go and start implementing the proposal, it'll be a lot
> easer to benchmark once the client actually uses the compressed, packed
> pristine store -- because you'll be able to use real-world datasets, not
> contrived ones.

Yes, I meant benchmarking actual implementation to collect more data to decide how to proceed.

>
> Since we now have a set of performance tests, it might not be a bad idea
> to incorporate compressed/uncompressed pristine in the comparison charts.

Absolutely, agreed.

-Ash

>
> -- Brane
>
Received on 2012-04-01 01:17:19 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.