[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Compressed Pristines (Summary)

From: Ashod Nakashian <ashodnakashian_at_yahoo.com>
Date: Sat, 31 Mar 2012 14:30:01 -0700 (PDT)

----- Original Message -----

> From: Branko Čibej <brane_at_apache.org>
> To: dev_at_subversion.apache.org
> Cc:
> Sent: Sunday, April 1, 2012 12:58 AM
> Subject: Re: Compressed Pristines (Summary)
>
> On 31.03.2012 18:16, Ashod Nakashian wrote:
>> So it's fair to say I'm ignorant about the details, but I must say
> this: A repository, precisely like Git pack files, don't necessarily need
> good (if at all) support of deletion. This is a very critical issue that I can
> see why it might not be obvious at first.
>
> Instead of repeating the obvious, I suggest you start reading here:
>
>     http://www.sqlite.org/pragma.html#pragma_auto_vacuum
>
> It's clear that, in order to optimize the pristine store, the pristine
> files should not be stored in wc.db but in a separate database (simply
> because auto-vacuum optimization will be different for wc.db and
> pristine.db). Apart from that, I still don't see how a custom pack
> format can do better in the short term than what SQLite already does.

In the short term, it probably won't do any better (especially for small WC's). In the long term, we'll have control over both organization/defragmentation, compression, and virtually all operations and behaviors. As for the PS residing in a separate db, I think there is no question about that - it should be separate from wc.db.

>
>> Git can keep deleted items until git-gc is invoked, should we support
> something similar, we need to be consistent and probably support arbitrary
> revision history, which is out of scope.
>
> I'm confused: how does revision history affect the pristine store?

If the pristine store also keeps multiple revisions, then it's a whole different set of features than what we are aiming for (at least for compressed pristines).

>
>>   Sqlite (which internally uses a b-tree pointing to fixed-size pages that
> overflow using linked-lists) is designed for fast
> additions/modifications/deletions of typically tiny data (a row is reasonably
> assumed to be -much- less than a page in most cases)
>
> Are you quite sure about that? Certainly, the /keys/ need to be much
> smaller than a page size in order for the B+-tree implementation to be
> reasonably efficient, but I can't see how that can be the case for
> BLOBs, which are treated differently all the way from SQL semantics
> level to the C API, and aren't keys.

Yes, I'm fairly certain. There are b-tree specific pages that hold the b-tree information. See http://www.sqlite.org/fileformat2.html%c2%a0for more details. But again, I'd like to have hard numbers than to speculate based on theory.

>
>>   and *without* promising a compact footprint, which we dearly care about.
>
> Not all the time. It's OK to make it "compact" only during
> "svn
> cleanup", and if we add a "--gc" option to that (which would, I
> expect,
> invoke SQLite's VACUUM command), then the user will clearly understand
> that they're trading time for space.

Fair enough, provided that's a reasonable compromise that is documented and given.

>
>>   We will be doing the same on KBytes worth of data for each entry. This is
> something that we must certainly research more with actual data. However in my
> mind our
>>   use-case is quite different from what Sqlite is designed to do best, which
> is why I'm suggesting we do some benchmarking if we go with Sqlite.
>>
>> Just wanted to make this clear just to be sure we're not talking cross
> purposes at this point.
>
> I suspect that benchmarking for its own sake is not worth the trouble at
> this point. Just go and start implementing the proposal, it'll be a lot
> easer to benchmark once the client actually uses the compressed, packed
> pristine store -- because you'll be able to use real-world datasets, not
> contrived ones.

Yes, I meant benchmarking actual implementation to collect more data to decide how to proceed.

>
> Since we now have a set of performance tests, it might not be a bad idea
> to incorporate compressed/uncompressed pristine in the comparison charts.

Absolutely, agreed.

-Ash

>
> -- Brane
>
Received on 2012-04-01 01:17:19 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.