[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Compressed Pristines (Summary)

From: Branko Čibej <brane_at_apache.org>
Date: Sat, 31 Mar 2012 21:58:50 +0200

On 31.03.2012 18:16, Ashod Nakashian wrote:
> So it's fair to say I'm ignorant about the details, but I must say this: A repository, precisely like Git pack files, don't necessarily need good (if at all) support of deletion. This is a very critical issue that I can see why it might not be obvious at first.

Instead of repeating the obvious, I suggest you start reading here:

    http://www.sqlite.org/pragma.html#pragma_auto_vacuum

It's clear that, in order to optimize the pristine store, the pristine
files should not be stored in wc.db but in a separate database (simply
because auto-vacuum optimization will be different for wc.db and
pristine.db). Apart from that, I still don't see how a custom pack
format can do better in the short term than what SQLite already does.

> Git can keep deleted items until git-gc is invoked, should we support something similar, we need to be consistent and probably support arbitrary revision history, which is out of scope.

I'm confused: how does revision history affect the pristine store?

> Sqlite (which internally uses a b-tree pointing to fixed-size pages that overflow using linked-lists) is designed for fast additions/modifications/deletions of typically tiny data (a row is reasonably assumed to be -much- less than a page in most cases)

Are you quite sure about that? Certainly, the /keys/ need to be much
smaller than a page size in order for the B+-tree implementation to be
reasonably efficient, but I can't see how that can be the case for
BLOBs, which are treated differently all the way from SQL semantics
level to the C API, and aren't keys.

> and *without* promising a compact footprint, which we dearly care about.

Not all the time. It's OK to make it "compact" only during "svn
cleanup", and if we add a "--gc" option to that (which would, I expect,
invoke SQLite's VACUUM command), then the user will clearly understand
that they're trading time for space.

> We will be doing the same on KBytes worth of data for each entry. This is something that we must certainly research more with actual data. However in my mind our
> use-case is quite different from what Sqlite is designed to do best, which is why I'm suggesting we do some benchmarking if we go with Sqlite.
>
> Just wanted to make this clear just to be sure we're not talking cross purposes at this point.

I suspect that benchmarking for its own sake is not worth the trouble at
this point. Just go and start implementing the proposal, it'll be a lot
easer to benchmark once the client actually uses the compressed, packed
pristine store -- because you'll be able to use real-world datasets, not
contrived ones.

Since we now have a set of performance tests, it might not be a bad idea
to incorporate compressed/uncompressed pristine in the comparison charts.

-- Brane
Received on 2012-04-01 01:58:59 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.