[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: pristine store design

From: Greg Stein <gstein_at_gmail.com>
Date: Wed, 3 Mar 2010 13:00:09 -0500

I didn't miss any part.

You're optimizing writes, when you should worry about reads.

The DB is always open, so reading and writing to it is "cheap".

I don't care about a scheme to seek to the end of a 16Gb chunk. You're
making stuff up again.

On Mar 3, 2010 9:45 AM, "Stefan Sperling" <stsp_at_elego.de> wrote:

On Wed, Mar 03, 2010 at 12:24:29PM -0500, Greg Stein wrote:
> You're talking about schemes to verify...
I didn't say that. I'd very much like svn to verify data it
reads from the pristine store, on the fly, and point out corrupted
pristines to the user.

> You're talking about splitting files for certain filesystems to help with
> size limitations, yet...
Those are side-issues.

What Neels and I are trying to get rid of is the need for
locking when writing to the pristine store.

You missed the part of not storing data in an sqlite DB which will
never change once written.
We need to store the MD5 of every pristine somewhere, for instance.
If we do store this data in a DB, writing to the pristine store requires
synchronising access to the DB to keep the DB in a consistent state,
on top writing the pristine itself.

Writing the pristine itself is already lockless, and also writing
the MD5 while at it means we wouldn't need any locking.

> Putting data in the file means you have to *open* it to read the data.
We're opening and reading pristines anyway.
Reading pristines is disk i/o we cannot avoid.

The proposed scheme even minimises I/O in case we need only a chunk
near the end of a file: Seek across a few SHA1 checksums, read a SHA1
checksum, then open the pristine with that checksum, instead of seeking
an entire 16GB pristine until the right block has been found.

Granted, reading an entire huge pristine involves opening a number of
other prisitines. Not sure which is better.

> Again: we are centralizing in order to aggregate data and reduce I/O. Your
> idea defeats that go...
Is writing another few bytes to the file slower than writing to the file
and then opening the DB and modifying the DB, possibly waiting for another
process to unlock the DB, so we can store the MD5?

Stefan
Received on 2010-03-03 19:00:44 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.