On Wed, Mar 03, 2010 at 12:24:29PM -0500, Greg Stein wrote:
> You're talking about schemes to verify partial reads, yet stated you
> couldn't think of any cases.
I didn't say that. I'd very much like svn to verify data it
reads from the pristine store, on the fly, and point out corrupted
pristines to the user.
> You're talking about splitting files for certain filesystems to help with
> size limitations, yet the working file still has to be whole, and we have
> no/few? reports of size problems.
> IOW, you're just making up requirements and work.
Those are side-issues.
What Neels and I are trying to get rid of is the need for
locking when writing to the pristine store.
You missed the part of not storing data in an sqlite DB which will
never change once written.
We need to store the MD5 of every pristine somewhere, for instance.
If we do store this data in a DB, writing to the pristine store requires
synchronising access to the DB to keep the DB in a consistent state,
on top writing the pristine itself.
Writing the pristine itself is already lockless, and also writing
the MD5 while at it means we wouldn't need any locking.
> Putting data in the file means you have to *open* it to read the data.
We're opening and reading pristines anyway.
Reading pristines is disk i/o we cannot avoid.
The proposed scheme even minimises I/O in case we need only a chunk
near the end of a file: Seek across a few SHA1 checksums, read a SHA1
checksum, then open the pristine with that checksum, instead of seeking
an entire 16GB pristine until the right block has been found.
Granted, reading an entire huge pristine involves opening a number of
other prisitines. Not sure which is better.
> Again: we are centralizing in order to aggregate data and reduce I/O. Your
> idea defeats that goal.
Is writing another few bytes to the file slower than writing to the file
and then opening the DB and modifying the DB, possibly waiting for another
process to unlock the DB, so we can store the MD5?
Received on 2010-03-03 18:45:44 CET