[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: fsfs7 structure-indexes - questions

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Mon, 1 Jun 2015 12:39:45 +0200

On Mon, May 18, 2015 at 1:20 AM, Daniel Shahaf <d.s_at_daniel.shahaf.name>
wrote:

> Good morning Stefan,
>
> A couple of questions on the fsfs7 indexes doc:
>
> [[[
> --- subversion/libsvn_fs_fs/structure-indexes
> +++ subversion/libsvn_fs_fs/structure-indexes
> @@ -18,6 +18,9 @@ a simple concatenation of runtime structs and as such,
> an implementation
> detail subject to change. A proto index basically aggregates all the
> information that must later be transformed into the final index.
>
> +### "Subject to change?" The format must be stable & documented since
> +### "begin txn, modify txn, reboot, upgrade svn, commit txn" should work.
> +
>

That's an outdated section. All index structures,
including the proto-indexes are now well-defined
and platform-independent.

> General design concerns
> -----------------------
> @@ -346,7 +349,10 @@ For performance reasons we use a modified version:
> * combine the big endian representation of these checksums plus the
> remnant of the original stream into a 12 to 15 byte long intermediate
>
> [i0 .. iK], 12 <= K+1 <= 15
>
> +### "combine" is unclear. Combine how? Concatenate? Interleave? Add?
> Multiply?
> +
>

Concatenated into a 16 to 19 byte sequence
(so there is also a factual error).

> * FNV checksum = fnv_1a([i0 .. iK]) in big endian representation
>
> +### Why do we checksum the output of a checksum algorithm? (Bytes i0..iK
> are themselves FNV output)
>

Not all are FNV output, they also include up to
3 "odd" bytes from the end of the original sequence.
But the main reason is that we want a short / space
efficient checksum to keep the index size small.

A 1-in-a-billion false negative for a random bit corruption
is good enough for a quick low-level check. It is on par
with what e.g. BTRFS does.

]]]
>
> Also, I suggest to change "zero to many" to "zero or more" to avoid
> confusion
> with the term "one-to-many" of relational databases.
>

Yeah, that should be changed as well.

Thanks for the review! r1692739 addresses the first
3 points and r1692964 the last one.

-- Stefan^2.
Received on 2015-06-01 12:40:03 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.