Re: FSFS successor ID design draft

From: Stefan Sperling <stsp_at_elego.de>
Date: Mon, 5 Sep 2011 11:38:11 +0200

On Sun, Sep 04, 2011 at 02:23:28AM +0300, Daniel Shahaf wrote:
> > Each node-revision map file stores a mapping of the form
> > node_rev_id => [revnum, revnum, ...]
> > The revision numbers identify those revisions in which one or more
> > successors of a given node-revision were added.
> >
> > In the first iteration of this design, the mapping is stored as lines
> > of ASCII text. Each line contains an unparsed node_revision_id, a space
> > separator (ASCII 0x20), and a revision number in ASCII decimal representation.
> > (The design may be refined later to use a more efficient storage model.)
> >
>
> Some wild ideas here:
>
> - Multiple, concatenated DELTA\n..ENDREP\n svndiff streams. That's
> appendable and compressible (a deltify process can combin adjacent
> streams, which will trigger zlib compression).
>
> (That won't work for the successors data since offsets into it are
> stored elsewhere.)

So you're saying that we should run the plaintext proposed above
through svndiff? Can you explain in more detail how this would work?
What is the base of a delta?

> - Some packing support that also implements grouping of all the map
> entries with the same lhs? Is that needed, or should the API at this

What is 'lhs'?

> level only implement 'slurp the entire map into a hash mapping ints to
> arrays of revnums'?

I would say so, with the possible retriction that only a certain range
of revisions is slurped into memory. This way, the API can optimise for
callers which are only interested in a certain revision range (e.g.
graphing tools can reconstruct the history of a file in several steps).

> (OT: those terms for the three kinds of files you have haven't sat well
> with me; from the term alone I couldn't instantly recall what the
> contents of the file were.)

What would you call them? I'm open to suggestions.

> > In exceptional cases (see below) it is possible that a given revision
> > does not actually contain any successors of the given node_rev_id.
> >
>
> Implementation question: would it make sense to have a mode for
> inquiring about several noderev's successors at the same time, in order
> to amortize the lookup operation (three open()s, several read()s of the
> successor data file)'s cost across them?

Yes, sure.

The API could take a list of node-revision IDs and return a hash table
keyed on node-revision ID, with the values being arrays of revision offsets.

> > When committing a finished transaction (in fs_fs.c:commit_body()),
> > the writer obtains a write-lock on all files within the successors
> > directory it needs to modify.
> >
>
> Clarification: do you propose that a single write lock be obtained to
> cover the entire successors/** hierarchy?
>
> Also: is the FSFS write lock (which is held by commit_body() whilst it
> runs) sufficient for this?

I would say the global write lock is sufficient.

I wasn't sure how exactly locking in FSFS works. You can read this
statement as "the writer has some kind of write lock on everything it
needs to change, possibly implied by a global write lock".

> > If rN falls within the revision range of an existing revision map file,
> > it copies ((N-2)*4) bytes of content from the revision map file into the
> > temporary file. (Note that this step discards data left behind by any
> > previous attempt at committing rN).
> >
>
> ((N-2)*4) is bogus for N=1, and it's also missing a modulo operation
> somewhere.

Math is hard!

> But the gist is clear: copy the four bytes of every
> preceding revision in the shard.

Yup.

> > If the writer fails to update all files in the successors directory,
> > it will also fail to update 'current'. In this case, readers will ignore
> > the new successor data until another commit succeeds in creating rN.
> > Once a commit of rN has succeeded, readers following node-rev-id map
> > entries updated by the failing commit might end up with an empty
> > successors list for rN. Such erroneous entries will not be cleaned
> > up automatically, but can be eliminated by re-creating the repository
> > (e.g. via a dump/load cycle).
>
> Near the start you referred to a 'compromise'. Is that the compromise?

This is what I meant, yes.

> That said, one issue. Debris in the successor data file or revision map
> file will be overwritten as soon as an attempt to commit rN succeeds;
> however, noderev map entries remain until a dump/load. How feasible
> would it be to also have a cleanup procedure for them?

I am happy to just leave this debris in the files for now.

I would guess that nobody will ever even notice this problem in practice.
The number of commits failing within the time window where succesor data
is updated will statistically be very low to begin with.
Each time it happens we lose a very small fraction of disk space. We also
suffer a teeny tiny bit of read performance loss for readers of successors
of the affected node-revision. So what...

If it ever becomes a real problem, people can dump/load.

One thing that is a little annoying is that file checksums for successor
maps in production and backup repositories might not match up. Folks
might then infer that there is some kind of repository corruption.
So we should make 'svnadmin verify' print a warning if we find debris
entries in the map. The warning would also explain that this is harmless
and does not constitute corruption.

> Such a cleanup procedure could grab the commits (and successors, if it's
> separate) write locks, attempt to read the successors data for all
> noderevs in a noderev map file, and delete the bogus entries from the
> noderev map file. (It could delete them either by atomically replacing
> the file with a new version, or by replacing the first byte of the
> revnum by some agreed non-[0-9\n] byte representing "This line of the
> noderev map file shall be ignored".
>
> (it must be a single byte because some revnums are represented in ASCII
> by a single byte)

If you really want to we can always bolt something like this on later.
Received on 2011-09-05 11:39:12 CEST

This message: [ Message body ]
Next message: Stefan Sperling: "question for FSFS gurus (was: Re: FSFS successor ID design draft)"
Previous message: Julian Foad: "[RFC] Make "svn switch" say "Switched" instead of "Updated" or "At""
In reply to: Daniel Shahaf: "Re: FSFS successor ID design draft"
Next in thread: Daniel Shahaf: "Re: FSFS successor ID design draft"
Reply: Daniel Shahaf: "Re: FSFS successor ID design draft"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]