Re: FSFS successor ID design draft

From: Stefan Sperling <stsp_at_elego.de>
Date: Mon, 5 Sep 2011 12:52:10 +0200

On Mon, Sep 05, 2011 at 01:23:14PM +0300, Daniel Shahaf wrote:
> Stefan Sperling wrote on Mon, Sep 05, 2011 at 11:38:11 +0200:
> > So you're saying that we should run the plaintext proposed above
> > through svndiff? Can you explain in more detail how this would work?
> > What is the base of a delta?
> >
>
> The file contains one or more DELTA\n..ENDREP\n streams:
>
> DELTA
> <svndiff stream>
> ENDREP
> DELTA
> <svndiff stream>
> ENDREP
>
> (In second thought, we should be storing the length of the stream
> somewhere; on the DELTA header seems a fine place:
>
> DELTA 512
> <512 bytes of svndiff stream>
> ENDREP
> DELTA 37
> <37 bytes of svndiff stream>
> ENDREP
>
> .) When the file is read, readers decode all the deltas and concatenate
> the resulting plaintexts. When the file is rewritten, writers
> optionally combine the first N deltas into a single delta that produces
> the combined plaintext.
>
> The deltas can be self-compressed (like a DELTA\n rep in the revision
> files), ie, having no base.

OK, I see. You're trying to save disk space, trading it for CPU time
during read/write operations. Does that make sense? Is the amount of
data really going to be big enough to be worth it?

> > What is 'lhs'?
> lhs = left-hand side
> rhs = right-hand side

> How about calling them after ths RHS'es of the mappings rather than
> after the fact that they are mappings?
>
>
> Currently:
>
> - noderev map file, revision map file, successors data file
>
> Perhaps:
>
> - noderev posterity file, successor offsets file, successors data file

These names are fine with me.

What would you call them on disk?

> (Is 'progeny' the more appropriate word here?

I like 'progeny' because it means 'immediate offspring'.
'Posterity' includes all ancestors in all generations, and that's not
what the file is storing.

> > I am happy to just leave this debris in the files for now.
> >
> > I would guess that nobody will ever even notice this problem in practice.
> > The number of commits failing within the time window where succesor data
> > is updated will statistically be very low to begin with.
> > Each time it happens we lose a very small fraction of disk space. We also
> > suffer a teeny tiny bit of read performance loss for readers of successors
> > of the affected node-revision. So what...
> >
> > If it ever becomes a real problem, people can dump/load.
> >
>
> It'll work, but it's a kill a fly with a fleet approach :). Dump/load
> the entire history (many MB of svndiffs) only to fix some derived
> noderev->offset map data?

Sure, it's not optimal. I doubt anyone will be bothered enough to
perform a dump/load just for this.
Received on 2011-09-05 12:53:17 CEST

This message: [ Message body ]
Next message: Stefan Sperling: "Re: question for FSFS gurus (was: Re: FSFS successor ID design draft)"
Previous message: Ivan Zhakov: "Re: question for FSFS gurus (was: Re: FSFS successor ID design draft)"
In reply to: Daniel Shahaf: "Re: FSFS successor ID design draft"
Next in thread: Daniel Shahaf: "Re: FSFS successor ID design draft"
Reply: Daniel Shahaf: "Re: FSFS successor ID design draft"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]