On Mon, Sep 05, 2011 at 01:23:14PM +0300, Daniel Shahaf wrote:
> Stefan Sperling wrote on Mon, Sep 05, 2011 at 11:38:11 +0200:
> > So you're saying that we should run the plaintext proposed above
> > through svndiff? Can you explain in more detail how this would work?
> > What is the base of a delta?
> >
>
> The file contains one or more DELTA\n..ENDREP\n streams:
>
> DELTA
> <svndiff stream>
> ENDREP
> DELTA
> <svndiff stream>
> ENDREP
>
> (In second thought, we should be storing the length of the stream
> somewhere; on the DELTA header seems a fine place:
>
> DELTA 512
> <512 bytes of svndiff stream>
> ENDREP
> DELTA 37
> <37 bytes of svndiff stream>
> ENDREP
>
> .) When the file is read, readers decode all the deltas and concatenate
> the resulting plaintexts. When the file is rewritten, writers
> optionally combine the first N deltas into a single delta that produces
> the combined plaintext.
>
> The deltas can be self-compressed (like a DELTA\n rep in the revision
> files), ie, having no base.
OK, I see. You're trying to save disk space, trading it for CPU time
during read/write operations. Does that make sense? Is the amount of
data really going to be big enough to be worth it?
> > What is 'lhs'?
> lhs = left-hand side
> rhs = right-hand side
> How about calling them after ths RHS'es of the mappings rather than
> after the fact that they are mappings?
>
>
> Currently:
>
> - noderev map file, revision map file, successors data file
>
> Perhaps:
>
> - noderev posterity file, successor offsets file, successors data file
These names are fine with me.
What would you call them on disk?
> (Is 'progeny' the more appropriate word here?
I like 'progeny' because it means 'immediate offspring'.
'Posterity' includes all ancestors in all generations, and that's not
what the file is storing.
> > I am happy to just leave this debris in the files for now.
> >
> > I would guess that nobody will ever even notice this problem in practice.
> > The number of commits failing within the time window where succesor data
> > is updated will statistically be very low to begin with.
> > Each time it happens we lose a very small fraction of disk space. We also
> > suffer a teeny tiny bit of read performance loss for readers of successors
> > of the affected node-revision. So what...
> >
> > If it ever becomes a real problem, people can dump/load.
> >
>
> It'll work, but it's a kill a fly with a fleet approach :). Dump/load
> the entire history (many MB of svndiffs) only to fix some derived
> noderev->offset map data?
Sure, it's not optimal. I doubt anyone will be bothered enough to
perform a dump/load just for this.
Received on 2011-09-05 12:53:17 CEST