[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

FSFS "rev cache" operation [was: obliterate in trunk]

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Tue, 13 Oct 2009 18:06:56 +0100

Daniel,

OK, I think I get it now. The "rep cache" is a cache of rep locations,
keyed by rep checksum. When we want to add a new rep, we look in the
cache to see if we have a copy of it stored already. If a location is
not found in the cache, we might nonetheless already have a copy of the
rep and we could search the rev files exhaustively, but we prefer not to
spend the time doing so, so we just store (another copy of) the new rep
in full.

You raise the issue of deleting stale references from the cache, and
ensuring stale entries do not end up being used in new committed revs.
We can invalidate such entries in the cache synchronously with
obliteration, but you point out that the reference may have already been
read from the cache and stored in a pending transaction before we
invalidate it. Therefore it seems we will need to validate all such
references in a new revision txn at commit finalization time. It might
be acceptable to abort a commit if it contains any stale reference, and
the user could re-try. The more correct action would be to ensure we
have a full copy of the rep available until we convert it into a
reference at finalization time.

A way to describe this is that the conceptual "reference count" of a rep
needs to include references in pending transactions: if any pending
transaction refers to a given rep, then we can't (yet) delete that rep.
When we abort a transaction, then we can look through its references and
delete any reps for which it was the only reference. That sounds like it
would be inefficient to implement in the obvious way but I'm sure we can
find a good equivalent.

- Julian

Daniel Shahaf wrote:
> Representations are stored in the rev files. The DB only stores the
> coordinates (offsets into rev files) of representations, keyed by the
> sha1 of the file generated by the representation.
>
> Commits that want to use a representation write out its coordinates in
> full in the revision file they create. (In particular, they *do not*
> refer to the DB; this is why the latter can be removed at any point.)
>
> For example, in a Greek tree (r1) with 'svn ps foo bar iota' (r2) and
> '/bin/cp iota iota2; svn add iota2' (r3):
>
> % sha1sum < wc1/trunk/iota
> 2c0aa9014a0cd07f01795a333d82485ef6d083e2 -
>
> % sqlite3 r/db/rep-cache.db "select * from rep_cache where hash = '2c0aa9014a0cd07f01795a333d82485ef6d083e2';"
> 2c0aa9014a0cd07f01795a333d82485ef6d083e2|1|547|37|25
>
> ### [1]
> % grep -ar 2c0aa9014a0cd07f01795a333d82485ef6d083e2 r/db/revs
> r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 0-0/_16
> r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 0-0/_16
> r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 2-2/_3
>
> ### [2]
> # that's the representation
> % xxd -s 347 -l 50 r/db/revs/0/1
> 0000223: 4445 4c54 410a 5356 4e01 0000 1902 1a01 DELTA.SVN.......
> 0000233: 9919 5468 6973 2069 7320 7468 6520 6669 ..This is the fi
> 0000243: 6c65 2027 696f 7461 272e 0a45 4e44 5245 le 'iota'..ENDRE
> 0000253: 500a P.
>
> Daniel
> (The docs I got this info from are the 'Revision file format' section of
> the FSFS 'structure' file.)
>
>
> [1] There is a pack file because I build with PACK_AFTER_EVERY_COMMIT and
> with SVN_FS_FS_DEFAULT_MAX_FILES_PER_DIR=4. A normal build would
> see matches in revs/0/{1,2,3}.
>
> [2] 50 == 37 + strlen("DELTA\n") + strlen("ENDREP\n")
>
>
> > > Actually, after r39897, it's possible that DB rows referencing the
> > > revision-being-obliterated[1] will be added at an arbitrary time after
> > > that revision has been committed: it's (theoretically, depending on
> > > the order SQLite hands out write locks) possible that the sequence
> > >
> > > # make a few large (multi-file) commits
> > > # commit rBO
> > > # obliterate rBO
> > > # wait 3 seconds
> > >
> > > will result in rep-cache rows referring the obliterated version of rBO.
> > >
> > >
> > > Not sure how to solve that. We want to ensure that the obliterate doesn't
> > > get the SQLite write lock before "its" commit gets the same lock. [2]
> > >
> > >
> > > Possible damage? If the DB rows relating to the original rBO are added
> > > after its obliteration its complete, then reps created in the future may
> > > rely on these rows and avoid writing themselves out explicitly --- even
> > > though the reps may no longer be in the rBO rev file.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407219
Received on 2009-10-13 23:31:15 CEST

This is an archived mail posted to the Subversion Dev mailing list.