Julian Foad wrote on Tue, 13 Oct 2009 at 18:06 +0100:
> Daniel,
>
> OK, I think I get it now. The "rep cache" is a cache of rep locations,
> keyed by rep checksum. When we want to add a new rep, we look in the
> cache to see if we have a copy of it stored already. If a location is
> not found in the cache, we might nonetheless already have a copy of the
> rep and we could search the rev files exhaustively, but we prefer not to
> spend the time doing so, so we just store (another copy of) the new rep
> in full.
>
Yes.
> You raise the issue of deleting stale references from the cache, and
> ensuring stale entries do not end up being used in new committed revs.
> We can invalidate such entries in the cache synchronously with
> obliteration, but you point out that the reference may have already been
> read from the cache and stored in a pending transaction before we
> invalidate it.
Good point, this is possible, but it isn't what I said. The scenario I
described below (search for 'rBO') is a race condition around the sqlite
write lock that (theoretically) may cause cache entries for a given
revision to be first inserted into the DB only after the obliterate of
that revision has finished.
(I'm thinking of the case where the 'obliterate' gets the sqlite write
lock before the commit-it-obliterates gets the sqlite write lock.)
> Therefore it seems we will need to validate all such
> references in a new revision txn at commit finalization time. It might
> be acceptable to abort a commit if it contains any stale reference, and
> the user could re-try. The more correct action would be to ensure we
> have a full copy of the rep available until we convert it into a
> reference at finalization time.
>
IOW, when we take the write lock to finalize a commit, at that point we
should make sure that all the reps we refer to are still alive and well.
Unrelated question: Do we keep a history of obliterations? (I see it
may be undesired for users, but it might make the implementation
easier...)
> A way to describe this is that the conceptual "reference count" of a rep
> needs to include references in pending transactions: if any pending
> transaction refers to a given rep, then we can't (yet) delete that rep.
> When we abort a transaction, then we can look through its references and
> delete any reps for which it was the only reference. That sounds like it
> would be inefficient to implement in the obvious way but I'm sure we can
> find a good equivalent.
>
IOW: if we have other references for the rep, we should obliterate the
file without physically deleting the rep?
> - Julian
>
Daniel
(a bit cloudy)
>
> Daniel Shahaf wrote:
> > Representations are stored in the rev files. The DB only stores the
> > coordinates (offsets into rev files) of representations, keyed by the
> > sha1 of the file generated by the representation.
> >
> > Commits that want to use a representation write out its coordinates in
> > full in the revision file they create. (In particular, they *do not*
> > refer to the DB; this is why the latter can be removed at any point.)
> >
> > For example, in a Greek tree (r1) with 'svn ps foo bar iota' (r2) and
> > '/bin/cp iota iota2; svn add iota2' (r3):
> >
> > % sha1sum < wc1/trunk/iota
> > 2c0aa9014a0cd07f01795a333d82485ef6d083e2 -
> >
> > % sqlite3 r/db/rep-cache.db "select * from rep_cache where hash = '2c0aa9014a0cd07f01795a333d82485ef6d083e2';"
> > 2c0aa9014a0cd07f01795a333d82485ef6d083e2|1|547|37|25
> >
> > ### [1]
> > % grep -ar 2c0aa9014a0cd07f01795a333d82485ef6d083e2 r/db/revs
> > r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 0-0/_16
> > r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 0-0/_16
> > r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 2-2/_3
> >
> > ### [2]
> > # that's the representation
> > % xxd -s 347 -l 50 r/db/revs/0/1
> > 0000223: 4445 4c54 410a 5356 4e01 0000 1902 1a01 DELTA.SVN.......
> > 0000233: 9919 5468 6973 2069 7320 7468 6520 6669 ..This is the fi
> > 0000243: 6c65 2027 696f 7461 272e 0a45 4e44 5245 le 'iota'..ENDRE
> > 0000253: 500a P.
> >
> > Daniel
> > (The docs I got this info from are the 'Revision file format' section of
> > the FSFS 'structure' file.)
> >
> >
> > [1] There is a pack file because I build with PACK_AFTER_EVERY_COMMIT and
> > with SVN_FS_FS_DEFAULT_MAX_FILES_PER_DIR=4. A normal build would
> > see matches in revs/0/{1,2,3}.
> >
> > [2] 50 == 37 + strlen("DELTA\n") + strlen("ENDREP\n")
> >
> >
> > > > Actually, after r39897, it's possible that DB rows referencing the
> > > > revision-being-obliterated[1] will be added at an arbitrary time after
> > > > that revision has been committed: it's (theoretically, depending on
> > > > the order SQLite hands out write locks) possible that the sequence
> > > >
> > > > # make a few large (multi-file) commits
> > > > # commit rBO
> > > > # obliterate rBO
> > > > # wait 3 seconds
> > > >
> > > > will result in rep-cache rows referring the obliterated version of rBO.
> > > >
> > > >
> > > > Not sure how to solve that. We want to ensure that the obliterate doesn't
> > > > get the SQLite write lock before "its" commit gets the same lock. [2]
> > > >
> > > >
> > > > Possible damage? If the DB rows relating to the original rBO are added
> > > > after its obliteration its complete, then reps created in the future may
> > > > rely on these rows and avoid writing themselves out explicitly --- even
> > > > though the reps may no longer be in the rBO rev file.
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407219
>
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407282
Received on 2009-10-13 22:57:53 CEST