[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: FSFS "rev cache" operation [was: obliterate in trunk]

From: David Glasser <glasser_at_davidglasser.net>
Date: Tue, 13 Oct 2009 10:50:28 -0700

Well, the actual problem here is that the rep cache doesn't have a ref count.

So let's say you want to obliterate the node /foo/bar_at_1234. Its text
may be in the rep r1234/9876. If you actually want to remove that
data from the backend (and not just mask it from clients), you need to
know if any other nodes (which may not be related to this node via
ancestry relations) use that rep. Since the DB doesn't even have a
refcount, you can't even know if it's safe to wipe the rep text (let
alone where the other uses are).

As Greg said, you can basically get around this by doing a bunch of
slow repository walks to try to find other places that use the same
rep. (Of course half the reason that we want an obliterate feature is
that people find dump/load to be too slow :) )

Note that Hyrum's original implementation had ref counts but since we
were combining two different attempts at ACID-style semantics (svn
FSFS and sqlite) in an unsound way, there were many ways that the ref
count could become incorrect, so we removed it rather than try to make
it perfect.

--dave

On Tue, Oct 13, 2009 at 10:06 AM, Julian Foad
<julianfoad_at_btopenworld.com> wrote:
> Daniel,
>
> OK, I think I get it now. The "rep cache" is a cache of rep locations,
> keyed by rep checksum. When we want to add a new rep, we look in the
> cache to see if we have a copy of it stored already. If a location is
> not found in the cache, we might nonetheless already have a copy of the
> rep and we could search the rev files exhaustively, but we prefer not to
> spend the time doing so, so we just store (another copy of) the new rep
> in full.
>
> You raise the issue of deleting stale references from the cache, and
> ensuring stale entries do not end up being used in new committed revs.
> We can invalidate such entries in the cache synchronously with
> obliteration, but you point out that the reference may have already been
> read from the cache and stored in a pending transaction before we
> invalidate it. Therefore it seems we will need to validate all such
> references in a new revision txn at commit finalization time. It might
> be acceptable to abort a commit if it contains any stale reference, and
> the user could re-try. The more correct action would be to ensure we
> have a full copy of the rep available until we convert it into a
> reference at finalization time.
>
> A way to describe this is that the conceptual "reference count" of a rep
> needs to include references in pending transactions: if any pending
> transaction refers to a given rep, then we can't (yet) delete that rep.
> When we abort a transaction, then we can look through its references and
> delete any reps for which it was the only reference. That sounds like it
> would be inefficient to implement in the obvious way but I'm sure we can
> find a good equivalent.
>
> - Julian
>
>
> Daniel Shahaf wrote:
>> Representations are stored in the rev files.  The DB only stores the
>> coordinates (offsets into rev files) of representations, keyed by the
>> sha1 of the file generated by the representation.
>>
>> Commits that want to use a representation write out its coordinates in
>> full in the revision file they create.  (In particular, they *do not*
>> refer to the DB; this is why the latter can be removed at any point.)
>>
>> For example, in a Greek tree (r1) with 'svn ps foo bar iota' (r2) and
>> '/bin/cp iota iota2; svn add iota2' (r3):
>>
>>     % sha1sum < wc1/trunk/iota
>>     2c0aa9014a0cd07f01795a333d82485ef6d083e2  -
>>
>>     % sqlite3 r/db/rep-cache.db "select * from rep_cache where hash = '2c0aa9014a0cd07f01795a333d82485ef6d083e2';"
>>     2c0aa9014a0cd07f01795a333d82485ef6d083e2|1|547|37|25
>>
>>     ### [1]
>>     % grep -ar 2c0aa9014a0cd07f01795a333d82485ef6d083e2 r/db/revs
>>     r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 0-0/_16
>>     r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 0-0/_16
>>     r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 2-2/_3
>>
>>     ### [2]
>>     # that's the representation
>>     % xxd -s 347 -l 50 r/db/revs/0/1
>>     0000223: 4445 4c54 410a 5356 4e01 0000 1902 1a01  DELTA.SVN.......
>>     0000233: 9919 5468 6973 2069 7320 7468 6520 6669  ..This is the fi
>>     0000243: 6c65 2027 696f 7461 272e 0a45 4e44 5245  le 'iota'..ENDRE
>>     0000253: 500a                                     P.
>>
>> Daniel
>> (The docs I got this info from are the 'Revision file format' section of
>> the FSFS 'structure' file.)
>>
>>
>> [1] There is a pack file because I build with PACK_AFTER_EVERY_COMMIT and
>>     with SVN_FS_FS_DEFAULT_MAX_FILES_PER_DIR=4.  A normal build would
>>     see matches in revs/0/{1,2,3}.
>>
>> [2] 50 == 37 + strlen("DELTA\n") + strlen("ENDREP\n")
>>
>>
>> > > Actually, after r39897, it's possible that DB rows referencing the
>> > > revision-being-obliterated[1] will be added at an arbitrary time after
>> > > that revision has been committed: it's (theoretically, depending on
>> > > the order SQLite hands out write locks) possible that the sequence
>> > >
>> > >     # make a few large (multi-file) commits
>> > >     # commit rBO
>> > >     # obliterate rBO
>> > >     # wait 3 seconds
>> > >
>> > > will result in rep-cache rows referring the obliterated version of rBO.
>> > >
>> > >
>> > > Not sure how to solve that.  We want to ensure that the obliterate doesn't
>> > > get the SQLite write lock before "its" commit gets the same lock. [2]
>> > >
>> > >
>> > > Possible damage?  If the DB rows relating to the original rBO are added
>> > > after its obliteration its complete, then reps created in the future may
>> > > rely on these rows and avoid writing themselves out explicitly --- even
>> > > though the reps may no longer be in the rBO rev file.
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407219
>

-- 
glasser_at_davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407239
Received on 2009-10-14 00:04:33 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.