Hi,
C. Michael Pilato wrote:
> C. Michael Pilato wrote:
>> C. Michael Pilato wrote:
>>> My guess is that the problem is actually caused by the handling of r2524,
>>> but maybe not seen until r2554 tries to get deltified (or perhaps
>>> re-deltified). But it's just a guess.
>> Yep. If you load up to r2524, you'll find that you can't now do an
>> incremental dump of r2524 without hitting the loop. And it isn't always the
>> log.c loop that you get stuck in. Other files are using shared reps, too,
>> such as libsvn_wc/adm_files.h.
>
> [Is it bad form to have a conversation with yourself in public?]
>
> So, I loaded up to r2524 (as before), and then tried to dump the whole
> repository. I was surprised to find that the dump hung now as early as
> r2024! It was hanging on one of the files reverted in r2524 (adm_files.h),
> which tells me that despite our best attempts thus far, revisions aren't
> time-safe.
>
> So I got to thinking about the situation, and I must pose the following
> question: Are we doing anything in the post-commit deltification to prevent
> the deltification of nodes that picked up previously used rep-keys?
>
> If not (and I suspect we aren't), I'm concerned about the following possible
> scenario.
>
> Let's look at the theoretical representations and their storage models for a
> single file with 7 revisions:
>
> REV REP-KEY REP-SUMMARY
> --- ------- -----------
> 1 A fulltext
> 2 B delta-A
> 3 C delta-A
> 4 D delta-C
> 5 E delta-A
> 6 F delta-E
> 7 G delta-E
>
> Note that our delta chain includes skip-deltas, based on the skip-delta
> algorithm described in notes/skip-deltas and employed by the backends. Now,
> what if instead of creating a new set of contents in r7, we had instead
> reverted our file to the way it looked in r5. At commit time, we have this:
>
> REV REP-KEY REP-SUMMARY
> --- ------- -----------
> 1 A fulltext
> 2 B delta-A
> 3 C delta-A
> 4 D delta-C
> 5 E delta-A
> 6 F delta-E
> 7 E fulltext (not yet deltified)
>
If I understood correctly, r7 and r5 are made to share rep keys *before*
deltification kicks in, right? We create a fulltext for r7, notice that it has
the same checksum as r5's rep, and then drop the fulltext and replace it with E.
> Now post-commit deltification comes in. It sees (just as it would have had
> we *not* reverted the file's contents) that "the representation for r7
> should be deltified against rep E". But r7's rep *is* E. So we get:
>
> REV REP-KEY REP-SUMMARY
> --- ------- -----------
> 1 A fulltext
> 2 B delta-A
> 3 C delta-A
> 4 D delta-C
> 5 E delta-E <-- note that this changed, too
> 6 F delta-E
> 7 E delta-E
>
> Not only is r7 now stuck in a loop, but r5 is, too. This would explain why
> after loading r2524 with your branch code, an earlier revision (r2024) is
> the first to hang.
It seems that svn_fs_base__dag_deltify() checks against attempting to deltify a
rep against itself, so there is probably more than one node in the cycle of
death (that is, it's not rep E as a delta against rep E, it's probably something
like rep E as a delta against rep D as a delta against rep E).
>
> Of course, one of us might want to verify that this is, in fact, what's
> happening in this situation. If so, the solution could be "as easy as" not
> deltifying any nodes that make use of shared reps. How do you do that?
> Well, maybe we change the storage model for the reps-checksums tables to:
>
> REP-KEY ::= (CHECKSUM FIRST-NODE-REVISION-ID-TO-USE-IT)
>
> If you're thinking about deltifying a node-revision-ID, and it ain't the
> first node-revision-id to use a given rep-key, you know not to do the
> deltification.
>
Can't we add a boolean flag to node-revision-id, set to TRUE if the node-rev-id
reused a rep?
--
Vlad
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-09-19 00:16:16 CEST