C. Michael Pilato wrote:
> C. Michael Pilato wrote:
>> My guess is that the problem is actually caused by the handling of r2524,
>> but maybe not seen until r2554 tries to get deltified (or perhaps
>> re-deltified). But it's just a guess.
>
> Yep. If you load up to r2524, you'll find that you can't now do an
> incremental dump of r2524 without hitting the loop. And it isn't always the
> log.c loop that you get stuck in. Other files are using shared reps, too,
> such as libsvn_wc/adm_files.h.
[Is it bad form to have a conversation with yourself in public?]
So, I loaded up to r2524 (as before), and then tried to dump the whole
repository. I was surprised to find that the dump hung now as early as
r2024! It was hanging on one of the files reverted in r2524 (adm_files.h),
which tells me that despite our best attempts thus far, revisions aren't
time-safe.
So I got to thinking about the situation, and I must pose the following
question: Are we doing anything in the post-commit deltification to prevent
the deltification of nodes that picked up previously used rep-keys?
If not (and I suspect we aren't), I'm concerned about the following possible
scenario.
Let's look at the theoretical representations and their storage models for a
single file with 7 revisions:
REV REP-KEY REP-SUMMARY
--- ------- -----------
1 A fulltext
2 B delta-A
3 C delta-A
4 D delta-C
5 E delta-A
6 F delta-E
7 G delta-E
Note that our delta chain includes skip-deltas, based on the skip-delta
algorithm described in notes/skip-deltas and employed by the backends. Now,
what if instead of creating a new set of contents in r7, we had instead
reverted our file to the way it looked in r5. At commit time, we have this:
REV REP-KEY REP-SUMMARY
--- ------- -----------
1 A fulltext
2 B delta-A
3 C delta-A
4 D delta-C
5 E delta-A
6 F delta-E
7 E fulltext (not yet deltified)
Now post-commit deltification comes in. It sees (just as it would have had
we *not* reverted the file's contents) that "the representation for r7
should be deltified against rep E". But r7's rep *is* E. So we get:
REV REP-KEY REP-SUMMARY
--- ------- -----------
1 A fulltext
2 B delta-A
3 C delta-A
4 D delta-C
5 E delta-E <-- note that this changed, too
6 F delta-E
7 E delta-E
Not only is r7 now stuck in a loop, but r5 is, too. This would explain why
after loading r2524 with your branch code, an earlier revision (r2024) is
the first to hang.
Of course, one of us might want to verify that this is, in fact, what's
happening in this situation. If so, the solution could be "as easy as" not
deltifying any nodes that make use of shared reps. How do you do that?
Well, maybe we change the storage model for the reps-checksums tables to:
REP-KEY ::= (CHECKSUM FIRST-NODE-REVISION-ID-TO-USE-IT)
If you're thinking about deltifying a node-revision-ID, and it ain't the
first node-revision-id to use a given rep-key, you know not to do the
deltification.
--
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet <> www.collab.net <> Distributed Development On Demand
Received on 2008-09-18 22:18:51 CEST