Philip and I had an interesting conversation with some users this
evening, and I'm just archiving my brain dump here.
These users have a large repository with a large number of branches in
the /branches directory (~35k). We described the well-known
phenomenon in which directories aren't deltified on commit, and thus
cause the repository to have very large revisions, even when the
actual content changes are fairly small. This is due to bubble up and
having to re-write the entire directory list of the /branches
Philip recalled a time several years ago when he enabled directory
deltification, but the performance was awful, and we've never released
it. In our discussion, we mentioned that directory deltification may
be better performing now, especially in light of the imminent merge of
the diff-bytes-optimizations branch. In the case of a bubble-up
directory modification, the prefix and suffix matching would simplify
the problem space, leaving a very small diff.
The only trouble with the above theory is if directory entry lists are
stored in a hash, and are serialized in an unordered manner, thus
negating any benefits prefix-scanning would provide (and potentially
causing the horrific delta performance in the first place).
Anyway, that was the kernel of our discussion. I haven't dug around
in the code to determine how much of it is true or not, but if anybody
wants something to do, this might be interesting.
Received on 2011-02-01 05:29:56 CET