[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

mem bug progress: got the cause, fix planned

From: <kfogel_at_collab.net>
Date: 2001-08-24 01:26:20 CEST

A quick status report on the showstopper "cannot allocate memory" bug:
we're pretty sure we've got the cause, and a fix is known.

The fix is a fairly deep reworking of how up-to-date checks are done
during commits, and may take a few days to implement. Fortunately,
the benefit will be not merely that this bug goes away, but that
commits become *much* more efficient than we ever thought they would
be.

The long story:

(Apologies if this explanation needs more editing, I want to send this
asap so people know what's going on.)

Greg Stein and Ben and I have been investigating a bug whereby the fs
runs out of memory when committing from a working copy that is largely
far behind HEAD. This is the situation produced by the `mass-commit'
script, which you may recall was posted here recently. The
`mass-commit' script imports an entire Subversion source tree into a
newly-minted repository, thus creating revision 1, and then checks it
out into a working copy. It then runs through a cycle of 1000
commits, randomly modifying a few files at various places in the wc
tree and committing those files (by name) each time. It's only doing
content changes, by the way, no property mods.

It never updates that working copy, so all the directories remain at
revision 1, and the files end up at all sorts of revisions, from 1 to
around ~500 (on my box, it's commit 563 that finally runs out of
memory, but Your Mileage May Vary, of course). This happens over
*both* ra_local and ra_dav.

We tried first the following experiment:

           Scenario 1 Scenario 2
     ------------------------ ---------------------------
     Try to commit those files Try to commit those files
     from a working copy that from a working copy that
     had never been updated, was just checked out from
     i.e., as the script does the repository, so the whole
     it. working copy is at HEAD.

As you might expect, Scenario 1 fails with the out-of-memory error as
it had always done, yet Scenario 2 succeeds instantly. Hmmm.

Instrumentation revealed that the memory runs out after all the files'
base revisions have been sent to the repository, but before the txn
has been brought to a mergeable state. That is, the txn is
constructed, and all the files being committed have been replaced
using calls to svn_fs_link(), so that the txn now reflects an
appropriate base state against which svndiff deltas can now be applied
to reflect the working copy changes. But the changes haven't actually
been applied yet, so every node revision in the txn still reflects a
state that actually exists in a committed revision somewhere. And
merge() hasn't been called yet, either, of course. With pool
debugging -- thanks to Mike Pilato for the quick lesson -- we saw
incredible numbers of pools (like 50,000) being created, cleared, and
destroyed mostly in the svndiff'ing and undeltification code, all
before txn_body_merge() was ever called.

Remember this state; we'll be coming back to it. :-)

Also, before we go on, note that this problem occurs with both
ra_local and ra_dav, even though those two build their txns opposite
ways: ra_dav builds it based on the head revision, whereas ra_local
builds it based on the directory revision at the root of the change in
the working copy. We thought for a long time that since our problem
reproduced best with out-of-date working copies, that it must have
something to do with the revision on which you base the txn, but
noooo...

The problem has to do with undeltification inefficiency compounded
with the way file basetexts are obtained for svndiff applications.
The former is a known problem which we are planning to address after
M3, and luckily, it won't be necessary to solve it to make commits
work today. It's the latter half that's the real issue. Here's an
illustration of the problem:

   1. You have a mixed-revision working copy, like the one produced by
      mass-commit. All directories are at rev 1, files are at various
      revisions.

   2. You make many commits. No problem, though things seem to be
      getting a bit slow...

   3. You commit a change to, among other things, the file
      `subversion/libsvn_fs/fs.c', so it ends up with revision 245.

   4. You do many other commits. None of them touch `fs.c', but many
      of the commits do result in its parent, grandparent, or
      great-grandparent directory being "bubbled up" and receiving a
      new revision number in the repository.

   5. You try to commit another change to `fs.c'. Now the head is at
      revision 558, and while `fs.c' has not changed since 245, its
      parents have changed many times...

At this point the repository needs to check if your fs.c is
up-to-date, and if it is, the fs needs to retrieve your revision of
that file so the incoming svndiff can be applied to it.

Actually, that's the problem. In order to even do the check, the fs
*thinks* it needs revision 245 of the file, so it can compare that
node id with the one in the head. But since the various parent
directories have been changed a lot since then, fetching the old
entry's node ID involves a *lot* of undeltification, which costs way
too much right now, and frankly will never be truly cheap. Whenever
we can avoid it, we should. At the moment, there's so much of it
going on and it's so expensive that we actually run out of memory.

There's another way to do things, fortunately. We have a magic "back
door" to get what we need, without *ever* fetching that old directory
listing. Every node remembers what revision it was created
(committed) in. So we have these pieces of data:

   1. The file's revision number on the client side.

   2. A node revision for it in the repository head, cheaply
      obtainable because the head is fulltext, or still very near
      fulltext by the time you get a handle on it anyway.

   3. The revision number in which the head revision of the file was
      created, via point (2).

Thus we can use this new commit algorithm on the server-side, whose
big advantage strength is that it avoids undeltifying numerous parent
directories just to discover an old node-rev-id:

   1. First of all, ra_local should base transaction on the youngest
       revision Y, as ra_dav does, not the revision of the working
       copy parent. This makes for cheaper merges and some more
       convenient code paths, although it doesn't directly solve this
       bug or anything.

   2. For each committed target TGT at revision N that we receive:

         - Get the node-rev-id of TGT at revision Y. This is a
           fulltext retrieval, therefore cheap.
          
         - Look inside this node-rev-id, it will tell you what
           revision it was committed in. Call that revision L.

         - if (N < L)

              Then TGT at revision N is obviously out-of-date, because
              somebody changed it in revision L. Signal a conflict
              and bail early. Note that you never looked at the node
              ids themselves.

           else if (L <= N <= Y)

              Everything is fine; TGT at revision N is up-to-date,
              because we know that nobody has changed the node-rev-id
              between revisions L and Y. Drop Y's idea of TGT's
              node-rev-id into the transaction, and await a
              text-delta.

           else if (N > Y)

              A very rare situation, though possible if you really
              work at it (we know a scenario, but it's one that will
              "never" happen). Anyway, just bounce back with an
              out-of-date error, or else re-base the txn on the new
              youngest revision and redo the changes.

That's the basic idea, though I've hand-waved on a few details --
handling for added and deleted files, for example, plus you can see
that some of the checks can be a bit fancier, and by remembering the
node-rev we can do a predecessor/successor check as well, blah blah
blah. Ben and I will be sitting down tomorrow morning and figuring
out exactly what we'll need to change.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:36 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.