The Four Elements of Righteousness

From: Greg Stein <gstein_at_lyra.org>
Date: 2002-07-18 03:09:55 CEST

[ yah... the subject has little relevance, but "I made you look!" :-) ]

This email is to recap a phone conversation that Karl and I had about the
wcprops and ra_dav and the WC. The motivation of the call was, obviously,
that there is a bit of disagreement in how to solve the transaction problem
for wcprops.

The original problem:

    1) we perform a commit
    2) ra_dav gets a version resource URL from the server for the newly
       committed item and stores it via the RA propset callback
    3) RA completes the commit editor drive, returns to the WC
    4) the WC calls process_committed() to wrap up the commit (move text
       bases, update entries, etc)

  If a crash/signal/ctrl-C occurs *after* step 2, but before step 4
  completes, then you have a WC that has lost its internal consistency. The
  recorded version resource URL does not map to the values stored in the
  entries file.

Naive solution #1:

    * add "loggy" behavior to the storage of the wcprop

  This doesn't work because if the crash occurs, then during a "cleanup" we
  will run the log and be right back in the same spot.

Naive solution #2:

* don't run logs during cleanup

  This probably isn't an answer because we might *need* to run whatever is
  in the log to return the WC to a consistent state (unrelated to any of the
  wcprops stuff).

Big Brains solution #3:

* add various consistency checks to what we store as a wcprop

  This is what Karl has done with ra_dav, but it really papers over the
  underlying problem that our WC is not properly transacted. Instead, this
  double checking is about *detecting* a WC in an inconsistent state. It
  doesn't *solve* the inconsistency problem.

The Four Elements solution:

  [ you knew the number Four was involved somewhere, didn't you? :-) ]

  I believe that we have simply not stepped back to realize what needs to be
  transacted and made "loggy" in our system. In short, there are four items
  that have an implicit need to remain synchronized:

    1) the file's entry in ./svn/entries
    2) the file's contents
    3) the file's properties
    4) the file's WC properties

  These four items need to be changed as a single unit. If you change any of
  them *WITHOUT* changing the others (or specifically knowing that it
  doesn't need to change), then you violate the internal integrity of the
  working copy.

Thus, any solution needs to look at these four items as a unit, and
transact them according.

----------------------------------------------
Implementation Considerations

The client and working copy libraries are operating from a pretty simple
standpoint:

  a) we are going to call RA and give it a callback to modify properties
  b) since RA can modify props, then we need to consider transaction(s)
  c) we will then make our own changes, completing the transaction(s)

In other words, there isn't anything real sneaky going on here. This isn't
really about changing wcprops, but simply that the client library is giving
RA a way to change one of the four states. Thus, the client lib had better
ensure that it happens Properly(tm).

Karl pointed out that our logs are written as atomic units of work, where
the processing each unit leaves the WC intact. We read in the whole log,
append 10 items to it (one "unit" of work), write the result to a temp file,
and move it over the top of the old log file. But steps (a) and (c) occur a
"long ways" from each other, so the integrity of the WC cannot be guaranteed
since the "real" unit occurs in a couple pieces.

I suggested tagging log items with a sequence ID and not performing any
until you see a "close ID" in the log. If a crash occurs, then a "close 13"
won't be present, so all the ID==13 items would be skipped. This style might
be possible, but it would take a bit of scanning to find valid transaction
sets and then to process them.

Instead, Karl suggested that we write individual log files. When a log for a
specific item is "complete", then it gets appended to the master log file
for running later on. These logs could go in, say:

.svn/tmp/file-logs/FILENAME
.svn/tmp/dir-log

Each file would simply follow the standard log format.

The notion of "complete" is defined by process_committed. It knows that
(possibly) some prop changes occurred earlier, but it definitely knows when
a file has been completely processed. Thus, we have a good marker for
knowing the termination of all transactions. The beginning is simply the
first time the RA setprop callback is used for a particular path.

----------------------------------------------
Some Refinements

* There is no need for a "master" log file.

  Our units of work occur entirely on a per-file or per-dir basis, except
  for altering directory info *after* its children have been updated. For
  example, we need to ensure all children are added/deleted before changing
  the revision associated with a directory.

  Thus, each time a file-log is completed, it can be run immediately.

  A dir-log cannot be run if any file-logs exist. Once the file logs are
  completed, then the dir-log may be run.

  The directory structure shown above, using the .svn/tmp/ subdir is not
  needed. We can simply move the logs directly up to .svn and omit the
  .svn/log file.

* Maybe eliminate the wcprops separation

  Initially, we decided to put the wcprops into their own files so that the
  props file would be just user-defined properties. This is somewhat
  artificial. Since we use the same API to set normal, entry, and WC props,
  we may want to just go ahead and keep them in the same file.

  This would reduce the number of inodes used by SVN, reduce the overall I/O
  because of few files to open/read/write, and reduce the number of items to
  transact (the Three Elements now :-)

  The WC's property functions would need to filter wcprops before returning
  them, but this seems very minor relative to the I/O to get the darned
  things in the first place (they could even be filtered at read time, based
  on whether the caller is interested or not).

* Maybe keep a single log file and use IDs

  Not sure on this one. Maybe there is a simple two-pass scheme to read the
  log. Collect IDs, then reread to process the complete ones. This scheme
  would reduce the number of files needed, but I'm not sure of any other
  benefits or costs.

I think that is about it for now. Mostly, this email can simply serve as an
impetus for conversation to validate the basic "transact *all four* pieces
of WC state" concept.

Cheers,
-g

p.s. no, this won't go into alpha; in fact, I would want to wait for the WC
admin lock work to complete, and build off that

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Thu Jul 18 03:08:05 2002

This message: [ Message body ]
Next message: Bill Tutt: "RE: Re: Hot to restore repo from hot-backup.py"
Previous message: Ben Collins-Sussman: "Re: SV: svn documentation"
Next in thread: Karl Fogel: "Re: The Four Elements of Righteousness"
Reply: Karl Fogel: "Re: The Four Elements of Righteousness"
Maybe reply: Greg Stein: "Re: The Four Elements of Righteousness"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]