[ yah... the subject has little relevance, but "I made you look!" :-) ]
This email is to recap a phone conversation that Karl and I had about the
wcprops and ra_dav and the WC. The motivation of the call was, obviously,
that there is a bit of disagreement in how to solve the transaction problem
for wcprops.
The original problem:
1) we perform a commit
2) ra_dav gets a version resource URL from the server for the newly
committed item and stores it via the RA propset callback
3) RA completes the commit editor drive, returns to the WC
4) the WC calls process_committed() to wrap up the commit (move text
bases, update entries, etc)
If a crash/signal/ctrl-C occurs *after* step 2, but before step 4
completes, then you have a WC that has lost its internal consistency. The
recorded version resource URL does not map to the values stored in the
entries file.
Naive solution #1:
* add "loggy" behavior to the storage of the wcprop
This doesn't work because if the crash occurs, then during a "cleanup" we
will run the log and be right back in the same spot.
Naive solution #2:
* don't run logs during cleanup
This probably isn't an answer because we might *need* to run whatever is
in the log to return the WC to a consistent state (unrelated to any of the
wcprops stuff).
Big Brains solution #3:
* add various consistency checks to what we store as a wcprop
This is what Karl has done with ra_dav, but it really papers over the
underlying problem that our WC is not properly transacted. Instead, this
double checking is about *detecting* a WC in an inconsistent state. It
doesn't *solve* the inconsistency problem.
The Four Elements solution:
[ you knew the number Four was involved somewhere, didn't you? :-) ]
I believe that we have simply not stepped back to realize what needs to be
transacted and made "loggy" in our system. In short, there are four items
that have an implicit need to remain synchronized:
1) the file's entry in ./svn/entries
2) the file's contents
3) the file's properties
4) the file's WC properties
These four items need to be changed as a single unit. If you change any of
them *WITHOUT* changing the others (or specifically knowing that it
doesn't need to change), then you violate the internal integrity of the
working copy.
Thus, any solution needs to look at these four items as a unit, and
transact them according.
----------------------------------------------
Implementation Considerations
The client and working copy libraries are operating from a pretty simple
standpoint:
a) we are going to call RA and give it a callback to modify properties
b) since RA can modify props, then we need to consider transaction(s)
c) we will then make our own changes, completing the transaction(s)
In other words, there isn't anything real sneaky going on here. This isn't
really about changing wcprops, but simply that the client library is giving
RA a way to change one of the four states. Thus, the client lib had better
ensure that it happens Properly(tm).
Karl pointed out that our logs are written as atomic units of work, where
the processing each unit leaves the WC intact. We read in the whole log,
append 10 items to it (one "unit" of work), write the result to a temp file,
and move it over the top of the old log file. But steps (a) and (c) occur a
"long ways" from each other, so the integrity of the WC cannot be guaranteed
since the "real" unit occurs in a couple pieces.
I suggested tagging log items with a sequence ID and not performing any
until you see a "close ID" in the log. If a crash occurs, then a "close 13"
won't be present, so all the ID==13 items would be skipped. This style might
be possible, but it would take a bit of scanning to find valid transaction
sets and then to process them.
Instead, Karl suggested that we write individual log files. When a log for a
specific item is "complete", then it gets appended to the master log file
for running later on. These logs could go in, say:
.svn/tmp/file-logs/FILENAME
.svn/tmp/dir-log
Each file would simply follow the standard log format.
The notion of "complete" is defined by process_committed. It knows that
(possibly) some prop changes occurred earlier, but it definitely knows when
a file has been completely processed. Thus, we have a good marker for
knowing the termination of all transactions. The beginning is simply the
first time the RA setprop callback is used for a particular path.
----------------------------------------------
Some Refinements
* There is no need for a "master" log file.
Our units of work occur entirely on a per-file or per-dir basis, except
for altering directory info *after* its children have been updated. For
example, we need to ensure all children are added/deleted before changing
the revision associated with a directory.
Thus, each time a file-log is completed, it can be run immediately.
A dir-log cannot be run if any file-logs exist. Once the file logs are
completed, then the dir-log may be run.
The directory structure shown above, using the .svn/tmp/ subdir is not
needed. We can simply move the logs directly up to .svn and omit the
.svn/log file.
* Maybe eliminate the wcprops separation
Initially, we decided to put the wcprops into their own files so that the
props file would be just user-defined properties. This is somewhat
artificial. Since we use the same API to set normal, entry, and WC props,
we may want to just go ahead and keep them in the same file.
This would reduce the number of inodes used by SVN, reduce the overall I/O
because of few files to open/read/write, and reduce the number of items to
transact (the Three Elements now :-)
The WC's property functions would need to filter wcprops before returning
them, but this seems very minor relative to the I/O to get the darned
things in the first place (they could even be filtered at read time, based
on whether the caller is interested or not).
* Maybe keep a single log file and use IDs
Not sure on this one. Maybe there is a simple two-pass scheme to read the
log. Collect IDs, then reread to process the complete ones. This scheme
would reduce the number of files needed, but I'm not sure of any other
benefits or costs.
I think that is about it for now. Mostly, this email can simply serve as an
impetus for conversation to validate the basic "transact *all four* pieces
of WC state" concept.
Cheers,
-g
p.s. no, this won't go into alpha; in fact, I would want to wait for the WC
admin lock work to complete, and build off that
--
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Jul 18 03:08:05 2002