Re: Pristine store - using it

From: Greg Stein <gstein_at_gmail.com>
Date: Mon, 8 Mar 2010 19:45:33 -0500

On Mon, Mar 8, 2010 at 13:22, Julian Foad <julian.foad_at_wandisco.com> wrote:
>...
> I see two Wrong Ways to do this:
>
> (1) Pass the path of the temporary file (through whatever contorted code
> and data flows exist) along to install_committed_file().

This one is somewhat reasonable.

> (2) Ensure that the temporary file has a derived name, so that the same
> name can be derived again within (). The name could be based on the
> working file path/name/version (as it is in WC-1), or could be based on
> the file's SHA-1 checksum (like it will be when properly in the pristine
> store) if that is available.

This is what we do now, and it totally sucks. We should never "guess"
at filenames or other such items. There should be clear dataflow.

> ... and one Right Way:
>
> (3) As soon as all the content is written to the temp file, move it
> fully into the pristine store, named by its checksum. Then later, in
> install_committed_file(), make that pristine text become "this node's
> base" by writing its checksum into the node's entry in the DB.

Yes. This was where I hoped for us to go, which is why I brought up
the bits about checksum in process_committed_leaf. There is still a
problem around the checksum dataflow, but I do believe that is the
best way to do this.

> Method (3) is right because the new pristine store can contain both the
> old and the new text base simultaneously. We can install a new pristine

Yes and no. More below.

> text that nothing yet refers to, which can be done regardless of whether
> and when the (update) operation completes, and then later make something
> refer to it, thus simplifying the loggy requirements.
>
> However, that does beg the question of how and when we will delete the
> old text from the store.

... or keep the in-flight one from being deleted. Imagine a second
process running during the commit, performing a GC, seeing an unref'd
pristine, and wiping it out.

> If we are going delete it by garbage collection, we must either ensure
> that the temporary pristine is known to be "referenced" before it is
> actually recorded as the base of any node, or otherwise ensure that no
> "garbage collection" can happen while the WC is in such an intermediate
> state.

I believe we can do this pretty simply, by recording a work queue item
(to do *whatever*). The wc_db code will not allow the use of a
database if there are outstanding work queue items.

> Alternatively, if we are going to delete the old one at the time when
> the new one "replaces" it, then we have to "unreference" the old one at
> that time, but that's OK as we will have the old SHA1 checksum available
> up until that time.

Sure.

But I think we're always going to have some form of garbage
collection. At a minimum, "svn cleanup" will perform pristine GC. A
"referenced" pristine has a checksum sitting in one of the columns of
the schema (there are about five). If we delete the old (assuming it
is unref'd), then that tends to imply the new one is not (yet)
referenced and is subject to a second process' GC logic.

>...

One thing around this commit process: the work queue item is
"backwards". We have a work item which does a bunch of stuff and
*inside* calls svn_wc__db_global_commit() on each item to perform the
database work. This is due mostly to converting an old loggy item.

The Proper way is to have code call db_global_commit() itself (rather
than queueing a work item which calls that function). When calling
commit, one or more work items should be passed as argument in order
to complete the on-disk operations. This would include (at a minimum)
installing a new, translated copy of the pristine into the working
copy. *Somewhere* in this process is also the installation of the
pristine, which involves both on-disk and in-db operations which need
to be coordinated.

global_commit() takes the new checksum. Maybe it can queue two work items:

1) perform the on-disk installation of the pristine
2) perform the translated-install into the working copy

Now... this does imply that the new pristine is not installed, but
residing at some temp location. It also means that the new PRISTINE
row will exist, a WORK_QUEUE row will exist, and the pristine file
will not be "in place" (but after completing that work item, it will
be).

I'm not sure how to best install a pristine *before* commit
finalization and ensure it won't be tossed. We have some checksums in
ACTUAL which are used to record sources for merge conflicts (with
corresponding instructions in ACTUAL_NODE.conflict). Those checksums
should be null if there is no conflict. I don't see how we can record
a "to-be-committed-checksum" because I don't know when we'd clear
that. What if the commit is interrupted, and the user changes the
text? When we do say "that commit checksum is now outdated"? Wiping
them all at the start of commit would definitely create dangling
pristines, and it wouldn't allow for simultaneous, disjoint commits
from within the same working copy. Maybe we could harvest
committables, store intended checksums into *just* those commit
targets, then run the commit logic.

Something like that. Thoughts?

Cheers,
-g
Received on 2010-03-09 01:46:11 CET

This message: [ Message body ]
Next message: Greg Stein: "Re: svn commit: r920602 - /subversion/trunk/subversion/libsvn_wc/entries.c"
Previous message: Greg Stein: "Re: [PATCH] Replace entries in revision"
In reply to: Julian Foad: "Re: Pristine store - using it"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]