Re: wc-to-wc copies and wc-ng

From: Greg Stein <gstein_at_gmail.com>
Date: Thu, 3 Jun 2010 16:14:49 -0400

In more detail this time...

On Tue, May 25, 2010 at 13:11, Philip Martin <philip.martin_at_wandisco.com> wrote:
> The wc-ng approach is a database transaction that modifies the
> database and adds one or more workqueue items to modify the working
> files and directories. At least initially, there is likely to be one
> transaction per versioned node in the source. I suppose when we

Yup.

> centralise it might be possible to copy the whole tree in a single
> transaction, but I'm not sure what the workqueue would look like.

Sure. I think you'd simply have (say) 1000 work items in there.
Shouldn't be a problem at all.

> I'm not sure how/when unversioned items get handled. Do we create
> workqueue items for them when adding the parent directory? Does each
> one get its own transaction/workqueue? Do we copy them without
> transactions?

I would suggest each unversioned item gets some kind of "copy file" or
"make subdir" work item (we don't have these, so you can construct
some simple work items; see OP_FILE_REMOVE for similar).

In the future, these would all go into the single wq at the wcroot.
For now, you could put these into the wq at the destination root. I
don't see any problems running file operations in subdirs of a
versioned dir. Of course, you need to call wq_run() at some point, but
that's going to be the case for all workqueues that you add items
into.

Naturally, if you're doing a per-node copy today, any work items
associated with that will go into the SDB associated with the sqlite
transaction. Strictly speaking, you could also modify a second SDB
(think "parent stub"). We don't have good transaction semantics across
the two SDBs, but it isn't a big deal since it is transitional. The
work items can go into either SDB, as long as wq_run() is called
against a path that implies the appropriate SDB.

Note that you can "transact" all these unversioned copies by including
them into the work items associated with versioned nodes. For example,
given node DIR, the work items could include "mkdir DIR. cp SRC/alpha
DIR/alpha. cp SRC/beta DIR/beta." where {alpha, beta} are two
unversioned nodes. Thus, the unversioned items become part of your
overall transaction (the copy is not "complete" until DIR completes
and all the work items, including unversioned copies, are completed).

> One of the problems before centralisation is that a copied directory
> cannot be fully added to the database until the directory has been
> created in the filesystem. We can add the directory to the database
> in the parent and we can put workqueue items there as well, but a
> directory also needs to be added to its own database.
>
> I suppose we could create the new directory before executing the
> transaction, but that seems to be completely at odds with the
> transaction/workqueue approach. It's probably better to have the
> transaction just add the directory in the parent database and have
> workqueue items that create the new directory that modify the new
> database.

As a transitional approach, sure. Once we hit single-db, then I
imagine this logic will need to be reviewed regardless.

I might suggest using a sqlite txn to insert the parent stub and a
work item. When the work item executes, it does a "mkdir" for the
subdir, creates the .svn area, and populates the wc.db with the subdir
metadata. In the future, this work item devolves to a simple "mkdir",
and the txn inserts full data rather than a stub.

> Is the current svn_wc__db_op_copy_file iterface the correct one for
> wc-to-wc copies? It's sensible for a repo-to-wc copy where all the
> source information has to be inserted into the database, but for a
> wc-to-wc copy that information is in the database already. Perhaps we
> should just pass the source path and copy all the information with a
> database query?

Answered before. And I believe that I saw a commit from julianf
allowing non-copies to use op_copy_file(). I think we should change
the callers to use op_add_file() instead.

> Perhaps we should continue to do a dumb copy as the first step even
> with wc-ng? That would create a new completely unversioned tree and
> then one or more tranactions could modify the database(s). At the
> moment there is no way to restart an interrupted copy, so although the
> transaction/workqueue approach means that the nodes will show up as
> incomplete it's not clear what the user could do.

I'd prefer to see a "proper" set of transactions and workqueue items.

I believe the "ideal" solution that we should eventually reach is a
single sqlite transaction that copies 1000 nodes and has 1000 work
queue items to finalize the filesystem portion. Then the copy succeeds
or fails atomically(*) since if the sqlite txn succeeds, then all the
work items must also succeed for the working copy to be usable again.

(*) and if the work items cannot succeed, then we have problems. I
could imagine the process is killed during the wq run, some source
files are torched, and then the destination is cleaned up (ie. wq is
run again). Without the source files, then we'd have problem
completing the work items. Thus, an argument exists for stashing away
copies of all (modified or unversioned) sources into temp files.
Non-modified are just pulled from the pristine store.

Cheers,
-g
Received on 2010-06-03 22:15:27 CEST

This message: [ Message body ]
Next message: Dan Villiom Podlaski Christiansen: "Re: File descriptor leak of rep-cache.db in 1.6.x"
Previous message: Dan Villiom Podlaski Christiansen: "Re: File descriptor leak of rep-cache.db in 1.6.x"
Next in thread: Philip Martin: "Re: wc-to-wc copies and wc-ng"
Reply: Philip Martin: "Re: wc-to-wc copies and wc-ng"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]