Re: getting nodes by ID

From: Jim Blandy <jimb_at_zwingli.cygnus.com>
Date: 2001-03-13 05:46:22 CET

(Is there a better term than DAV-ID? I'm happy to use the real DAV
terminology, but I don't know what it is.)

Greg Stein <gstein@lyra.org> writes:
> > Would it be possible to change DAV to remove condition 2)?
>
> Well, it is reasonable assumption that the user will typically be fetching
> the latest. IOW, heuristics will typically make this fast.

I think we're miscommunicating. What I meant is:

At the moment, there's a requirement that a DAV-ID alone is sufficient
to efficiently retrieve a node: a request can just take a DAV-ID and
hand you back its contents, quickly.

What if we changed that rule so that the DAV-ID alone was sufficient
to check identity --- "Okay, this file you've asked for is the same as
one I've already got" --- but you need additional information to
actually get the file, if the identities don't match. Then we could
use node revision ID's as DAV-ID's --- if two node revision ID's are
identical, then you've got a cache hit --- and supply a (REV,PATH)
pair for retrieval. So you haven't lost condition 1.1 ("if their
contents are equal(*), their DAV-IDs are equal."), since you're only
comparing node revision ID's.

> > Also, it seems to me like the problems we're having in implementing
> > this aren't really unique to Subversion. Can you show us a system
> > where DAV-ID's are easy to implement, with all the desireable
> > properties?
>
> A table mapping UUIDs to (REV, PATH) pairs.

So, each time I commit a new node revision as part of some
transaction, I place a flag or property on that node revision giving
the REV and PATH in which it first appears. Easy enough. From then
on, DAV can use that (REV, PATH) as the DAV-ID, even when accessing
revisions in which that path no longer exists, or no longer refers to
that node revision. And unlike a node revision ID, the filesystem can
check authorization properly given a (REV,PATH) pair. So that meets
all our requirements, I think.

The problem here is that if someone sets an ACL on some old revision,
making it inaccessible, then suddenly all the nodes in new revisions
whose DAV-ID's happen to refer to the now-restricted revision become
inaccessible.

> If you have a versioning system which stores a new file for each change,
> then you'd just use the internal pathname to that file.

I think this has the same problems as using node revision ID's. Let
me work the analogy through and see if it applies:

There are two possibilities: either different revisions of a tree
share or don't share these per-change files. I'm not assuming
Subversion's global revision number model here. I just mean, select a
tree by date or by tag or by branch or whatever: do the same internal
pathnames appear in different trees?

They must, or else you've lost requirement 1.1, which is the whole
point of the game.

So if they do share them, and we support directory renaming and
deletion, then given a particular internal pathname, it must have more
than one "parent" path --- I mean a parent in the version controlled
tree, not a parent in the internal tree structure. And the more trees
in which it appears, the more parents it may have.

So when I present a DAV server with one of these DAV-ID's, how can it
determine whether I am actually authorized to access it? It doesn't
know which path of parents to check.

> If you had a database of files, with a file per row (nodes!), then
> you could use the node id. (we're throwing acls into the mix, which
> bungles up this approach, but I bet that is about the data modelling
> rather than a statement about the validity of the requirement)

Well, but that's exactly my question: I can't figure out how to
implement DAV-ID's without throwing away some of the ACL behavior
which we had thought was reasonable. If you think we should revise
that part of the ACL idea, so that one can access nodes even if one
doesn't have "execute" access (or whatever we call it) to its parents,
then that's a point to discuss. But I think folks will be surprised
to learn where that requirement comes from.

> > For example, what if I'm implementing DAV on CVS --- what
> > kind of DAV-ID would you use there? You could use (PATH, RCS-NUMBER),
> > but only because CVS doesn't really allow you to delete directories,
> > or support renaming.
>
> That is exactly what I'd use because it works very well. "but only" is moot.
> It's the CVS design, and we can take advantage of it.

What I meant was, "this works, but it doesn't help us, because we
don't have that simplification." My whole point is that DAV-ID's
don't seem to mix (as far as I can tell) with ACL's on parents that
restrict access to children (as, say, Unix execute permissions on
directories do), when a child can appear in multiple parents.

> > I guess I'm beginning to wonder if this isn't a
> > half-baked idea in general, not just a problematic case for
> > Subversion.
>
> It is emphatically NOT half-baked.

Relax --- I'm just trying to figure out how to implement what you
want. I honestly don't see the solution.

> It should not be hard to compute OLDEST-REV for any given node. Just record
> it in the node when it is created. The obvious problem is that we only have
> TXN when we're creating the node, not REV. This would imply the need for
> retaining a TXN -> REV mapping; to compute OLDEST-REV, you'd take the nodes'
> TXN and pass it thru the mapping to get a REV. The mappings could sit in the
> transactions table (possibly by storing a skel such as ("committed" REV)),
> or a fourth table could be used.

As I said above, I'd be happy to record this info in the node revision
when it's created. That's equivalent to your suggestion, I think.
The problem is that now the user needs to be authorized to traverse
both of two distinct paths to reach a node --- both the path in the
revision he's actually working in, and the path in OLDEST-REV. We
only want the former to matter.

> Let's take a different approach here. We don't need to answer the question
> of "what is the oldest revision this node appears in?", but we need to
> answer "what are the ACLs that apply to this node?" We posited this in (4),
> but kept trying to answer the former question.
>
> How about this: when a node is created, we record the ROOT-ID (of the txn)
> in the node. To determine the ACLs that apply to any given node (and PATH!),
> we bounce up to the node's ROOT-ID, then traverse the PATH for ACLs.
>
> Storing the ROOT-ID should allow us to use (ID, PATH) as the DAV-ID, which
> meets every condition that we've specified.

Yep. Same problem as (OLDEST-REV, PATH), though. (Equivalent, I think.)

I didn't really understand your Answer #2 and Answer #3. But you
raise a critical point: we don't really know how ACL's are going to
work --- frankly, I think it might be a challenge to get useful
semantics, so we may see some severe hair --- and since the exact
problem here is the interaction between authorization and DAV-ID's, we
need to understand how ACL's are going to work before we can tackle
this. Just recognizing that there's an interesting interaction there
is a good thing.

Pending the addition of ACL's to the system, I don't see any reason
not to use node revision ID's as DAV ID's.
Received on Sat Oct 21 14:36:25 2006

This message: [ Message body ]
Next message: B. W. Fitzpatrick: "Re: Check min and max num targets in client args patch."
Previous message: Alan Shutko: "Re: credits (was: Re: CVS update: ...)"
In reply to: Greg Stein: "Re: getting nodes by ID"
Next in thread: Greg Stein: "Re: getting nodes by ID"
Reply: Greg Stein: "Re: getting nodes by ID"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]