Jim Blandy <firstname.lastname@example.org> writes:
> DAV wants to be able to retrieve nodes by DAV-ID. This DAV-ID doesn't
> have to be a Subversion FS node revision ID --- it can be any
> arbitrary string of bytes such that:
> 1) if two nodes' DAV-ID's are equal, their contents are equal, and
> 2) it's reasonably fast to retrieve a node given a DAV-ID.
Requirement (1) is not only useful for DAV, it is useful for users as
well -- it's essentially the per-file "change number" we've talked a
bit about in the past.
In CVS, the revision numbers naturally have this property, and it's
really handy. Two people can quickly establish that they have the
same file contents just by comparing revision numbers (assuming no
SVN revision numbers are different, of course. I could have revision
12 of foo.c, you could have revision 363, and yet the file might be
the same in those two revisions. If each file tracked which commits
affected it, or stored a number indicating how many times it had been
committed, then people would have this information as conveniently in
Subversion as in CVS.
I'm not sure yet how this affects Jim's points below, I just thought
it worth mentioning early.
> The idea, I think, is that when you ask for the contents of a
> directory, you'd get back a list of (name, DAV-ID) pairs, and then you
> could retrieve the contents you were interested in by DAV-ID. A
> caching proxy would use DAV-ID's to recognize text it had seen before.
> So we should add the following property:
> 3) As often as possible, if a node's contents have not changed, its
> DAV-ID shouldn't change.
> The first idea was that we'd simply use filesystem node revision ID's
> as DAV-ID's. That certainly meets the requirements above. But that
> won't work, because the filesystem can't tell whether you are
> authorized to access a file without knowing the path through which you
> reached it --- if you don't have permission to read some parent
> directory, you shouldn't be able to read its contents, no matter what
> the permissions are on them. So we need to add:
> 4) Given a DAV-ID, there must be a quick way to determine whether the user
> is authorized to access the node.
> The next suggestion was to use a pair (PATH, NODE-REV-ID). But PATH
> is useless without a revision or transaction whose root we can start
> from. The database isn't indexed to quickly answer questions like "in
> what revisions does PATH refer to NODE-REV_ID?" (I think such an
> index would completely duplicate the directory structure, unless
> someone thinks of something clever.)
> Another possibility would be to use a pair (REV, PATH), where REV is
> the oldest revision in which PATH refers to the node we want. This
> satisfies all four properties, but there's no quick way to choose the
> right REV, without trying them all. Since the "oldest" part is only
> necessary to satisfy condition 3), we can relax it a bit to "the
> oldest revision we can find". But this is still a mess. We're doing
> an unbounded number of path traversals in the database. So we need to
> 5) It should be quick to compute a suitable DAV-ID for each node as we
> traverse the filesystem.
> Would it be possible to change DAV to remove condition 2)? That is,
> can we use DAV-ID's to notice cache hits, but supply extra information
> to get the real thing when we miss? That way, we could use node
> revision ID's for DAV-ID's, but also supply a (ROOT, PATH) pair that
> could be looked up easily.
> Also, it seems to me like the problems we're having in implementing
> this aren't really unique to Subversion. Can you show us a system
> where DAV-ID's are easy to implement, with all the desireable
> properties? For example, what if I'm implementing DAV on CVS --- what
> kind of DAV-ID would you use there? You could use (PATH, RCS-NUMBER),
> but only because CVS doesn't really allow you to delete directories,
> or support renaming. I guess I'm beginning to wonder if this isn't a
> half-baked idea in general, not just a problematic case for
Received on Sat Oct 21 14:36:25 2006