> I did manage to
> write code that gave up a sort-of-random 64-bit number if you let it
> run for an entire minute. Obviously no good for something that has to
> happen on every checkin.
Oh yes. I forgot on my earlier note to mention that there is quite a good
random number generator associated with the ssh code, and if you only have
to generate one every time you create a project or a branch (as opposed to
every checkin) it's runtime is probably acceptable.
> There are some cases they don't handle: zero-length files,
> objects which are not files (directories, symlinks), and deltas that
> change only metadata. I'd be interested to know how you mean to
> handle these.
The DCMS repository doesn't contain directories or symlinks and (IMO) should
not. Symlinks are not portable and directories don't exist in the logic of
the repository, only in the logic of the workspace. Where information about
directory/permission ownership is needed, the repository contains a
*description* of the directory to be created -- it's name, permissions, etc.
Ditto for symlinks. This description does NOT contain any information about
the *contents* of the directory. It therefore does not change as files are
added and deleted to/from the directory, and the hash therefore works just
fine. That is, the description holds the metadata for the directory, not the
In DCMS, directory contents are implicit. There is a list of pairs of the
where the workspace name is a path (including directories) relative to the
root directory in the workspace. When an entity is checked out, the
dircetory path is implicitly created. When all entities in a directory are
removed, the directory is presumed gone unless there is a directory entity
entry for it, which is the exceptional case.
The meta-observation is that mostly the repository should not need to know
about the directories. The directories are merely a side effect of the
workspace layout, and storing them explicitly makes rename much harder to
> There may be two files in a tree with identical content but different
> histories. You need to distinguish "this version" from "this file".
DCMS has no versions of files. It only has versions of branches. A version
of a branch is a configuration plus a checkin message. The configuration is
a list of entities. If two entities in different branches chance to have
identical content they will indeed have identical names and you will store
them only once. This is a feature, not a bug.
It follows from this design that there exists no unique identifier that
names "the set of all versions of foo.c". I.e. file history evolution is a
derived consequence from the histories of the containing branchs.
Every entity instance knows it's predecessors, its content, and the name of
the project/branch/version in which it was first created.
SHA-1 has a perfectly well defined output for a zero-length file. I see no
reason why zero-length files are a special case. All zero-length files have
identical content. It hardly matters which copy of that content you get.
Likewise, all identical files have the same SHA-name, and it hardly matters
which one you get. I'm probably missing something obvious.
Received on Sat Oct 21 14:36:05 2006