> ... I will say that you seem to be using a stricter
> definition of 'text' than me. When I say 'text' I mean simply 7-bit
> no control characters. In particular, it would be perfectly fine to
> on Unix-style line endings...
This relates to somebody else's comments on possibly using MIME types, so
let me expand on my view of this in case it's helpful down the road.
I see a CM system as having a number of layers. At the bottom, it manages a
bunch of objects purely in terms of bits. It's job is to get a consistent
set of bit-bundles to the next layer up. At this layer, I feel there should
be no interpretation of content, and consequently no typing. This is why I
want a (conceptually) binary-only repository. My one exception to the "no
interpretation" is that the repository at this layer is free to use
compressed representations as a matter of internal storage convenience. Such
compression should not (in principle) be externally visible -- not least
because making it visible screws up delta handling.
At the next conceptual layer up there is some input and output filtering
that needs to happen. This handles things like newline canonicalization and
variable expansion. To my mind, the execution of such filters is the job of
the user agent. The repository server shouldn't have anything to do with it.
At the level of MIME types, we run into a couple of problems. First, there
isn't a good way to reliably extract MIME types from content or fsName of an
object. We can use heuristics, but ultimately this is the kind of thing that
wants to be stored as a per-object attribute somewhere. I think the idea of
being able to record MIME types and use them to drive things like
visualizers is cool, but I think it's none of the repository's business.
This attribution is purely a convention between the user agents. The
repository's role is merely to store the attribute(s) along with the object.
After a while, I concluded that the "type" of an object at the repository
level consists of the set of trigger behaviors that the object exhibits in
response to configuration actions (e.g. checkout, commit). This is different
from the type of the content, though the two will in practice be closely
related. The previously mentioned filters are an example of such triggers.
But consider this (real) example: I have one source tree that contains an
autorun.inf file for a CD. This file is (to all appearances) a text file,
but it must be preserved in MS newline format, even when used on a UNIX
system. This doesn't change it's MIME type at all.
So I basically think that the notion of "type" is quite complicated. One
goal for DCMS is to have a platform-independent scripting language that lets
triggers be written which behave the same on all platforms.
Sorry if this seems slightly disjointed -- my attention is divided at the
> I'm glad to hear you agree with me that strong checksums are necessary.
> settle for CRC or adler32, but SHA sure would be nice.
I'm not using checksums. I'm using cryptographic hashes. The difference is
significant. CRCs and adler32's provide a good means of integrity checking,
but they collide too frequently to serve as *names*. In the DCMS repository,
the "repository name" of an object *is* its SHA-1 hash. One effect of this
is to greatly simplify transaction and replication handling.
> Incidentally, where can I find more information about your DCMS?
There is a mailing list: email@example.com. At the moment there is no
design document, though I am in the process of drafting one. Version 0.1 is
still in progress.
Received on Sat Oct 21 14:36:05 2006