On Thu, Jun 15, 2000 at 08:29:27AM -0400, Jonathan S. Shapiro wrote:
> > I think some planning is needed
> > now, e.g. to account for a distributed file id/version numbering
> > system. The hierarchical model requires either a) a way of doing
> > distributed unique version numbers, or b) associating multiple
> > version numbers with each file (e.g. the local version # vs. the
> > upstream version #).
> Yes it needs planning. No the problem isn't version numbering. It's unique
> entity names.
> I emphasize this because by focusing on numbering we are led astray from
> some other solutions, among them cryptographic hashes. The sequence numbers
> are purely a human convenience.
When I first saw you suggest cryptographic hashes I didn't like the
idea. Having thought about it some more, I now think it's an
excellent approach. Bitkeeper tried to solve the same problem using
"keys" that were the concatenation of a bunch of stuff - we kept
finding clash conditions and having to add more data to the keys.
When I left keys were ~80 bytes per delta, still growing, and still
not provably unique.
Anything involving random numbers is not a portable choice, because
very few OS's give you a true random number generator - and dinking
with the current time and the PID is not good enough. I did manage to
write code that gave up a sort-of-random 64-bit number if you let it
run for an entire minute. Obviously no good for something that has to
happen on every checkin.
Crypto hashes pretty well squelch the problem, and aren't too
expensive to compute. If you do 'em right, they'll double as
data-integrity checksums for your storage format and/or wire transfer
protocol. There are some cases they don't handle: zero-length files,
objects which are not files (directories, symlinks), and deltas that
change only metadata. I'd be interested to know how you mean to
There may be two files in a tree with identical content but different
histories. You need to distinguish "this version" from "this file".
The way BK did that was by having a virtual delta 0, which never
contained any text. Its key was the permanent unique identifier for
the file. (This of course forces you to confront the problem of
unique IDs for zero length files head-on.)
> > If the svn client finds both deleted files and added
> > files, it could try to match them up to see if they're actually just
> > renamings.
> Deletes and adds are fairly easy to spot, and the tool definitely needs to
> be able to find them. Whether it should do so on commit is a question of how
> much handholding you want. I don't have an opinion.
> Locating an fs rename combined with some other change starts to get pretty
> damned hairy, though.
We used to punt to the user at that point; we had a Tk gui thing that
displayed all the deletes and additions and asked the user to inform
it which were really renames. It would show you diffs and could pop
up the graphical file-history viewer, too.
That was more for ease in importing a series of tarballs than a normal
operation mode, though.
Received on Sat Oct 21 14:36:05 2006