Re: Making fsfs generate unique transaction names

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: 2007-06-18 20:29:07 CEST

Blair Zajac wrote:
>> We have no compelling reason to change BDB at all, so I'm -1 on doing so.
>>
>> As transaction names wind up being stored as part of the node-revision-id
>> triplet, I wouldn't want to start seeing nodes like
>> '4uv.3e.22006a41-a1fb-0310-bd7e-c51a55d5d677'. That BDB transaction IDs are
>> sequential base-36 numbers has, on more than one occasion, assisted me in
>> debugging a BDB-backed repository.
>
> How is having a short name any different than a long name in debugging?

The key here isn't so much the size of the names, but the fact that with a
simple UUID you lose sequence (which is really helpful). And if you have
sequence, you don't need anything nearly as obnoxious as a UUID.

> I've seen the long transaction names in the transaction fsfs files, but
> you're saying they also end up in revisions?

Now, remember, I'm only talking about BDB here. In BDB, node-revision-IDs
never change -- they begin life as NODE-ID.COPY-ID.TXN-ID (where TXN-ID is
the name of the transaction), and they stay that way even after the
transaction is promoted to revision-hood. I don't know if FSFS repositories
have that same property or not.

> If so, would you rather see uuids or the new transaction names I put in
> 1.5 "<hostname>-<pid>-<time>-<uniquifier>". Personally, I like seeing a
> hostname and pid in the transaction name for my project, it'll make it
> easier to debug my new RPC framework.

With all due respect to your new RPC network, do we really want to replicate
redundant data like hostnames across millions of database keys simply
because it makes your error messages prettier? I could see is we had a SQL
backend with shared storage across multiple hosts or something, but...

I think it's fine to target a sane level of re-use avoidance -- dump and
load don't happen often, and I don't really see the issue there anyway,
since not only is the transaction name reused, but all the transaction's
data has been dropped, too. What would be scary is if the txn name could be
recycled post-dump-and-load but there still be a transaction in the system
with that name.

As for what I'd like to see in 1.5, just teach FSFS to not re-use
transaction names and be done with it. My vote is to just keep using
monotonically increasing base-36 numbers the same way the BDB backend does,
but if the transaction names don't persist once the transaction is promoted
to a revision, I can't make myself care too much. But I question efforts to
encode unnecessary system information into long-lived, persistent revision
data such as would be the case in BDB, where node-revision-ids appear in at
least three different places already: used as keys to the 'nodes' table,
stored as predecessor-ids in the node-revision skels, stored as data in the
'changes' table.

-- 
C. Michael Pilato <cmpilato@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

application/pgp-signature attachment: OpenPGP digital signature

Received on Mon Jun 18 20:29:08 2007

This message: [ Message body ]
Next message: C. Michael Pilato: "Re: [PATCH] Build both serf and dav"
Previous message: David Glasser: "Re: [PATCH] Build both serf and dav"
In reply to: Blair Zajac: "Re: Making fsfs generate unique transaction names"
Next in thread: Malcolm Rowe: "Re: Making fsfs generate unique transaction names"
Reply: Malcolm Rowe: "Re: Making fsfs generate unique transaction names"
Reply: Blair Zajac: "Re: Making fsfs generate unique transaction names"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]