[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Making fsfs generate unique transaction names

From: Blair Zajac <blair_at_orcaware.com>
Date: 2007-06-18 21:31:50 CEST

C. Michael Pilato wrote:
> Blair Zajac wrote:
>>> We have no compelling reason to change BDB at all, so I'm -1 on doing so.
>>>
>>> As transaction names wind up being stored as part of the node-revision-id
>>> triplet, I wouldn't want to start seeing nodes like
>>> '4uv.3e.22006a41-a1fb-0310-bd7e-c51a55d5d677'. That BDB transaction IDs are
>>> sequential base-36 numbers has, on more than one occasion, assisted me in
>>> debugging a BDB-backed repository.
>> How is having a short name any different than a long name in debugging?
>
> The key here isn't so much the size of the names, but the fact that with a
> simple UUID you lose sequence (which is really helpful). And if you have
> sequence, you don't need anything nearly as obnoxious as a UUID.
>
>> I've seen the long transaction names in the transaction fsfs files, but
>> you're saying they also end up in revisions?
>
> Now, remember, I'm only talking about BDB here. In BDB, node-revision-IDs
> never change -- they begin life as NODE-ID.COPY-ID.TXN-ID (where TXN-ID is
> the name of the transaction), and they stay that way even after the
> transaction is promoted to revision-hood. I don't know if FSFS repositories
> have that same property or not.

Looks like the transaction name does end up in the final revision file:

id: 0.0.r3/765
type: dir
pred: 0.0.r2/739
count: 3
text: 3 670 82 82 1ae6cb7ff646ca32938af3d180baccae
cpath: /
copyroot: 0 /

_0.0.tmy.hostname.com-26378-1182190468960383-1 add false true /2

765 891

>> If so, would you rather see uuids or the new transaction names I put in
>> 1.5 "<hostname>-<pid>-<time>-<uniquifier>". Personally, I like seeing a
>> hostname and pid in the transaction name for my project, it'll make it
>> easier to debug my new RPC framework.
>
> With all due respect to your new RPC network, do we really want to replicate
> redundant data like hostnames across millions of database keys simply
> because it makes your error messages prettier? I could see is we had a SQL
> backend with shared storage across multiple hosts or something, but...

That's why I asked if transaction names end up in the final revision. I'm not
concerned about that info ending up in the revision.

> I think it's fine to target a sane level of re-use avoidance -- dump and
> load don't happen often, and I don't really see the issue there anyway,
> since not only is the transaction name reused, but all the transaction's
> data has been dropped, too. What would be scary is if the txn name could be
> recycled post-dump-and-load but there still be a transaction in the system
> with that name.

Agreed.

> As for what I'd like to see in 1.5, just teach FSFS to not re-use
> transaction names and be done with it. My vote is to just keep using
> monotonically increasing base-36 numbers the same way the BDB backend does,
> but if the transaction names don't persist once the transaction is promoted
> to a revision, I can't make myself care too much. But I question efforts to
> encode unnecessary system information into long-lived, persistent revision
> data such as would be the case in BDB, where node-revision-ids appear in at
> least three different places already: used as keys to the 'nodes' table,
> stored as predecessor-ids in the node-revision skels, stored as data in the
> 'changes' table.

Well, there was pushback on using the new sqlite backend as a place to store the
base-36 number, similar to how BDB does it, hence the current transaction name.
  Otherwise, where do you suggest to keep the counter for transaction names?

So the choices are:

1) uuid
    - non-sequential
    - long length
2) use the currently coded hostname-pid-time
    + sequential
    - long length
3) use base-36 digits and some way of atomically incrementing it
    + sequential
    + short
    - need to write code to atomically increment the counter
      a) use sqlite to store the current value
      b) use yet another file that is atomically updated and write the code that
         does this

I prefer 2) to 1), and would prefer 3a) to all the rest, if people don't mind to
  put this in sqlite.

Blair

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Jun 18 21:31:59 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.