[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Maintaining NodeID sanity

From: Kirby C. Bohling <kbohling_at_birddog.com>
Date: 2002-05-12 09:57:09 CEST

Karl Fogel wrote:
> Greg Stein <gstein@lyra.org> writes:

<snip>

>>Work out the math. If you manage to *sustain* one per second, you've got 136
>>years worth (unsigned 32-bit number). You'll definitely burst at times, but
>>a sustained rate of one per second, year after year, is absolutely amazing.
>>Running out is simply not going to happen. If there is any possible scenario
>>where we believe that 4 billion transactions will actually occur, then hell:
>>make it an apr_uint64_t.
>>
>
> This is not "amazing" at all. Not only humans use repositories; other
> programs will start using them too. And Moore's Law still held last
> time I checked. Why not make *sure* the problem is solved, once and
> for all? It's trivial for us to do -- I know, because I did it for
> txns already and it was no effort -- it doesn't affect performance in
> any meaningful way, and (for me at least, don't know about others)
> improves maintainability by not implying ordering when there is no
> ordering going on, and by making IDs a bit more readable.
>
> When the designers of IPv6 considered network address allocation
> (which was already supposed to be more than enough at 32 bits, years
> ago when IPv4 was designed), they jumped all the way to 128 bits,
> wisely skipping 64. They realized that the *rate* of consumption can
> increase unexpectedly -- since it already happened once, as everyone
> and their toaster started wanting an IP address. (Nevertheless,
> within our lifetimes I expect to see debates about how to handle the
> impending exhaustion of 128 bits.)
>

        128 bit is only limiting if there is only one of them in the world.
Subversion should have multiple repositories in the world, so I think
that IPv6 analogy is flawed.

        Personally, I'd make the convention be a completely opaque type that is
based on the backend used for storing it. Define the operations it must
do explicitly, and carry the opaque type around with you everywhere you
go . Make the operations based off of a vtable they point to internally
(like is done with the fs/editor/wc and everything else in subversion).
  For BDB 4.0.14 a 48 bit number ( 256TB --> 2^48 addressable bytes)
will do the job (if the average thing takes 64K on disk then a 32bit
type would do it fine). It is completely unneccessary to have anything
larger for BDB 4.0.14. If BDB 5.0 supports a bigger size, write a new
opaque type which should take a couple hours of work which should be a
drop in the bucket in comparision to all the other stuff that has to go
on to make that work.

        Make the ability to convert to/from a known format like base36 or base10
const char* a requirement for the ability to convert from one backend to
another, and for doing a dump. The only serious problem then is doing
conversions from one repository to another might be slow because of the
required conversion.

        It should however be an uncommon operation as repositories that are
related will likely use the same backend. You can also optimize the
problem significantly for the common case by using a lookup table for
hand tuned conversions from a->b, and hard code a->a to be a nop. Using
a type that is extremely computer efficient would probably make up for
the function call to realize it is a nop. That only becomes an issue
when you have cross repository functionality. I didn't think that
exists just yet, could be wrong.

        Thus allowing the backend to efficiently store the number in the most
native and unlimited format allowable for that storage mechanism. Every
backend you'll ever find will be written by somebody who thinks like
Greg Stein. Well, every back end worth using that is *grin*.
        
        Using a type that has no limitations on it leads to problems being
unbounded. That really bothers Database guys because having unbounded
size for the keys means no performance guaruntees, which leads to low
sales volumes which means not eating. It is just a fact that all
databases are limited in size because of all the speed it buys them, and
makes them sell like hot cakes.

        I can make an argument that a 64bit number is just enough. 64 bit is so
incredibly huge that you can check in the PC counter after every
instruction on my machine for ~250 years and not run out of revisions.
So when you can check in a revision in the current CPU cycle time you
have about 250 years until your in serious trouble. That and the
largest database I know of (Oracle8i) only supports 2^80 rows in the
database which is pretty stinking large and Oracle considers solving
problems for the largest companies and governments in the world. They
probably have a good handle on how large is necessary. If you support
128bit now, you'll be ahead of the single largest database vendor on the
planet. Okay IBM might be larger provider, and DB2 might have a larger
key. Either way, 64bit is enough, and if it is truely bothersome to
have it be limited, you can make it so it is specific to the storage
mechanism which would obviously be good enough.

                Kirby

        

> +1 on base36 with `const char *'; -1 on integers whatever the
> marshalling.
>
> I believe we've both presented our complete arguments for and against.
> If you have something new and constructive to add, please do! FUD
> like "way beyond insane" and "completely non-standard in all respects"
> doesn't count :-).
>
> -K
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun May 12 10:02:31 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.