[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: repository GUIDs

From: Bill Tutt <rassilon_at_lyra.org>
Date: 2002-12-12 08:03:36 CET

> From: Karl Fogel [mailto:kfogel@newton.ch.collab.net]
> "Bill Tutt" <rassilon@lyra.org> writes:
> > The new table is very simple. All it needs at the moment are two
> > columns:
> > RespositoryID and GUID. RepositoryID is just another one of our fun
> > monotonically increasing ID fields, and the GUID column is the
> > repository GUID. The reason for structuring the data this way is
> > eventually we'll want to widen at least the NodeRevision primary key
> > RepositoryID. We don't want to widen the NodeRevision PK by the
> > repository GUID mainly because GUIDs cluster so poorly on indices.
> > need to waste valuable page &/or index space.
> What's the purpose of RepositoryID (as separate from GUID)?

GUIDs are usually cryptography strength random data. Such binary blobs
don't generally index effectively. The normal B+-tree index experiences
index fragmentation. Index fragmentation means that fewer nodes will
reside on a BDB index page and consequently each I/O will be less

Additionally, if we use an integer as opposed to a GUID in an expanded
NodeRevision primary key we can save substantial amounts of disk space.

If GUIDs were still generated using the Ethernet MAC address approach
you could work around this fragmentation slightly by reordering the bits
in the GUID to have the various time parts of the GUID in a simple
increasing order.


To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Dec 12 08:05:28 2002

This is an archived mail posted to the Subversion Dev mailing list.