[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Proposal: Improved locking implementation for fsfs

From: Brian W. Fitzpatrick <fitz_at_collab.net>
Date: 2004-12-20 23:36:55 CET

The current tree-based lock-storage mechanism in fsfs doesn't account
for variations in underlying filesystems's filename encoding support.

Peter Lundblad and I have been discussing several different ways of
reimplementing the fsfs lock storage mechanism, and below is a
proposed implementation (and a variant of it) to address these
shortcomings, based on email, irc, and face-to-face conversations with
lundblad, mbk, ghudson, sussman, and kfogel. I'm hoping that we can
come to consensus on one of these implementations, and I'll start
writing it after the start of the New Year (unless, of course, someone
beats me to it :).

This proposal sticks with the current mechanism of using the
filesystem as a tree to store the locks themselves, and the
lock-tokens (which merely contain a pointer to the lock) are all
stored in one directory, keyed on their token/uuid.

The Proposal: All The World's A Hash.

Summary: Hash each path component of a lock path and store locks in
the filesystem under the hashed name.

Details:

The path to a lock is broken into its components, and each path
component is hashed using the first 16 characters from the MD5 hash of
the component name. We'll only use the first 16 characters to avoid
issues with systems whose PATH_MAX is only 256 bytes (16 char
component names + 1 char path separators gives us a maximum of 15 path
components on such systems. For example:

foo/bar/baz.c becomes

foo -> d3b07384d113edec
bar -> c157a79031e1c40f
baz.c -> 72f8c9bac968893a

Lock storage path:

repos/db/locks/d3b07384d113edec/c157a79031e1c40f/72f8c9bac968893a

Lockfiles can contain multiple lock representations, so collisions
will not be an issue (think of the end file as a hash bucket).

Variation A: The Hash/ASCII Experiment

Summary: Force each path component to some 127bit ASCII representation
and store locks the filesystem under a lowercase "ASCII-ized" name.
Use a hash for names that can't be efficiently "ASCII-ized".

Details:

The path to a lock is broken into its components, and each path
component is translated into its lowercase ASCII equivalent (Using the
4 character hex code) for non-ASCII characters. If the resulting
"ASCII-ized" string is both a)longer than 16 characters and b) greater
than 1.5x times the length of the original string, we fall back onto
the 16 byte MD5 hash as described in the main proposal.

Example 1:
  foo/BAR/baz.c becomes

  foo -> foo
  BAR -> bar
  baz.c -> baz.c

  Lock storage path:

  repos/db/locks/foo/bar/baz.c

Example 2:
  foo/bör/baz.c becomes

  foo -> foo
  bör -> bar
  baz.c -> baz.c

  Lock storage path:

  repos/db/locks/foo/b00F6r/baz.c

Example 3:
  /foo/bar/böööö.c becomes

  foo -> foo
  bar -> bar
  böööö.c -> b00f600f600f600f6.c, which is longer than 16 chars and
                                  1.5x the length of the original path
                                   component, so the hash is used instead:
             edcb45bd435bfc99

  Lock storage path:

  repos/db/locks/foo/bar/edcb45bd435bfc99

Thotz? Comments?

-Fitz

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Dec 21 00:15:50 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.