cmpilato_at_apache.org wrote:
> Author: cmpilato
> Date: Tue Jun 22 15:40:00 2010
> New Revision: 956921
>
> URL: http://svn.apache.org/viewvc?rev=956921&view=rev
> Log:
> Correct the FSFS structure documentation for lock storage, which
> doesn't appear to match the implementation.
>
> If a repository has a locked file /A/D/G/rho, there will be a
> serialized hash file for that path (as an MD5 digest,
> ".../db/2d9/2d9ce8aaac06331d75dae9dad43473bd", in this example), and
> that digest file will be directly referenced from the digest files for
> /A/D/G, /A/D, /A and /. The documentation implies that the digest
> file for /A/D/G/rho will only be referenced by a digest file for
> /A/D/G (which is then referenced by the digest file for /A/D, which
> itself is referenced by the digest for /A, etc.)
By the way, I think this was an accident in the implementation. A reading
of the code leads you to believe that the original intent was to essentially
mirror the FS path structure. I think a single mistake (the failure to
update a stringbuf_t with a new value on every iteration) resulted in the
behavior we have today.
I can't decide if this is a happy accident or a bug we should address. It
actually seems to make some of the common queries much faster than they
would otherwise be, but at the potential cost of disk usage and memory
consumption. I mean, in a ginormous repository with 10,000 locked files,
there's a serialized hash file (or maybe right many of them) with thousands
of entries in it. Makes finding those thousands of entries really fast,
after you've parsed the file and loaded that thousands-of-entries-having
hash into memory.
--
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet <> www.collab.net <> Distributed Development On Demand
Received on 2010-06-22 18:04:02 CEST