C. Michael Pilato wrote:
> cmpilato_at_apache.org wrote:
>> Author: cmpilato
>> Date: Tue Jun 22 15:40:00 2010
>> New Revision: 956921
>>
>> URL: http://svn.apache.org/viewvc?rev=956921&view=rev
>> Log:
>> Correct the FSFS structure documentation for lock storage, which
>> doesn't appear to match the implementation.
>>
>> If a repository has a locked file /A/D/G/rho, there will be a
>> serialized hash file for that path (as an MD5 digest,
>> ".../db/2d9/2d9ce8aaac06331d75dae9dad43473bd", in this example), and
>> that digest file will be directly referenced from the digest files for
>> /A/D/G, /A/D, /A and /. The documentation implies that the digest
>> file for /A/D/G/rho will only be referenced by a digest file for
>> /A/D/G (which is then referenced by the digest file for /A/D, which
>> itself is referenced by the digest for /A, etc.)
>
> By the way, I think this was an accident in the implementation. A reading
> of the code leads you to believe that the original intent was to essentially
> mirror the FS path structure. I think a single mistake (the failure to
> update a stringbuf_t with a new value on every iteration) resulted in the
> behavior we have today.
>
> I can't decide if this is a happy accident or a bug we should address. It
> actually seems to make some of the common queries much faster than they
> would otherwise be, but at the potential cost of disk usage and memory
> consumption. I mean, in a ginormous repository with 10,000 locked files,
> there's a serialized hash file (or maybe right many of them) with thousands
> of entries in it. Makes finding those thousands of entries really fast,
> after you've parsed the file and loaded that thousands-of-entries-having
> hash into memory.
Ooh... ouch. Yes, this is a bug. The lock setting code goes about things
one way, and the lock deleting code expects another. I think it turns out
okay because the lock system gracefully ignores missing lock files, but I
further predict that it means that the serialized hash file for, say, the
root of the repository, will always contain pointers to serialized hash
files (which may or may not actually exist) for each and every file ever
locked inside that repository. For a repository of a million files, that
could eventually mean a million records -- perhaps all of which are dead --
in that file. Forever.
I'll dig around some to try to confirm this, and get an issue filed.
--
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet <> www.collab.net <> Distributed Development On Demand
Received on 2010-06-22 18:40:38 CEST