2010/4/14 Philipp Marek <philipp.marek_at_emerion.com>:
> Hello Jan!
>
> On Dienstag, 13. April 2010, Jan Horįk wrote:
>> Dne 12.4.2010 8:02, Philipp Marek napsal(a):
>> > Sorry for the delay; but reading the thread "Severe performance issues
>> > with large directories" I just remembered that the backend has a little
>> > bit of a problem with big directories - storage overhead.
>> >
>> > Do you see any way to split directories into a series of blocks (like
>> > files are done), and when changing only a few of the files using pointers
>> > to the unmodified blocks of the old directory?
>> >
>> > I don't propose a real delta design - that was too slow, IIRC.
>> > Just re-use of directory blocks; that shouldn't bring any performance
>> > issues.
>> >
>> >
>> > Is there some way to do that? Perhaps multiple "." entries in a
>> > directory, which just point to other parts?
>> I'm not sure if we think the same issue, but I was thinking about a kind
>> of hash table. Using
>> a sophisticated table size it could bring good results, supposedly.
> Sorry, I didn't make myself clear.
>
> I didn't find the issue I'm talking about in the issue tracker; but the
> problem is that the backends (FSFS, BDB) don't store directories deltified
> (for performance reasons), and so modifying an entry in or below a big
> directory has to re-write the whole directory - and that means several
> megabytes, for big directories.
>
>
> So I'd suggest to change the directory storage.
> * Either use a new table, with fields like parent, name (or path),
> valid-from-revision, valid-before-revision or something like that;
> then changing an entry means only updating valid-before of the
> old record, and inserting a new one.
> * Or, if you want to store directories in the same way as file data (like
> now in FSFS and BDB), I'd suggest to limit such blocks of directory data
> to a few KB, but to define an indirect-block that tells which blocks are
> used.
> A new entry could then reference all the unchanged blocks of the older
> revision.
Or (3): go ahead and store megabytes for each directory, just like the
other backends. And leave the solution of this problem to a future
iteration of the SQL-based backend.
Really... optimizing before you even get started is not advisable. Get
something done. THEN examine and iterate. There could be numerous
other problems inherent in a SQL backend that would obviate any such
"solution" proposed today.
Also, the "SQL backend" concept has been started several times before,
and abandoned. I don't want to see it get abandoned AGAIN because the
initial "solutions" make it overly complicated before it can even
begin.
Cheers,
-g
Received on 2010-04-14 18:08:47 CEST