Hello Jan!
On Dienstag, 13. April 2010, Jan Horák wrote:
> Dne 12.4.2010 8:02, Philipp Marek napsal(a):
> > Sorry for the delay; but reading the thread "Severe performance issues
> > with large directories" I just remembered that the backend has a little
> > bit of a problem with big directories - storage overhead.
> >
> > Do you see any way to split directories into a series of blocks (like
> > files are done), and when changing only a few of the files using pointers
> > to the unmodified blocks of the old directory?
> >
> > I don't propose a real delta design - that was too slow, IIRC.
> > Just re-use of directory blocks; that shouldn't bring any performance
> > issues.
> >
> >
> > Is there some way to do that? Perhaps multiple "." entries in a
> > directory, which just point to other parts?
> I'm not sure if we think the same issue, but I was thinking about a kind
> of hash table. Using
> a sophisticated table size it could bring good results, supposedly.
Sorry, I didn't make myself clear.
I didn't find the issue I'm talking about in the issue tracker; but the
problem is that the backends (FSFS, BDB) don't store directories deltified
(for performance reasons), and so modifying an entry in or below a big
directory has to re-write the whole directory - and that means several
megabytes, for big directories.
So I'd suggest to change the directory storage.
* Either use a new table, with fields like parent, name (or path),
valid-from-revision, valid-before-revision or something like that;
then changing an entry means only updating valid-before of the
old record, and inserting a new one.
* Or, if you want to store directories in the same way as file data (like
now in FSFS and BDB), I'd suggest to limit such blocks of directory data
to a few KB, but to define an indirect-block that tells which blocks are
used.
A new entry could then reference all the unchanged blocks of the older
revision.
I hope that this explains it a bit better.
Regards,
Phil
Received on 2010-04-14 08:07:38 CEST