Re: Changes in SQL backend design

From: Philipp Marek <philipp.marek_at_emerion.com>
Date: Wed, 14 Apr 2010 08:06:55 +0200

Hello Jan!

On Dienstag, 13. April 2010, Jan Horák wrote:
> Dne 12.4.2010 8:02, Philipp Marek napsal(a):
> > Sorry for the delay; but reading the thread "Severe performance issues
> > with large directories" I just remembered that the backend has a little
> > bit of a problem with big directories - storage overhead.
> >
> > Do you see any way to split directories into a series of blocks (like
> > files are done), and when changing only a few of the files using pointers
> > to the unmodified blocks of the old directory?
> >
> > I don't propose a real delta design - that was too slow, IIRC.
> > Just re-use of directory blocks; that shouldn't bring any performance
> > issues.
> >
> >
> > Is there some way to do that? Perhaps multiple "." entries in a
> > directory, which just point to other parts?
> I'm not sure if we think the same issue, but I was thinking about a kind
> of hash table. Using
> a sophisticated table size it could bring good results, supposedly.
Sorry, I didn't make myself clear.

I didn't find the issue I'm talking about in the issue tracker; but the
problem is that the backends (FSFS, BDB) don't store directories deltified
(for performance reasons), and so modifying an entry in or below a big
directory has to re-write the whole directory - and that means several
megabytes, for big directories.

So I'd suggest to change the directory storage.
* Either use a new table, with fields like parent, name (or path),
  valid-from-revision, valid-before-revision or something like that;
  then changing an entry means only updating valid-before of the
  old record, and inserting a new one.
* Or, if you want to store directories in the same way as file data (like
  now in FSFS and BDB), I'd suggest to limit such blocks of directory data
  to a few KB, but to define an indirect-block that tells which blocks are
  used.
  A new entry could then reference all the unchanged blocks of the older
  revision.

I hope that this explains it a bit better.

Regards,

Phil
Received on 2010-04-14 08:07:38 CEST

This message: [ Message body ]
Next message: Philip Martin: "Re: [Issue 3596] 'hotcopy' of packed fsfs repos may corrupt target revprops.db"
Previous message: Jan Horák: "Re: Changes in SQL backend design"
Maybe in reply to: Jan Horák: "Changes in SQL backend design"
Next in thread: Greg Stein: "Re: Changes in SQL backend design"
Reply: Greg Stein: "Re: Changes in SQL backend design"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]