Re: Changes in SQL backend design

From: Jan Horák <horak.honza_at_gmail.com>
Date: Thu, 15 Apr 2010 23:58:50 +0200

Hi,

Dne 14.4.2010 18:08, Greg Stein napsal(a):
> 2010/4/14 Philipp Marek<philipp.marek_at_emerion.com>:
>
>> Hello Jan!
>>
>> On Dienstag, 13. April 2010, Jan Horák wrote:
>>
>>> Dne 12.4.2010 8:02, Philipp Marek napsal(a):
>>>
>>>> Sorry for the delay; but reading the thread "Severe performance issues
>>>> with large directories" I just remembered that the backend has a little
>>>> bit of a problem with big directories - storage overhead.
>>>>
>>>> Do you see any way to split directories into a series of blocks (like
>>>> files are done), and when changing only a few of the files using pointers
>>>> to the unmodified blocks of the old directory?
>>>>
>>>> I don't propose a real delta design - that was too slow, IIRC.
>>>> Just re-use of directory blocks; that shouldn't bring any performance
>>>> issues.
>>>>
>>>>
>>>> Is there some way to do that? Perhaps multiple "." entries in a
>>>> directory, which just point to other parts?
>>>>
>>> I'm not sure if we think the same issue, but I was thinking about a kind
>>> of hash table. Using
>>> a sophisticated table size it could bring good results, supposedly.
>>>
>> Sorry, I didn't make myself clear.
>>
>> I didn't find the issue I'm talking about in the issue tracker; but the
>> problem is that the backends (FSFS, BDB) don't store directories deltified
>> (for performance reasons), and so modifying an entry in or below a big
>> directory has to re-write the whole directory - and that means several
>> megabytes, for big directories.
>>
>>
>> So I'd suggest to change the directory storage.
>> * Either use a new table, with fields like parent, name (or path),
>> valid-from-revision, valid-before-revision or something like that;
>> then changing an entry means only updating valid-before of the
>> old record, and inserting a new one.
>> * Or, if you want to store directories in the same way as file data (like
>> now in FSFS and BDB), I'd suggest to limit such blocks of directory data
>> to a few KB, but to define an indirect-block that tells which blocks are
>> used.
>> A new entry could then reference all the unchanged blocks of the older
>> revision.
>>
First, sorry about my delays in answering, just too busy at the moment.

On the one hand I like this solution, I find it clear and useful. But I
agree with Greg on the other hand and I would be glad if some working
prototype of SQL backend will become real in the following weeks/months.
So I would rather not to complicate the present design and keep this
idea to the future extending.

It brings me to another point, I would like to begin to implement a
prototype soon, so if it would be possible to create some devel. branch
for that purpose, it would be great. Or is there anybody to ask for that
directly?

Regards,

Jan

> Or (3): go ahead and store megabytes for each directory, just like the
> other backends. And leave the solution of this problem to a future
> iteration of the SQL-based backend.
>
> Really... optimizing before you even get started is not advisable. Get
> something done. THEN examine and iterate. There could be numerous
> other problems inherent in a SQL backend that would obviate any such
> "solution" proposed today.
>
> Also, the "SQL backend" concept has been started several times before,
> and abandoned. I don't want to see it get abandoned AGAIN because the
> initial "solutions" make it overly complicated before it can even
> begin.
>
> Cheers,
> -g
>
Received on 2010-04-15 23:59:20 CEST

This message: [ Message body ]
Next message: Greg Stein: "Patch branch"
Previous message: Daniel Shahaf: "Re: [Issue 3596] 'hotcopy' of packed fsfs repos may corrupt target revprops.db"
In reply to: Greg Stein: "Re: Changes in SQL backend design"
Next in thread: Greg Stein: "Re: Changes in SQL backend design"
Reply: Greg Stein: "Re: Changes in SQL backend design"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]