[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Changes in SQL backend design

From: Jan Horák <horak.honza_at_gmail.com>
Date: Thu, 15 Apr 2010 23:58:50 +0200


Dne 14.4.2010 18:08, Greg Stein napsal(a):
> 2010/4/14 Philipp Marek<philipp.marek_at_emerion.com>:
>> Hello Jan!
>> On Dienstag, 13. April 2010, Jan Horák wrote:
>>> Dne 12.4.2010 8:02, Philipp Marek napsal(a):
>>>> Sorry for the delay; but reading the thread "Severe performance issues
>>>> with large directories" I just remembered that the backend has a little
>>>> bit of a problem with big directories - storage overhead.
>>>> Do you see any way to split directories into a series of blocks (like
>>>> files are done), and when changing only a few of the files using pointers
>>>> to the unmodified blocks of the old directory?
>>>> I don't propose a real delta design - that was too slow, IIRC.
>>>> Just re-use of directory blocks; that shouldn't bring any performance
>>>> issues.
>>>> Is there some way to do that? Perhaps multiple "." entries in a
>>>> directory, which just point to other parts?
>>> I'm not sure if we think the same issue, but I was thinking about a kind
>>> of hash table. Using
>>> a sophisticated table size it could bring good results, supposedly.
>> Sorry, I didn't make myself clear.
>> I didn't find the issue I'm talking about in the issue tracker; but the
>> problem is that the backends (FSFS, BDB) don't store directories deltified
>> (for performance reasons), and so modifying an entry in or below a big
>> directory has to re-write the whole directory - and that means several
>> megabytes, for big directories.
>> So I'd suggest to change the directory storage.
>> * Either use a new table, with fields like parent, name (or path),
>> valid-from-revision, valid-before-revision or something like that;
>> then changing an entry means only updating valid-before of the
>> old record, and inserting a new one.
>> * Or, if you want to store directories in the same way as file data (like
>> now in FSFS and BDB), I'd suggest to limit such blocks of directory data
>> to a few KB, but to define an indirect-block that tells which blocks are
>> used.
>> A new entry could then reference all the unchanged blocks of the older
>> revision.
First, sorry about my delays in answering, just too busy at the moment.

On the one hand I like this solution, I find it clear and useful. But I
agree with Greg on the other hand and I would be glad if some working
prototype of SQL backend will become real in the following weeks/months.
So I would rather not to complicate the present design and keep this
idea to the future extending.

It brings me to another point, I would like to begin to implement a
prototype soon, so if it would be possible to create some devel. branch
for that purpose, it would be great. Or is there anybody to ask for that



> Or (3): go ahead and store megabytes for each directory, just like the
> other backends. And leave the solution of this problem to a future
> iteration of the SQL-based backend.
> Really... optimizing before you even get started is not advisable. Get
> something done. THEN examine and iterate. There could be numerous
> other problems inherent in a SQL backend that would obviate any such
> "solution" proposed today.
> Also, the "SQL backend" concept has been started several times before,
> and abandoned. I don't want to see it get abandoned AGAIN because the
> initial "solutions" make it overly complicated before it can even
> begin.
> Cheers,
> -g
Received on 2010-04-15 23:59:20 CEST

This is an archived mail posted to the Subversion Dev mailing list.