[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Repeated SQL queries when doing 'svn st'

From: Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de>
Date: Wed, 08 Sep 2010 12:25:08 +0200

Philip Martin wrote:
> Branko ─îibej <brane_at_xbc.nu> writes:
>
>
>> On 06.09.2010 12:16, Philip Martin wrote:
>>
>>> To use a per-directory query strategy we would probably have to cache
>>> data in memory, although not to the same extent as in 1.6. We should
>>> probably avoid having Subversion make status callbacks into the
>>> application while a query is in progress, so we would accumulate all
>>> the row data and complete the query before making any callbacks. Some
>>> sort of private svn_wc__db_node_t to hold the results of the select
>>> would probably be sufficient.
>>>
>> I wonder if per-directory is really necessary; I guess I'm worrying
>> about the case were the WC tree has lots of directories with few files.
>> Do we not have the whole tree in a single Sqlide DB now? Depending on
>> the schema, it might be possible to load the status information from the
>> database in one single query.
>>
>
> Yes, per-tree would probably work but I expect most WCs have more
> files than directories so the gains over per-dir would be small. One
> big advantage of doing status per-tree is that it gives a proper
> snapshot, the tree cannot be modified during the status walk. I'm not
> pushing per-dir as the final solution, my point is that per-node
> SQLite queries are not going to be fast enough.
There are actually two or three reasons why status should
run queries on directory granularity:

* directories usually resemble files in that opening them is
  expensive relative to reading their content
* operation can be canceled in a timely manner (may or may
  not be an issue with huge SQL query results)
* maybe: queries for a specific folder may be simpler / faster
  than for sub-trees (depends on schema)

Also, I don't think there is a need to cache query results.
Instead, the algorithm should be modified to look like this:

dir_status:

    // get all relevant info; each array sorted by name
    stat_recorded = sql_query("BASE + recorded change info of dir entries")
    stat_actual = read_dir()
    prop_changes = sql_query("find prop changes in dir")
  
    // "align" / "merge" arrays and send results to client
    foreach name do
       recorded= has(stat_recorded,name) ? stat_recorded[name] : NULL;
       actual = has(stat_actual,name) ? stat_actual[name] : NULL;
       changed_props = has(prop_changes,name) ? prop_changes[name] : NULL;

       // compare file content if necessary
       if (recorded&& actual && needs_content_check(recorded, actual))
          actual = check_content(name)

       send_node_status(recorded, actual, changed_props)

Only two SQL queries (give or take) per directory.

-- Stefan^2.
Received on 2010-09-08 12:25:52 CEST

This is an archived mail posted to the Subversion Dev mailing list.