Philip Martin wrote:
> Branko Čibej <brane_at_xbc.nu> writes:
>> On 06.09.2010 12:16, Philip Martin wrote:
>>> To use a per-directory query strategy we would probably have to cache
>>> data in memory, although not to the same extent as in 1.6. We should
>>> probably avoid having Subversion make status callbacks into the
>>> application while a query is in progress, so we would accumulate all
>>> the row data and complete the query before making any callbacks. Some
>>> sort of private svn_wc__db_node_t to hold the results of the select
>>> would probably be sufficient.
>> I wonder if per-directory is really necessary; I guess I'm worrying
>> about the case were the WC tree has lots of directories with few files.
>> Do we not have the whole tree in a single Sqlide DB now? Depending on
>> the schema, it might be possible to load the status information from the
>> database in one single query.
> Yes, per-tree would probably work but I expect most WCs have more
> files than directories so the gains over per-dir would be small. One
> big advantage of doing status per-tree is that it gives a proper
> snapshot, the tree cannot be modified during the status walk. I'm not
> pushing per-dir as the final solution, my point is that per-node
> SQLite queries are not going to be fast enough.
There are actually two or three reasons why status should
run queries on directory granularity:
* directories usually resemble files in that opening them is
expensive relative to reading their content
* operation can be canceled in a timely manner (may or may
not be an issue with huge SQL query results)
* maybe: queries for a specific folder may be simpler / faster
than for sub-trees (depends on schema)
Also, I don't think there is a need to cache query results.
Instead, the algorithm should be modified to look like this:
// get all relevant info; each array sorted by name
stat_recorded = sql_query("BASE + recorded change info of dir entries")
stat_actual = read_dir()
prop_changes = sql_query("find prop changes in dir")
// "align" / "merge" arrays and send results to client
foreach name do
recorded= has(stat_recorded,name) ? stat_recorded[name] : NULL;
actual = has(stat_actual,name) ? stat_actual[name] : NULL;
changed_props = has(prop_changes,name) ? prop_changes[name] : NULL;
// compare file content if necessary
if (recorded&& actual && needs_content_check(recorded, actual))
actual = check_content(name)
send_node_status(recorded, actual, changed_props)
Only two SQL queries (give or take) per directory.
Received on 2010-09-08 12:25:52 CEST