(see my other mail)
from 40 minutes to 3 seconds...
yay, \o/
On 06/25/12 14:57, Mark Phippard wrote:
> Could you try building trunk? I believe these issues have been resolved or at least improved. It would be good to see how much.
>
> Sent from my iPhone
>
> On Jun 25, 2012, at 8:36 AM, Attila Nagy <bra_at_fsn.hu> wrote:
>
>> Hi,
>>
>> (my humble attempts to make some improvement can be read at the end)
>>
>> I suffer from the slowness of svn rm since the upgrade to 1.7 from 1.6, but I couldn't find the time to profile it until now.
>> My setup is: FreeBSD 9-STABLE/amd64 with zfs, eight fast cores, SAN, 32 GiB of RAM.
>> Versions:
>> svn, version 1.7.5 (r1336830)
>> sqlite: 3.7.12.1 2012-05-22 02:45:53 6d326d44fd1d626aae0e8456e5fa2049f1ce0789
>>
>> The working copy currently consists of 1992210 entries (files and directories), the wc.db database is 1.5 GiB (which can easily fit into memory, and pre-heated, so sqlite won't have to read a bit from the disks).
>>
>> Deleting a single file and committing the change take 25.43 seconds combined:
>> # time svn rm test > /tmp/svn-rm
>> 4.825u 6.640s 0:11.47 99.9% 223+2544k 4+95io 0pf+0w
>> # time svn commit -m 't' test > /tmp/svn-commit
>> 5.380u 6.167s 0:13.96 82.6% 223+2544k 6+143io 0pf+0w
>>
>> I've modified subversion/libsvn_subr/sqlite.c in svn_sqlite__step() to measure time for each SQL operations and print it along with the original statement:
>> svn_error_t *
>> svn_sqlite__step(svn_boolean_t *got_row, svn_sqlite__stmt_t *stmt)
>> {
>> clock_t start = clock();
>> long i;
>> int sqlite_result = sqlite3_step(stmt->s3stmt);
>> printf ( "%f, %s\n", ( (double)clock() - start ) / CLOCKS_PER_SEC,sqlite3_sql(stmt->s3stmt) );
>>
>> Here are the numerically sorted top lines from svn-rm and svn-commit (I've included the last zero-time operations, so every more than zero lines can be observed):
>> rm:
>> 0.000000, SELECT root, uuid FROM repository WHERE id = ?1
>> 0.007812, DELETE FROM actual_node WHERE wc_id = ?1 AND (?2 = '' OR local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') ) AND (changelist IS NULL OR NOT EXISTS (SELECT 1 FROM nodes_current c WHERE c.wc_id = ?1 AND c.local_relpath = actual_node.local_relpath AND c.kind = 'file'))
>> 0.007812, UPDATE actual_node SET properties = NULL, text_mod = NULL, tree_conflict_data = NULL, conflict_old = NULL, conflict_new = NULL, conflict_working = NULL, prop_reject = NULL, older_checksum = NULL, left_checksum = NULL, right_checksum = NULL WHERE wc_id = ?1 AND (?2 = '' OR local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') )
>> 1.882812, DELETE FROM nodes WHERE wc_id = ?1 AND (?2 = '' OR local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') ) AND op_depth >= ?3
>> 9.539062, SELECT local_relpath FROM nodes WHERE wc_id = ?1 AND (?2 = '' OR local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') ) AND op_depth = 0 AND presence = 'absent' LIMIT 1
>> commit:
>> 0.000000, SELECT root, uuid FROM repository WHERE id = ?1
>> 1.171875, SELECT nodes.repos_id, nodes.repos_path, lock_token FROM nodes LEFT JOIN lock ON nodes.repos_id = lock.repos_id AND nodes.repos_path = lock.repos_relpath WHERE wc_id = ?1 AND op_depth = 0 AND (?2 = '' OR local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') )
>> 2.007812, DELETE FROM nodes WHERE wc_id = ?1 AND (?2 = '' OR local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') ) AND op_depth >= ?3
>> 8.320312, SELECT nodes.repos_id, nodes.repos_path, lock_token FROM nodes LEFT JOIN lock ON nodes.repos_id = lock.repos_id AND nodes.repos_path = lock.repos_relpath WHERE wc_id = ?1 AND op_depth = 0 AND (?2 = '' OR local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') )
>>
>> It can be seen that in case of delete, two statements, in case of commit, three statements take most of the time.
>>
>> For the curious, I've repeated the above test with 100 files to see how multiple files affect this (in my application I delete 1000s of files regularly in one svn rm "transaction", this often takes days, literally).
>>
>> So deleting 100 files takes 1353.219 seconds (so it has some benefits compared to deleting the files one by one):
>> # time svn rm test* > /tmp/svn-rm
>> 508.826u 692.233s 20:01.47 99.9% 223+2544k 7+8720io 0pf+0w
>> # time svn commit -m 't' > /tmp/svn-commit
>> 523.866u 660.986s 19:47.65 99.7% 223+2544k 6+6574io 0pf+0w
>>
>> and still printing statements, which has a bigger aggregated execution time than 0:
>> rm:
>> 0.000000, SELECT root, uuid FROM repository WHERE id = ?1
>> 0.007812, DELETE FROM wc_lock WHERE wc_id = ?1 AND (?2 = '' OR local_dir_relpath = ?2 OR ((local_dir_relpath) > (?2) || '/' AND (local_dir_relpath) < (?2) || '0') ) AND NOT EXISTS (SELECT 1 FROM nodes WHERE nodes.wc_id = ?1 AND nodes.local_relpath = wc_lock.local_dir_relpath)
>> 0.007812, SELECT 1 FROM nodes WHERE wc_id = ?1 AND local_relpath = ?2 LIMIT 1
>> 0.007812, SELECT id, work FROM work_queue ORDER BY id LIMIT 1
>> 0.007812, SELECT local_dir_relpath FROM wc_lock WHERE wc_id = ?1 AND local_dir_relpath LIKE ?2 ESCAPE '#'
>> 0.007812, SELECT prop_reject, changelist, conflict_old, conflict_new, conflict_working, tree_conflict_data, properties FROM actual_node WHERE wc_id = ?1 AND local_relpath = ?2
>> 0.015624, SELECT local_relpath, def_local_relpath FROM externals WHERE wc_id = ?1 AND (?2 = '' OR def_local_relpath = ?2 OR ((def_local_relpath) > (?2) || '/' AND (def_local_relpath) < (?2) || '0') )
>> 0.015624, SELECT op_depth, nodes.repos_id, nodes.repos_path, presence, kind, revision, checksum, translated_size, changed_revision, changed_date, changed_author, depth, symlink_target, last_mod_time, properties, lock_token, lock_owner, lock_comment, lock_date FROM nodes LEFT OUTER JOIN lock ON nodes.repos_id = lock.repos_id AND nodes.repos_path = lock.repos_relpath WHERE wc_id = ?1 AND local_relpath = ?2 ORDER BY op_depth DESC
>> 0.015624, SELECT presence, kind, def_local_relpath, repos_id, def_repos_relpath, def_operational_revision, def_revision, presence FROM externals WHERE wc_id = ?1 AND local_relpath = ?2 LIMIT 1
>> 0.023436, INSERT INTO delete_list(local_relpath) SELECT local_relpath FROM nodes n WHERE wc_id = ?1 AND (local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') ) AND op_depth >= ?3 AND presence NOT IN ('base-deleted', 'not-present', 'excluded', 'absent') AND op_depth = (SELECT MAX(op_depth) FROM nodes s WHERE s.wc_id = n.wc_id AND s.local_relpath = n.local_relpath)
>> 0.031248, SELECT local_dir_relpath, locked_levels FROM wc_lock WHERE wc_id = ?1 AND ((local_dir_relpath <= ?2 AND local_dir_relpath >= ?3) OR local_dir_relpath = '') ORDER BY local_dir_relpath DESC
>> 0.03906, INSERT INTO nodes ( wc_id, local_relpath, op_depth, parent_relpath, presence, kind) SELECT wc_id, local_relpath, ?4 , parent_relpath, 'base-deleted', kind FROM nodes WHERE wc_id = ?1 AND (local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') ) AND op_depth = ?3 AND presence NOT IN ('base-deleted', 'not-present', 'excluded', 'absent')
>> 0.03906, INSERT INTO wc_lock (wc_id, local_dir_relpath, locked_levels) VALUES (?1, ?2, ?3)
>> 0.046872, SELECT op_depth, repos_id, repos_path, presence, kind, revision, checksum, translated_size, changed_revision, changed_date, changed_author, depth, symlink_target, last_mod_time, properties FROM nodes WHERE wc_id = ?1 AND local_relpath = ?2 ORDER BY op_depth DESC
>> 0.062496, DELETE FROM wc_lock WHERE wc_id = ?1 AND local_dir_relpath = ?2
>> 0.398413, UPDATE actual_node SET properties = NULL, text_mod = NULL, tree_conflict_data = NULL, conflict_old = NULL, conflict_new = NULL, conflict_working = NULL, prop_reject = NULL, older_checksum = NULL, left_checksum = NULL, right_checksum = NULL WHERE wc_id = ?1 AND (?2 = '' OR local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') )
>> 1.031218, DELETE FROM actual_node WHERE wc_id = ?1 AND (?2 = '' OR local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') ) AND (changelist IS NULL OR NOT EXISTS (SELECT 1 FROM nodes_current c WHERE c.wc_id = ?1 AND c.local_relpath = actual_node.local_relpath AND c.kind = 'file'))
>> 205.515629, DELETE FROM nodes WHERE wc_id = ?1 AND (?2 = '' OR local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') ) AND op_depth >= ?3
>> 993.296875, SELECT local_relpath FROM nodes WHERE wc_id = ?1 AND (?2 = '' OR local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') ) AND op_depth = 0 AND presence = 'absent' LIMIT 1
>> commit:
>> 0.000000, SELECT root, uuid FROM repository WHERE id = ?1
>> 0.007812, DELETE FROM nodes WHERE wc_id = ?1 AND local_relpath = ?2 AND op_depth = (SELECT MIN(op_depth) FROM nodes WHERE wc_id = ?1 AND local_relpath = ?2 AND op_depth > 0) AND presence = 'base-deleted'
>> 0.007812, DELETE FROM wc_lock WHERE wc_id = ?1 AND local_dir_relpath = ?2
>> 0.015624, INSERT OR REPLACE INTO nodes ( wc_id, local_relpath, op_depth, parent_relpath, repos_id, repos_path, revision, presence, depth, kind, changed_revision, changed_date, changed_author, checksum, properties, translated_size, last_mod_time, dav_cache, symlink_target, file_external) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12, ?13, ?14, ?15, ?16, ?17, ?18, ?19, ?20)
>> 0.015624, SELECT prop_reject, changelist, conflict_old, conflict_new, conflict_working, tree_conflict_data, properties FROM actual_node WHERE wc_id = ?1 AND local_relpath = ?2
>> 0.015624, SELECT repos_id, repos_path, presence, kind, revision, checksum, translated_size, changed_revision, changed_date, changed_author, depth, symlink_target, last_mod_time, properties, file_external IS NOT NULL FROM nodes WHERE wc_id = ?1 AND local_relpath = ?2 AND op_depth = 0
>> 0.023436, SELECT op_depth, repos_id, repos_path, presence, kind, revision, checksum, translated_size, changed_revision, changed_date, changed_author, depth, symlink_target, last_mod_time, properties FROM nodes WHERE wc_id = ?1 AND local_relpath = ?2 ORDER BY op_depth DESC
>> 0.031248, SELECT op_depth, nodes.repos_id, nodes.repos_path, presence, kind, revision, checksum, translated_size, changed_revision, changed_date, changed_author, depth, symlink_target, last_mod_time, properties, lock_token, lock_owner, lock_comment, lock_date FROM nodes LEFT OUTER JOIN lock ON nodes.repos_id = lock.repos_id AND nodes.repos_path = lock.repos_relpath WHERE wc_id = ?1 AND local_relpath = ?2 ORDER BY op_depth DESC
>> 0.539034, DELETE FROM actual_node WHERE wc_id = ?1 AND (?2 = '' OR local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') )
>> 205.703121, DELETE FROM nodes WHERE wc_id = ?1 AND (?2 = '' OR local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') ) AND op_depth >= ?3
>> 978.265612, SELECT nodes.repos_id, nodes.repos_path, lock_token FROM nodes LEFT JOIN lock ON nodes.repos_id = lock.repos_id AND nodes.repos_path = lock.repos_relpath WHERE wc_id = ?1 AND op_depth = 0 AND (?2 = '' OR local_relpath = ?2 OR ((local_relpath) > (?2) || '/' AND (local_relpath) < (?2) || '0') )
>>
>> Looking at these, it seems some indexes are missing.
>> Particularly:
>> nodes has an index of: CREATE INDEX I_NODES_PARENT ON NODES (wc_id, parent_relpath, op_depth);
>> but the long running statement query the local_relpath instead of the parent one.
>> Placing the following index on it makes svn rm run in 3.43 seconds instead of 11.47 and commit in 4.29 seconds instead of 13.96 (deleting only one file):
>> sqlite> CREATE INDEX I_NODES_LOCAL ON NODES (wc_id, local_relpath, op_depth);
>> wc.db's size was increased to 1.6GiB (not much for the speedup).
>> There is another similar query, where presence is used, so I've placed another one:
>> sqlite> CREATE INDEX I_NODES_LOCAL_PRESENCE ON NODES (wc_id, local_relpath, op_depth, presence);
>> wc.db increased to 1.7 GiB
>>
>> With these, removing 100 files took 4:46.21 seconds (was 20:01.47) and committing 6:00.25 (was 19:47.65). Still sucks, but much better than before.
>> Any ideas about how could it be made even faster without ditching sqlite? Deleting 100 files and committing the change shouldn't take 40 (10, with the above indexes) minutes, even with 2 million files... I guess.
Received on 2012-06-25 16:36:16 CEST