[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Is sqlite fast enough?

From: Philip Martin <philip.martin_at_wandisco.com>
Date: Thu, 18 Feb 2010 18:21:55 +0000

How fast are sqlite queries be compared to the old-wc entry caching?
Operations like update benefit from faster locking, but in old-wc
status does no locking at all. Also the multiple entries files give
the data a tree-like organisation that is not modelled quite so well
by an SQL table. Is sqlite going to be at least as fast as old-wc at
running status when the OS cache is hot? I've been vaguely assumming
that it would be OK but this week I felt it was time to investigate.
What would we do in 3 months time if wc-ng turns out to be an order of
magnitude slower than old-wc?

I need a big working copy to make status take a significant amount of
time, so I used a shell script[1] to create about 100,000 files in
about 900 directories. Then I wrote a python script[2] to create and
populate a single sqlite database with rows representing a similar set
of files and directories. Finally I wrote a simple C client[3] to
query the sqlite database in manner that vaguely approximates status.

Initially result were not encouraging. On my machine 'svn status'
using 1.6 takes about 1.1 seconds and the sqlite client took nearly 60
seconds to query the database, and most of that was in sqlite3_step.
That sort of slowdown would be disastrous. After some thought I
realised I needed another index:

 create unique index i_local on base_node (wc_id, local_relpath)

With that added the sqlite client completes the queries in about 0.35
of a second, comfortably faster than the 1.6 status. Please note that
the sqlite client is only reading the base_node table, it's not doing
stat() on any working files or reading working_node or actual_node.
Also the base_node table in my test has fewer columns than wc-ng will
use. As a final test I patched[4] my 1.6 build to remove most of the
querying of the working files. With this change 'svn status' goes
down to about 0.55 of a second, still slower than the sqlite client.

My major concern that sqlite might be an order of magnitude slower
than old-wc is gone. Obviously the locking operations are likely to
be faster and the read-only operations look good too, although the
margin is not that great.

[1] script to create a large working copy

#!/bin/sh

svnadmin=svnadmin
svn=svn
url=file://`pwd`/repo

rm -rf repo wc empty
$svnadmin create repo
$svn mkdir -mm $url/trunk
touch empty
for i in `seq 0 99` ; do $svn import -mm empty $url/trunk/foo$i; done
rm empty
for i in `seq 0 29` ; do $svn cp -mm -r101 $url/trunk $url/trunk/zag$i; done
for i in `seq 0 29` ; do $svn cp -mm -r131 $url/trunk $url/trunk/zig$i; done
$svn co $url/trunk wc

[2] script to create a sqlite database

#!/usr/bin/python

import os, sqlite3

try: os.remove('wcx.db')
except: pass

c = sqlite3.connect('wcx.db')
c.execute("""create table repository (
               id integer primary key autoincrement,
               root text unique not null,
               uuid text not null)""")
c.execute("""create index i_uuid on repository (uuid)""")
c.execute("""create index i_root on repository (root)""")
c.execute("""create table wcroot (
               id integer primary key autoincrement,
               local_abspath text unique)""")
c.execute("""create unique index i_local_abspath on wcroot (local_abspath)""")
c.execute("""create table base_node (
               wc_id integer not null references wcroot (id),
               local_relpath text not null,
               repos_id integer references repository (id),
               repos_relpath text,
               parent_relpath text,
               presence text not null,
               kind text not null)""")
c.execute("""create unique index i_local on base_node (wc_id, local_relpath)""")
c.execute("""create index i_parent on base_node (wc_id, parent_relpath)""")
c.execute("""insert into repository (root, uuid) values (
               "http://example.com/repo",
               "f738be9e-409d-481f-b246-1fb6a969aba2")""")
c.execute("""insert into wcroot(local_abspath) values ("/wc")""")

c.execute("""insert into base_node (
               wc_id,
               local_relpath,
               repos_id,
               repos_relpath,
               parent_relpath,
               presence,
               kind)
             values (
               1,
               "",
               1,
               "trunk",
               NULL,
               "normal",
               "dir")""")

for i in range(100):
    c.execute("""insert into base_node (
                   wc_id,
                   local_relpath,
                   repos_id,
                   repos_relpath,
                   parent_relpath,
                   presence,
                   kind)
                 values (
                   1,
                   "foo"""+str(i)+"""",
                   1,
                   "trunk/foo"""+str(i)+"""",
                   "",
                   "normal",
                   "file")""")
    if i >= 30:
        continue;
    c.execute("""insert into base_node (
                   wc_id,
                   local_relpath,
                   repos_id,
                   repos_relpath,
                   parent_relpath,
                   presence,
                   kind)
                 values (
                   1,
                   "zag"""+str(i)+"""",
                   1,
                   "trunk/zag"""+str(i)+"""",
                   "",
                   "normal",
                   "dir")""")
    c.execute("""insert into base_node (
                   wc_id,
                   local_relpath,
                   repos_id,
                   repos_relpath,
                   parent_relpath,
                   presence,
                   kind)
                 values (
                   1,
                   "zig"""+str(i)+"""",
                   1,
                   "trunk/zig"""+str(i)+"""",
                   "",
                   "normal",
                   "dir")""")

    for j in range(100):
        c.execute("""insert into base_node (
                       wc_id,
                       local_relpath,
                       repos_id,
                       repos_relpath,
                       parent_relpath,
                       presence,
                       kind)
                     values (
                       1,
                       "zag"""+str(i)+"/foo"+str(j)+"""",
                       1,
                       "trunk/zag"""+str(i)+"/foo"+str(j)+"""",
                       "zag"""+str(i)+"""",
                       "normal",
                       "file")""")
        c.execute("""insert into base_node (
                       wc_id,
                       local_relpath,
                       repos_id,
                       repos_relpath,
                       parent_relpath,
                       presence,
                       kind)
                     values (
                       1,
                       "zig"""+str(i)+"/foo"+str(j)+"""",
                       1,
                       "trunk/zig"""+str(i)+"/foo"+str(j)+"""",
                       "zig"""+str(i)+"""",
                       "normal",
                       "file")""")
        if j >= 30:
            continue
        c.execute("""insert into base_node (
                       wc_id,
                       local_relpath,
                       repos_id,
                       repos_relpath,
                       parent_relpath,
                       presence,
                       kind)
                     values (
                       1,
                       "zig"""+str(i)+"/zag"+str(j)+"""",
                       1,
                       "trunk/zig"""+str(i)+"/zag"+str(j)+"""",
                       "zig"""+str(i)+"""",
                       "normal",
                       "dir")""")
        for k in range(100):
            c.execute("""insert into base_node (
                           wc_id,
                           local_relpath,
                           repos_id,
                           repos_relpath,
                           parent_relpath,
                           presence,
                           kind)
                         values (
                           1,
                           "zig"""+str(i)+"/zag"+str(j)+"/foo"+str(k)+"""",
                           1,
                           "trunk/zig"""+str(i)+"/zag"+str(j)+"/foo"+str(k)+"""",
                           "zig"""+str(i)+"/zag"+str(j)+"""",
                           "normal",
                           "file")""")

c.commit()

[3] client to query the database

#include "svn_pools.h"
#include "svn_sqlite.h"
#include <stdio.h>

static svn_error_t *
status_query(svn_sqlite__db_t *sdb,
             const char *local_relpath,
             svn_boolean_t display,
             apr_pool_t *pool)
{
  svn_sqlite__stmt_t *stmt;
  svn_boolean_t have_row;
  const char *kind;
  apr_pool_t *subpool;
  apr_array_header_t *subdirs = apr_array_make(pool, 10, sizeof(const char *));
  int i;

  SVN_ERR(svn_sqlite__get_statement(&stmt, sdb, 0));
  SVN_ERR(svn_sqlite__bindf(stmt, "is", 1, local_relpath));
  SVN_ERR(svn_sqlite__step(&have_row, stmt));
  if (!have_row)
    {
      SVN_ERR(svn_sqlite__reset(stmt));
      return SVN_NO_ERROR;
    }
  kind = svn_sqlite__column_text(stmt, 0, pool);
  if (display)
    printf("%s %s\n", local_relpath, kind);
  SVN_ERR(svn_sqlite__reset(stmt));
  
  if (!strcmp(kind, "dir"))
    {
      SVN_ERR(svn_sqlite__get_statement(&stmt, sdb, 1));
      SVN_ERR(svn_sqlite__bindf(stmt, "is", 1, local_relpath));
      SVN_ERR(svn_sqlite__step(&have_row, stmt));
      while (have_row)
        {
          const char *child_relpath = svn_sqlite__column_text(stmt, 0, NULL);
          kind = svn_sqlite__column_text(stmt, 1, NULL);
          if (!strcmp(kind, "dir"))
            APR_ARRAY_PUSH(subdirs, const char *)
              = apr_pstrdup(pool, child_relpath);
          else if (display)
            printf("%s %s\n", child_relpath, kind);
          SVN_ERR(svn_sqlite__step(&have_row, stmt));
        }
      SVN_ERR(svn_sqlite__reset(stmt));
    }

  subpool = svn_pool_create(pool);
  for (i = 0; i < subdirs->nelts; ++i)
    {
      svn_pool_clear(subpool);
      SVN_ERR(status_query(sdb, APR_ARRAY_IDX(subdirs, i, const char*), display,
                           subpool));
    }
  svn_pool_destroy(subpool);

  return SVN_NO_ERROR;
}

int main()
{
  apr_pool_t *pool;
  svn_sqlite__db_t *sdb;
  const char * const statements[] = {
    "select kind from base_node" \
    " where wc_id = ?1 and local_relpath = ?2;",
    "select local_relpath, kind from base_node" \
    " where wc_id = ?1 and parent_relpath = ?2;",
    NULL
  };

  apr_initialize();
  pool = svn_pool_create(NULL);
  SVN_INT_ERR(svn_sqlite__open(&sdb, "wcx.db", svn_sqlite__mode_rwcreate,
                               statements, 0, NULL, pool, pool));
  SVN_INT_ERR(status_query(sdb, "", FALSE, pool));

  return EXIT_SUCCESS;
}

[4] patch for 1.6

Index: subversion/libsvn_wc/status.c
===================================================================
--- subversion/libsvn_wc/status.c (revision 905665)
+++ subversion/libsvn_wc/status.c (working copy)
@@ -272,6 +272,7 @@
 
   svn_lock_t *repos_lock = NULL;
 
+#if 0
   /* Check for a repository lock. */
   if (repos_locks)
     {
@@ -302,6 +303,7 @@
   SVN_ERR(svn_wc__get_tree_conflict(&tree_conflict, path, adm_access, pool));
 
   if (! entry)
+#endif
     {
       /* return a fairly blank structure. */
       stat = apr_pcalloc(pool, sizeof(*stat));
@@ -316,25 +318,7 @@
       stat->tree_conflict = tree_conflict;
       stat->file_external = FALSE;
 
- /* If this path has no entry, but IS present on disk, it's
- unversioned. If this file is being explicitly ignored (due
- to matching an ignore-pattern), the text_status is set to
- svn_wc_status_ignored. Otherwise the text_status is set to
- svn_wc_status_unversioned. */
- if (path_kind != svn_node_none)
- {
- if (is_ignored)
- stat->text_status = svn_wc_status_ignored;
- else
- stat->text_status = svn_wc_status_unversioned;
- }
-
- /* If this path has no entry, is NOT present on disk, and IS a
- tree conflict victim, count it as missing. */
- if ((path_kind == svn_node_none) && tree_conflict)
- stat->text_status = svn_wc_status_missing;
-
- stat->repos_lock = repos_lock;
+ stat->repos_lock = NULL;
       stat->url = NULL;
       stat->ood_last_cmt_rev = SVN_INVALID_REVNUM;
       stat->ood_last_cmt_date = 0;

-- 
Philip
Received on 2010-02-18 19:22:39 CET

This is an archived mail posted to the Subversion Dev mailing list.