[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Do we need to keep mergeinfo in the mergeinfo database?

From: David Glasser <glasser_at_davidglasser.net>
Date: 2007-11-28 23:25:09 CET

So I noticed (http://svn.haxx.se/dev/archive-2007-11/1168.shtml) that
we can currently make the mergeinfo db be out of sync with the
svn:mergeinfo properties. What this means is that the indexing code
needs to be fixed. Now, the indexing code is pretty complicated; it
has all sorts of "parse and unparse mergeinfo in and out of the table"
stuff. We certainly can fix it. But this set me off to figuring out
what we're actually using the mergeinfo code for.


Currently (and yes, this is ignoring the issue-2897 branch, though
whether or not that branch is doing the right thing is still under
debate) mergeinfo-sqlite-index.c only contains three SELECT
statements (not counting an optimization done during commit). There
are also only two APIs that the FS modules use to read from the DB:
svn_fs_mergeinfo__get_mergeinfo and

    SELECT mergedfrom, mergedrevstart, mergedrevend, inheritable FROM
    mergeinfo WHERE mergedto = ? AND revision = ? ORDER BY mergedfrom,

I.e., "get the mergeinfo on MERGEDTO@REVISION". Semantically
equivalent to just doing svn_fs_node_prop(&value, rev_root(REVISION),
MERGEDTO, "svn:mergeinfo"). (And in fact one of the two APIs
converts it back into a string anyway.) Note that the code guarantees
that REVISION is a rev where mergeinfo changed on the path.

    SELECT MAX(revision) FROM mergeinfo_changed WHERE path = ? AND
    revision <= ?;

Ie, "find the last revision before REVISION where mergeinfo changed on

    SELECT MAX(revision), path FROM mergeinfo_changed WHERE path LIKE ?
    AND revision <= ? GROUP BY path;

(PATH is always "some/path/%" here.)
Ie, "find the paths under PATH which have mergeinfo at REVISION (or
have ever had
mergeinfo before REVISION, I think?) and return them with the last
changed revision".


Note a couple things here:

* The first SELECT statement is completely redundant with

* The first SELECT statement is the only thing that uses any column
  other than path name (mergedto/path) and revision.

* The only reason for the second SELECT statement is to make sure to
  pass the right revision to the first SELECT statement. In fact, if
  we just looked on the DB itself, the second statement wouldn't be
  necessary at all.

* The third SELECT statement is actually doing something interesting:
  it's doing a prefix-match. That is, it's doing the equivalent of a
  recursive tree-walk in the FS, without actually needing to walk the
  tree. (And as of r28077, this actually gets to use an index.)


Let's move the mergeinfo out of the mergeinfo database. Let's just
make the mergeinfo database have one table:

CREATE TABLE mergeinfo
  ( path TEXT NOT NULL,
    revision INTEGER NOT NULL,
    action INTEGER NOT NULL );

where action is an enum representing 'added', 'modified', or

svn_fs_mergeinfo__get_mergeinfo should not touch the database at all.

svn_fs_mergeinfo__get_mergeinfo_for_tree will use the database to
figure out which paths under P actually have mergeinfo, and use the FS
to find out the actual mergeinfo data.

(One side effect of this is that there will be a lot less work done to
update the tables, which means less work done with the FSFS write lock
held, which is good. My guess is that this change, if feasible, will
improve the efficiency of commits. I have no idea what the
performance impact on read ops will be; it'll keep the index table
smaller, but require more FS access.)

Yes, I realize it's a bad time to redesign the DB *yet again*. But I
do believe that these changes can be made without touching code in
libsvn_client: this is pure backend information. And the current
state of the index is broken anyway, and this version would be easier
to keep up to date. (Plus in 1.6 we can extend it to track properties
other than svn:mergeinfo; I know the inability to do an efficient
recursive search for nodes with properties is a common inhibitor on
widespread use of custom props...)


David Glasser | glasser_at_davidglasser.net | http://www.davidglasser.net/
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Nov 28 23:25:35 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.