Re: Do we need to keep mergeinfo in the mergeinfo database?

From: David Glasser <glasser_at_davidglasser.net>
Date: 2007-11-29 17:44:00 CET

On Nov 29, 2007 5:07 AM, Mark Phippard <markphip@gmail.com> wrote:
>
> On Nov 29, 2007 12:47 AM, David Glasser <glasser@davidglasser.net> wrote:
> >
> > On Nov 28, 2007 2:25 PM, David Glasser <glasser@davidglasser.net> wrote:
> > > So I noticed (http://svn.haxx.se/dev/archive-2007-11/1168.shtml) that
> > > we can currently make the mergeinfo db be out of sync with the
> > > svn:mergeinfo properties. What this means is that the indexing code
> > > needs to be fixed. Now, the indexing code is pretty complicated; it
> > > has all sorts of "parse and unparse mergeinfo in and out of the table"
> > > stuff. We certainly can fix it. But this set me off to figuring out
> > > what we're actually using the mergeinfo code for.
> > >
> > >
> > > CURRENT USE
> > > ===========
> > >
> > > Currently (and yes, this is ignoring the issue-2897 branch, though
> > > whether or not that branch is doing the right thing is still under
> > > debate) mergeinfo-sqlite-index.c only contains three SELECT
> > > statements (not counting an optimization done during commit). There
> > > are also only two APIs that the FS modules use to read from the DB:
> > > svn_fs_mergeinfo__get_mergeinfo and
> > > svn_fs_mergeinfo__get_mergeinfo_for_tree.
> > >
> > > SELECT mergedfrom, mergedrevstart, mergedrevend, inheritable FROM
> > > mergeinfo WHERE mergedto = ? AND revision = ? ORDER BY mergedfrom,
> > > mergedrevstart;
> > >
> > > I.e., "get the mergeinfo on MERGEDTO@REVISION". Semantically
> > > equivalent to just doing svn_fs_node_prop(&value, rev_root(REVISION),
> > > MERGEDTO, "svn:mergeinfo"). (And in fact one of the two APIs
> > > converts it back into a string anyway.) Note that the code guarantees
> > > that REVISION is a rev where mergeinfo changed on the path.
> > >
> > > SELECT MAX(revision) FROM mergeinfo_changed WHERE path = ? AND
> > > revision <= ?;
> > >
> > > Ie, "find the last revision before REVISION where mergeinfo changed on
> > > PATH".
> > >
> > > SELECT MAX(revision), path FROM mergeinfo_changed WHERE path LIKE ?
> > > AND revision <= ? GROUP BY path;
> > >
> > > (PATH is always "some/path/%" here.)
> > > Ie, "find the paths under PATH which have mergeinfo at REVISION (or
> > > have ever had
> > > mergeinfo before REVISION, I think?) and return them with the last
> > > changed revision".
> > >
> > >
> > >
> > > OBSERVATIONS
> > > ============
> > >
> > > Note a couple things here:
> > >
> > > * The first SELECT statement is completely redundant with
> > > svn_fs_node_prop.
> > >
> > > * The first SELECT statement is the only thing that uses any column
> > > other than path name (mergedto/path) and revision.
> > >
> > > * The only reason for the second SELECT statement is to make sure to
> > > pass the right revision to the first SELECT statement. In fact, if
> > > we just looked on the DB itself, the second statement wouldn't be
> > > necessary at all.
> > >
> > > * The third SELECT statement is actually doing something interesting:
> > > it's doing a prefix-match. That is, it's doing the equivalent of a
> > > recursive tree-walk in the FS, without actually needing to walk the
> > > tree. (And as of r28077, this actually gets to use an index.)
> > >
> > >
> > >
> > > PROPOSAL
> > > ========
> > >
> > > Let's move the mergeinfo out of the mergeinfo database. Let's just
> > > make the mergeinfo database have one table:
> > >
> > > CREATE TABLE mergeinfo
> > > ( path TEXT NOT NULL,
> > > revision INTEGER NOT NULL,
> > > action INTEGER NOT NULL );
> > >
> > > where action is an enum representing 'added', 'modified', or
> > > 'deleted'.
> > >
> > > svn_fs_mergeinfo__get_mergeinfo should not touch the database at all.
> > >
> > > svn_fs_mergeinfo__get_mergeinfo_for_tree will use the database to
> > > figure out which paths under P actually have mergeinfo, and use the FS
> > > to find out the actual mergeinfo data.
> > >
> > > (One side effect of this is that there will be a lot less work done to
> > > update the tables, which means less work done with the FSFS write lock
> > > held, which is good. My guess is that this change, if feasible, will
> > > improve the efficiency of commits. I have no idea what the
> > > performance impact on read ops will be; it'll keep the index table
> > > smaller, but require more FS access.)
> >
> > Actually, if you accept that the only operation we need to speed up is
> > "find all nodes with mergeinfo under path P", then even this
> > single-table solution is overkill.
> >
> > I think we can ditch sqlite entirely and still keep that operation efficient.
> >
> > Just add a new field to the node-rev hash: "number of nodes with
> > mergeinfo below me (including myself)". This number is pretty much
> > trivial to keep accurate, and then we only need to descend down paths
> > with it >0 in a recursive FS crawl.
>
> Is there some way for you to figure out the new information Kamesh
> needs from SQLite so we can see if that can also be obtained without
> having the database?

I'm not sure.

Kamesh, can you describe the sorts of queries that you need to do your
feature? Are they deeper than "get svn:mergeinfo for PATH@REV" and
"find children with mergeinfo"?

--dave

-- 
David Glasser | glasser_at_davidglasser.net | http://www.davidglasser.net/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Thu Nov 29 17:44:45 2007

This message: [ Message body ]
Next message: Michael Sinz: "Re: Do we need to keep mergeinfo in the mergeinfo database?"
Previous message: C. Michael Pilato: "Re: 'svn mergeinfo' and renamed branches."
In reply to: Mark Phippard: "Re: Do we need to keep mergeinfo in the mergeinfo database?"
Next in thread: Michael Sinz: "Re: Do we need to keep mergeinfo in the mergeinfo database?"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]