Re: Do we need to keep mergeinfo in the mergeinfo database?

From: Mark Phippard <markphip_at_gmail.com>
Date: 2007-11-29 14:07:27 CET

On Nov 29, 2007 12:47 AM, David Glasser <glasser@davidglasser.net> wrote:
>
> On Nov 28, 2007 2:25 PM, David Glasser <glasser@davidglasser.net> wrote:
> > So I noticed (http://svn.haxx.se/dev/archive-2007-11/1168.shtml) that
> > we can currently make the mergeinfo db be out of sync with the
> > svn:mergeinfo properties. What this means is that the indexing code
> > needs to be fixed. Now, the indexing code is pretty complicated; it
> > has all sorts of "parse and unparse mergeinfo in and out of the table"
> > stuff. We certainly can fix it. But this set me off to figuring out
> > what we're actually using the mergeinfo code for.
> >
> >
> > CURRENT USE
> > ===========
> >
> > Currently (and yes, this is ignoring the issue-2897 branch, though
> > whether or not that branch is doing the right thing is still under
> > debate) mergeinfo-sqlite-index.c only contains three SELECT
> > statements (not counting an optimization done during commit). There
> > are also only two APIs that the FS modules use to read from the DB:
> > svn_fs_mergeinfo__get_mergeinfo and
> > svn_fs_mergeinfo__get_mergeinfo_for_tree.
> >
> > SELECT mergedfrom, mergedrevstart, mergedrevend, inheritable FROM
> > mergeinfo WHERE mergedto = ? AND revision = ? ORDER BY mergedfrom,
> > mergedrevstart;
> >
> > I.e., "get the mergeinfo on MERGEDTO@REVISION". Semantically
> > equivalent to just doing svn_fs_node_prop(&value, rev_root(REVISION),
> > MERGEDTO, "svn:mergeinfo"). (And in fact one of the two APIs
> > converts it back into a string anyway.) Note that the code guarantees
> > that REVISION is a rev where mergeinfo changed on the path.
> >
> > SELECT MAX(revision) FROM mergeinfo_changed WHERE path = ? AND
> > revision <= ?;
> >
> > Ie, "find the last revision before REVISION where mergeinfo changed on
> > PATH".
> >
> > SELECT MAX(revision), path FROM mergeinfo_changed WHERE path LIKE ?
> > AND revision <= ? GROUP BY path;
> >
> > (PATH is always "some/path/%" here.)
> > Ie, "find the paths under PATH which have mergeinfo at REVISION (or
> > have ever had
> > mergeinfo before REVISION, I think?) and return them with the last
> > changed revision".
> >
> >
> >
> > OBSERVATIONS
> > ============
> >
> > Note a couple things here:
> >
> > * The first SELECT statement is completely redundant with
> > svn_fs_node_prop.
> >
> > * The first SELECT statement is the only thing that uses any column
> > other than path name (mergedto/path) and revision.
> >
> > * The only reason for the second SELECT statement is to make sure to
> > pass the right revision to the first SELECT statement. In fact, if
> > we just looked on the DB itself, the second statement wouldn't be
> > necessary at all.
> >
> > * The third SELECT statement is actually doing something interesting:
> > it's doing a prefix-match. That is, it's doing the equivalent of a
> > recursive tree-walk in the FS, without actually needing to walk the
> > tree. (And as of r28077, this actually gets to use an index.)
> >
> >
> >
> > PROPOSAL
> > ========
> >
> > Let's move the mergeinfo out of the mergeinfo database. Let's just
> > make the mergeinfo database have one table:
> >
> > CREATE TABLE mergeinfo
> > ( path TEXT NOT NULL,
> > revision INTEGER NOT NULL,
> > action INTEGER NOT NULL );
> >
> > where action is an enum representing 'added', 'modified', or
> > 'deleted'.
> >
> > svn_fs_mergeinfo__get_mergeinfo should not touch the database at all.
> >
> > svn_fs_mergeinfo__get_mergeinfo_for_tree will use the database to
> > figure out which paths under P actually have mergeinfo, and use the FS
> > to find out the actual mergeinfo data.
> >
> > (One side effect of this is that there will be a lot less work done to
> > update the tables, which means less work done with the FSFS write lock
> > held, which is good. My guess is that this change, if feasible, will
> > improve the efficiency of commits. I have no idea what the
> > performance impact on read ops will be; it'll keep the index table
> > smaller, but require more FS access.)
>
> Actually, if you accept that the only operation we need to speed up is
> "find all nodes with mergeinfo under path P", then even this
> single-table solution is overkill.
>
> I think we can ditch sqlite entirely and still keep that operation efficient.
>
> Just add a new field to the node-rev hash: "number of nodes with
> mergeinfo below me (including myself)". This number is pretty much
> trivial to keep accurate, and then we only need to descend down paths
> with it >0 in a recursive FS crawl.

Is there some way for you to figure out the new information Kamesh
needs from SQLite so we can see if that can also be obtained without
having the database?

-- 
Thanks
Mark Phippard
http://markphip.blogspot.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Thu Nov 29 14:07:38 2007

This message: [ Message body ]
Next message: Paul Burba: "RE: svn switch creates local mods"
Previous message: C. Michael Pilato: "Re: svn switch creates local mods"
In reply to: David Glasser: "Re: Do we need to keep mergeinfo in the mergeinfo database?"
Next in thread: David Glasser: "Re: Do we need to keep mergeinfo in the mergeinfo database?"
Reply: David Glasser: "Re: Do we need to keep mergeinfo in the mergeinfo database?"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]