Firstly, I just want to thank Dave for taking a look at this. I've been
waiting for a while for some more eyes to go over the "log -g" code, and
I'm glad he was able to take the time. I'd rather release a good
feature later than a half-baked one earlier any day.
David Glasser wrote:
> I have been experimenting with an alternate backend for svn:mergeinfo
> data over the last day. I have come to the conclusion that by
> delaying one feature to 1.6 (which was originally proposed as a 1.6
> feature anyway), we can vastly simplify the svn:mergeinfo backend,
> remove some pretty difficult bugs, and get a satisfactory 1.5 released
> relatively soon.
> Specifically, my experiments have taught me a few things:
> * Almost all of merge-tracking on trunk now requires no index at all;
> most of the queries are just trying to look up svn:mergeinfo on a
> specific path at a specific revision, which is exactly what the FS
> itself does.
> * ... except that the FS itself handles things like "copying a node
> copies everything below it" and "deleting a node deletes everything
> below it"; our current sqlite code does not handle that, making it
> easy to corrupt the index. And implementing that for the sqlite
> code would be tantamount to a complete Subversion-DAG-FS model
> implemented for our index.
> * The only command that requires a more sophisticated query against
> the index is "svn log -g", which essentially needs to do the query
> "at revision R, what are all the paths under P that have mergeinfo?"
> * I have a completely working implementation for FSFS that keeps
> enough metadata in the DAG itself to answer that question
> efficiently. So we really don't need to use sqlite for that.
> It would probably require a db format bump, but that's not too big a
> deal (and wouldn't really need a dump/load; I can give more details
> if you want). Hopefully the BDB implementation wouldn't be hard
Having this metadata would be very useful. The entire reason we are
using the mergeinfo index to look up child paths, is because we wanted
to avoid doing a recursive walk through the directory structure for
*every single revision* of a given path. If we need to eliminate the
mergeinfo index and/or we find a better way to capture this data, I'll
be the first one in line to use it.
> * My implementation does a little more error-checking than the sqlite
> implementation; specifically, the sqlite implementation didn't care
> if you asked for mergeinfo about paths that don't exist, whereas
> mine does (though it could suppress that error, of course). That
> extra checking is already showing me a bunch of bugs all throughout
> the client code, and especially in "log -g", where they're passing
> in the wrong paths.
> * Kamesh's issue-2897 branch would require more sophisticated queries.
> (And in fact I think that those queries might enable "log -g" to do
> its job better.) But it's controversial whether or not we should
> try to get issue 2897 in for 1.5; it's a big, big problem with no
> simple answer. In addition, this would require us to fix the
> serious bugs in the sqlite index mentioned above.
> I would like to propose the following:
> * We do not attempt to solve Issue 2897 for 1.5. It is probably
> possible to solve it, but it will take a lot of work, have lots of
> subtleties, etc.
> * We disable "log -g" for 1.5. "log -g" was originally proposed as a
> 1.6 command; it only switched to 1.5 because Hyrum finished his
> implementation. (And there are a lot of good things about log -g; I
> certainly have respect for Hyrum's work, and expect that 1.6 could
> contain a fixed version of it.) This would allow us to ignore the
> "log -g" bugs for now and focus on bugs more central to merge
> tracking as opposed to just this one auditing feature.
I'll be honest: "log -g" is kinda my baby (Summer of Code and all that).
It's a baby with warts, a few extra legs, and a deep burlesque voice,
but still my baby. :) However, I have absolutely *no reservation* about
pulling it (I'll even do the grunt work), if that means a better, sooner
1.5. There may be some issues with "log -g" which can't get resolved
until 2.0, and I'm fine with that.
I haven't looked at the bugs Dave mentions, so I don't know what kind of
effort would be involved with getting them fixed. "blame -g" does not
use the index beyond getting the mergeinfo for PATH@REV, so it should be
relatively safe from any mucking we do with the index. "blame -g"
probably has bugs of it's own, though. :)
> * Because we no longer need it, we remove the sqlite mergeinfo index
> from 1.5. This reduces a huge amount of code complexity in the FS
> backends, and lets us not worry about fixing the bugs in keeping the
> indices up to date. Because we don't need it, we don't use my
> extra-metadata-in-DAG thing either.
> When working on 1.6, we can solve #2897 and fix "log -g" with much
> more leisure to get it right. If fixing them requires retrying the
> sqlite index again, or my metadata idea, then so be it: we can add
> that code back in in 1.6 (it's all in version control) and make it
> work for those needs then.
> But I think we can make 1.5 much more solid and less complex by simply
> deferring #2897 and "log -g" to 1.6. 1.5 will still have a superset
> of svnmerge.py's features.
> (I don't mean to disrespect the hard work done on the sqlite backend,
> "log -g", or issue-2897 here. I just think that these are difficult
> problems to solve, and that making a release that doesn't try to solve
> them and fixing them with more leisure is better than trying to do
> everything at once and being full of bugs.)
I'd like to echo the concern Karl brought up elsethread. If we do yank
a couple of features, we need to be very open about what 1.5 is and
isn't. We've been touting 1.5 a long time, and lots of folks have been
looking forward to it. If we start yanking features, we need to be
*very careful* about managing expectations, and helping people know
what's in 1.5, and what will be coming later.
Received on Fri Nov 30 03:04:24 2007