Firstly, I just want to thank Dave for taking a look at this. I've been
waiting for a while for some more eyes to go over the "log -g" code, and
I'm glad he was able to take the time. I'd rather release a good
feature later than a half-baked one earlier any day.
David Glasser wrote:
> I have been experimenting with an alternate backend for svn:mergeinfo
> data over the last day. I have come to the conclusion that by
> delaying one feature to 1.6 (which was originally proposed as a 1.6
> feature anyway), we can vastly simplify the svn:mergeinfo backend,
> remove some pretty difficult bugs, and get a satisfactory 1.5 released
> relatively soon.
>
> Specifically, my experiments have taught me a few things:
>
> * Almost all of merge-tracking on trunk now requires no index at all;
> most of the queries are just trying to look up svn:mergeinfo on a
> specific path at a specific revision, which is exactly what the FS
> itself does.
>
> * ... except that the FS itself handles things like "copying a node
> copies everything below it" and "deleting a node deletes everything
> below it"; our current sqlite code does not handle that, making it
> easy to corrupt the index. And implementing that for the sqlite
> code would be tantamount to a complete Subversion-DAG-FS model
> implemented for our index.
>
> * The only command that requires a more sophisticated query against
> the index is "svn log -g", which essentially needs to do the query
> "at revision R, what are all the paths under P that have mergeinfo?"
>
> * I have a completely working implementation for FSFS that keeps
> enough metadata in the DAG itself to answer that question
> efficiently. So we really don't need to use sqlite for that.
> It would probably require a db format bump, but that's not too big a
> deal (and wouldn't really need a dump/load; I can give more details
> if you want). Hopefully the BDB implementation wouldn't be hard
> either.
Having this metadata would be very useful. The entire reason we are
using the mergeinfo index to look up child paths, is because we wanted
to avoid doing a recursive walk through the directory structure for
*every single revision* of a given path. If we need to eliminate the
mergeinfo index and/or we find a better way to capture this data, I'll
be the first one in line to use it.
> * My implementation does a little more error-checking than the sqlite
> implementation; specifically, the sqlite implementation didn't care
> if you asked for mergeinfo about paths that don't exist, whereas
> mine does (though it could suppress that error, of course). That
> extra checking is already showing me a bunch of bugs all throughout
> the client code, and especially in "log -g", where they're passing
> in the wrong paths.
>
> * Kamesh's issue-2897 branch would require more sophisticated queries.
> (And in fact I think that those queries might enable "log -g" to do
> its job better.) But it's controversial whether or not we should
> try to get issue 2897 in for 1.5; it's a big, big problem with no
> simple answer. In addition, this would require us to fix the
> serious bugs in the sqlite index mentioned above.
>
> I would like to propose the following:
>
> * We do not attempt to solve Issue 2897 for 1.5. It is probably
> possible to solve it, but it will take a lot of work, have lots of
> subtleties, etc.
>
> * We disable "log -g" for 1.5. "log -g" was originally proposed as a
> 1.6 command; it only switched to 1.5 because Hyrum finished his
> implementation. (And there are a lot of good things about log -g; I
> certainly have respect for Hyrum's work, and expect that 1.6 could
> contain a fixed version of it.) This would allow us to ignore the
> "log -g" bugs for now and focus on bugs more central to merge
> tracking as opposed to just this one auditing feature.
I'll be honest: "log -g" is kinda my baby (Summer of Code and all that).
It's a baby with warts, a few extra legs, and a deep burlesque voice,
but still my baby. :) However, I have absolutely *no reservation* about
pulling it (I'll even do the grunt work), if that means a better, sooner
1.5. There may be some issues with "log -g" which can't get resolved
until 2.0, and I'm fine with that.
I haven't looked at the bugs Dave mentions, so I don't know what kind of
effort would be involved with getting them fixed. "blame -g" does not
use the index beyond getting the mergeinfo for PATH@REV, so it should be
relatively safe from any mucking we do with the index. "blame -g"
probably has bugs of it's own, though. :)
> * Because we no longer need it, we remove the sqlite mergeinfo index
> from 1.5. This reduces a huge amount of code complexity in the FS
> backends, and lets us not worry about fixing the bugs in keeping the
> indices up to date. Because we don't need it, we don't use my
> extra-metadata-in-DAG thing either.
>
> When working on 1.6, we can solve #2897 and fix "log -g" with much
> more leisure to get it right. If fixing them requires retrying the
> sqlite index again, or my metadata idea, then so be it: we can add
> that code back in in 1.6 (it's all in version control) and make it
> work for those needs then.
>
> But I think we can make 1.5 much more solid and less complex by simply
> deferring #2897 and "log -g" to 1.6. 1.5 will still have a superset
> of svnmerge.py's features.
>
> (I don't mean to disrespect the hard work done on the sqlite backend,
> "log -g", or issue-2897 here. I just think that these are difficult
> problems to solve, and that making a release that doesn't try to solve
> them and fixing them with more leisure is better than trying to do
> everything at once and being full of bugs.)
I'd like to echo the concern Karl brought up elsethread. If we do yank
a couple of features, we need to be very open about what 1.5 is and
isn't. We've been touting 1.5 a long time, and lots of folks have been
looking forward to it. If we start yanking features, we need to be
*very careful* about managing expectations, and helping people know
what's in 1.5, and what will be coming later.
-Hyrum
Received on Fri Nov 30 03:04:24 2007