[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: RFC: Revision indexes for 1.1

From: Branko Čibej <brane_at_xbc.nu>
Date: 2004-04-25 18:53:14 CEST

Greg Hudson wrote:

>On Thu, 2004-04-22 at 20:33, Branko Čibej wrote:
>
>
>>The API problem is that svn_repos_dated_revision looks for _one_
>>revision using _one_ date. So if your revisions aren't ordered by date,
>>you can't just use this function to find the first and last revision in
>>a date range and rely on the result. Instead, we need a new function
>>that accepts a start and end date, and returns a list of revision ranges
>>that fall within those two dates.
>>
>>
>
>I'm not sure what you could do with that information, though. If you've
>got mis-ordered dates such that "{2004-10-10}:{2004-10-11}" results in
>revs 4, 5, 800, and 6, in that order, what does "svn diff -r
>{2004-10-10}:{2004-10-11}" do?
>
>
You seem to be forgetting that we also filter by path, not only by date.
The client needs an intersection of the set of revisions in which a
subtree changed, and the set of dates belonging to a range.

There are two use cases for non-ordered revisions:

    * cvs2svn (or any other conversion) can delay branch and tag
      creation, and optimize them better
    * merging repositories or importing (via svnadmin load) some
      project's history

In both cases the you'd typically ise date-range queries on a path where
locally the revisions are ordered, and therefore the intersection of
changes and dates is a continuous range of revisions.

>I remain convinced that enforcing date order is the only sane path to
>follow.
>
>
If we keep that restriction, there's no way optimize cvs2svn, which
means that people who start with a converted repository will keep
complaining about the size blowup.

>> The implementation of such a function
>>is trrivial and involves only a cursor traverse of the revision index,
>>not touching revision props at all.
>>
>>
>
>I've gotten the impression that cursor walks create locking issues in
>the BDB implementation.
>
I can't believe BDB needs more than two lock object to do a linear
cursor walk, unless you do the walk in a transaction. And there's no
need to do that, it being a read-only operation.

> And it's also possible to imagine a repository
>getting big enough that a cursor walk of a table containing N revisions
>is too expensive, if getting a revision by date is a common operation.
>
>
You obviously don't walk the whole table; you start with the smallest
matching index and stop when you've passed the largest one.

>It seems like the best implementation would be a btree database where we
>could perform a binary search to find the closest match, and then walk
>forward or backward by one key as necessary, except BDB doesn't seem to
>have a "get the closest match in a btree database" operation, or a "walk
>forward or backward by one key from a specified starting point in a
>btree database" operation. Obviously, such a thing wouldn't dovetail
>with the revision indexes plan, if we could implement it at all.
>
>
Huh? DBcursor->c_get with DB_SET_RANGE, then DB_NEXT as long as
necessary (or DB_NEXT_DUP, depending on what you're doing), will do
exactly that, and will work marvelously with the proposed revision indexes.

Not to mention that it takes a single SQL query. But it might be a bit
hard to do in libsvn_fs_fs, I imagine. :-)

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Apr 25 19:24:06 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.