Re: RFC: Revision indexes for 1.1

From: Branko Èibej <brane_at_xbc.nu>
Date: 2004-04-25 19:53:29 CEST

Greg Hudson wrote:

>On Sun, 2004-04-25 at 12:53, Branko Èibej wrote:
>
>
>>>I'm not sure what you could do with that information, though. If you've
>>>got mis-ordered dates such that "{2004-10-10}:{2004-10-11}" results in
>>>revs 4, 5, 800, and 6, in that order, what does "svn diff -r
>>>{2004-10-10}:{2004-10-11}" do?
>>>
>>>
>>You seem to be forgetting that we also filter by path, not only by date.
>>The client needs an intersection of the set of revisions in which a
>>subtree changed, and the set of dates belonging to a range.
>>
>>
>So, what happens if I run that command on the root of the repository?
>Your heuristic analysis doesn't seem very convincing to me; you seem to
>be saying "eh, people probably won't run into the hard cases very
>often."
>
>
I forgot to type part of the answer, regarding your example.

First, note that svn diff does _not_ operate on a range of revisions, it
operates on two specific revisions. The cmdline syntax is the same, of
course, but there's a subtle difference in semantics.

Now, if a date range operation returns more than one revision range,
then obviously svn diff can't use it and will error out. Or we could
decide that in this case the dates are definitive, and diff could use
the two closest revisions that match the dates. But diff isn't the only
operation that uses ranges; branch and merge do, too, and they can work
just fine with a list of revisions. They don't now, but they can.

Regarding the "not so often": I don't propose to drop _all_ revision
ordering. Obviously it's a good idea to try to keep them ordered, and
revisions arising from normal commits would remain so. That means that
in most cases, a date-range search would return a single revision range.

When merging subtrees of repositories (e.g., for some sort of repo
replication), revisions would also remain ordered in a particular subtree.

So in the final analysis, yes, people won't run into the hard cases very
often, and when they do, it'll be because they're trying to diff or
merge unrelated things.

>>>I remain convinced that enforcing date order is the only sane path to
>>>follow.
>>>
>>>
>>If we keep that restriction, there's no way optimize cvs2svn, which
>>means that people who start with a converted repository will keep
>>complaining about the size blowup.
>>
>>
>I have a hard time believing this, but I'm at a bit of a disadvantage
>since I don't follow cvs2svn development. Perhaps you can spell out
>where the conflict lies.
>
>
I may have overstated that; it's probably not impossible, but very hard
because, to create optimal branches and tags from CVS, you have to
globally optimize the sequence of copies, which means you use up
enormous amouts of either and/or memory.

>>>I've gotten the impression that cursor walks create locking issues in
>>>the BDB implementation.
>>>
>>>
>>I can't believe BDB needs more than two lock object to do a linear
>>cursor walk, unless you do the walk in a transaction. And there's no
>>need to do that, it being a read-only operation.
>>
>>
>But there might be write operations mucking with the table at the same
>time, and they need to do so in a transaction.
>
>
So what? That just means that two identical date queries don't
necessarily return the same range of revisions, but I don't see that as
a problem.

Again, remember that revisons will not be out of order unless you
specifically fiddle with svnadmin load (or equivalent) to make them so,
or change svn:date. You can't do that by accident, and I'm inclined to
assume that the repos administrator knows what she's doing.

>>> And it's also possible to imagine a repository
>>>getting big enough that a cursor walk of a table containing N revisions
>>>is too expensive, if getting a revision by date is a common operation.
>>>
>>>
>
>
>
>>You obviously don't walk the whole table; you start with the smallest
>>matching index and stop when you've passed the largest one.
>>
>>
>
>You've lost me, a bit. Were you proposing that the revision indices
>would all be btree tables?
>
>
I was proposing that there be one table for all indexed revision props.
Of course it has to be a brtree table, how else can you use it as an
index and get any performance benefit? Well, really.

>>>except BDB doesn't seem to
>>>have a "get the closest match in a btree database" operation
>>>
>>>
>>Huh? DBcursor->c_get with DB_SET_RANGE
>>
>>
>Ah, good to know.
>
>
>>Not to mention that it takes a single SQL query. But it might be a bit
>>hard to do in libsvn_fs_fs, I imagine. :-)
>>
>>
>It's true, your revision index feature is difficult (though I think not
>impossible) to implement within a libsvn_fs_fs design, and since I
>continue to think that it's of minimal value in general, I'm not very
>fond of it.
>
>
I can't agree that it is of minimal value. The fact that you can't do
efficient searches of revprops is a big limitation. Right now the only
fast index is the revision number, and I see this as a usability
misfeature because it makes CM tracking and reporting so much harder. It
may not be a big deal for your average student project, but it's fairly
major if you want to use SVN to implement any serious quality management
process.

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Apr 25 19:55:46 2004

This message: [ Message body ]
Next message: Greg Hudson: "Re: RFC: Revision indexes for 1.1"
Previous message: Branko Èibej: "Re: RFC: Revision indexes for 1.1"
In reply to: Greg Hudson: "Re: RFC: Revision indexes for 1.1"
Next in thread: Greg Hudson: "Re: RFC: Revision indexes for 1.1"
Reply: Greg Hudson: "Re: RFC: Revision indexes for 1.1"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]