Re: RFC: Revision indexes for 1.1

From: Branko Čibej <brane_at_xbc.nu>
Date: 2004-04-28 00:20:52 CEST

Greg Hudson wrote:

>On Sun, 2004-04-25 at 13:53, Branko Čibej wrote:
>
>
>>So in the final analysis, yes, people won't run into the hard cases very
>>often, and when they do, it'll be because they're trying to diff or
>>merge unrelated things.
>>
>>
>
>I don't agree. It seems like a reasonable question to ask, "What
>changed in this repository between January and February of 2002?" and if
>we've given people the rope to have screwed that up in November of 2003
>by inserting mis-ordered revisions, we've done the user a disservice.
>
>We have a responsibility to define a semantic model which is simple and
>well-defined, not one that we think will just happen to work most of the
>time.
>
>
Yes, it is a pretty problem. The trouble is, of course, that we've not
really analysed our model enough. we definitely need _something_ that
maintains time order, but it doesn't necessarily have to be revisions --
after all, revisions are only aliases for transactions, and _those_ I
definitely agree must be ordered in time. The sad bit is that
transactions don't have an immutable date attached. That's something for
2.0 to solve. :-)

>>>>If we keep that restriction, there's no way optimize cvs2svn, which
>>>>means that people who start with a converted repository will keep
>>>>complaining about the size blowup.
>>>>
>>>>
>
>
>
>>I may have overstated that; it's probably not impossible, but very hard
>>because, to create optimal branches and tags from CVS, you have to
>>globally optimize the sequence of copies, which means you use up
>>enormous amouts of either and/or memory.
>>
>>
>
>(Either what and/or memory?)
>
>
(time. duh)

>I don't think we should be making deep semantic compromises in svn for
>the sake of efficiency gains in cvs2svn, and I strongly suspect the
>branch optimization problem isn't insurmountable.
>
>
Yup. I was convinced that cvs2svn was affected, but Karl says otherwise.
I'm a bit confused, but don't have time to go into cvs2svn details, so
I'll just believe what I hear.

>>>>>I've gotten the impression that cursor walks create locking issues in
>>>>>the BDB implementation.
>>>>>
>>>>>
>>>>I can't believe BDB needs more than two lock object to do a linear
>>>>cursor walk, unless you do the walk in a transaction. And there's no
>>>>need to do that, it being a read-only operation.
>>>>
>>>>
>>>But there might be write operations mucking with the table at the same
>>>time, and they need to do so in a transaction.
>>>
>>>
>>So what? That just means that two identical date queries don't
>>necessarily return the same range of revisions, but I don't see that as
>>a problem.
>>
>>
>
>Context, context. "So what" meaning "perhaps we'll have locking
>problems." I don't really understand what leads to BDB locking
>problems; I'm just relying on a statement from CMike that a cursor walk
>of the revisions table during a read-only operation has created locking
>issues in the past.
>
>
I don't believe we ever did even a read-only cursor walk outside of a
trail, therefore outside of a transaction (until lately, perhaps).

>>>You've lost me, a bit. Were you proposing that the revision indices
>>>would all be btree tables?
>>>
>>>
>
>
>
>>I was proposing that there be one table for all indexed revision props.
>>Of course it has to be a brtree table, how else can you use it as an
>>index and get any performance benefit? Well, really.
>>
>>
>
>You could use a hash table; the only reason to use a btree table would
>be for this date thing.
>
>
Now this is going to the level of detail I haven't really thought about.
I can see how a hash table could be better for certain classes of keys,
but a btree is more predictable given that we don't know what kind of
keys we'll have.

>>>It's true, your revision index feature is difficult (though I think not
>>>impossible) to implement within a libsvn_fs_fs design, and since I
>>>continue to think that it's of minimal value in general, I'm not very
>>>fond of it.
>>>
>>>
>>I can't agree that it is of minimal value. The fact that you can't do
>>efficient searches of revprops is a big limitation. Right now the only
>>fast index is the revision number, and I see this as a usability
>>misfeature because it makes CM tracking and reporting so much harder. It
>>may not be a big deal for your average student project, but it's fairly
>>major if you want to use SVN to implement any serious quality management
>>process.
>>
>>
>
>I feel like we have a fundamental conflict here. Subversion was
>originally conceived of as a version control tool,
>
(raises hand) I though 1.0 was conceived as a CVS replacement? That
doesn't impose constraints for the future...

> and with its current
>feature set we can have an implementation of it which is flexible and
>low-overhead. If we want to turn it into Clearcase, we'll lose that
>ability, because it will become too unwieldly to index the repository in
>all the desired ways without a Oracle-caliber database.
>
Oh, I can absolutely agree that we don't want to change Subversion into
ClearCase. If nothing else, we're light years ahead of CC in terms of
the fundamental CM model, and of course we don't want to take a step
back in that respect.

I know that's not what you meant by your comment. :-)

>Moreover, our
>learning curve will suffer as our command set grows to encompass a set
>of features most people will never need.
>
>
I don't buy the idea that a user can only effectively use a tool if she
can learn the UI by heart. And right now most people have trouble
understanding tagging in SVN; the labeling set of commands (which is
only one use of revision indexes) would make their life -- and ours! --
simpler.

>Of course, we could implement an SQL back end and let people build
>layered products on top of Subversion which take advantage of whatever
>indexing an SQL database can provide. But that's different from
>providing core Subversion features aimed at implementing a full-fledged
>CM system.
>
>
Yes, it is. And note that I'm not proposing, e.g., integrating change
management or document flow control into the core. Rather, I'm proposing
a generic feature that will _support_ the implementation of such tools
regardless of what kind of back-end you happen to use. Labels and date
indexes are just a spinoff.

I quite understand that you see SVN in a different context than I do; I
don't expect you're being paid for designing large-scale SCM solutions.
But even if you look at only the open-source community aspect, there are
projects out there that could immediately start using what I propose.
GCC comes to mind, for example (although they'll probably wait for arch
to mature... dunno), and there are many others of similar eminence and size.

Anyway, I think I can take a step back and agree to keep the revision
ordering constraint in 1.x, at least. I'd still like to use an index for
dates rather than doing the binary walk explicitly (what nonsense -- BDB
already does that for us), and I still think labels would be useful,
although they're really a feature on top of revision indexes -- but they
seem to be the killer app in the short term, heh.

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Apr 28 00:23:03 2004

This message: [ Message body ]
Next message: Greg Hudson: "Re: fs-test "unopened FS object" tests: fair or unfair?"
Previous message: Philip Martin: "Re: perl bindings: SVN::Client::status provokes assertion failure"
In reply to: Greg Hudson: "Re: RFC: Revision indexes for 1.1"
Next in thread: kfogel_at_collab.net: "Re: RFC: Revision indexes for 1.1"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]