[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Finding other descendants - efficiently

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: 2007-04-03 17:06:56 CEST

Michael Sinz wrote:
> The question of finding the copy sources (see "[RFC] Identifying
> copy/move sources" email thread) rekindled a "need" to do the inverse
> - finding where a file/revision was branched/copied to.
> That is, it would be great to find where a specific revision of a file
> was copied to such that I could, for example, track down all branches
> that may need a specific fix or other "genealogy" type attributes from
> the trunk or source file. (One that I was hoping to provide is a way
> to find the "branches" that could be merged back to this part of
> trunk, but that is being solved in our development process and naming
> conventions)

This is a very common request from CollabNet's Subversion-using customers,
and a key part of the auditability aspect of version control.

> Right now, this process is hard for a few reasons:
> 1) Using normal interfaces you can only really find the source of a
> copy, not the destination(s).

"normal interfaces" ?

> 2) Even if you get the whole repository log (including the path
> details which is needed to find the copies) it does not always
> indicate the file but just the directory. This complicates things,
> especially when the tag/branch came from a mixed revision working
> copy.

Complicates, yes, but I think the data integrity is still intact. What's
*not* there that really, really ought to be, though, is a way to distinguish
files from directories in the log changed-paths list. :-(

> 3) Even if we can solve the #2 complexity, there is the fact that
> getting the whole log from a large repository with many revisions is
> very, very costly.

Yup. (Though softened if you implement a client-side cache of
already-parsed revisions.)

> Now, I understand that this may not be easily done. In fact, given
> how the FS works, there is little in the way of any forward linkages
> (hey, it is a DAG, so why would there be). Also, the fact that past
> revisions do not get modified is a major win for a number of reasons,
> so where would such information be stored. (I have thought about
> doing this in a post-commit hook and storing some path/revision texts
> within a revprop but...)
> Does anyone have a better idea of what could be done? Maybe the
> performance and size of the log issue will get greatly reduced by the
> --only-copies capability. That may make generating the genealogy data
> a bit easier for larger systems. (Maybe that is really the only
> reasonable answer at this time?)

The algorithm for answering the question, "Where live all the files whose
history includes PATH@REV?" involves finding all copies made from PATH
between REV and HEAD, plus those made from any of PATH's parent directories
between those same revisions, plus copies of those copies, copies of the
copies of those copies, etc., until all successors have been checked. I
think you might get to rule out soft-copies (since there should be some real
copy that covers that case elsewhere in the crawl). And having successor
links in the FS DAG (just like we have predecessor links) would certainly
help alot. Being able to look at a node and say, "where are your
descendents" and getting a list that includes the next version in that line
of history plus all the nodes made as copies of the one in question helps.
But I think you're still looking at a sort of brute force crawl.

C. Michael Pilato <cmpilato@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Received on Tue Apr 3 17:07:11 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.