[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

taking part of a repository: beyond svndumpfilter

From: Jay Berkenbilt <ejb_at_ql.org>
Date: 2005-10-12 06:22:54 CEST

I'd like to extract from my subversion repository the entire history
of some files specified by their paths in the latest revision of the
repository as they've moved around throughout various layouts,
branches, and so forth. It seems to me that if I use svndumpfilter
include /some/path, then any transaction to a file that was at or
below /some/path *at the time of the transaction* will be included,
whereas I want would be to include historical information only about
files at a certain location in the HEAD revision even if they lived in
other places at other times. My repository is too big and complicated
to even think about creating a set of include paths that would cover
all these variations. This is a 1.3-GB repository converted six
months ago from CVS that includes 10 years of history on 300 software
products, many of which are tiny utilities customized to perform a
specific task for a specific client. There are almost 27,000
revisions in the repository. There are some general-purpose tools
that would have useful lives on their own that I want to extract with
their history intact. Before I write my own code to do this, I
thought I'd post here to see if anyone has any suggestions of
something I may have overlooked or something that may already exist.
(I did check the links section of subversion.tigris.org before
posting.)

Here's an example of why what I want to do is complicated. I have a
library that I want to extract so that it can be released as an open
source project, and I want to preserve its history. A particular
source file in a library may have originated as a source file in some
specific application before we realized that it should be library
code. That source file may have had various changes on various side
branches. So the history of that library file may actually intersect
with some path in the repository that I don't want to include.
Quoting from the subversion book's information on svndumpfilter:

   Also, copied paths can give you some trouble. Subversion supports
   copy operations in the repository, where a new path is created by
   copying some already existing path. It is possible that at some
   point in the lifetime of your repository, you might have copied a
   file or directory from some location that svndumpfilter is
   excluding, to a location that it is including. In order to make the
   dump data self-sufficient, svndumpfilter needs to still show the
   addition of the new path—including the contents of any files
   created by the copy—and not represent that addition as a copy from
   a source that won't exist in your filtered dump data stream. But
   because the Subversion repository dump format only shows what was
   changed in each revision, the contents of the copy source might not
   be readily available. If you suspect that you have any copies of
   this sort in your repository, you might want to rethink your set of
   included/excluded paths.

The basic functionality I'm thinking about implementing is to give a
program a list of paths that I care about, where these paths are all
defined with respect to the latest revision. Then I'd have to work
backwards through the repository to figure out which nodes in which
revisions contribute to each file's history, and just keep those parts
of those revisions. I could do this in two passes by processing the
dump file, but I'm struck with the realization that this feels like
reinventing the very way in which subversion stores revision
information about nodes. Still, I can't imagine that I'm the only one
who's ever wanted to do this. I'm not really looking for help on how
to implement this -- I have a pretty clear idea of how to implement it
from the dump file. I'm really looking for tips as to whether
something may already exist or whether someone else has already solved
this problem on a large scale.

-- 
Jay Berkenbilt <ejb@ql.org>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Wed Oct 12 06:24:34 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.