Re: keeping legacy repo in sync with refactored repo

From: Holger Stratmann <tigris_at_finch.de>
Date: Tue, 03 Mar 2009 01:54:16 +0100

Tyler wrote:
> Guten Tag Holger,

Guten Abend Tyler ;-)
(well, it's the middle of the night for me...)

I don't want to "hijack" the thread or turn it into a dialog (which
sometimes happen with long responses), so I'll post this now and then I
give everybody else a chance to comment ;-)
I'll follow the thread, though (so go ahead and answer my questions...)
and might post a little more a little later ;-)

So here we go:

> There will be ongoing development (commits) in both repositories.
>
> We are doing this because one of our products will do another release
> from the legacy repository. At the same time, several other products are
> migrating to a brand new structure and a brand new build system.

Yes.
I still *really* don't see why you need a new repository for that.

The big question here (IMHO) is:
What does a new repository do for you that a branch wouldn't?

>
> The product that will live on in the legacy tree is big and complex and
> has an old, crufty build system. One of the reasons for going to the
> refactored repo is to break that product up into smaller pieces and to
> replace the crufy build system with something modern.
>
Yes, that explains the need for the refactoring (which I never
questioned in the first place...), but it still doesn't explain the need
for a new repository.
>
> I recognize that this is a recipe for disaster, so changes to some
> products will be made only in the legacy repo while changes to other
> products will be made only in the refactored repo. This will be
> enforced via education/policy or (when that doesn't work) by pre-commit
> hooks.
>
Good idea.
A bit simpler than pre-commit hooks: Access control. You can just set
certain directories to "read-only". Really simple with svn_access-rules.

> I definitely prefer retaining all the history, but it might be
> acceptable to have a cut -- e.g. All revisions before
> $ARBITRARY_TIMESTAMP are in this repository; if you need something
> older, look in the old repository.

That would work, but it still wouldn't be nearly as nice.
It would "break" stuff like blame for quite a while (until the switch is
"so long ago" that nobody cares any more - which in my experience takes
2-5 years, especially for "blame" *g* - for a very old product, it might
take longer...).

>
> What would be best is to have it happen continuously. The way I see it,
> every commit in the legacy repo triggers a post-commit hook that dumps
> the committed revision, massages it, and sends it to the refactored repo
> for import.
I don't want to belabor the point (well, maybe I do *g*), but you could
do just the same thing with just one repository and two branches,
couldn't you?
You'd even have quite a few advantages with one repository like merge
tracking (see below) or even merges at all...
> Conflicts or other abnormal conditions would require manual
> intervention.
>
Hmm, yes. And what happens during the time until the conflict is
resolved (manually)?
Nobody else is allowed to commit? How would you ensure that?
Alternatively, your "automatic commits" would need to be stopped until
the conflict is resolved and then restarted. What if another conflict
happens then? That could be a while after the (other) developer
committed and he might've gone home already.
With your system, it would be almost impossible to "switch" the order of
the synchronization (i.e. copy revision 1235 before copying revision
1234), even if they affect completely different sets of files.

>
> We will be releasing new versions of products from each repo for the
> next year or so, so we need to be able to have both "views" on the
> source code simultaneously.
>
> It might be possible to handle this with branches (as you suggest below)
> but I feel like that would turn into a tangled mess very quickly.

It might, but IMHO, it would be (much!) less messy and less tangled than
two repositories.
Here's my challenge for you: Construct a scenario that's easy with two
repositories, but hard with two branches ;-)))
In return, I'll give you a hand full of scenarios that are easier with
branches ;-)

> Would
> merges from the "legacy branch" to the "refactored branch" work
> correctly if one of those branches has a bunch of files moved to
> different paths?
All in all: yes!
It wouldn't be "fully automatic" of course. You'd probably (? - I don't
know how 1.6 behaves - supposedly, it has a few improvements in this
area) still have to tell Subversion which file and directory to merge to
which new location, BUT that would still be soooooo much easier and less
error prone than two different repositories.
Subversion would do most of the work for you, would automatically
preserve changes you made in the new structure and would automatically
warn you if things go wrong (merge conflicts) instead of just silently
overwriting and destroying changes - which might happen with your "repo
to repo copy script".
All the cases that would be easy with a script (trivial merges or no
merges at all) would be even easier with "semi-automatic-semi-scripted"
merges.

Ok, you wouldn't be able to simply do:
svn merge legacy/trunk -> refactored/trunk (not using strict svn syntax
here... :-))

BUT this should work:
svn merge legacy/trunk/root/dir1 -> refactored/trunk/product/src
svn merge legacy/trunk/root/dir2 -> refactored/trunk/product/doc
svn merge legacy/trunk/root/dir3 -> refactored/trunk/product/lib
svn merge legacy/trunk/root/funnyfile.txt ->
refactored/trunk/product/misc/funnythings/file1.txt

You would need a "translation map" (as far as I understand the current
merge capabilities of Subversion and the consequences of the lack of
true renames... maybe you want to ask a separate question about this if
this is the path you're going).
BUT Subversion would
a) do the "merge" for you even if there were changes in the target...
b) track the changes for you and tell you which changes were already
merged. You wouldn't even have to merge the revisions one by one and (!)
you wouldn't have to remember which revisions are already merged. You
could just run the same script over and over...
Let's say a developer made changes that are "non-trivial" to merge. S/He
could just perform the merge (assuming they know what they're doing) and
wouldn't even (necessarily) have to tell you, because Subversion would
record the merge and your script would automatically know that this
particular revision has already been merged manually.

Don't forge to put your script into the repository and keep it current.
Developers can use it to merge their on changes any time they feel like
it... (or when you tell them to:-))

> Would a developer be able to follow and understand the
> history after a few of these merges?
>

Yes, because the "normal history" ("svn log" in the new repository
branch/location) would show the merges as new revisions just as it would
for your synchronization script.
Added bonus: IF somebody is curious, they could still see more info
about the merges.

Example:
If you "copy the changes from one repository to the other", who would be
the "author" of the changes? It would probably be either you or some
"system user", right?
With a branch, the history would look the same, BUT with a special
switch (which I'd have to look up), it would show the "real" author of
the change... This would only be "on special request".
(ah yes, here it is:
http://svnbook.red-bean.com/en/1.5/svn.branchmerge.advanced.html#svn.branchmerge.advanced.logblame)

If you decide to merge/sync every revision separately, you could work
around this by knowing everybody's password... (not so nice either...
people might lose faith in the logs after your sync made a mistake or
two...).

>> 2) why you want to keep them in sync for a while instead of switching
>> "Sunday, 12:00:00 PST" (or whichever timezone you choose ;-)).
>
> Products will be released from both repositories simultaneously.
>
Uhh... so?
* What do you mean: ONE product will be released from both repositories
simultaneously? Or one product from the old one and one product from the
new one?

* Where is the problem? Your build system can build from a branch,
right? Of course you wouldn't *delete* the old structure...

>> Personally, I think it would be better to have a "big bang" change for
>> each project (not necessarily the whole repo) AND use "svn move" to do
>> the refactoring.
>> You'd keep your entire history, which would be a big plus.
>
> The projects are, of course, tightly-coupled (In fact, I have been known
> to refer to their relationship as 'incestuous' :)) so that moving each
> project independently is not easy.
>
Oh wow.
In this case I have a follow-up question:
Let's say you copy stuff to the new repository and move everything to
its new location: Will the source code compile (what type of products
are we talking about?) without changes??
If you need to make any changes to the code to "make it
work/compile/validate/...", your "two-repository-approach" is doomed...
(IMHO *g*). How would you preserve those changes when copying changes
from the old repository?

You say: "changes to some products will be made only in the legacy repo
while changes to other

products will be made only in the refactored repo"

So you copy A+B+C over to the new repo. A uses B. B uses C.
A+C is maintained in the new repository, B in the old one.
Now somebody makes changes to B in the old repository. What do you do? If you "copy the changes over", it will not work any more because the references to C are broken (once again)...

I don't know how many products you have, but in theory, you'd have to build a dependency graph and be VERY careful about the order of migration... if they are incestuous, there might be no way to solve this...

>> P.S.: Of course you would make a tag or a branch before you start the
>> refactoring: If people still need access to the old structure, they can
>> have it. They can even commit changes to the old structure and merge
>> them to the new structure "in a somewhat deterministic fashion"... (I'm
>> not sure how well Subversion handles this type of merge (to files which
>> have moved) and if it would be considered a "tree conflict"? In any
>> case: Merge tracking could help you keep track of merges...)
>
> If anyone has experience with this, I'd love to hear it. I'll set up a
> test repo and play with it when I have time but if someone knows
> something already that would be helpful.
>
> Thanks for your questions and advice, Holger. Hopefully this makes
> things a little clearer and y'all can give me more suggestions!
The more I think about this, the more it looks like "just another
release branch" - with radical changes in the trunk, but so what...
You're still building (and delivering) your 1.x versions from the branch
while refacoring and building the 2.x versions in the trunk...
Still, most of the procedures for doing this are established and while
it won't be easy, it's just a question of using the tools and
configuring (!) everything correctly rather than developing a complex
solution from scratch... In my mind, the scope (amount of work involved)
of this project just decreased by 85% :-D

Hope that helps :-D

Holger
>
> Thanks,
> tyler
>
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1259056

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-03-03 01:55:34 CET

This message: [ Message body ]
Next message: Dale J. Chatham: "What happened to List-Id header?"
Previous message: Judy H: "Eclipse with multiple Subversions - HELP"
In reply to: Tyler: "Re: keeping legacy repo in sync with refactored repo"
Next in thread: Stephen Connolly: "Re: keeping legacy repo in sync with refactored repo"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]