[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Distributed Subversion

From: Nathan Nobbe <quickshiftin_at_gmail.com>
Date: Mon, 8 Jun 2009 12:10:30 -0600

On Mon, Jun 8, 2009 at 9:04 AM, David Weintraub <qazwart_at_gmail.com> wrote:

> On Mon, Jun 8, 2009 at 4:49 AM, Marko Käning <mk362_at_mch.osram.de> wrote:
> > what are your users running as an operating system? in this situation,
> it
> > sounds like a true dcvs would excel. and actually, if your users arent
> on
> > windows, id wholeheartedly recommend git-svn.
> >
> > I'd also recommend to go for a real distributed VCS!
>
> Sooner or later, even distributed repositories have to send information
> back over the network back to the original repository. And, if there are
> network issues, then you'll have the same issues all over again.

i think youll find yourself hitting the network much less often than you do
as for other operations. lets take a quick look,

SVN:
 diff, log, merge, co, branch

all require network IO.. and i can do all of these offline w/ dvcs. even
where i work, on a high performance intranet, repository introspection, and
diff'ing are way, *way* faster w/ git-svn. over the WAN the savings will be
substantial to say the least.

There are some real problems with distributed repository systems that most
> people tend to brush off. They feel if Git is good enough for Linux, it's
> good enough for them, but they don't understand the issues that Git is
> trying to solve:
>
> The big issue for Linux is how do you let tens of thousands of people make
> contributions without having to maintain tens of thousands of developers in
> a database. A distributed repository handles this well. I could checkout
> from Linus Tolvard's version of the Linux repository, make my changes, but I
> doubt Linus will accept them.

scalability is only one problem dvcs attempts to address.. it just happens
that its turning out better than conventional centralized systems on several
levels. one amazing aspect for example is that it scales beautifully and
naturally, and yet, its also way more convenient for small personal projects
as well. its so much easier to cd into a new directory and quickly ordain
it as being versioned than it is to roll out a new repo in any of a number
of centralized systems including svn. in fact, as youll see once you start
working w/ dvcs tools, limitations in conventional tools encourage bad
practices, like putting several projects into a single repo, which frequents
centralized systems.

> Instead, what I need to do is find someone down the line who is working on
> Linux and knows me. That person will give me access to their copy of the
> repository. They'll accept my changes, and then submit them to their
> trustee. That trustee will submit them down the line until the changes are
> filtered to Linus who decides exactly what will and won't go into Linux.

it should be noted that the flow(s) used to manage the linux kernel are not
the same flow(s) required in general by a dvcs tool. one of the benefits of
dvc's flexibilty is that organizations / teams / individual users can really
tweak out custom flows.. dare i say a detriment potentially also comes from
this flexibility in that getting started may (quite likely) have a greater
learning curve.. especially for maintainers trying to establish something
that works well for the team theyre on. its like anything tho, once you get
the swing of it, the flows you design will get better and better ;D

> Even with a distributed repository, sooner or later, you have to get the
> changes from the field back to the main repository, and if the network is
> slow, getting those changes back and forth will be slow. You could simply
> delay getting those changes from your remote locations, and only synchronize
> your repositories on a weekly basis, but then you run the risk of having
> merge problems.

this is true, however, see above for the overall advantages of dvcs; esp in
this case. offline working greatly increases performance (thats just one
benefit).

> The big question is to figure out exactly what is slowing things down. Is
> it your Subversion repository? Is it your network? Is it the VPN? Or, maybe
> it is Subversion itself...
>
> On a checkout, not only does Subversion copy each and every file over the
> network, but it also makes a "base" duplicate on your drive. Subversion was
> designed to work comfortably in a disconnected mode. Except for "checkout",
> "update", and "commit", most developers can easily do their work without
> being connected to their repository. You don't need to tell the repository
> that you are editing a particular file. You don't need to tell the
> repository you want to see what changes you've made.
>
> The problem is that virus protection software can slow things down to a
> crawl. Imagine an antivirus software that scans each file created for a
> virus. If I checkout a dozen files, I'm not just scanning those files, but
> the dozen "bases" too, and the property files and the property bases inside
> the .svn directories. If your VPN is scanning for viruses too, you'll find
> each packet in Subversion is being sniffed and that will slow things down
> too.
>
> You could try Perforce which is a commerical software that's $800 per
> license. Perforce has several tricks that make it a bit faster than
> Subversion. It doesn't create .svn or CVS directories to track files which
> makes it a bit faster on checkouts. It also has "proxy" repositories that
> allow you to put a copy of the repository in a remote location, and the
> proxy and main repository get sync'd in the background. I've found this
> great for this type of situation.
>
> But be warned: Perforce has a few quirks: It is not easy to work
> disconnected to the server. (You can, but it involves a few tricks to work
> off line and you have to synchronize your working copy to the server once
> you reattach.). Perforce setup can be quite picky. (Perforce practically
> insists that server storage must be completely local to the server.
> Something many shops can't do). Plus, you have to maintain a license file
> and pay $800 per user. Have a dozen or so developers, and you're talking
> about tens of thousands of dollars.
>
> So, do some testing. Find out how long it takes to send a file back and
> forth over your network. If your network is slow, distributed repositories
> won't necessarily make things any better. You'll have to find out what is
> slowing down the network. If it's your VPN, you might want to try a third
> party source repository like SourceForge or GForge which both use Subversion
> as their revision control system. If it is the antivirus system that's
> slowing Subversion down, try configuring your antivirus system not to scan
> your Subversion working directory. Most people find this almost doubles
> Subversion's speed.
>
> Like with any development issue, you have to know what the cause is before
> you can get a cure.

 but yes, determining the bottleneck would prolby be a good idea anywayz.
lots of places use svn over a WAN and it doesnt seem to be a problem.
another primary issue could be the size of the codebase you have stored in
this svn instance. is it 30,000 lines of code??; 4 million??; do you have
lots of binary files in the repo??, etc. i was hosting a small codebase of
about 20,000 lines last year over a cable modem and a consultant from india
was able to use it w/o complaining. and i know there were (network
performance) issues; b/c he said access to mysql was simply not feasible..
so i feel like svn was doing reasonably well in that case over the WAN.

-nathan
btw
not trying to be argumentative, David, but now that im learning more about
dvcs, im compelled to offer the things i do know.. ive been using svn for
years and have been a long time proponent. its just that the landscape has
been chaning, and so far i like the potential these new tools have put
forth. it will still be some time for them to mature and attain 3rd party
integration, extensibility, and mass adoption svn has.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2360401

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-06-08 20:11:35 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.