[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Perforce/Subversion Timing Statistics #2

From: Joshua Jensen <jjensen_at_workspacewhiz.com>
Date: 2002-05-26 02:08:03 CEST

> From: Ben Collins [mailto:bcollins@debian.org]
> On Sat, May 25, 2002 at 12:28:59PM -0600, Joshua Jensen wrote:
> > This test handles 10x as many files in the repository. It is an
> > example of how the numbers scale with growing lists of files.
>
> One advantage (or disadvantage depending on your
> perspective), is that perforce retains all working copy
> metadata on the server end. So the server "knows" where the
> client sits. If you edit a file, you tell the server. So
> everything can be deduced without any transfer of data from
> the client to the server.

When it comes to performance, this is a great way to do it. In fact, a
combination of server metadata and client metadata would make the
operation even faster.

In a Perforce environment, you don't even need to be connected to the
server to edit a file. You can just as easily make the file writable
and run a script the next time you connect to the server to check out
the appropriate files. I do it all the time. This is a fast operation,
too, as it only looks at writable files.

> This is good for obvious performance gains. Not to mention
> that the server can gain more statistics and information on
> the clients (for administration purposes, this is a win).
> However, it is extrememly costly for the client to validate
> itself. If you need to compare your working copy against what
> the server thinks you have, you are looking at a good bit of
> time and cpu consumption.

I work in an environment where we have thousands of clients accessing
gigabytes and gigabytes of data via the appropriate source control
servers daily. The number of times any given client needs to do full
validations are very, very few (and in my opinion, only done when an
administrator is present). I question how much time storing a client
side extra copy of the server side file is actually going to save. I
have gigabytes of data from source control on my machine. Space might
be cheap, but having to go through bureaucratic mumbo jumbo to get hard
drive upgrades is time consuming in itself.

That being said, I like the idea of having the option in this case.
Being able to svn status or svn revert (something Perforce CAN'T do)
while not connected to the server is a very cool operation. But for
those individuals who don't have the hard drive space to handle it, it
would be nice to offer the ability to turn the client side storage
on/off (post 1.0, of course).

> So, IMO, your tests are really missing the whole picture.
> Sure, Perforce can win on key command sequences. However, in
> the end, the overall performance of Subversion is likely
> better, taking into account the less costly maint. of a
> working copy. For large repositories, this is critical.

No, I don't believe my tests are missing the whole picture. I have
proven that:

* Adding files to Perforce is significantly faster than Subversion.
* Retrieving new files from Perforce is significantly faster than
Subversion.
* Branching files is significantly faster than Subversion. This
operation will happen far less frequently, though, but still takes a
long, long time.
* Merging files across branches is way faster in Perforce. I've been
benchmarking this, and the merge and commit is just a few seconds in
Perforce for a small number of files. The merge itself in Subversion is
very quick, but EVERY time I commit, it takes OVER 30 seconds (that's
when I quit looking at the clock and just hope it will finish).
* From an administration standpoint, server disk space is far less in
Perforce.

So what performance in Subversion are you talking about? My most used
operations are retrieving updates to files, editing files, and merging
files across branches. Oh, and for commits, I tend to revert unchanged
files so they don't check in with everything else... however, since
Perforce knows exactly which files it needs to diff (and starts with
timestamp), this is still way faster than Subversion determining what
needs to be checked in, too, except in the case of having every file in
the repository checked out.

Do others use different operations more commonly than retrieving files
and editing files? If that is what Subversion is for, then by all
means, I've got it wrong.

But before making a claim that the overall performance of Subversion is
likely better, back it up with hard statistics. I was SO EXCITED about
Subversion. I was super excited when I could jump in, compile it, and
run it. But all my excitement waned when I actually started the time
benchmarks. Even worse, I did some benchmarks with CVS (and how I hate
that software) earlier today (at the request of a CVS user), and CVS was
WAY faster than Subversion, too.

Again, I realize Subversion is pre-alpha. I am merely bringing this to
the attention of the primary developers of Subversion. You want to be
the next CVS, but if you tell a CVS user, "Sure, we're better. CVS
might be able to do a cvs import of the data in Josh's test case in 18
seconds, but don't mind that (and the fact that Subversion took 4.5
MINUTES). We have atomic commits! CVS doesn't!"

For what it's worth, I haven't given up on Subversion. Other than the
potential improvements being made to CVSNT that might sway me when they
work, it is the only free alternative to Perforce that has the potential
to be as powerful. I'm now poking through code trying to understand
where the bottlenecks are. I have high hopes for the product, and I
will continue to spend the necessary time investigating it.

Thanks,
Joshua Jensen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun May 26 02:09:50 2002

This is an archived mail posted to the Subversion Dev mailing list.