Re: Eliminating the text-base penalty

From: Gareth McCaughan <Gareth.McCaughan_at_pobox.com>
Date: 2002-09-30 00:14:40 CEST

(I sent this to Jon but not to the list. Apologies to Jon, who'll be
getting it a second time if he's subscribed.)

On Sunday 29 September 2002 5:38 pm, Jon Watte wrote:
> > Hmm. On my box, creating a tree with about 36k nodes takes
> > approximately 8.5 seconds; crawling it and statting every node
> > takes approximately 4 seconds. If it really takes 10 minutes to
> > traverse a tree containing 20k files looking to see what's
> > changed, then I suspect the problem is with the implementation,
> > not with the principle.
>
> That's the numbers I've seen on Windows with CVS. Of course, CVS
> does more than just stat each file. However, SVN has a similar
> model.

Indeed. (And all tests so far seem to show that on operations
where Subversion doesn't have asymptotically better performance
than CVS, it's worse by a substantial constant factor. Alas.)

> > Further, most checkins don't need to check the entire tree for
> > changed files, nor even a substantial fraction of the entire
> > tree. Usually you're working in a directory somewhere near the
> > leaves, with (if you're unlucky) 1000 files under it. No?
>
> But my change involves some database schema, which lives in one
> place; some sources and headers, which live elsewhere; some xslt,
> some images, some perl scripts and a new regression test. So, no,
> my change isn't contained in a single directory. A problem that
> happened a lot with CVS was that users would try to submit in "the
> main directory" and forget some other files they've edited.
> "Educating the users" isn't sufficient because these are clever,
> well-meaning programmers already, whose only fault is that they're
> human.

I guess. Do you often have to make such scattered changes
on a really large project? (I conjecture with no evidence that
it will less often be necessary with Subversion than with CVS,
because with Subversion it's less painful to reorganize. But,
of course, on a large project with many developers it's always
painful to reorganize.)

> > is being done wrong in Subversion. And if you often have to
> > do a checkin for which Subversion needs to look at 20k files,
> > then I think something's wrong in the organization of your
> > project. I am open to correction on both issues.
>
> I've found that any project that's big enough will run into these
> kinds of issues. It's just a fact of scale.

I am in no position to argue; I've mostly worked on projects
much smaller than 20k files. Out of curiosity, how many projects
that large have you worked on, and how different were they?

> > My experiment was admittedly a very simple-minded one,
>
> Did you run it on Windows? Were the files large enough to look like
> real source files?

I didn't run it on Windows, and the files were of length 0. Making
the files longer would have slowed down creation, obviously, but
that wasn't really the focus of the test. It would probably have
made the statting a bit slower by reducing locality a bit, but I bet
it wouldn't have been worse than (say) a factor of 2 worse.

> > This was on a local filesystem on a FreeBSD box.
>
> FreeBSD: Hardly the worst case scenario :-)

I concur. Maybe the FreeBSD Project should adopt that as
their slogan. :-)

--
g
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Mon Sep 30 00:15:17 2002

This message: [ Message body ]
Next message: Branko ÄŒibej: "Re: [PATCH] [updated] [Issue 870] import should set svn:executable automatically"
Previous message: Kirby C. Bohling: "Re: Eliminating the text-base penalty"
Maybe in reply to: Greg Hudson: "Eliminating the text-base penalty"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]