[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Eliminating the text-base penalty

From: Gareth McCaughan <Gareth.McCaughan_at_pobox.com>
Date: 2002-09-29 12:29:43 CEST

Jon Watte wrote:

> + The problem we're wanting to solve is that it takes 10 minutes to
> crawl through 20,000 files, trying to figure out which ones have
> changed when you want to commit (or similar).

Hmm. On my box, creating a tree with about 36k nodes takes
approximately 8.5 seconds; crawling it and statting every node
takes approximately 4 seconds. If it really takes 10 minutes to
traverse a tree containing 20k files looking to see what's changed,
then I suspect the problem is with the implementation, not with
the principle.

Further, most checkins don't need to check the entire tree for
changed files, nor even a substantial fraction of the entire tree.
Usually you're working in a directory somewhere near the leaves,
with (if you're unlucky) 1000 files under it. No?

I don't have any objection to the proposal, but I'm puzzled
by the apparent need for it. If it takes 10 minutes to find
which files, out of 20k, have changed, then I think something
is being done wrong in Subversion. And if you often have to
do a checkin for which Subversion needs to look at 20k files,
then I think something's wrong in the organization of your
project. I am open to correction on both issues.


My experiment was admittedly a very simple-minded one,
and represents a best case in a few ways. I created a
tree of directories and empty files as follows:

    def build(dir, n_files, n_dirs, depth):
      for i in range(n_files):
        open(os.path.join(dir, str(i)), "w").close()
      depth -= 1
      if depth <= 0: return
      for j in range(n_files, n_files+n_dirs):
        build(os.path.join(dir, str(j)), n_files, n_dirs, depth)

    build("foo", 10, 3, 8)

and I crawled it doing a stat on each node as follows:

    time find foo -ls > /dev/null

This was on a local filesystem on a FreeBSD box.
Hardware: Athlon/1GHz, 256Mb. FreeBSD's filesystem
is quite fast, and the hardware -- though nowhere near
today's bleeding edge -- is quite decent. But, still,
NFS on a 300MHz Ultrasparc (say) surely can't be
more than (say) 25 times slower, which would be
less than 2 minutes.

If statting 2*20k files takes 10 minutes then you're statting
about 70 files a second. That's very, very, very slow.

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Sep 29 12:30:19 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.