Jon Watte wrote:
> + The problem we're wanting to solve is that it takes 10 minutes to
> crawl through 20,000 files, trying to figure out which ones have
> changed when you want to commit (or similar).
Hmm. On my box, creating a tree with about 36k nodes takes
approximately 8.5 seconds; crawling it and statting every node
takes approximately 4 seconds. If it really takes 10 minutes to
traverse a tree containing 20k files looking to see what's changed,
then I suspect the problem is with the implementation, not with
the principle.
Further, most checkins don't need to check the entire tree for
changed files, nor even a substantial fraction of the entire tree.
Usually you're working in a directory somewhere near the leaves,
with (if you're unlucky) 1000 files under it. No?
I don't have any objection to the proposal, but I'm puzzled
by the apparent need for it. If it takes 10 minutes to find
which files, out of 20k, have changed, then I think something
is being done wrong in Subversion. And if you often have to
do a checkin for which Subversion needs to look at 20k files,
then I think something's wrong in the organization of your
project. I am open to correction on both issues.
*
My experiment was admittedly a very simple-minded one,
and represents a best case in a few ways. I created a
tree of directories and empty files as follows:
def build(dir, n_files, n_dirs, depth):
os.mkdir(dir)
for i in range(n_files):
open(os.path.join(dir, str(i)), "w").close()
depth -= 1
if depth <= 0: return
for j in range(n_files, n_files+n_dirs):
build(os.path.join(dir, str(j)), n_files, n_dirs, depth)
build("foo", 10, 3, 8)
and I crawled it doing a stat on each node as follows:
time find foo -ls > /dev/null
This was on a local filesystem on a FreeBSD box.
Hardware: Athlon/1GHz, 256Mb. FreeBSD's filesystem
is quite fast, and the hardware -- though nowhere near
today's bleeding edge -- is quite decent. But, still,
NFS on a 300MHz Ultrasparc (say) surely can't be
more than (say) 25 times slower, which would be
less than 2 minutes.
If statting 2*20k files takes 10 minutes then you're statting
about 70 files a second. That's very, very, very slow.
--
g
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Sep 29 12:30:19 2002