[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Node origins cache rewrite

From: David Glasser <glasser_at_davidglasser.net>
Date: Thu, 24 Jan 2008 19:04:13 -0800

On Jan 24, 2008 6:54 PM, Mark Phippard <markphip_at_gmail.com> wrote:
> I see David has rewritten this to no longer use SQLite. Yay!
>
> That being said, I do still have some reservations. Keep in mind that
> CollabNet uses BDB repositories, so I am just speaking from what we
> have heard in the past from users.

Right, BDB is irrelevant here (it uses its own tables).

> How many nodes will a large repository have? We have heard from users
> with working copies with thousands of folders and tens and hundreds of
> thousands of files. If this represents their trunk, and the have many
> branches with modifications how many nodes can they expect.
>
> As I said previously, just 100,000 nodes X 4kb block size is 400 MB of
> disk space used. Don't we think users might complain about the
> increase? Even if the repository is already 4 GB, I am sure they
> would still notice the increase.

My assumption is that the cache will be strictly smaller than a single
checkout of *one branch* of (every project in) the repository, and if
that doesn't fit on your server, then something's wrong already.

(As a completely separate issue, perhaps the structure should be
sharded; I'm fine with people trying that.)

> Does the Python script to generate the cache still work? I wonder if
> we could modify it or otherwise make it available for people to run on
> some repositories to get an idea of the number of nodes in their
> repository. It would be interesting to see how many nodes are in the
> ASF repository. Perhaps we could run it on some of our large
> repositories at CollabNet as well.

I expect it ought to still work, yes. Though hmm, I thought you said
your repositories are BDB?

> That being said, I suppose we should only do this if there are a
> number of nodes at which point we would want to consider changing
> this.
>
> When we came up with this design, how many nodes were we thinking
> might typically exist? What is it optimized for?

Optimized for "not requiring a prereq for one lousy little cache".

--dave

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-01-25 04:04:25 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.