[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: SVN scalability problem as number of tags grows

From: Greg Stein <gstein_at_gmail.com>
Date: Sat, 21 Feb 2009 18:44:07 +0100

Hi John,

It really doesn't have anything to do with tags/ per se, but simply
that you're creating an ever-larger directory. The size of the
name:node mapping for the directory contents will continue to grow as
you add new entries into that directory.

You'd see the exact same problem if you created 90000 entries in
/trunk/some/path/down/deep/in/the/hierarchy/.

The sharding is the appropriate solution. You could shard by date,
initial letters of the tag, or the hash of the tag (as you suggested).
Just settle on one, and you should be fine.

Another solution would be to delete obsolete tags. Note that they will
always be there in history, just not in HEAD. You could also rotate
tags into an archival tag directory. For example, each month, you
could move tags into /tags/archive/2008-12/ and
/tags/archive/2009-01/. Or even /archived-tags/... for that matter.

Lots of possibilities. I think the right answer is going to depend
upon your workflow, to determine what will work best for you. Main
point: creating directories with 90k entries *is* going to consume
more time and space.

Cheers,
-g

On Sat, Feb 21, 2009 at 18:23, John Coiner <john.coiner_at_amd.com> wrote:
> Hi SVN developers,
>
> I support SVN for a few hundred co-workers. We have been using SVN
> heavily for about two years, generating about 60000 commits, 90000 tags,
> and 3000 branches in one repository.
>
> We have recently discovered a scalability problem. If you follow the
> usual "trunk/tags/branches" structure, the size required to store each
> new tag grows in proportion to the number of tags previously created.
>
> This can be demonstrated in just a few commands, in a brand new repository:
>
> 183 svnadmin create test_repo
> 184 svn list file:///home/john/testsvn/test_repo
> 185 svn mkdir file:///home/john/testsvn/test_repo/trunk -m ''
> 186 svn mkdir file:///home/john/testsvn/test_repo/tags -m ''
> 187 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag1 -m ''
> 188 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag2 -m ''
> 189 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag3 -m ''
> 190 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag4 -m ''
> 191 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag5 -m ''
> 192 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag6 -m ''
> 193 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag7 -m ''
> 194 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag8 -m ''
> 195 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag9 -m ''
>
> In the FSFS, each new revs/ entry is larger than the previous one. In
> the output of 'ls' below, revs 3 through 11 correspond to the creation
> of the tag1 through tag9 directories:
>
> john_at_pitfall:~/testsvn/test_repo/db/revs/0$ ls -latr
> total 56
> -rw-r--r-- 1 john john 115 2009-02-21 11:19 0
> drwxr-sr-x 3 john john 4096 2009-02-21 11:19 ..
> -rw-r--r-- 1 john john 277 2009-02-21 11:19 1
> -rw-r--r-- 1 john john 305 2009-02-21 11:19 2
> -rw-r--r-- 1 john john 531 2009-02-21 11:19 3
> -rw-r--r-- 1 john john 564 2009-02-21 11:19 4
> -rw-r--r-- 1 john john 595 2009-02-21 11:19 5
> -rw-r--r-- 1 john john 628 2009-02-21 11:19 6
> -rw-r--r-- 1 john john 659 2009-02-21 11:19 7
> -rw-r--r-- 1 john john 690 2009-02-21 11:20 8
> -rw-r--r-- 1 john john 721 2009-02-21 11:20 9
> -rw-r--r-- 1 john john 762 2009-02-21 11:20 10
> -rw-r--r-- 1 john john 800 2009-02-21 11:20 11
> drwxr-sr-x 2 john john 4096 2009-02-21 11:20 .
>
> After creating 90000 tags, each new tag consumes megabytes of space in
> the repository. Also each new tag takes a few seconds to apply, up from
> milliseconds when we first began. We had the expectation of more
> graceful scaling, based in part on our experience in other situations
> where SVN scales well, for example committing a million additions to the
> same file.
>
> Our big installation is running on Linux, SVN 1.4.4, and FSFS. The
> problem also exists in SVN 1.5.1.
>
> Is this a known issue? Are there plans to make this more scalable? I
> searched the issues database and did not find anything that looked like
> a duplicate. Should I file a new issue?
>
> Do you have any recommendations for a work around?
>
> One workaround that we are evaluating is to shard the branches and tags
> over a large number of directories. So rather than create
> "tags/TAG_NAME", we may begin to create "tags2/1/b/5/e/TAG_NAME". The
> "1/b/5/e" is the first four hex digits of the md5 hash of "TAG_NAME". We
> chose "tags2" as the base directory to avoid colliding with existing
> entries under "tags/" that happen to be named after a hex digit.
>
> This scales better. Applying N sharded tags requires O(N) space and each
> tag takes O(1) time to apply.
>
> One possible resolution of this issue is a documentation-only change. If
> the SVN book described the scalability issue and recommended a sharded
> tags and branches structure, it would help future "enterprise" adopters
> (and other crazy people who create way too many tags :)
>
> Please let me know if you need any more information about this problem.
> Cheers,
>
> John
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1204071
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1204276
Received on 2009-02-21 18:44:36 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.