[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

SVN scalability problem as number of tags grows

From: John Coiner <john.coiner_at_amd.com>
Date: Sat, 21 Feb 2009 12:23:12 -0500

Hi SVN developers,

I support SVN for a few hundred co-workers. We have been using SVN
heavily for about two years, generating about 60000 commits, 90000 tags,
and 3000 branches in one repository.

We have recently discovered a scalability problem. If you follow the
usual "trunk/tags/branches" structure, the size required to store each
new tag grows in proportion to the number of tags previously created.

This can be demonstrated in just a few commands, in a brand new repository:

183 svnadmin create test_repo
184 svn list file:///home/john/testsvn/test_repo
185 svn mkdir file:///home/john/testsvn/test_repo/trunk -m ''
186 svn mkdir file:///home/john/testsvn/test_repo/tags -m ''
187 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag1 -m ''
188 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag2 -m ''
189 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag3 -m ''
190 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag4 -m ''
191 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag5 -m ''
192 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag6 -m ''
193 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag7 -m ''
194 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag8 -m ''
195 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag9 -m ''

In the FSFS, each new revs/ entry is larger than the previous one. In
the output of 'ls' below, revs 3 through 11 correspond to the creation
of the tag1 through tag9 directories:

john_at_pitfall:~/testsvn/test_repo/db/revs/0$ ls -latr
total 56
-rw-r--r-- 1 john john 115 2009-02-21 11:19 0
drwxr-sr-x 3 john john 4096 2009-02-21 11:19 ..
-rw-r--r-- 1 john john 277 2009-02-21 11:19 1
-rw-r--r-- 1 john john 305 2009-02-21 11:19 2
-rw-r--r-- 1 john john 531 2009-02-21 11:19 3
-rw-r--r-- 1 john john 564 2009-02-21 11:19 4
-rw-r--r-- 1 john john 595 2009-02-21 11:19 5
-rw-r--r-- 1 john john 628 2009-02-21 11:19 6
-rw-r--r-- 1 john john 659 2009-02-21 11:19 7
-rw-r--r-- 1 john john 690 2009-02-21 11:20 8
-rw-r--r-- 1 john john 721 2009-02-21 11:20 9
-rw-r--r-- 1 john john 762 2009-02-21 11:20 10
-rw-r--r-- 1 john john 800 2009-02-21 11:20 11
drwxr-sr-x 2 john john 4096 2009-02-21 11:20 .

After creating 90000 tags, each new tag consumes megabytes of space in
the repository. Also each new tag takes a few seconds to apply, up from
milliseconds when we first began. We had the expectation of more
graceful scaling, based in part on our experience in other situations
where SVN scales well, for example committing a million additions to the
same file.

Our big installation is running on Linux, SVN 1.4.4, and FSFS. The
problem also exists in SVN 1.5.1.

Is this a known issue? Are there plans to make this more scalable? I
searched the issues database and did not find anything that looked like
a duplicate. Should I file a new issue?

Do you have any recommendations for a work around?

One workaround that we are evaluating is to shard the branches and tags
over a large number of directories. So rather than create
"tags/TAG_NAME", we may begin to create "tags2/1/b/5/e/TAG_NAME". The
"1/b/5/e" is the first four hex digits of the md5 hash of "TAG_NAME". We
chose "tags2" as the base directory to avoid colliding with existing
entries under "tags/" that happen to be named after a hex digit.

This scales better. Applying N sharded tags requires O(N) space and each
tag takes O(1) time to apply.

One possible resolution of this issue is a documentation-only change. If
the SVN book described the scalability issue and recommended a sharded
tags and branches structure, it would help future "enterprise" adopters
(and other crazy people who create way too many tags :)

Please let me know if you need any more information about this problem.


Received on 2009-02-21 18:29:21 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.