[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

The cost of fulltexts on branches

From: Tobias Ringström <tobias_at_ringstrom.mine.nu>
Date: 2004-06-29 12:18:38 CEST

I've converted the gcc/gcc directory of the gcc CVS repository using
cvs2svn.py. That part of the repository is 1.2 GiB, has 19934 active and
deleted files, 404014 CVS revisions, 911 tags, 82 branches. 1308 files
are bigger than 100 kiB, and 134 files are bigger than 1 MiB. The
dumpfile is 37.4 GiB, and the resulting Subversion repository is 5.6 GiB
and has 54330 revisions. A lot of that size comes from inefficient
copies made by cvs2svn.py, but the size of the fulltexts do not, and
their size is substantial.

Using code from Max Bowsher, I've written a tool to analyze the size of
the fulltexts in the repository and where they are used. The tool only
counts unique reps, so there is no double counting. (In other words, if
a file is copied, all copies will refer to the same rep, but it will
only be counted once by the tool.) It is only when a change is commited
to a file that a new unique fulltext is created. cvs2svn.py does not
generate unneccessary commits on branches, so those fulltexts would be
there even if the gcc team would have used Subversion from the start.
They have nothing to do with cvs2svn. I've attached the tool so you can
play with it and verify it's correctness.

At the end of this email is a list of the size of the fulltexts for all
tags and branches. Tags and branches without fulltexts are omitted. The
amount of fulltexts used by tags is very small as expected since they
are simple copies. The reason three of them show up in the list below at
all is because they share their reps with branches, and they happen to
be counted on the tag by the tool, and the reps on the branches are
considered duplicates. It would be more fair to consider the tag reps as
duplicates, but it's not a big deal.

Many branches have had a long life, and changes have been merged
repeatedly from trunk. The effect of such merges is that a lot of files
on the branches are changed, i.e. new fulltexts are created. I think
that is a common pattern, and it will make the repository grow quite a bit.

I hope this info will be useful by someone. I've started to dump and
load the repository into fsfs, but it's going to take a while. The dump
alone took over seven hours (on a very fast machine).

/Tobias

==================================== TOTAL =====================================
                   The whole repository has 124859 fulltexts => 2743927756 bytes

==================================== TRUNK =====================================
                                  trunk has 10131 fulltexts => 121880685 bytes

===================================== TAGS =====================================
                 before_gc_merge_990902 has 526 fulltexts => 5015769 bytes
                 before_gc_merge_990327 has 128 fulltexts => 938495 bytes
                 before_gc_merge_981008 has 1 fulltexts => 1581 bytes

=================================== BRANCHES ===================================
               tree-ssa-20020619-branch has 12244 fulltexts => 117274085 bytes
               objc-improvements-branch has 7194 fulltexts => 111811095 bytes
                  cxx-reflection-branch has 10687 fulltexts => 110264699 bytes
                    new-regalloc-branch has 11203 fulltexts => 101095883 bytes
                          libada-branch has 4690 fulltexts => 90234196 bytes
                  compile-server-branch has 5008 fulltexts => 88395574 bytes
                         csl-arm-branch has 4243 fulltexts => 87164997 bytes
                             lno-branch has 3276 fulltexts => 87080250 bytes
                             pch-branch has 4279 fulltexts => 86701946 bytes
                   ast-optimizer-branch has 3925 fulltexts => 85883790 bytes
                          rtlopt-branch has 3424 fulltexts => 84681982 bytes
                             dfa-branch has 3059 fulltexts => 77900526 bytes
                  tree-profiling-branch has 2169 fulltexts => 77107797 bytes
                       apple-ppc-branch has 2669 fulltexts => 76082593 bytes
                             cfg-branch has 3124 fulltexts => 75810131 bytes
                     cp-parser-branch-2 has 2997 fulltexts => 74125528 bytes
                mips-3_4-rewrite-branch has 2646 fulltexts => 73598736 bytes
      gcc-3_4-basic-improvements-branch has 2414 fulltexts => 71274095 bytes
                   itanium-sched-branch has 1961 fulltexts => 67211455 bytes
                      hammer-3_3-branch has 1402 fulltexts => 53918125 bytes
                   gcj-abi-2-dev-branch has 1498 fulltexts => 52274155 bytes
                     toplevel-bootstrap has 1477 fulltexts => 49479746 bytes
                         gcc-3_4-branch has 831 fulltexts => 47305014 bytes
                         gcc-3_3-branch has 1007 fulltexts => 45305784 bytes
                       cp-parser-branch has 1171 fulltexts => 44290366 bytes
                bounded-pointers-branch has 1722 fulltexts => 43242221 bytes
                     gcc-3_3-rhl-branch has 765 fulltexts => 41973962 bytes
        merged-arm-thumb-backend-branch has 1193 fulltexts => 39188981 bytes
                         gcc-3_0-branch has 1090 fulltexts => 38404867 bytes
                    tree-ssa-cfg-branch has 901 fulltexts => 37199672 bytes
                    gcc-3_3-e500-branch has 462 fulltexts => 33988740 bytes
                    gcc-3_2-rhl8-branch has 771 fulltexts => 33555940 bytes
                         egcs_gc_branch has 1516 fulltexts => 33065514 bytes
                     gcc-3_4-rhl-branch has 228 fulltexts => 30297935 bytes
                        new_ia32_branch has 667 fulltexts => 29593068 bytes
                                   gcc3 has 636 fulltexts => 27696520 bytes
                         gomp-01-branch has 205 fulltexts => 26837214 bytes
                         gcc-3_1-branch has 1559 fulltexts => 26675923 bytes
                        condexec-branch has 355 fulltexts => 25250286 bytes
                         gcc-3_2-branch has 432 fulltexts => 23779114 bytes
                        gcc-2_95-branch has 240 fulltexts => 23248814 bytes
                         ffixinc-branch has 780 fulltexts => 22375609 bytes
          cygwin-mingw-gcc-3_2_1-branch has 205 fulltexts => 17903496 bytes
                        egcs_1_1_branch has 201 fulltexts => 17634048 bytes
                       egcs_1_00_branch has 216 fulltexts => 16216779 bytes
                      sh-elf-3_5-branch has 217 fulltexts => 15220964 bytes
                             cygming332 has 133 fulltexts => 14857238 bytes
                     subreg-byte-branch has 69 fulltexts => 9723873 bytes
                        pchmerge-branch has 90 fulltexts => 8639561 bytes
            cygwin-mingw-gcc-3_1-branch has 141 fulltexts => 7262523 bytes
                      bnw-simple-branch has 51 fulltexts => 5962480 bytes
                      g77_0_0_21_970811 has 81 fulltexts => 5862770 bytes
                 cygwin-mingw-v2-branch has 51 fulltexts => 4038427 bytes
                   gnu-win32-b20-branch has 26 fulltexts => 3622267 bytes
                        csl-hpux-branch has 13 fulltexts => 3112421 bytes
                    gcc-2_95_2_1-branch has 11 fulltexts => 2531583 bytes
                  tree-serialize-branch has 24 fulltexts => 2218210 bytes
                         fixincl-branch has 29 fulltexts => 1409856 bytes
                          newppc-branch has 32 fulltexts => 1345208 bytes
                             cygming331 has 104 fulltexts => 1088998 bytes
                         new-abi-branch has 4 fulltexts => 973877 bytes
                    meissner-ppc-branch has 3 fulltexts => 925675 bytes
                           stree-branch has 9 fulltexts => 865143 bytes
                         g77-0_6-branch has 12 fulltexts => 758875 bytes
            cygwin-mingw-gcc-3_2-branch has 67 fulltexts => 455913 bytes
                            no_bogosity has 18 fulltexts => 427912 bytes
                          x86-64-branch has 101 fulltexts => 354550 bytes
               gcc-3_2-rhl8-branchpoint has 40 fulltexts => 28353 bytes
                       egcs_ss_19980502 has 3 fulltexts => 1676 bytes
                         libobjc-branch has 1 fulltexts => 817 bytes
             gcc-3_5-integration-branch has 1 fulltexts => 805 bytes

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Tue Jun 29 12:20:08 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.