I've converted the gcc/gcc directory of the gcc CVS repository using
cvs2svn.py. That part of the repository is 1.2 GiB, has 19934 active and
deleted files, 404014 CVS revisions, 911 tags, 82 branches. 1308 files
are bigger than 100 kiB, and 134 files are bigger than 1 MiB. The
dumpfile is 37.4 GiB, and the resulting Subversion repository is 5.6 GiB
and has 54330 revisions. A lot of that size comes from inefficient
copies made by cvs2svn.py, but the size of the fulltexts do not, and
their size is substantial.
Using code from Max Bowsher, I've written a tool to analyze the size of
the fulltexts in the repository and where they are used. The tool only
counts unique reps, so there is no double counting. (In other words, if
a file is copied, all copies will refer to the same rep, but it will
only be counted once by the tool.) It is only when a change is commited
to a file that a new unique fulltext is created. cvs2svn.py does not
generate unneccessary commits on branches, so those fulltexts would be
there even if the gcc team would have used Subversion from the start.
They have nothing to do with cvs2svn. I've attached the tool so you can
play with it and verify it's correctness.
At the end of this email is a list of the size of the fulltexts for all
tags and branches. Tags and branches without fulltexts are omitted. The
amount of fulltexts used by tags is very small as expected since they
are simple copies. The reason three of them show up in the list below at
all is because they share their reps with branches, and they happen to
be counted on the tag by the tool, and the reps on the branches are
considered duplicates. It would be more fair to consider the tag reps as
duplicates, but it's not a big deal.
Many branches have had a long life, and changes have been merged
repeatedly from trunk. The effect of such merges is that a lot of files
on the branches are changed, i.e. new fulltexts are created. I think
that is a common pattern, and it will make the repository grow quite a bit.
I hope this info will be useful by someone. I've started to dump and
load the repository into fsfs, but it's going to take a while. The dump
alone took over seven hours (on a very fast machine).
/Tobias
==================================== TOTAL =====================================
The whole repository has 124859 fulltexts => 2743927756 bytes
==================================== TRUNK =====================================
trunk has 10131 fulltexts => 121880685 bytes
===================================== TAGS =====================================
before_gc_merge_990902 has 526 fulltexts => 5015769 bytes
before_gc_merge_990327 has 128 fulltexts => 938495 bytes
before_gc_merge_981008 has 1 fulltexts => 1581 bytes
=================================== BRANCHES ===================================
tree-ssa-20020619-branch has 12244 fulltexts => 117274085 bytes
objc-improvements-branch has 7194 fulltexts => 111811095 bytes
cxx-reflection-branch has 10687 fulltexts => 110264699 bytes
new-regalloc-branch has 11203 fulltexts => 101095883 bytes
libada-branch has 4690 fulltexts => 90234196 bytes
compile-server-branch has 5008 fulltexts => 88395574 bytes
csl-arm-branch has 4243 fulltexts => 87164997 bytes
lno-branch has 3276 fulltexts => 87080250 bytes
pch-branch has 4279 fulltexts => 86701946 bytes
ast-optimizer-branch has 3925 fulltexts => 85883790 bytes
rtlopt-branch has 3424 fulltexts => 84681982 bytes
dfa-branch has 3059 fulltexts => 77900526 bytes
tree-profiling-branch has 2169 fulltexts => 77107797 bytes
apple-ppc-branch has 2669 fulltexts => 76082593 bytes
cfg-branch has 3124 fulltexts => 75810131 bytes
cp-parser-branch-2 has 2997 fulltexts => 74125528 bytes
mips-3_4-rewrite-branch has 2646 fulltexts => 73598736 bytes
gcc-3_4-basic-improvements-branch has 2414 fulltexts => 71274095 bytes
itanium-sched-branch has 1961 fulltexts => 67211455 bytes
hammer-3_3-branch has 1402 fulltexts => 53918125 bytes
gcj-abi-2-dev-branch has 1498 fulltexts => 52274155 bytes
toplevel-bootstrap has 1477 fulltexts => 49479746 bytes
gcc-3_4-branch has 831 fulltexts => 47305014 bytes
gcc-3_3-branch has 1007 fulltexts => 45305784 bytes
cp-parser-branch has 1171 fulltexts => 44290366 bytes
bounded-pointers-branch has 1722 fulltexts => 43242221 bytes
gcc-3_3-rhl-branch has 765 fulltexts => 41973962 bytes
merged-arm-thumb-backend-branch has 1193 fulltexts => 39188981 bytes
gcc-3_0-branch has 1090 fulltexts => 38404867 bytes
tree-ssa-cfg-branch has 901 fulltexts => 37199672 bytes
gcc-3_3-e500-branch has 462 fulltexts => 33988740 bytes
gcc-3_2-rhl8-branch has 771 fulltexts => 33555940 bytes
egcs_gc_branch has 1516 fulltexts => 33065514 bytes
gcc-3_4-rhl-branch has 228 fulltexts => 30297935 bytes
new_ia32_branch has 667 fulltexts => 29593068 bytes
gcc3 has 636 fulltexts => 27696520 bytes
gomp-01-branch has 205 fulltexts => 26837214 bytes
gcc-3_1-branch has 1559 fulltexts => 26675923 bytes
condexec-branch has 355 fulltexts => 25250286 bytes
gcc-3_2-branch has 432 fulltexts => 23779114 bytes
gcc-2_95-branch has 240 fulltexts => 23248814 bytes
ffixinc-branch has 780 fulltexts => 22375609 bytes
cygwin-mingw-gcc-3_2_1-branch has 205 fulltexts => 17903496 bytes
egcs_1_1_branch has 201 fulltexts => 17634048 bytes
egcs_1_00_branch has 216 fulltexts => 16216779 bytes
sh-elf-3_5-branch has 217 fulltexts => 15220964 bytes
cygming332 has 133 fulltexts => 14857238 bytes
subreg-byte-branch has 69 fulltexts => 9723873 bytes
pchmerge-branch has 90 fulltexts => 8639561 bytes
cygwin-mingw-gcc-3_1-branch has 141 fulltexts => 7262523 bytes
bnw-simple-branch has 51 fulltexts => 5962480 bytes
g77_0_0_21_970811 has 81 fulltexts => 5862770 bytes
cygwin-mingw-v2-branch has 51 fulltexts => 4038427 bytes
gnu-win32-b20-branch has 26 fulltexts => 3622267 bytes
csl-hpux-branch has 13 fulltexts => 3112421 bytes
gcc-2_95_2_1-branch has 11 fulltexts => 2531583 bytes
tree-serialize-branch has 24 fulltexts => 2218210 bytes
fixincl-branch has 29 fulltexts => 1409856 bytes
newppc-branch has 32 fulltexts => 1345208 bytes
cygming331 has 104 fulltexts => 1088998 bytes
new-abi-branch has 4 fulltexts => 973877 bytes
meissner-ppc-branch has 3 fulltexts => 925675 bytes
stree-branch has 9 fulltexts => 865143 bytes
g77-0_6-branch has 12 fulltexts => 758875 bytes
cygwin-mingw-gcc-3_2-branch has 67 fulltexts => 455913 bytes
no_bogosity has 18 fulltexts => 427912 bytes
x86-64-branch has 101 fulltexts => 354550 bytes
gcc-3_2-rhl8-branchpoint has 40 fulltexts => 28353 bytes
egcs_ss_19980502 has 3 fulltexts => 1676 bytes
libobjc-branch has 1 fulltexts => 817 bytes
gcc-3_5-integration-branch has 1 fulltexts => 805 bytes
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Jun 29 12:20:08 2004