[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [bug report] SVN design issue: checkouts horribly space-inefficient

From: <kfogel_at_collab.net>
Date: 2005-06-23 17:31:20 CEST

Marc Mutz <marc@klaralvdalens-datakonsult.se> writes:
> I just noticed while checking out some KDE SVN modules that my inodes were
> quickly depleted. After lots of searching I found that not my huge maildir
> archive is the culprit, it's the .svn directory with it's props/prop-base
> directories containing each one file for each object (file or dir) stored in
> the repository. On a typical ext2/3 filesystem, this leads to a waste of
> almost 98% of space:
>
> $ cd kdelibs:
> $ du -sh .svn/props
> 96K .svn/props
> $ wc -c .svn/props/* | grep total
> 1714 total
>
> This means, that - absent tail-end optimizations in the filesystem (which
> ext2/3 doesn't have) - the space efficiency is 1.75% (1714/96K).
>
> Same for prop-base.
>
> Now, the amount of data this represents is
> $ du -sch .svn/text-base/* | grep total
> 272K total
> $ wc -c .svn/text-base/*|grep total
> 224236 total
> -> 80.5% efficiency.
>
> Now, the bad thing is that props and prop-base are identical:
> $ for i in $(cd .svn/props/; echo *.svn-work ); do
> diff-u .svn/props/$i .svn/prop-base/${i/-work/-base}
> done
>
> So a lot of inodes could be spared by simply hard-linking the files in those
> dirs. brings up the efficiency to 3.5%. Same thing could (optionally) be done
> for text-base, on the assumption that most editors break that link during
> saving.
>
> That would halve the overall space inefficency.
>
> As it is now, svn uses about 4x more inodes for the same checkout as cvs. I
> expected 50% both in space and inodes, due to the offline-diff capability.
>
> In ten years of using Unix, I've never been _near_ the inode limit, though
> I've often been permanently in 90%+ disk-full mode.
>
> I hope you can apply these simple "optimizations" in the next release. They
> typically speed up diff's, too, see the difference between
> cp -ra old new-1
> cp -la old new-2
> time diff -ur old new-1
> time diff -ur old new-2

Totally agree with your analysis of the problem. The solutions are
not so simple, however.

One can't count on editors breaking hard links, and anyway the text
base is often not the same as the working file, when keyword expansion
or end-of-line translation is happening.

But we could maybe use hard-links for the properties, since only
Subversion itself edits those. Another solution would just be to put
all the properties (working and base) in a single file. Or, put the
properties and text base into one file per working file, with the text
base coming last in that file.

I would love to see a new issue filed about this, linking to this
thread. Would you be willing to file that? Your mail with inode
numbers is exactly the sort of quantitative evidence we needed to see.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Thu Jun 23 18:42:42 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.