[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [bug report] SVN design issue: checkouts horribly space-inefficient

From: Russ Brown <pickscrape_at_gmail.com>
Date: 2005-06-23 19:16:22 CEST

kfogel@collab.net wrote:
> Marc Mutz <marc@klaralvdalens-datakonsult.se> writes:
>
>>I just noticed while checking out some KDE SVN modules that my inodes were
>>quickly depleted. After lots of searching I found that not my huge maildir
>>archive is the culprit, it's the .svn directory with it's props/prop-base
>>directories containing each one file for each object (file or dir) stored in
>>the repository. On a typical ext2/3 filesystem, this leads to a waste of
>>almost 98% of space:
>>
>>$ cd kdelibs:
>>$ du -sh .svn/props
>>96K .svn/props
>>$ wc -c .svn/props/* | grep total
>>1714 total
>>
>>This means, that - absent tail-end optimizations in the filesystem (which
>>ext2/3 doesn't have) - the space efficiency is 1.75% (1714/96K).
>>
>>Same for prop-base.
>>
>>Now, the amount of data this represents is
>>$ du -sch .svn/text-base/* | grep total
>>272K total
>>$ wc -c .svn/text-base/*|grep total
>>224236 total
>>-> 80.5% efficiency.
>>
>>Now, the bad thing is that props and prop-base are identical:
>>$ for i in $(cd .svn/props/; echo *.svn-work ); do
>> diff-u .svn/props/$i .svn/prop-base/${i/-work/-base}
>>done
>>
>>So a lot of inodes could be spared by simply hard-linking the files in those
>>dirs. brings up the efficiency to 3.5%. Same thing could (optionally) be done
>>for text-base, on the assumption that most editors break that link during
>>saving.
>>
>>That would halve the overall space inefficency.
>>
>>As it is now, svn uses about 4x more inodes for the same checkout as cvs. I
>>expected 50% both in space and inodes, due to the offline-diff capability.
>>
>>In ten years of using Unix, I've never been _near_ the inode limit, though
>>I've often been permanently in 90%+ disk-full mode.
>>
>>I hope you can apply these simple "optimizations" in the next release. They
>>typically speed up diff's, too, see the difference between
>> cp -ra old new-1
>> cp -la old new-2
>> time diff -ur old new-1
>> time diff -ur old new-2
>
>
> Totally agree with your analysis of the problem. The solutions are
> not so simple, however.
>
> One can't count on editors breaking hard links, and anyway the text
> base is often not the same as the working file, when keyword expansion
> or end-of-line translation is happening.
>
> But we could maybe use hard-links for the properties, since only
> Subversion itself edits those. Another solution would just be to put
> all the properties (working and base) in a single file. Or, put the
> properties and text base into one file per working file, with the text
> base coming last in that file.
>
> I would love to see a new issue filed about this, linking to this
> thread. Would you be willing to file that? Your mail with inode
> numbers is exactly the sort of quantitative evidence we needed to see.
>

I don't know how svk does it, but I have the entire repository mirrored
on my machine (including all branches) and the trunk checked out, and it
occupies a combined total of about 700MB, which compares very well with
the svn trunk checkout of 600MB. Plus, the svk trunk checkout doesn't
have any extra directories. It's effectively a clean copy of the trunk.

I immediately start to save disk space once I check out one or more
branches.

I suspect that it could work by not bothering with the pristine files at
all, since the repository is mirrored there's no network bandwidth to
worry about when accessing the files directory from the mirrored repository.

-- 
Russ.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Thu Jun 23 19:24:32 2005

This is an archived mail posted to the Subversion Users mailing list.