[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Running a repository out of RAM?

From: Ryan Schmidt <subversion-2007b_at_ryandesign.com>
Date: 2007-06-18 01:04:52 CEST

On Jun 17, 2007, at 17:03, Troy Curtis Jr wrote:

> On 6/17/07, Carsten Koch wrote:
>
>> FSFS is a zillion tiny files, so what you can do is
>>
>> * use FSFS for your repository.
>>
>> * set vfs_cache_pressure to 0, so inodes will stay in RAM.
>>
>> * maybe preload the inodes to RAM by running a
>> find /path-to-repository > /dev/null
>> at boot time.
>>
>> If you still have lots of available RAM after that,
>> you could preload even the data to RAM by running a
>> find /path-to-repository -type f | xargs cat > /dev/null
>
> Thanks for the suggestions, but ultimately they are basically the same
> that I have seen. The only issue with this is that it doesn't really
> tell the kernel to keep particular inodes cached in RAM, and so they
> will get replaced if you do enough other disk I/O.
>
> This was the approach that I first investigated, but I would have to
> constantly have to do this to make sure the repo data stayed in RAM.
> Oh well.

I think you're right, that this isn't necessarily a Subversion
question, but a persistent reliable RAM disk question -- something
where reads were as fast as RAM, and writes always went to the hard
disk right away (or very soon) for security. That kind of software is
rather OS-specific; I could for example look into some Mac OS X RAM
disk software, but wouldn't know where to start for Linux or Windows.
I've also heard of hybrid hard disks that are now or will soon be
available which combine traditional hard disk platters with several
gigs of flash RAM, such that performances approaches that of a RAM
disk while ensuring no loss of data. However, you said you want to
use your existing abundant system RAM, so let's proceed with that.

I know you said you're using a BDB repository, but my suggestions
below probably require a FSFS repository.

If you can't find any reliable persistent RAM disk software, and can
only find the regular it's-gone-if-you-don't-manually-rescue-it kind,
maybe you can make a reliable RAM disk yourself. Again, I don't know
about Windows or Linux, but on Mac OS X (since 10.4) the kernel
immediately knows when a file is added anywhere, and can run scripts
you specify when files are added to specific places. So, conceivably,
one could write a set of scripts. One, the init script, would create
the RAM disk and copy the repository to it from the hard drive; you
would run this at server startup. Another, the update script, would
ensure that, whenever any new files appear on the RAM disk, they get
copied to the hard drive. A third script could compare the contents
of the RAM disk with the hard disk to ensure that the update script
is doing its job; you could run this every so often until you're sure
the system is solid.

Or you could certainly handle copying to the hard drive in the post-
commit (and post-revprop-change) hook, too. That would take care of
the revisions, anyway. If you go this way, you may want to leave the
hook scripts, config and other directories on the hard disk and not
move them to the RAM disk at all.

Presumably you would also want to inform your OS not to again cache
any files from the RAM disk in the disk cache... Don't know how the
OS would handle that by default... Also don't know how to configure
that kind of thing.

As proposed so far, your RAM usage will grow as your repository size
grows, and will never decrease. That sounds like a dangerous
situation to set up, since your RAM, though spacious, is still
finite. So I would advise taking the scripts a step farther to limit
the amount of data that would end up on the RAM disk. You could write
the init script so that only the most recent, say, 1000 revisions get
copied to the RAM disk, and for the rest of them, just make symlinks
to the real files on the hard disk. [1] For the update script, as new
revisions get added, remove old revisions from the RAM disk and
replace them with symlinks to their hard disk counterparts. This way,
active development on current revisions is fast, old revisions are
still available at regular disk speed, and you keep your RAM usage
from growing unbounded.

Perhaps this introduces a race condition where someone tries to
access a revision at exactly the moment that it has deleted, before
the symlink has been recreated? If so, maybe it's safer to defer
these replace-with-symlink operations until some non-peak usage time,
during which you might even want to disable all access to the
repository.

A better variation, though slightly harder to implement, would be to
limit the revisions on the RAM disk by the size they occupy, not just
by the quantity of revisions. In fact it's probably necessary to do
it this way, since your RAM disk will almost certainly be of a fixed
size. So all the RAM you're willing to dedicate to the RAM disk will
be used up all the time, so you'll want to maximize its use, so
you'll want to fill it up with as many revisions as possible, while
leaving enough room for the largest new revision you expect to have
committed.

[1] This strategy has been recommended before as a means of growing a
repository when you have run out of disk space without needing to
move the entire repository to a single larger disk or array -- just
symlink some of the old revisions off to another disk. Um... I was
sure this was described either in the book or in the FAQ but I can't
find it now.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon Jun 18 01:05:51 2007

This is an archived mail posted to the Subversion Users mailing list.