Re: Running a repository out of RAM?

From: Ryan Schmidt <subversion-2007b_at_ryandesign.com>
Date: 2007-06-20 03:06:06 CEST

On Jun 19, 2007, at 17:57, Troy Curtis Jr wrote:

> On 6/17/07, Ryan Schmidt wrote:
>
>> If you can't find any reliable persistent RAM disk software, and can
>> only find the regular it's-gone-if-you-don't-manually-rescue-it kind,
>> maybe you can make a reliable RAM disk yourself. Again, I don't know
>> about Windows or Linux, but on Mac OS X (since 10.4) the kernel
>> immediately knows when a file is added anywhere, and can run scripts
>> you specify when files are added to specific places. So, conceivably,
>> one could write a set of scripts. One, the init script, would create
>> the RAM disk and copy the repository to it from the hard drive; you
>> would run this at server startup. Another, the update script, would
>> ensure that, whenever any new files appear on the RAM disk, they get
>> copied to the hard drive. A third script could compare the contents
>> of the RAM disk with the hard disk to ensure that the update script
>> is doing its job; you could run this every so often until you're sure
>> the system is solid.
>>
>> Or you could certainly handle copying to the hard drive in the post-
>> commit (and post-revprop-change) hook, too. That would take care of
>> the revisions, anyway. If you go this way, you may want to leave the
>> hook scripts, config and other directories on the hard disk and not
>> move them to the RAM disk at all.
>>
>> Presumably you would also want to inform your OS not to again cache
>> any files from the RAM disk in the disk cache... Don't know how the
>> OS would handle that by default... Also don't know how to configure
>> that kind of thing.
>>
>> As proposed so far, your RAM usage will grow as your repository size
>> grows, and will never decrease. That sounds like a dangerous
>> situation to set up, since your RAM, though spacious, is still
>> finite. So I would advise taking the scripts a step farther to limit
>> the amount of data that would end up on the RAM disk. You could write
>> the init script so that only the most recent, say, 1000 revisions get
>> copied to the RAM disk, and for the rest of them, just make symlinks
>> to the real files on the hard disk. [1] For the update script, as new
>> revisions get added, remove old revisions from the RAM disk and
>> replace them with symlinks to their hard disk counterparts. This way,
>> active development on current revisions is fast, old revisions are
>> still available at regular disk speed, and you keep your RAM usage
>> from growing unbounded.
>>
>> Perhaps this introduces a race condition where someone tries to
>> access a revision at exactly the moment that it has deleted, before
>> the symlink has been recreated? If so, maybe it's safer to defer
>> these replace-with-symlink operations until some non-peak usage time,
>> during which you might even want to disable all access to the
>> repository.
>>
>> A better variation, though slightly harder to implement, would be to
>> limit the revisions on the RAM disk by the size they occupy, not just
>> by the quantity of revisions. In fact it's probably necessary to do
>> it this way, since your RAM disk will almost certainly be of a fixed
>> size. So all the RAM you're willing to dedicate to the RAM disk will
>> be used up all the time, so you'll want to maximize its use, so
>> you'll want to fill it up with as many revisions as possible, while
>> leaving enough room for the largest new revision you expect to have
>> committed.
>
> A very interesting and thoroughly explained idea! It seems like it
> could be made to work but I don't think the slight speed improvement
> would justify such a complicated and potentially unreliable on a
> production repository! Especially since it seems like several of my
> recent scripts have been behaving in strange and unexpected ways :)
> However, it is a very interesting idea nonetheless.

I thought the same thing as I was writing it down. It is a bit
complicated and potentially error-prone. If you do decide to pursue
it, of course you'd want to extensively stress-test it before
deploying it. Write (or find) a script that does lots of commits and
checkouts, and run it on several machines at once, and make sure no
weird errors occur. You could also use this as a means of comparing
performance. For example, run such a script against a normal disk
repo, and then try again with the proposed RAM disk repo, and see how
the performance compares.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Wed Jun 20 03:06:17 2007

This message: [ Message body ]
Next message: Eric Hanchrow: "Re: Where do I get an rpm?"
Previous message: Ryan Schmidt: "Re: Permission denied on SVN commit"
In reply to: Troy Curtis Jr: "Re: Running a repository out of RAM?"
Next in thread: Toby Thain: "Re: Running a repository out of RAM?"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]