Re: Running a repository out of RAM?

From: Troy Curtis Jr <troycurtisjr_at_gmail.com>
Date: 2007-06-20 00:57:58 CEST

On 6/17/07, Ryan Schmidt <subversion-2007b@ryandesign.com> wrote:
> On Jun 17, 2007, at 17:03, Troy Curtis Jr wrote:
>
> > On 6/17/07, Carsten Koch wrote:
> >
> >> FSFS is a zillion tiny files, so what you can do is
> >>
> >> * use FSFS for your repository.
> >>
> >> * set vfs_cache_pressure to 0, so inodes will stay in RAM.
> >>
> >> * maybe preload the inodes to RAM by running a
> >> find /path-to-repository > /dev/null
> >> at boot time.
> >>
> >> If you still have lots of available RAM after that,
> >> you could preload even the data to RAM by running a
> >> find /path-to-repository -type f | xargs cat > /dev/null
> >
> > Thanks for the suggestions, but ultimately they are basically the same
> > that I have seen. The only issue with this is that it doesn't really
> > tell the kernel to keep particular inodes cached in RAM, and so they
> > will get replaced if you do enough other disk I/O.
> >
> > This was the approach that I first investigated, but I would have to
> > constantly have to do this to make sure the repo data stayed in RAM.
> > Oh well.
>
> I think you're right, that this isn't necessarily a Subversion
> question, but a persistent reliable RAM disk question -- something
> where reads were as fast as RAM, and writes always went to the hard
> disk right away (or very soon) for security. That kind of software is
> rather OS-specific; I could for example look into some Mac OS X RAM
> disk software, but wouldn't know where to start for Linux or Windows.
> I've also heard of hybrid hard disks that are now or will soon be
> available which combine traditional hard disk platters with several
> gigs of flash RAM, such that performances approaches that of a RAM
> disk while ensuring no loss of data. However, you said you want to
> use your existing abundant system RAM, so let's proceed with that.
>
> I know you said you're using a BDB repository, but my suggestions
> below probably require a FSFS repository.
>
> If you can't find any reliable persistent RAM disk software, and can
> only find the regular it's-gone-if-you-don't-manually-rescue-it kind,
> maybe you can make a reliable RAM disk yourself. Again, I don't know
> about Windows or Linux, but on Mac OS X (since 10.4) the kernel
> immediately knows when a file is added anywhere, and can run scripts
> you specify when files are added to specific places. So, conceivably,
> one could write a set of scripts. One, the init script, would create
> the RAM disk and copy the repository to it from the hard drive; you
> would run this at server startup. Another, the update script, would
> ensure that, whenever any new files appear on the RAM disk, they get
> copied to the hard drive. A third script could compare the contents
> of the RAM disk with the hard disk to ensure that the update script
> is doing its job; you could run this every so often until you're sure
> the system is solid.
>
> Or you could certainly handle copying to the hard drive in the post-
> commit (and post-revprop-change) hook, too. That would take care of
> the revisions, anyway. If you go this way, you may want to leave the
> hook scripts, config and other directories on the hard disk and not
> move them to the RAM disk at all.
>
> Presumably you would also want to inform your OS not to again cache
> any files from the RAM disk in the disk cache... Don't know how the
> OS would handle that by default... Also don't know how to configure
> that kind of thing.
>
> As proposed so far, your RAM usage will grow as your repository size
> grows, and will never decrease. That sounds like a dangerous
> situation to set up, since your RAM, though spacious, is still
> finite. So I would advise taking the scripts a step farther to limit
> the amount of data that would end up on the RAM disk. You could write
> the init script so that only the most recent, say, 1000 revisions get
> copied to the RAM disk, and for the rest of them, just make symlinks
> to the real files on the hard disk. [1] For the update script, as new
> revisions get added, remove old revisions from the RAM disk and
> replace them with symlinks to their hard disk counterparts. This way,
> active development on current revisions is fast, old revisions are
> still available at regular disk speed, and you keep your RAM usage
> from growing unbounded.
>
> Perhaps this introduces a race condition where someone tries to
> access a revision at exactly the moment that it has deleted, before
> the symlink has been recreated? If so, maybe it's safer to defer
> these replace-with-symlink operations until some non-peak usage time,
> during which you might even want to disable all access to the
> repository.
>
> A better variation, though slightly harder to implement, would be to
> limit the revisions on the RAM disk by the size they occupy, not just
> by the quantity of revisions. In fact it's probably necessary to do
> it this way, since your RAM disk will almost certainly be of a fixed
> size. So all the RAM you're willing to dedicate to the RAM disk will
> be used up all the time, so you'll want to maximize its use, so
> you'll want to fill it up with as many revisions as possible, while
> leaving enough room for the largest new revision you expect to have
> committed.
>
>
> [1] This strategy has been recommended before as a means of growing a
> repository when you have run out of disk space without needing to
> move the entire repository to a single larger disk or array -- just
> symlink some of the old revisions off to another disk. Um... I was
> sure this was described either in the book or in the FAQ but I can't
> find it now.
>
>
>

A very interesting and thoroughly explained idea! It seems like it
could be made to work but I don't think the slight speed improvement
would justify such a complicated and potentially unreliable on a
production repository! Especially since it seems like several of my
recent scripts have been behaving in strange and unexpected ways :)
However, it is a very interesting idea nonetheless.

It does seem like FSFS is a little more flexible in a lot of ways.
For instance, with FSFS I could conceivably do a periodic "cat * >
/dev/null" in the directory containing the revisions to constantly
keep them in the native linux disk cache. Apparently even a read only
operation like this is not really safe to do on a bdb repo (or at
least a copy is not safe).

Thanks a lot!

-- 
"Beware of spyware. If you can, use the Firefox browser." - USA Today
Download now at http://getfirefox.com
Registered Linux User #354814 ( http://counter.li.org/)
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Received on Wed Jun 20 00:58:12 2007

This message: [ Message body ]
Next message: Matthew Hannigan: "Re: Running a repository out of RAM?"
Previous message: PB: "Permission denied on SVN commit"
In reply to: Ryan Schmidt: "Re: Running a repository out of RAM?"
Next in thread: Matthew Hannigan: "Re: Running a repository out of RAM?"
Reply: Matthew Hannigan: "Re: Running a repository out of RAM?"
Reply: Ryan Schmidt: "Re: Running a repository out of RAM?"
Reply: Toby Thain: "Re: Running a repository out of RAM?"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]