Re: DFS alternative for linux

From: Les Mikesell <lesmikesell_at_gmail.com>
Date: 2006-10-05 05:37:43 CEST

On Wed, 2006-10-04 at 20:10, Ruslan Sivak wrote:
> >
> > Since you know when the update needs to be done, do it with
> > rsync over ssh to as many other places as necessary, or ssh
> > an 'svn update' command. It is a good idea to wrap these
> > operations in control scripts from the start so things adding
> > servers, dropping them out of a load balancer during the update,
> > etc. can be added if/when needed and the users just run the same
> > script to make a change.
> >
> >
> rsync is still kind of slow on large datasizes. DFS is super slow when
> you have a lot of data to sync over, but once the data is there, if you
> update 1 file out of 50000 files, it will sync almost instantly. Rsync
> will have to check 50000 files.

Rsync can do that very quickly if you have sufficient RAM to
have the directory entries for these files in cache - or
if you can restrict the run to a smaller directory containing
the changed files.

> svn update won't work for things like people updating images to the
> webserver. The images get uploaded into the working copy, and
> eventually i go through and check them into the repo. So the only thing
> that might work here would be rsync, but like I mentioned before, that
> would be pretty slow.

Again, I wouldn't want this update to happen in production
without the tracking through the repository so you know
what changed and have the ability to back it out. However
if that's what you want, perhaps you can at least restrict
it to a subset of the directories which would speed up
an rsync script.

> The best solution would be some sort of filesystem that detects changes
> to the filesystem and sends out updates to the other cluster members.
> I'm sure there is a filesystem like that out there, I just haven't found
> it.

There is something called GFS which is supposed to work like that.
I've always been happy with the way rsync works, though, at
least on unix-like systems. Files are updated under new
temporary names, then renamed to replace the originals. The
rename is an atomic operation and unix filesystem semantics
allow programs that had opened the previous copy to continue
to access it's data and any subsequent opens get the new copy.
Programs never have to deal with partially modified copies.

> One alternative would be to somehow mount the repository as a folder,
> and then have apache sever files off that folder. When people upload
> something, it can be written straight into the repo, basically with
> webdav. My fear is that this would be kind of slow. Is it possible to
> mount the repo in linux as a folder?

If the machines are all on the same lan you could just NFS-mount
the one working copy into all machines and use the repository
as a backup. My server farms are distributed and rsync works
nicely - network issues don't cause any problems with the
servers accessing their own files and an incomplete transfer
would have no immediate effect and would be cleaned up on the
next attempt.

-- 
  Les Mikesell
  lesmikesell@gmail.com
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Received on Thu Oct 5 05:38:21 2006

This message: [ Message body ]
Next message: Troy Curtis Jr: "Re: svnadmin load using fsfs is agonizingly slow"
Previous message: Troy Curtis Jr: "Re: why subversion?"
In reply to: Ruslan Sivak: "Re: DFS alternative for linux"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]