On Mittwoch, 4. Juli 2007, Talden wrote:
> > > A
> > > tar based approach will seek the length of the tar to evaluate
> > > differences to sync.
> >
> > It won't seek, it will just push that to the clients to apply.
> > And linear reading is on the >100MByte/sec range, eg. for a cheap stripe
> > set of two harddisks.
>
> Assuming of course that only one client pulls at a time. If not then
> you depending upon disk cache to avoid seeking. A large RAM
> investment might be coming your way.
The difference is still between having a few (self-synchronizing) points that
are read in a single file vs. having many files being read randomized.
See here:
Tar-file ===================================================================
^ ^
Client1 |
Client2
These two clients will synchronize, until they both send identical data -
because Client2 will have to fetch from disk, while Client1 can use already
cached data and is thus faster.
Tar-file ===================================================================
^ ^ ^ ^
Client1 | Client3&4 Client5-8
Client2
After a while there'll be (at most) a few points in the file being read; using
the anticipatory io-scheduler and/or read-ahead it's easy to only seek, say,
3 or 4 times a second while keeping the bandwidth at maximum speed.
Compare that with 8 clients reading physically discontinuos regions from the
harddisk, with additional CPU load ....
Sure, that data *should* be cached after the first reading ... but there's a
lot more to cache: indizes from the repository, inodes for the revision
files, and just the seeking around for "Client3&4" above takes most of the
harddisk bandwidth.
Regards,
Phil
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jul 4 10:48:13 2007