On Saturday 04 June 2005 13:12, Prateek Srivastava wrote:
> My machine is new and well configured.
> > Is your HD or memory starting to show its age?
I saw problems like this many times on new machines with load-intensive
software. It is common practice to produce machines with very cheap
components - why bother with ECC-RAM, good high-datarate cables, and
server-grade Harddisks if 95% of these machines will run Windows anyway and
users are used to blame it on the OS or their own inability?
Tips for diagnosis:
*let a memory tester run over night - if it does find something then you've
got VERY bad memory chips, if it doesn't you unfortunately won't know for
sure (if you run SuSE: it is on the install DVD, just boot into memtest
instead of into Linux)
*check that all disk cables are properly connected and don't have sharp
bends or even lose connectors
*have you switched on ATA-133 or other high-speed modes? (Linux: hdparm)
Switch them off and try again.
*Often you get indications of problems if you do dmesg (Linux-specific
though) - if you get lines like "hdb: lost interrupt" it means the
communication between your mainboard and the disk is bad.
*does the same problem occur on a different system? (if you have a real
server - not just a PC posing as one - use the server for this test) If
yes: it is either a problem with the OS (faulty library) or coincidence
(test on a third system) or really a problem with SVN. If no: you've got a
problem with the hardware.
Tips for the next system:
*never buy pre-configured systems from big end-consumer retailers - these
are Cheap[tm], I personally use very small computer shops, preferably those
that regularly deliver small servers to business customers and do the
support for those - these guys have the most experience with how to build a
good system that is still in your budget
*use workstation or server boards - ok they are expensive, but they have the
plus of requiring and supporting ECC-RAM:
*use ECC-RAM - DRAM cells are extremely susceptible to bit flips (the
probability of a flip goes up exponentially with temperature and with the
amount of bits in the system), ECC-RAM is twice as expensive because it a)
uses good chips (which passed ALL tests instead of only the barely
necessary ones) and b) is able to fix almost all bit-flip situations (I've
NEVER had problems with ECC, but constantly have them on normal cheap
chips)
*while we are on it: use server processors - they are also twice as
expensive because they had to endure twice as many tests and have a twice
as large margin (a processor that would be sold as a 2.5GHz consumer
machine would be sold as a 2.0GHz server machine)
*use cables that actually support high-data-rates and use server-grade
harddisks (the lower ranges of server-disks are not much more expensive
than consumer disks), don't use spin-down or disk suspend, since
server-disks are not optimised for that mode
*let your supplier/retailer built in powerful fans - there are fans that are
both powerful and silent today (I'll never do that myself again after I
ruined two mainboards with underpowered CPU-fans).
Ok, this all sounds terribly paranoid and expensive. Actually: it is. Both
of it. It comes from a lot of bad experience with "consumer grade" systems
and some great experiences with "server grade" workstations. You pay a lot
for that, but at least I can be sure that if something goes wrong now it is
the software.
Even if you don't want to go to these extremes: use one of those small shops
that do daily support for small business people and talk to them. They know
which kind of consumer-systems are often returned for repair and which
aren't... ;-)
Konrad
- application/pgp-signature attachment: stored
Received on Sat Jun 4 16:19:23 2005