Statistically speaking electronic equipment tends to fail towards the beginning of its life cycle or towards the end.
So when buying hardware to build a server, how do we know if we have a component or device that will fail early or towards the end of its life cycle?
We don’t know. So we stress the hell out of it before entrusting it with any data to see if it will fail (Fester takes a similar approach with underwear). If it doesn’t it is probably (statistically speaking) going to give good service. This is basically hardware validation.
The areas that usually get stress tested are the processor, memory and the HDDs in the server, although technically you can stress test anything in a computer if you have the relevant tool (Fester can be found stress testing his head with a hammer when he forgets his medication).
Stress testing usually takes the form of running a piece of software on the server that intensely and repeatedly tests (and therefore stresses) a particular component or device in the server (i.e. memory, processor, etc). The generic term for software of this type is “burn-in” software.
You can place a monitor and keyboard on the server or use IPMI to administer and observe the tests.
Fester puts the server in its final location at this point (in my case the living room) and monitors through IPMI. This is because when monitoring temperatures during the validation tests, I want to see how hot the server will get with the given ambient temperatures in the final location. This will give me a truer picture of how hot the server could get (mine is next to a radiator, not the smartest choice, but there were no other options without upsetting the psychopath).
This section discusses using discrete software tools to check out the CPU, memory, and hard drives on your system. Most of these tools, or equivalents, are also available in a single package on the Ultimate Boot CD. The structure of that download means that it can't be directly burned to a USB stick, but it can be burned to a CD or mounted via IPMI.