Data center stress testing allows you to detect a potential problem before it manifests itself as errors, failures and failure of key systems. But the effectiveness and value of tests is directly proportional to the quality of their conduct.
How do you organize the process to get the most out of it?
Methodology – NI base
For each specific data center, it is necessary to develop an individual methodology. It will be necessary at each NI conduct; over time, adjustments may be made to the original document.
The technique solves several important problems.
- First, to guarantee the safety of the research institute.
You can’t just randomly display a particular system for testing. The sequence and procedure of checks must be calculated in advance so that the tests do not affect the operation of the entire data center. In addition, the methodology should include rapid recovery plans in the event of failures. This is especially important for data centers that have been in operation for a long time and have not been tested for a long time.
- Secondly, to ensure the effectiveness of research.
The simplest approach can be used. For example, for a UPS: take it out of service, switch it to battery operation, measure the operating time, fix it and turn it on again. A very simple technique – only the battery life is recorded. is that enough? In fact, this will only show that everything is fine at the moment.
The main goal of NI is to make sure that everything will continue to be normal, until next year, until the next test. In the same example with a UPS, you need to use special tools, connect them to different areas with different conditions and watch how it behaves.
All stages of this process for each system must be spelled out in the methodology. This is the only way to conduct full testing and detailed diagnostics to see potentially problem areas.
Thus, a complete and competent methodology is almost half the success. Its preparation will require a lot of effort from the data center operations department. Involvement of experts specializing in conducting research will allow solving this problem faster and more efficiently. Our team is ready to help with the development of the methodology.
When the methodology is written and agreed upon, it remains to decide whether to conduct the tests on its own, guided by the plan, or entrust the implementation to specialists with experience in this direction.
Conducting NI on our own
What will the operations department need to carry out stress tests on their own? In addition to the methodology, a set of equipment is needed that will simulate various operating conditions, increasing the load to maximum values:
- load machine (1 MW and above) for connection to UPS and generator,
- heat guns to check the cooling system,
- hardware and software systems with diagnostic functions for evaluating the results
and other devices, depending on the configuration of the data center.
The purchase of such equipment makes sense for owners of large or several data centers, with a high frequency of testing. If the content is not very large, tests are required no more than once a year – it is more profitable to use a rental.
You can rent equipment for NI in our company. We will provide not only the machines themselves in a convenient mobile design, but also all the necessary components – connection cables, circuit breakers, etc.
In addition, our engineers can provide support to the extent you require. For example, if there are only IT specialists in the operation department, our electrician with the appropriate approvals and experience in the data center field can go for testing.
If required, we can take over the entire NI process, providing both equipment and a team of specialists.
Involvement of research specialists
A subtle question: why hire outside specialists if you have your own maintenance service?
The difference is in the profile qualifications and experience, which play an important role in the process of carrying out the research itself, and in the quality of its results.
A competent IT specialist with experience in data center maintenance can undergo training, gain a certain amount of knowledge and carry out the necessary operations. But, performing these works once a year, it is impossible to understand all the subtleties and gain experience.
In the process of NI, it is necessary to use different equipment, measure many parameters. The data generated by the hardware and software complex needs to be compared and analyzed.
A simple analogy: car diagnostics. A car enthusiast who has an understanding of the engine or chassis design can search for the problem using the instructions. In the end, he will find it – after spending a lot of time reading the documentation and going through the entire system. Whereas a master in a car service will understand where to look, as soon as he will receive information about the “symptoms”.
Likewise, a specialist who does NI every day will look deeper and notice more. He will pay attention to those control points that are critical, more susceptible to wear, or more affect the result.
Technology is getting smarter – diagnostic machines produce a lot of data. But the equipment cannot test itself: the final level of analytics and understanding of the situation still lies with the engineer. Two specialists can look at the same log file. One will see individual deviations, and the other will see a sequence of signals that indicate a specific problem. Only experience gives this level of understanding.
Another nuance is associated with the choice of solutions. A specialist who rarely encounters a problem most often stops at a spot replacement of parts. An experienced engineer knows that sometimes replacing a single part is pointless: the problem is broader, and only replacing the assembly will prevent future breakdowns.
For example, a common situation. The engineer replaces the UPS battery without changing the worn out capacitors in the DC machine. The ripple continues and wears out the new battery, reducing its lifespan from five years to two.
Another case is the replacement of the output filter capacitors, which heat up the chokes. This is not enough, the chokes will still fail.
Well, and the simplest example, when part of the batteries from the array is changed, mistakenly believing that this will extend the service life.
These are just three examples from our practice. An experienced technician, faced with many possible situations, knows hundreds of such details. It is impossible to obtain this knowledge from instructions.
How do we conduct NI?
Testing takes on average 1 to 4 hours for each type of equipment. Taking into account the time for connecting diagnostic systems, we will check 2-4 devices per day. The total duration of NI depends on the scale of the data center – from 1 day to a week.
Based on the test results, the customer receives a detailed report. It lists all the testimonies that were taken. Oscillograms showing the behavior of the equipment under certain modes are given. Conclusions have been formulated whether the device is operating normally or deviations from the norm have been noticed.
The engineer gives written recommendations: how to eliminate problem areas, which components to replace in order to bring the system to optimal condition. For example, if we find that the battery has lost capacity under load, we will inform you that it will need to be replaced in the next year.
Which test method should you choose? The decision is always individual and remains at the discretion of the owner of the data center and the operation service. We advise you to remember the main thing: high-quality NI are necessary for the stable operation of the data center, and the savings in the end can be much more expensive. Our team is ready to provide assistance in any volume: develop a methodology, provide equipment, take over the entire process and be responsible for the result.