Document “Single Points of Failure” in the Data Center DRP
One of the main tasks in developing and maintaining a data center disaster recovery plan is to make sure all the bases are covered. This generally includes data backup, UPS systems, fire alarms, restoration procedures, build procedures, etc.
Those are all good things to make sure are in place and working as expected. However, frequently, other situations go unnoticed until a disaster occurs and the recovery does not work as expected. These are things we frequently refer to as SPF’s, or Single Points of Failure. Simply stated an SPF is something that causes a direct or indirect failure in the disaster recovery planning process. Not a good thing.
Just what are some of these? Just to get the thought process going let me mention three that are frequently observed while doing risk assessments in data centers:
- Example: All telecommunications come into the facility through a central point in the data center. From there (voice, internet access, high speed lines, etc) are directed out to the end user community. Now consider the worst case scenario: the data center is completely destroyed by fire. How do you reconnect end users to the alternate site? Do you have a functional alternate path? Or, do you have a SPF.
- Example: All critical equipment in the data center should be supported by the UPS system. That allows the critical business functions to continue in the event of a power outage. By the way, power loss is still the most significant reason for an outage. As new equipment is added, is it placed on the proper circuits? Does that validation step take place on a regular basis? Do you have UPS support to all critical equipment or do you have a SPF?
- Example: The data center requires air conditioning to function properly. Does your data center have ample reserve capacity as in two AC units, both capable of adequate cooling capacity. Or, if you only have one AC unit do you really have a SPF? Open doors and fans are not a real good solution.
These are just three examples of situations that could indeed cause an outage in the data center. All three examples can be addressed ahead or time, or, ignored. When the data center goes down it costs money and time to the corporation. Prudent data center disaster recovery planning always looks for, documents, and lets management know about any SPF situations.
======================================
Jan Persson is the author of the GO.RECOVER-Data Center Disaster Recovery Template – a powerful yet easy-to-use tool for under $100.




