Root Cause Analysis for Supporting Fault Identification
In order to support today’s fast-paced business operations, data centers must minimize downtime. This white paper discusses the need to detect system disruptions quickly and explores the challenges of diagnosing system failures.
Discover root cause analysis tools that address the following scenarios:
- Fibre channel switch failures
- Inaccessible virtual servers
- Cable disconnections
No data center can afford downtime with the manic pace of today’s always available business operations. But for many midsized and large businesses, ensuring that mission critical systems perform efficiently and reliably amid the complexity of multivendor, multiplatform infrastructures can become an insurmountable task without the right tools. IT administrators need all the help they can get to quickly find and diagnose system disruptions or failures.
The faster the problem is found, the faster it can be solved.
But getting to the root cause of the failure event is often a problem in itself. To remove the fault from a device in the data center, the administrator must be able to identify where the root cause of the fault has occurred. In most cases, this is a difficult task when that device is connected to other devices and has caused a chain reaction of subsequent alert messages.
Also, the administrator may not have sufficient knowledge of the system configuration or related device dependencies. As a result, fault identification becomes time consuming and the duration of disruption lengthens, which can affect production or data availability.
Event alerts are defined and analyzed through monitoring software. While some enterprise-class software uses intelligent algorithms and rules-based inference technology to guide administrators to the fault node, midmarket products usually offer separate tools for checking disruption events and dependencies. So while alerts for multiple devices may notify the administrator of a problem, they do not divulge why or where the fault has actually occurred.
Root Cause Analysis provides administrators with a quick path to fault identification. Ideal for administrators wishing to reduce mean time to repair, Root Cause Analysis uses patent pending technology to usher the user directly to the node causing the fault, without requiring in-depth knowledge of separate environments or time consuming investigation.
See Hitachi IT Operations Analyzer: Root Cause Analysis for Supporting Fault Identification, by Yutaka Kudo and Saurabh (Manu) Batra.
=======================================================
The 3rd edition of the Root Cause Analysis Handbook: A Guide to Efficient and Effective Incident Investigation, by ABS Consulting – Lee N. Vanden Heuvel, Donald K. Lorenzo, Randal L. Montgomery, Walter E. Hanson, and James R. Rooney, is the definitive book on Root Cause Analysis.
Includes:
- A 17 inch by 22 inch pull-out Root Cause Map
- CD-ROM
- RCA/incident investigation software
- Dedicated Web Resources (registration required)!
Only $129.00.
Tags: ABS Consulting, Failure mode and effects analysis, Fault tree analysis, FMEA, Forensic engineering, RCA, Root Cause Analysis, Root Cause Identification


