The Basics of Root Cause Analysis
Root cause analysis (RCA) is a process designed for use in investigating and categorizing the root causes of incidents with safety, health, environmental, quality, reliability and production impacts.
Simply stated, RCA is a tool designed to help identify not only what and how an incident occurred, but also why it happened. Only when investigators are able to determine why an incident or failure occurred will they be able to specify workable corrective measures that prevent future incidents of the type observed.
Understanding why an incident occurred is the key to developing effective recommendations. Imagine an occurrence during which an operator is instructed to close valve A; instead, the operator closed valve B. The typical investigation would probably conclude operator error was the cause.
While this is an accurate description of what happened and how it happened, if the analysts stop here, they have not probed deeply enough to understand the reasons for the mistake. Therefore, they do not know what to do to prevent it from occurring again.
In the case of the operator who closed the wrong valve, we are likely to see recommendations such as retrain the operator on the procedure, remind all operators to be alert when manipulating valves or emphasize to all personnel that careful attention to the job should be maintained at all times. Such recommendations do little to prevent future occurrences.
Generally, mistakes do not just happen but can be traced to some well-defined causes. In the case of the valving error, we might ask, “Was the procedure confusing? Were the valves clearly labeled? Was the arrangement of the valves unusual? Was the operator familiar with this particular task?”
The answers to these and other questions will help determine why the error took place and what the organization can do to prevent recurrence. In the case of the valve error, example recommendations might include revising the procedure or performing procedure validation to ensure references to valves match the valve labels found in the field.
Identifying root causes is the key to preventing similar recurrences. An added benefit of an effective RCA is that, over time, the root causes identified across the population of occurrences can be used to target major opportunities for improvement.
If, for example, a significant number of analyses point to procurement inadequacies, then resources can be focused on improvement of this management system. Trending of root causes allows development of systematic improvements and assessment of the impact of corrective programs.
Although there is substantial debate on the definition of root cause, we use the following:
- Root causes are specific underlying causes.
- Root causes are those that can reasonably be identified.
- Root causes are those management has control to fix.
- Root causes are those for which effective recommendations for preventing recurrences can be generated.
Root causes are underlying causes. The investigator’s goal should be to identify specific underlying causes. The more specific the investigator can be about why an incident occurred, the easier it will be to implement recommendations that will prevent recurrence.
Root causes are those that can reasonably be identified. Occurrence investigations must be cost beneficial. It is not practical to keep valuable manpower occupied indefinitely searching for the root causes of occurrences. Structured RCA helps analysts get the most out of the time they have invested in the investigation.
Root causes are those over which management has control. Analysts should avoid using general cause classifications such as operator error, equipment failure or external factor. Such causes are nonspecific enough to allow management to make effective changes. Management needs to know exactly why a failure occurred before action can betaken to prevent recurrence. We must also identify a root cause that management can influence. Identifying “severe weather” as the root cause of parts not being delivered on time to customers is not appropriate. Severe weather is not controlled by management.
Root causes are those for which effective recommendations can be generated. Recommendations should directly address the root causes identified during the investigation. If the analysts arrive at vague recommendations such as, “Improve adherence to written policies and procedures,” then they probably have not found a basic and specific enough cause and need to expend more effort in the analysis process.
The Four Major Root Cause Analysis Steps
RCA is a four-step process involving the following:
- Data collection.
- Identifying causal factors
- Root cause identification.
- Recommendation generation and implementation.
Step one—data collection. The first step in the analysis is to gather data. Without complete information and an understanding of the incident, the causal factors and root causes associated with the incident cannot be identified. The majority of time spent analyzing an incident is spent in gathering data.
Step two—identifying causal factors. A number of techniques can be used in this step. Timelines, cause and effect trees, and causal factor charting can all be used to identify causal factors. Selection of the appropriate method is a tradeoff between the level of rigor and the time invested into the analysis. For example, timelines can be developed very quickly, but they are also the least rigorous of the three methods. Causal factor charting is the most rigorous approach, but also takes the greatest amount of time to develop.
Regardless of the technique selected, the analysts should begin using the technique from the outset of the investigation. The analysis tool provides a structure for investigators to organize and analyze the information gathered during the investigation and identify gaps and deficiencies in knowledge as the investigation progresses. The analysis technique should drive the data collection process by identifying data needs.
Data collection continues until the investigators can identify the causal factors associated with the incident. Causal factors are performance gaps for front-line personnel and equipment. Performance gaps are differences between the desired performance in the actual performance for the personnel or equipment involved. For example, the operator closing valve A instead of valve B would be a performance gap. In this case, the desired performance is to close valve B and the actual performance was closing valve A. as another example, a pump is operating at 100 gallons per minute (GPM) when it needs to operate at 80 GPM. In this case the desired performance is 80 GPM, and the actual performance is 100 GPM.
It is important to identify the causal factors at this stage of the analysis. If a causal factor is missed here, then the analyst will not consider it during the subsequent stages of the analysis.
In many traditional analyses, the most visible causal factor is given all the attention. Often, once the first causal factor is identified the team declares victory and moves on to the next stage of the analysis. Rarely, however, is there just one causal factor; incidents are usually the result of a combination of contributors. Complex systems tend to fail in complex ways. When only one obvious causal factor is addressed, the list of recommendations will likely not be complete. Consequently, the occurrence may repeat itself because the organization did not learn all that it could from the incident.
Step three—root cause identification. After all the causal factors have been identified, the investigators begin root cause identification. This step involves the use of a decision diagram called the Root Cause Map™ (this can be downloaded from ABS Consulting’s web site) to identify the underlying reason or reasons for each causal factor. The Map structures the reasoning process of the investigators by helping them answer questions about why particular causal factors exist or occurred. The identification of root causes helps the investigator determine the reasons the incident occurred so the problems surrounding the occurrence can be addressed. This step is rarely performed unless a formalized root cause analysis program is in place.
Step four—recommendation generation and implementation. The next step is the generation of recommendations. Following identification of the root causes for each causal factor, achievable recommendations for preventing their recurrence are then generated. In order to ensure that recommendations are addressed for all the causes identified, the following four levels of recommendations must be addressed.
- Address the causal factor
- Prevent recurrence of this specific causal factor
- Address the generic implications of the causal factor and associated root causes
- Address the root causes associated with the causal factor
The root cause analyst is often not responsible for the implementation of recommendations generated by the analysis. However, if the recommendations are not implemented, the effort expended in performing the analysis is wasted. In addition, the incidents that triggered the analysis should be expected to recur.
Organizations need to ensure that recommendations are tracked to completion. Implementation of recommendations needs to be tracked so that recommendations that are not implemented in a timely manner can be identified and actions can be taken to address the causes of the delays.
Presentation of Results
The appropriate formatting of the analysis results can greatly enhance the ability of others to perform an effective review. A 3-column root cause summary table should be used to make it easy to see the connection between the causal factors, the associated root causes, and the recommendations developed a by the team. In the first column, list the causal factor along with sufficient background information for the reader to be able to understand the need to address this causal factor. The second column shows the root causes, including the path or paths through the Root Cause Map associated with the causal factor. The third column presents recommendations to address the causal factor and each of the root causes identified. Use of this three-column format aids the investigator in ensuring root causes and recommendations are developed for each causal factor and root causes. This format also assists the report reviewers because it makes the connection between causes and recommendations very clear. Like all requests for changes to the equipment or processes used at the facility, recommendations from a root cause analysis should be reviewed as part of the organization’s management of change process.
Common root cause analysis program problems
Many organizations have a root cause analysis process. However, they often fail to live up to the organization’s expectations. Many of these problems are not do to the root cause analysis techniques utilized, but are more fundamental issues within the organization. A few examples of common RCA program issues:
- Focus on blaming individuals. Many organizations use a very simple “RCA” process – figure out who made the mistake and tell them not do that anymore. Most personnel in our organizations don’t choose to do harm to the organization. In most cases, they’re doing the best they can to get the job done.
The behavior of the individuals in the organization is strongly influenced by the management systems that are in place. For example, if we make it very difficult for personnel to access procedures they will tend to perform tasks without them. If we make it very difficult to make changes to facility drawings, it shouldn’t be surprising that personnel don’t submit drawing changes for minor discrepancies. Your organization has set up their training, procedure, supervision, design, communication, procurement, hazard analysis, and maintenance programs to encourage specific personnel behaviors and discourage other specific personnel behaviors. If personnel are not behaving consistent with our expectations, we should be looking at the systems used to drive that behavior – the organizational management systems.
- Root cause analysis is only used to investigate the large losses. Some organizations only use the root cause analysis process when a major loss has occurred. They think the time invested in performing an RCA for a minor loss is wasted. However, the performance gaps and the underlying organizational management systems that generate the major losses can often be identified and resolved by investigating smaller losses. This can prevent the large losses before they occur.
- Failure to focus on long-term performance. Root cause analysis programs are focused on improvements to long-term performance. It is difficult, sometimes, to get personnel to think long term when they are faced with multiple crises that must be dealt with today. As a result, many organizations fail to allocate adequate time to root cause analyses and learning from experience in general. They are too busy “doing” to think if what they are “doing” is the best way to get the job done. To prevent dealing with the same problems over and over an organization must allocate some time to solving tomorrow’s problems today.
- Rewarding those that create crises and punishing those that avoid them. This is a corollary to the previous bullet. In many organizations, the heroes of the company are those that come in during the crisis and “save” the company. Those in the organization that plan ahead, learn from experience, and avoid the crises to begin with are seen by management as run-of-the-mill employees. They never have crises to solve, and therefore are never “put to the test.” Our personnel see that the personnel that excel in crises is are rewarded, even if the crisis has been created by them. Therefore, if you want to advance in the organization you shouldn’t put too high a value on planning ahead and learning from experience because crisis avoidance could hamper your career.
Regardless of the RCA technique used, the organization must address these common RCA program problems in order to encourage personnel to value learning from experience.
- Ferry, Ted S., Modern Accident Investigation and Analysis, second edition, John Wiley and Sons, 1988.
- Guidelines for Investigating its Chemical Process Incidents, 2nd Edition, American Institute of Chemical Engineers, Center for Chemical Process Safety, 2003.
- ABS Consulting Inc., Root Cause Analysis Handbook: A Guide to Effective and Efficient Incident Investigation, Rothstein Associates Inc., 2008.
- James Reason, Managing the Risks of Organizational Accidents, Ashgate, 1997
- Henry Petroski, Success Through Failure: The Paradox of Design, Princeton University Press, 2006
About the Author
ABS Consulting, an ABS Group Company, has been providing root cause analysis (RCA) services, training and software to our clients for over 15 years. We assist organizations in setting up RCA programs, providing on-site incident investigation services, RCA training, and RCA software. Through our experience with using, teaching, and managing RCA programs, ABS Consulting has developed an RCA/incident investigation process that really works. Over 5,000 individuals and companies are using the methods detailed in this handbook.
In addition to our RCA-related work, ABS Consulting’s 1,400+ consultants provide consulting, training, and software in over 70 risk-assessment and -management topics on six continents and our training services division has trained over 50,000 personnel.
Visit us at www.absconsulting.com or call us at 865-966-5232.
The 3rd edition of the Root Cause Analysis Handbook: A Guide to Efficient and Effective Incident Investigation is now available.
By ABS Consulting – Lee N. Vanden Heuvel, Donald K. Lorenzo, Randal L. Montgomery, Walter E. Hanson, and James R. Rooney
- A 17 inch by 22 inch pull-out Root Cause Map;
- CD-ROM, PLUS
- Dedicated Web Resources (registration required)!
Tags: Root Cause Analysis