Root Cause Analysis – Chronic Events: Panning For Gold
Robert J. Latino, Reliability Center, Inc.
When we look at the widely used and misunderstood tool of root cause analysis (RCA), we should reflect its interpretation in our own environments. Think about it: when is RCA typically requested and applied in your environment? Based on my experience, it is requested and applied when:
- Someone is injured
- There is a catastrophic damage
- There is an environmental incident
- There is a “near miss”
- There is public scrutiny over an issue at the site
- There is a quality issue that a customer is complaining about
What do all of these issues have in common? They are high visibility events that require immediate action at the request of authority. Usually in these circumstances, resources, time, and money are not an issue because of the level of management that is requesting the analyses be done. While Root Cause Analysis needs to be, and will be, done under these circumstances, it is not the optimal use of such a disciplined methodology.
Utilizing a modified version of failure modes and effects analysis (FMEA), consider a one-time fire at a facility that results in $500,000 worth of damages. Costs such as these are unanticipated and not part of the budget, yet we almost always find the cash to recover. The accountants typically will use creative techniques to soften the blow such as amortizing the cost of the event over a 20-year period. The resulting impact would be viewed as $25,000/yr which is much more acceptable. Using the modified FMEA format, such a line item might look like the top item in the accompanying section “Comparative Impact of Failure Events”.
Now consider a chronic event such as conveyor belts that trip in a mining operation. On their individual impact they may take 15 minutes to reset. This 15-min period requires the attention of a person, which at a typical standard rate ($40/hr with benefits included) results in a cost per event of $10 (0.25 hr x $40/hr labor rate).
Because the event simply requires a person to find and reset the tripped conveyor system, generally no additional parts costs are involved. However, the 15-min delay causes a production loss upstream in the processing area, which equates to $5000/hr. Fifteen minutes now is worth $1250/occurrence (0.25 hr x $5000/hr production loss). So each 15-min occurrence is now worth $1260 ($10 labor + $1250 lost production). Still considered a relatively low impact, right?
Now consider on this particular conveying system, we experience 40 such stoppages a week or 2080 for the year. Now we are looking at an annual impact to the bottom line of $2,620,800 ($1260/occurrence x 2080 occurrences).
The chronic event is approximately 100 times more costly, yet which event gets the most attention – the one-time fire or the continual tripping of a conveyor system? We all know the answer; the fire gets the attention because it is highly visible and requires urgent response. The chronic event has been accepted as a cost of doing business and is considered part of the job. Herein lies the problem. Chronic events are never aggregated on an annual basis. They are typically viewed on their individual impacts.
Consider if we were to apply this modified FMEA format to an operation, a process, or a facility. We would seek out these hidden “nuggets” and determine their annual impact in dollars. This would tell us what the “carrot” was, and whether or not they were worth conducting a formal RCA on, experience shows through the Pareto Principle, that when such a list is aggregated, the 20 percent or less of the events identified account for 80 percent or more of the dollars lost. This is a good technique to provide focus for a disciplined RCA effort.
So where does the data come from to populate this type of spreadsheet? There are numerous means by which such lists can be developed, but how confident are we in the data. Think about this day and time, and where such information can reside: our ERP system, RCM system, CMMS, etc. How many of us really believe that such systems accurately reflect the field activity, especially when it comes to the recording of every chronic event?
It has been my experience that when a chronic event occurs, from the perception of the person tasked to fix the undesirable event, it takes more time to input the information into the recording system that it does to fix the problem. Usually a negative connotation of the information system is involved and it is deemed too cumbersome, so we will just fix the problem and be on our way. After all, that is what we are pressured to do – fix it and get production going again.
While we can get some information from such on-line monitoring systems, we must recognize that they are not all inclusive at this time. Only the people closest to the work will truly have the knowledge of the most chronic events. It is in their heads, not on paper!
Typically most information systems are labeled and advertised as asset management systems. So failures that affect the asset are typically what are recorded. However, what may not be recorded are events that produce off-spec product where no mechanical failure occurs, time delays as a result of a crane not showing up on time during a shutdown, time delays due to the wrong parts delivered to the site, or late deliveries to customers.
How do such asset management systems handle these events? Where is it recorded that such occurrences are undesirable and how are proactive recommendations from RCAs processed in a timely fashion?
If we conclude in our RCA that procedures are obsolete, specifications are incorrect, or that people were not trained properly to perform a task, how are these situations handled in the asset management system? These questions are food for thought when we consider how well our current environment supports the task of root cause analysis.
We can be the greatest failure analysts on the planet, but if we are working on the wrong events and our environment does not support the proactive activity, then we are likely to become frustrated ourselves and fall into the paradigm that “if management does not care, then why should I”? Once this attitude sets in, complacency with a reactive culture is the norm and overall profitability suffers.
What we need to do today is make management aware through education and awareness that our cultures live with these chronic events that typically end up costing 100 times more than the occasional sporadic event. Unfortunately, the sporadic events get all the attention. When our cultures are enlightened, we will begin to enjoy the fruits of our efforts in the form of return on investment (ROI) figures as high as 7000-8000 percent. Then the believers will come.