Root Cause Analysis - Chronic Events: Panning
back to root cause analysis main page
Robert J. Latino, Reliability
When we look at the widely used and misunderstood tool of
root cause analysis (RCA), we should reflect its interpretation
in our own environments. Think about it: when is RCA typically
requested and applied in your environment? Based on my experience,
it is requested and applied when:
- Someone is injured
- There is a catastrophic damage
- There is an environmental incident
- There is a "near miss"
- There is public scrutiny over an issue at the site
- There is a quality issue that a customer is complaining
What do all of these issues have in common? They are high
visibility events that require immediate action at the request
of authority. Usually in these circumstances, resources, time,
and money are not an issue because of the level of management
that is requesting the analyses be done. While Root Cause Analysis
needs to be, and will be, done under these circumstances, it
the optimal use of such a disciplined methodology.
Utilizing a modified version of failure modes and effects
analysis (FMEA), consider a one-time fire at a facility that
results in $500,000 worth of damages. Costs such as these are
unanticipated and not part of the budget, yet we almost always
find the cash to recover. The accountants typically will use
creative techniques to soften the blow such as amortizing the
cost of the event over a 20-year period. The resulting impact
would be viewed as $25,000/yr which is much more acceptable.
Using the modified FMEA format, such a line item might look
like the top item in the accompanying section "Comparative
Impact of Failure Events".
Now consider a chronic event such as conveyor belts that
trip in a mining operation. On their individual impact they
may take 15 minutes to reset. This 15-min period requires the
attention of a person, which at a typical standard rate ($40/hr
with benefits included) results in a cost per event of $10
(0.25 hr x $40/hr labor rate).
Because the event simply requires a person to find and reset
the tripped conveyor system, generally no additional parts
costs are involved. However, the 15-min delay causes a production
loss upstream in the processing area, which equates to $5000/hr.
Fifteen minutes now is worth $1250/occurrence (0.25 hr x $5000/hr
production loss). So each 15-min occurrence is now worth $1260
($10 labor + $1250 lost production). Still considered a relatively
low impact, right?
Now consider on this particular conveying system, we experience
40 such stoppages a week or 2080 for the year. Now we are looking
at an annual impact to the bottom line of $2,620,800 ($1260/occurrence
x 2080 occurrences).
The chronic event is approximately 100 times more costly,
yet which event gets the most attention - the one-time fire
or the continual tripping of a conveyor system? We all know
the answer; the fire gets the attention because it is highly
visible and requires urgent response. The chronic event has
been accepted as a cost of doing business and is considered
part of the job. Herein lies the problem. Chronic events are
never aggregated on an annual basis. They are typically viewed
on their individual impacts.
Consider if we were to apply this modified FMEA format to
an operation, a process, or a facility. We would seek out these
hidden "nuggets" and determine their annual impact in dollars.
This would tell us what the "carrot" was, and whether or not
they were worth conducting a formal RCA on, experience shows
through the Pareto Principle, that when such a list is aggregated,
the 20 percent or less of the events identified account for
80 percent or more of the dollars lost. This is a good technique
to provide focus for a disciplined RCA effort.
So where does the data come from to populate this type of
spreadsheet? There are numerous means by which such lists can
be developed, but how confident are we in the data. Think about
this day and time, and where such information can reside: our
ERP system, RCM system, CMMS, etc. How many of us really believe
that such systems accurately reflect the field activity, especially
when it comes to the recording of every chronic event?
It has been my experience that when a chronic event occurs,
from the perception of the person tasked to fix the undesirable
event, it takes more time to input the information into the
recording system that it does to fix the problem. Usually a
negative connotation of the information system is involved
and it is deemed too cumbersome, so we will just fix the problem
and be on our way. After all, that is what we are pressured
to do - fix it and get production going again.
While we can get some information from such on-line monitoring
systems, we must recognize that they are not all inclusive
at this time. Only the people closest to the work will truly
have the knowledge of the most chronic events. It is in their
heads, not on paper!
Typically most information systems are labeled and advertised
as asset management systems. So failures that affect the asset
are typically what are recorded. However, what may not be recorded
are events that produce off-spec product where no mechanical
failure occurs, time delays as a result of a crane not showing
up on time during a shutdown, time delays due to the wrong
parts delivered to the site, or late deliveries to customers.
How do such asset management systems handle these events?
Where is it recorded that such occurrences are undesirable
and how are proactive recommendations from RCAs processed in
a timely fashion?
If we conclude in our RCA that procedures are obsolete, specifications
are incorrect, or that people were not trained properly to
perform a task, how are these situations handled in the asset
management system? These questions are food for thought when
we consider how well our current environment supports the task
of root cause analysis.
We can be the greatest failure analysts on the planet, but
if we are working on the wrong events and our environment does
not support the proactive activity, then we are likely to become
frustrated ourselves and fall into the paradigm that "if management
does not care, then why should I"? Once this attitude sets
in, complacency with a reactive culture is the norm and overall
What we need to do today is make management aware through
education and awareness that our cultures live with these chronic
events that typically end up costing 100 times more than the
occasional sporadic event. Unfortunately, the sporadic events
get all the attention. When our cultures are enlightened, we
will begin to enjoy the fruits of our efforts in the form of
return on investment (ROI) figures as high as 7000-8000 percent.
Then the believers will come.