Getting to the root

5 mins read

There are many tools that can be utilised to improve performance of plant, one such methodology is Root Cause Analysis. But as Mark Venables discovers it is not as widely used as many believe.

Root Cause Analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problem. “It is a powerful improvement tool and PEME uses this technique to prevent re-occurring equipment and process failures,” Wayne Pheasey, associate director at PEME explains. “If you do not tackle the root causes of failures and incidents they will continue to occur with a negative impact on productivity or safety.”

For an effective reliability or maintenance improvement programme a robust RCA approach is an essential element. Often, equipment will fail, but the true root cause can be due to other factors such as human actions or errors, requiring an alternative solution to just replacing the failed component, which is the typical action that a maintenance engineer will take. The problem with only dealing with the symptoms of failure is that they are likely to re-occur.

“It is the ‘behaviour’ of maintainers in considering lower-level root causes and dealing with them that is desirable and provides real value-add benefits,” Pheasey adds. “The problem with many RCA solutions however, is that they are complex, data hungry, time consuming and often are the tool used by quality personnel to address client concerns rather than engineering failures. For this reason, these more formal ‘Full’ RCA tools are often not used at the front-line by maintenance engineers. To address this PEME has introduced two RCA approaches (see sidebar – Two approaches to RCA).

Improvement journey

“Anyone who wants to go on an improvement journey, really, needs to be using root cause analysis,” Paul Deighton, global segment manager pulp and paper at SKF UK, explains. “That's both operations and maintenance teams. You don't have to have a failure to conduct root cause analysis. You could be looking at a near miss or a quality issue, or anything, really.


“It's about understanding what the problem is, containing and analysing that problem, defining the root cause, and then define and implement an action plan that's going to eliminate that root cause. Then the most important step, which most people forget, which is the validation stage, at some point in the future, as to whether that action plan has eliminated the root cause, or if the cause is still there. So you didn't find the root cause; you found a false cause.”


In that scenario it is no different from a continuous improvement loop, the process itself, identify that you've got a problem, identify what that problem is, analyse it, define the root cause, define the action plan, and then check that the action plan is actually working. But as Lawton points out sometimes it is the minutiae of RCA that is a hurdle. “Sometimes people find that hard,” he admits. “Organisations, if they've not been using it before, it can be hard to implement an effective root cause programme.


“Often, it requires a fundamental shift in attitude and mind-set of the workers, the people who are involved in the process, especially established organisations. You can get stages where people think that they know what the problem is, and they don't need to go through any kind of analysis process, because they've worked here for so many years and they know what the problem is.


“Sometimes, when you challenge people to actually sit down and investigate a failure in a structured manner, that can actually cause problems. Organisations are sometimes bad at using root cause in the wrong way, so they find people to blame, and that's not what root cause is about.

“It's a very powerful tool that every maintenance organisation should be using routinely. SKF have an assessment process where we assess companies against best practice models. We have a couple of questions in there around the use of root cause analysis, and our results show that less than half of companies routinely use root cause analysis to investigate problems.”

But as Deighton alluded you don't have to have a failure for RCA to be an effective improvement tool. “Sometimes Root Cause Analysis is called Root Cause Failure Analysis,” he continues. “If it is Root Cause Failure Analysis, then yes, the failure is the thing you're investigating. People also talk about failure analysis. Failure analysis isn't root cause; failure analysis is analysing a failure. So a machine has failed; why did it fail? It is not necessarily part of the failure investigation.


“Failure investigation is what happened, when did it happen, and what was the impact of that happening? Very often, those are the key elements that people focus in on in failure investigation. Root cause failure investigation is to try to determine the root cause of the failure, but RCA could be used for a quality issue on a production line. It could be used for a health and safety incident or a near miss.”

In reality most organisations that use root cause failure analysis use a relatively simple tool such as ‘Five Whys’, which is just asking why five times. “It's sometimes described by the aficionados as quick and dirty, and that's what it does; it just simply asks why, why, why, why, why,” Deighton says “It can be limiting though, because you are almost relying on your own knowledge. So every time you ask why, you're looking at your own knowledge or the knowledge of the team that's there. So if you ask on the fourth why, for example, why, and then everyone's knowledge has run out, the process can stall at that stage.”

Wrong root

But not everyone is a big fan of RCA. Dennis McCarthy of DAK Consulting, a manufacturing improvement consultancy, is one expert who feels that the value of RCA is often overstated. “The thing about RCA is that the desired outcome is to prevent the problem from occurring again. It is very easy to get caught up in chasing for the root cause when in reality in all likelihood there will be multiple causes. For example, did the plane crash because the runway was wet, the pilot was tired or the tyres were worn. No single cause.

“In seeking out a countermeasure, consider the fire service advice on preventing fires. A fire needs fuel, head and oxygen. Remove one of these and a fire will be prevented. The equivalent for manufacturing organisations the three generic countermeasures. To prevent a failure, you need three things: a standard (when this happens do that); a formal best practice for the activity (start up, steady state, close down) that is easy to do right, difficult to do wrong and simple to learn; and process control cause/effect limits.

“When all three are in place, 90 per cent of problems do not occur. To put it another way these are three generic root causes of nine out of ten problems.

“In addition, RCA tends to be focussed on failures but in today’s challenging environment we need to optimise equipment by increasing time between intervention and reducing defects. Advanced RCA processes consider this also.”

McCarthy believes that RCA is one of these tools that is often quoted or referenced, but actually, very few people use because it’s not that useful. “There’s lots of methods out there, but the reality is that most of the reasons why things don’t work the way they should do are down to finger trouble. That’s often down to there not being a sort of standard way of doing things.

“I’ve worked in lot of companies where they all talk about it, but nobody really uses it. When things go wrong it’s fixed. Even in the better companies, people tend to leave it and then go on to the next thing. For example, if a bearing has failed, it’s due to lack of lubrication or it was worn and it wasn’t replaced when it should have been.

“If we take the latter example: it was worn, and nobody noticed, so it failed without warning, the problem is that there wasn’t a standard for when it was showing signs of failure such as it was getting a bit noisy. If you replace the bearing, the problem is going to happen again without warning at some point in the future, because you haven’t actually defined what it is you need to look for. The problem is the fact that there isn’t a standard to say when the bearing should have been replaced.”