It will probably come as little surprise that something called failure modes and effects analysis (FMEA) evolved at the National Aeronautics and Space Administration, an environment where the interest in preventing failures is extremely high. FMEA was later popularized by the automobile industry and in recent years has become more widespread among Six Sigma practitioners. It is now seen routinely even in transactional functions.

### What Is FMEA?

FMEA is a system for analyzing the design of a product or service system to identify potential failures, then taking steps to counteract or at least minimize the risks from those failures.

The FMEA process begins by identifying “failure modes,” the ways in which a product, service or process could fail. A project team examines every element of a service, starting from the inputs and working through to the output delivered to the customer. At each step, the team asks “what could go wrong here?”

Here are a few simple examples of failure modes related to the process of providing hot coffee at a truck stop:

• One of the inputs to that process is a “clean coffee pot.” What could go wrong? Perhaps the water in the dishwasher is not hot enough, so the coffee pot is not really clean.
• The first step in the process is to fill the brewing machine with water. What could go wrong? Perhaps the water is not the right temperature or the staff puts in too much or too little.
• An output from the process is a hot cup of coffee delivered to the customer. What could go wrong? The coffee could get too cool before it is delivered.

Of course, all failures are not the same. Being served a cup of coffee that is just hot water is much worse than being served a cup that is just a bit too cool. A key element of FMEA is analyzing three characteristics of failures:

1. How severe they are
2. How often they occur
3. How likely it is that they will be noticed when they occur

Typically, the project team scores each failure mode on a scale of 1 to 10 or 1 to 5 in each of these three areas, then calculates a Risk Priority Number (RPN):

RPN = (severity) x (frequency of occurrence) x (likelihood of detection)

The idea is to focus improvement efforts on the failures that have the biggest impact on customers. The highest scoring failure modes are those that happen a lot, that are bad when they happen, and/or that are unlikely to be detected. Difficult-to-detect errors obviously are more likely to get through to customers.

The team then completes the FMEA analysis for the highest-scoring failure modes and for any that get the highest severity scores, even if they do not score that high overall. Obviously, a business wants to make sure any possible disaster is prevented, even if it is unlikely to occur. “Completing the analysis” means looking at the potential causes for the error mode, identifying ways to detect the problem, developing recommended actions, and assigning responsibility for monitoring the process and taking action when warranted.

In the truck stop scenario, for example, here is a complete set of FMEA data for just one failure mode:

• Process step: Fill coffee pot with water
• Potential failure mode: Wrong amount of water
• Effect of the failure: Coffee too strong or too weak
• Severity score: 8
• Potential cause: Faded level marks on pot
• Frequency score: 4
• Current method of controlling the failure: Visual inspection
• Likelihood of detecting: 4
• RPN = 8 x 4 x 4 = 128
• Recommended action: Replace coffeepot
• Person responsible: Mel

### FMEA Case Study

A project was done in a transaction processing department to reduce process cycle time and reduce defects. In the Analyze phase of DMAIC (Define, Measure, Analyze, Improve, Control), the team identified several root causes, including duplication of efforts, excessive hand-offs and significant non-value-added work activities (such as filling out forms for other departments). In the Improve phase the team identified changes it could make that would remove non-value-added work to reduce hand-offs, and it developed ideas for several job aides to reduce defects. Before the team launched the pilot of these solutions, the Black Belt decided it would be worthwhile to have the team complete a FMEA on the redesigned procedures.

During the FMEA analysis, the team realized that one of the steps it wanted to remove from the processÂ â€“ which it had defined as non-value-addedÂ â€“ was actually an important input to the finance department downstream in the process. (The finance department played a support role in making sure certain controls were in place to limit the risk the company faced from processing the transactions.) Implementing the new process as designed by the team would have seriously disrupted the finance group and the company could have been exposed to significant risk. By exposing this potential problem before launch, the team was able to adjust its solution so that the information was still provided to the finance department, though in a more streamlined manner, and still meet the original project goals.

### Practical Uses for FMEA

The case study shows that FMEA is a good idea whenever changes to the workplace are planned. More generally, it can be used at either end of a DMAIC or Design for Six Sigma project:

• At the beginning of a project, FMEA can help a team better scope the opportunity by defining the types of failures and narrowing the focus to a specific type of problem.
• In the Improve phase, FMEA can help uncover potential problems (especially unintended consequences) with suggested solutions, thereby allowing timely adjustments.
• In the Control phase, FMEA helps identify what measures need to be in place to make sure that failures will not happen in the future.

Project teams trying FMEA for the first time are advised to keep it simple. Think of it as structured brainstormingÂ â€“ a technique to get teams thinking about potential failures it has not thought of before. Bring together people from different work areas and disciplines. FMEA works best in a team environment with cross-functional representation. Subject matter expertise is critical.