
© ESB Professional/Shutterstock.com
Key Points
- FMEA is a great fit for analyzing any IT projects you might have in progress.
- When combined with your organization’s standard operational procedure, you’ve got a powerful means of assessing and mitigating risk.
- This approach shines when considering just how complex some organizational needs are when it comes to technological infrastructure, so isolating and remediating problems becomes paramount.
Are you using FMEA for your IT projects? There is no shortage of viable frameworks for risk management in enterprise technology applications. Ask a dozen technicians what they use in their shops, and you’ll likely get a dozen different answers. I’ve understood this since my time in the tech industry, as each deployment of a technology stack has a different means of handling and controlling risk.
However, if you’re looking for something a little more in the box, FMEA is a great fit for assessing risk and detecting it early on. We’ll take some of my technical know-how and detail how you can utilize the mechanisms behind FMEA to make the most of your IT projects, as well as how to quickly remediate the issues that do arise.
Why FMEA Works

FMEA, or Failure Mode and Effects Analysis, is a means of assessing risk in a Six Sigma project, but is also a great way of handling risk mitigation for any project. This is done through assigning numerical values to higher risk factors, and determining which of those elements is most likely to fail. When properly implemented FMEA isn’t necessarily a means of avoiding failure, but rather understanding the potential pain points.
FMEA is only part of the equation when it comes to assessing and managing risk, however. When it comes to risk mitigation in a sector like technology, time is frequently a costly resource to consume. As such, you’ll want to leverage this technique alongside other proven measures.
Why FMEA works comes down to using data to logically deduce where pain points are along the process workflow in a Six Sigma project. The same sort of logical thinking applies across the board to just about any industry you can imagine, making for a flexible and effective means of managing risk.
Understanding How FMEA Functions
At the core of FMEA‘s functionality is the RPN, or risk priority number. This is a numeric value derived from three main criteria, which can be assigned as you need across various components in a workflow. Severity, the first criterion, looks at how serious the effects of a component going offline are. These are your big impact items, as it were. Occurrence is the frequency of failure, whether it’s something like calibration slipping or catastrophic component destruction.
The final criterion is detection, which determines what controls you have in place to detect and report a failure. When looking at these criteria, things might seem a bit murky if you’re new to FMEA. However, the thing to understand is the more impact an item has, the higher the severity rating. The same applies to occurrence and detection, with more occurrences and elements being undetected driving up their ratings.
When you have these rankings, you can determine your RPN. Your RPN is calculated simply by multiplying your Severity, Occurrence, and Detection scores together. Each item in a workflow will have an RPN if it is mission-critical to the function of a workflow. You can focus your efforts on the highest-scoring items first and work your way down.
FMEA for Risk Mitigation in IT Projects

©Ground Picture/Shutterstock.com
Now that we’ve done a base-level introduction to the concepts behind FMEA, how does this apply to IT projects? Any technological process is going to have multiple failure points. If I had the time and moxie, I could regale you with horror stories from the frontlines of the early deployment days of Windows Server 2012. Another time, another place, as it were.
That said, FMEA is a great means of taking stock of your technological inventory and determining potential points. Further, you can start calculating your RPNs for various components to determine which elements are going to need immediate attention in the event of a systems failure or compromise.
We often talk about the importance of data across the board when it comes to things like Six Sigma projects, but the technology solutions that store that very data are integral to regular operations. As such, understanding and developing strategies around failure points is going to be your best bet for maintaining business continuity in the long term.
Resource Assessment
Consider the various machines, accessories, and hardware at your disposal in any modern office space. Each desk likely has a computer, which receives network transmissions from a structured cabling deployment leading to a server room. Every single one of those elements is a failure point to consider. Cabling getting damaged runs the risk of downtime for workstations.
Workstations have various internal components like memory modules and storage which can fail, depending on age and other circumstances. Servers are just as fickle as workstations, and when those go offline, you feel the squeeze.
With this in mind, this is a key opportunity to sit down and take stock of every piece of technology that is vital for your operations. This is a common practice for any seasoned tech professional, and documentation will be vital for assessing the risk factors for these components.
Isolating the Causes of Risks
In tech, we often look less at quick fixes and more at isolation and remediation. Lasting solutions should be enacted, documented, and observed for any further concerns. Determining failure points can be done in a few different ways while you’re preparing to calculate your RPN.
As with any IT project, certain components in an enterprise environment are going to have a bigger impact than others. A central database server is going to be more vital to maintain than something like a VoIP phone, for example.
Now, all technology serves as a failure point when considering the integrity of operations, but some are more important than others. This is a great time to mark the severity of each piece of equipment in your inventory before moving to the next few criteria.
Understanding and Reducing the Effect of Failure
What are the most critical parts of your setup? At this point, hopefully, you’ve identified what is key to operational success and business continuity. You can ascertain the occurrence of the failure points, which ideally shouldn’t be frequent. That said, IT projects and Murphy’s Law go together like peanut butter and jelly.
My next piece of advice isn’t necessarily reliant upon FMEA, but it is a good idea nonetheless. When you’ve found these pain points, take the time to look over the logs, minidumps, and so forth. Computers, servers, and other pieces of network equipment have a wonderful way of documenting when they fail.
With these data points, you can at least see if it is an issue of misconfiguration, user error, or something more pressing to consider.
Detecting Risk Early On

Monitoring systems for IT projects are fairly robust. At the very worst, you can always monitor network traffic to determine whether a failure is occurring. However, for more severe issues like ransomware or malware, abnormal network transmissions aren’t a given.
As such, it becomes important for any IT project to have a normal staff who has a passing familiarity with reporting any incidents as they occur. Technical staff is going to recognize and detect points of failure, but ultimately, detection methods are going to come down to a combination of monitoring and employee training.
Combining Strategies with Trusted Methods
FMEA isn’t going to remediate the failure of any component in IT projects. That’ll come down to whatever operational procedures your organization has developed. If you’re at the point where you don’t have a standard operational procedure, I’ll give a general framework.
At this point, you should have RPNs determined for all the elements in your inventory. This is where you start putting the rubber to the road for lack of a better expression. Combining FMEA alongside other IT SOPs makes for a powerful combination, however. Lower RPN items are readily handled by junior staff.
Higher RPN elements are where the more seasoned and technically skilled members of your tech department will want to focus. This isn’t dissimilar to how most tech departments operate, so your method of approach is likely to differ based on departmental policies.
Remediation and Recovery
This final step doesn’t have a foundation necessarily in FMEA, and that’s alright. IT projects and operations aren’t concerned with solely mitigating risk, but rather remediating issues with documented solutions. Remediation takes a few different forms, depending on the issue, and what piece of equipment it is affecting.
I will say that developing a few standard practices centered around good-quality backups is crucial. You never know when a hard drive is going to fail. You also never know if Mike from accounting is going to click a sketchy PDF file in an email despite numerous warnings and training classes not to do so.
When the issue itself is remediated, document it. This gives an accurate post-mortem for your tech personnel as to what has happened, the severity of said incident, and the steps taken to rectify it.
Other Useful Tools and Concepts
Looking for some other tips and tricks to go with your morning coffee? You might want to take a closer look at how Lean Six Sigma can be applied to education to increase the quality of schooling and ultimately lead to happier students.
Additionally, you might want to take a closer look at how Total Quality Management is adapting to the digital age. New technology like artificial intelligence is affecting frameworks and methodologies in ways you might not expect, and our guide covers everything you need to know.
Conclusion
FMEA is one of the logical fits for IT projects. Any IT department is already walking a knife’s edge when it comes to stability and reliability, given how fickle most enterprise computers can be at the worst of times. Taking the steps and assessing the risk factors associated with every piece of inventory in your equipment catalog is going to save headaches in the long run.
The image featured at the top of this post is ©ESB Professional/Shutterstock.com.