© NicoElNino/Shutterstock.com

When conducting process improvement, Root Cause Analysis is a physical, human-centric exercise. It involves stakeholders standing around a whiteboard, drawing Ishikawa diagrams, and asking the Five Whys until a human or mechanical point of failure is found. However, the nature of work has shifted as more industries are embracing automated workflows. Processes aren’t a string of human actions. They’ve been replaced by intricate workflows powered by software stacks, algorithmic decision-making, and headless processes that run with no human intervention.

As such, when an automated workflow fails, the root cause isn’t as simple as missed step in the work. There is a deeper underlying issue of conflicting logic, API timeouts, or a mismatch in data integrity between multiple systems. To maintain efficiency, our approach to Root Cause Analysis must change. We can’t just investigate human mistakes, but learn how to audit the system logic and data flow for any process.

The Anatomy of an Automated Defect

customer relationship management crm

For automated workflows, a defect isn’t a broken product, but often the result of bad data or process stagnation. Since automation executes at speeds people can’t match, a single root cause generates potentially thousands of defects in seconds. This results in automated waste.

Traditional root cause analysis often looks for a specific point of failure. In automated workflows, we need to look for the point of divergence for any process. This is the point where automated logic encounters scenarios it wasn’t designed to handle, leading to outputs that are technically successful, at least according to the code, but add zero value for your customers.

Moving Beyond the Five Whys to Five Logics

The Five Whys are a staple of Lean, but in automated workflows, the Why can be often be vague. If your automated billing system fails, asking Why? might lead to something like a server time out or lost packets. This doesn’t prevent future occurrences. You need to shift your line of thinking to use the Five Logics for any automated process, which are as follows:

  1. Input Logic: Was the data entering the automation sanitized?
  2. Trigger Logic: Did the process start under the correct conditions?
  3. Transformation Logic: Did the code or algorithm change the data in a way that introduces errors.
  4. Integration Logic: Did the handshake between interconnected systems fail during transfer?
  5. Output Logic: Did the final result meet the Definition of Done for the next step in the value stream?

By shifting from Why to mapping out the logic of your automated workflows, we can start pinpointing exactly which part of the sequences needs a Poka-Yoke, or error-proofing, adjustment.

Digital Gemba Walks

In Lean, Gemba is where your value is created. Gemba walks are often conducted on factory floors, seeing the work as it is being done by front-line employees. Naturally, this doesn’t hold up for automated workflows, and it requires a bit of an adjustment to properly conduct a digital Gemba. You’re often going to start by looking at the log files generated by your automated processes.

Any digital Gemba walk starts with a technical review of logs, seeing the flow of the process from start to finish. Effective root cause analysis needs leaders who understand the flow of the logic. This includes things like noticing latency spikes, or a digital equivalent of waiting, as it occurs. Soft failures like error codes might indicate the system is self-correcting, but slowing down overall cycle time.

Finally, if the automation is performing redundant loops or unnecessary steps that come about from legacy logic that was never pruned. Understanding these items is crucial for conducting an effective digital Gemba for any automated workflow and helps to identify where waste is being generated in your processes.

Visualizing the Invisible Value Stream

Automated workflows are generally invisible, at least in the sense that when waste is produced it is next to impossible to immediately see. To perform a successful remote cause analysis, you need to shine some light on the workflow through Dynamic Value Stream Mapping.

Unlike a static Value Stream Map, a dynamic map makes use of real-time data to show the flow of work through various automated stages. When a defect occurs, the map highlights exactly where the bottleneck is. This allows for more precision when conducting a root cause analysis, as teams focus only on specific areas that are underperforming, rather than conducting audits of an entire end-to-end system.

Visualizing the flow also helps identify areas suffering from over-automation. Sometimes the root cause of a slow process is simply due to something being automated that should likely have been eliminated entirely. In Lean, there is no greater waste than automating a process that adds zero value.

Error-Proofing the Algorithm

Poka-Yoke in physical processes might be something like a guard rail or jig that prevents a part from being placed in the wrong way. For automated workflows, Poka-Yoke takes the form of validation gates.

Root cause analysis often shows systems fail because they assumed the data received was correct. Error-proofing your automated workflows means building automated checks at every hand-off points. As such, if the input doesn’t match the required format, the process stops. If response times take longer than a specific timeframe, often measured in milliseconds, the system triggers an alert rather continuing with potentially stale data.

By implementing digital guardrails, you’re making sure that the root cause of a problem is caught when it occurs. This leads to less automated waste, with none of it entering the downstream as your process churns along.

The Human-in-the-Loop Audit

Root cause analysis for automated workflows is inevitably going to lead back to a person. Automation only does what it’s told, and if the automation produces waste, the root cause is often the result of a misunderstanding of the design of the workflow.

This means you need to investigate feedback loops between the people using the workflow’s output and the people who built the automation. You might find that the requirements for the output have changed, but the automation was never updated to reflect them. Alternatively, you might find the problem is a lack of clear definition when it comes to standardization for the outputs of the automation and how the next step in your workflow interacts with it.

Often, a system error is a human workaround that broke the operational logic. Governance for this requires defining the standardization just as rigorously and clearly as the automation itself.

Utilizing AI for Predictive RCA

Automation isn’t solely being governed by human hands at the time of this writing, with the use of artificial intelligence being used for predictive root cause analysis. For the best results, we often feed historical process data into bespoke machine learning models to identify patterns that signal a potential failure.

If your data shows that a slight increase in latency produces a data defect, your AI model can alert the team before the defect occurs. This shifts the culture from fixing defects to preventing them entirely, which is ultimately the goal for any Lean organization.

Conclusion

Root cause analysis in the age of automated workflows requires a blend of traditional Lean principles and a fair degree of tech literacy. We aren’t abandoning the core principles of Lean, as putting people first, eliminating waste, and continuously improving processes still take place, but the environment has changed.

Organizations that are able to master root cause analysis in the digital age gain a significant competitive advantage. They can scale processes without scaling their waste. System errors no longer are a hitch in a process, but an opportunity to conduct a Kaizen to refine the logic and move closer to a state of perfection. That said, it does take a degree of investment for leadership, as being able to read logs, identify points of failure, and dynamically map workflows requires some technical savvy.

About the Author