Six Sigma for Network Operations: Reducing Vulnerabilities and Enhancing Uptime

Key Points

Identify critical-to-quality metrics early into your Six Sigma implementation.
Keep a clear scope and endpoint in mind for any projects.
Audit regularly to maintain fixes and guarantee business continuity for the future.

While it might seem rather unusual at first glance, the principles behind Six Sigma make for an ideal solution in most network operations. When you consider some of the core tenets running behind Six Sigma, these are quite similar to some of the best practices in networking. That said, you can take things further.

While Six Sigma is often an organizational shift from the top down, it rarely touches things like IT infrastructure. Rather than sticking with completely separate approaches, however, it is certainly worth taking a closer look at some of the ways Six Sigma can completely transform your network operations. We’ll go over some of the adaptations you need to make, along with where you can integrate teachings to maximize uptime, keep your users happy, and maintain business continuity in worst-case scenarios.

Understanding CTQ Metrics for Networking

excel — ©Kaspars Grinvalds/Shutterstock.com

Before establishing any sort of Six Sigma project for your network operations, it is important to gather the right sort of metrics ahead of time. Key metrics to look at for future analysis and improvements can include:

Network uptime
Latency
Mean Time to Repair
Mean Time Between Failures
Incident resolution time
Network provisioning time
Security incident response time

These are core metrics for any network administrator, and a key area to look at as you start to integrate Six Sigma teachings and tools into your IT operations. At the start, you’ll simply be gathering these numbers up. However, establishing these metrics early on gives you areas of focus for later improvement and process overhauls.

Collect Data on Current Network Performance

Before you start any sort of improvements, you need to establish a baseline. With the metrics gathered in the previous step, you should have a solid birds-eye view of what is happening on your network, along with how quickly your staff respond to incidents as they arise. Don’t fret about the meaning behind the numbers now. Simply do your due diligence and gather up data points where you can.

Ideally, you’ll be returning to establish a new baseline after every successive year of improvements. That said, early on, you want to have a solid data set to gather as you go into the next few steps.

Quantify Issues

Issues are going to be present in any network operations. That’s just part of life. While we set off with our best foot forward, we often can fall victim to antiquated practices, inefficiencies, and waste. I’ve been in plenty of network closets where the cable management is a nightmare, and the time to respond to incidents is less-than-ideal. These serve as great starting points for any improvement process, however.

Quantify these issues as best you can. Incident resolution time is a hard number, along with other metrics established in previous steps, like latency, mean time between repairs, and so forth.

From here, you want to identify the most pressing problems at the root of your woes. If you’re experiencing frequent outages for hours at a time, that is a clear problem that needs focus. Other areas of focus might be simple things like dropped connections, failed DNS resolution, and the usual pain points that come with running a computer network.

After establishing your core focus, you’ll want to start honing in on the scope of the project. Taking on too much in a pilot project is going to lead to failure. A clearly defined scope with specified end goal is going to yield far better results int he long run.

Identify the Root Cause of Network Woes

There are plenty of areas where things can go wrong, but identifying the underlying issues causing those problems is a trickier prospect. Now, you could root around in the dark, run traces, and stare at the logs for hours on end. I’ve been there, as I’m sure many of you have. However, that isn’t going to serve your project.

Instead, you’ll want to engage in a bit of root-cause analysis. Thankfully, there are some extremely helpful tools for getting to the bottom of a problem with Six Sigma.

Visualize The Problem

Returning to those tools, there is more than one way to get to the bottom of any network problem in Six Sigma. Process mapping can give a comprehensive and holistic view of the systems you have in place, giving a far greater view than simple network topology maps and the like.

You’ll want to look at you’re provisioning bandwidth, as that’s a single process, along with other vital processes in network operations like incident resolution. When you’ve mapped out the whole of your systems and their respective processes, this is where you start to apply a bit of data-driven thinking.

Bottlenecks are going to be present in any system. You might also have redundancy, unnecessary handoffs, and other non-value-added activities that are hindering the overall performance of your network operations.

FMEA

While not wholly necessary just yet, you might want to start looking at approaches like FMEA for handling issues. When you’re after proactive problem solving before the rubber hits the road, Failure Mode and Effects Analysis can be a lifesaver.

This approach reduces rework, freeing up personnel to handle the work needed to keep things rolling. It also provides a wealth of documentation of the risk and defect reduction remediations taken. It is a time-consuming process, so it isn’t an instant fix. However, if you’re looking to integrate Six Sigma into your network operations from the start, this can be a wonderful means of doing so.

If you’re designing network processes, take the time to address risk and defects ahead of time. It can be a boon for any network operations. Do be warned that it takes considerable time and practice to develop an effective game plan with FMEA at the heart of it, however.

Developing Fixes

Now, you’ve gathered up the data, you’ve rooted out the cause of your issues, so what comes next? This is where you start to implement fixes and remediations for whatever shortcomings your network operations are facing. There are a few different methods we’re going to cover, some more drastic than others.

Some of these are optional methods of approaching an overhaul, and can be done as needed, given budget and time constraints.

Process Improvement and Redesign

The most fundamental changes you can make are going to be a complete redesign and improvement of your current network processes. An example would be a ticketing system for support requests from personnel. Multiple hand-offs with no resolution are only going to breed resentment towards the IT department.

The obvious answer here is to streamline things. A support ticket should be entered into the system and receive proper escalation to a technician who knows how to remediate the problem. This can take up a fair amount of time, but you’re saving a bundle in both resources and manpower when properly implemented.

This is just one example of a process improvement. Other ways you could approach it would be scheduled maintenance in off-hours to minimize downtime, regular preventive measures to preserve uptime, and streamlining network change management to provision the right credentials and access for new machines on the network.

Technology Upgrades

While drastic, upgrades can greatly take the burden off your technicians. Software and hardware alike are areas of focus, with robust switches, routers, and workstations needed to maintain business continuity. We can’t necessarily help working with legacy systems in the workplace, but that doesn’t mean your network closet has to stay in the dark age.

If there is room in the budget, this is a fantastic means of addressing performance issues. Older tech doesn’t perform to modern standards, and if you’re running a router that’s over a decade old, that’s an area of concern. Likewise, this also applies to the software and operating systems in use by workstations around the business campus.

While older systems mean you don’t have to address skill gaps, they also present inherent security risks alongside reduced performance. If necessary, you might want to build a strong business case for a complete overhaul of the IT department.

Standardization

In my time in tech, there was no shortage of hacky workarounds and solutions to address textbook issues. Standardization of configurations, procedures, and processes means you have a uniform means of working. Any changes that need to be made to the network will have proven, best practices in place. Standardization can be a nightmare to implement, especially in a mixed hardware office environment.

That said, it is worth its weight in gold if you’re looking to bypass the headaches that come with maintaining business continuity and network operations. Non-standard practices are a point of failure, and reducing the inherent risk is only going to improve your network processes as time goes on.

Training

Tech professionals should be used to ongoing training. After all, you’re sitting for certifications regularly to make sure you’re keeping current with network standards, practices, and technology. Taking the time now to get everyone up to speed on future network processes makes sure your team is on the same page.

Sustaining Improvements

With fixes in place, it’s time to steady the ship. There are a few ways of maintaining and controlling the improvements made. Ideally, you’ll be implementing all of these into your regular network operations.

Monitoring

Regular monitoring is just part of the job when running any sort of computer network. While taking a look at the logs and traffic, you’ll also want to keep an eye on the critical-to-quality metrics established in the first section. For resolution and incident response times, these numbers should be trending downwards if the right fixes have been implemented.

Control Charts

The use of statistical process control, or SPC, is a key function in maintaining the course in any Six Sigma project. Establishing control charts with clearly defined upper and lower limits will give a fairly decent indication of how your network is performing and whether the fixes have stuck.

Documentation

Documentation goes without saying, and ideally, you’ve been documenting all changes made thus far. You’ll also want to drum up new documentation for freshly implemented processes, standards, and other practices. When onboarding new employees, this will be a vital resource.

Audits

At the bare minimum, you’ll want to conduct a network audit once a year. Ideally, you should be doing so twice a year. Given the scope and span of any audits, however, you can get away with an annual check for the time being. This helps to review that fixes have stuck and can illuminate if things are running out of control.

Other Useful Tools and Concepts

In the mood for a little more learning? You might want to take a closer look at the hidden costs of neglecting process improvement. We’ve spent most of today talking about the importance of it, but you’re seriously neglecting your organizaiton’s health if you aren’t regularly working toward continuous improvement.

Additionally, you might want to consider learning about how fast-track implementation of Lean Six Sigma during a merger or acquisition. Any newly combined entity is on shaky ground as you consider the disparate cultures and practices at play, but you can pool efforts to create a far more effective workforce.

Conclusion

Six Sigma is an ideal fit for network operations. When you consider that we’re constantly striving toward maximum uptime, rapid problem resolutions, and so forth, it just makes sense to start implementing the tools and principles behind the methodology. With any luck, you’ll see your network operations reach heights thanks to the steps outlined today.

Six Sigma for Network Operations: Reducing Vulnerabilities and Enhancing Uptime

Key Points

Understanding CTQ Metrics for Networking