Six Sigma Software Metrics, Part 1

Six Sigma brings sharp focus to customer and business requirements and the defects connected with the failure to satisfy them. While the relevance of that view is clear enough to software professionals, their introduction to Six Sigma is often gets stopped short in questions about how the notions of yield, sigma level, or defects per million opportunities (DPMO) fit their world. As those are the toughest concepts to map to software, this article will suggest it’s better to end the Six Sigma metrics discussion there, when a foundation in other defect metrics provides suitable measurement systems and perspective.

This is the first in a series of three articles that views the range of software measures that support Six Sigma software goals and results. Table 1 illustrates how those goals “start simple” with the fundamental need to reduce released defects, progressing to more sophisticated goals where each build on prior work and understanding. Defect density metrics like DPMO appear at the end of the table as a culminating goal for software metrics. Rather than starting by wrestling with questions like “What’s an opportunity for a defect?”, many software businesses would be better off to work through those goals in sequence, recovering significant dollars with DMAIC (Define, Measure, Analyze, Improve, Control) projects as they go. By the time the last few goals are tackled, the measurement system and understanding about defects frames the DPMO question in a way that is much more understandable.

“Maturity” In The Use Of Defect Metrics

When Six Sigma originated at Motorola, the sharp focus on defects created an important shift from the common manufacturing yield metric¹, which tracks defective units to the more detailed view of defect counting within units. For software, this part of Six Sigma is easy – no shift at all – as defects within units (bugs) have always been a natural measure of quality. While all software organizations find and fix bugs, there is huge variation in the quality of data gathering, measurement (converting the raw data into measures), and use of that data.

Table 1 shows that the progression in defect metrics refinement and richness of data-use can be seen as a series of goals, each of which call for some disciplined processes and which enable particular uses of the data. We will see that there is a lot to do before the question about DPMO even begins to make sense. Further, we’ll recognize that a software organization can be fully successful with Six Sigma without even implementing DPMO metrics.

Goal 1: Reduce Released Defects

This fundamental Six Sigma goal is one that all software work deals with in one way or another. Tracking defect containment requires, at a minimum, a test process and defect tally system. When combined with post-released defect tallies this basic data supports the primary defect containment metric: total containment effectiveness (TCE). The computation is simple.

Total Containment Effectiveness (TCE) Formula

Where:
Errors = Potential defects that are found during the phase that created them
Defects = Errors that escape to a subsequent development or delivery phase

Obviously, a high TCE is best. While TCE data can be collected and the metric can be easily computed, many software organizations cannot document their performance on this basic metric. Benchmarking shows a range of TCE in the industry from about 65 percent to 98 percent, with many organizations somewhere in the 75 percent to 85 percent range. In some software businesses where there are particularly strong penalties for escaped defects (high end data storage systems, for example) the TCE can approach and reach 100 percent.

Figure 1: Total Containment Effectiveness

For Software Projects of Different Size and “Schedule Compression”
From Patterns of Software System Failure and Success by Capers Jones, pp. 31-32.

If the same defects that are counted to compute TCE are classified and grouped according to type and, even better, with respect to where they were inserted into the process, some rudimentary defect causal analysis can be done.

Figure 2: Origins of Defects Found During Test

In Figure 2 we see that, for defects found in test, most were inserted during coding. Building a measurement system for classifying these defects by type would be the next step in uncovering the key defect root causes. That, of course, would help us to reduce or remove root causes. In concert with root cause (defect insertion) reduction we may want to step up efforts to reduce the escape of defects from upstream phases to test. That highlights the next goal.

Goal 2: Find and Fix Defects Closer to Their Origin

This requires that, in concert with the creation of important work products (not just code, of course, but documents like requirements, specifications, user documentation), we integrate activities that are effective in finding the errors that could become downstream defects. Formal work-product inspections, based on Fagan’s seminal work at IBM³, have been shown to be the most efficient and effective way to find potential defects and develop useful data about them.

With inspections and perhaps supplementary defect-finding activity in place we could track defect insertion rates at each key development or delivery stage. Figure 3 shows where defects attributed to Design were found. Note that 35 percent were found “in phase” (during Design) and lesser proportions found by test, during coding and by the customer (post-release).

Figure 3: Pareto Chart For Design Defects

Data like this enables the computation of two related containment metrics: phase containment effectiveness (PCE) and defect containment effectiveness (DCE).

Phase Containment Effectiveness (PCE)

Defect Containment Effectiveness (DCE)

PCE tracks the ability of each phase to find defects before they escape that phase. DCE tracks the ability of each phase to find defects passed to it by upstream phases. Each of these measures provides insight into the strengths and weaknesses of the error and defect detection processes in each phase. Tracking these numbers together with defect type classifications, can identify patterns that shed light on the causes of the types of defects that are found. As a system like this matures, with data over time and across multiple projects, an understanding of the defect “time dynamics” can begin to come into view, setting the stage for the next goal. )The use of a defect containment scorecard to track TCE, DCE, PCE and defect repair Costs is described in this related article: Is Software Inspection Value Added?)

Table 1: Software Organization Goals Versus Processes And Metrics

Six Sigma Goal

Required Processes

Enabled Processes

Metrics

1. Reduce Released Defects

Unit/integration/system Test

Pre-release vs. post-release defect tallies

Total containment effectiveness (TCE)

Focus defect fix/removal work

• Operational definitions for classifying defects
• Problem-solving knowledge-base

Basic causal analysis

• Defect stratification by type
• Poisson Modeling for defect clustering

(Optional) Define the “unit” for work-products delivered

• Yield assessments for work-product units delivered
• Yield predictions for work-product units planned

• Total defects per unit (TDU)
• Rolled Throughput Yield (RTY)

2. Find and Fix Defects Closer to Their Origin

Upstream defect detection process (inspections)

Defect insertion rates and defect find rates for phases or other segments of work breakdown structure (WBS)

Phase containment effectiveness (PCE) and defect containment effectiveness (DCE)

Gather data necessary for process monitoring and improvement

Defect sourcing process

Improved causal analysis

Defects per unit (DPU) for phases or other segments of WBS, contributions to TDU

3. Predict Defect Find and Fix Rates During Development and After Release

• Rayleigh or other time-dynamic modeling of defect insertion and repair
• Defect estimating model – calibrated to the history and current state in our process

• Given data on defects found during upstream development or WBS stages, predict defect find rates downstream
• Predicted total defects for an upcoming project of specified size and complexity

• Best-fit Rayleigh Model
• Predictive Rayleigh Model

4. Compare Implementations Within the Company

Company’s choice of appropriate normalizing factors (LOC, FP, etc) to convert defect counts into meaningful defect densities

Defect density comparing groups, sites, code-bases, etc. within the company

• Defects per function point
• Defects per KLOC

5. Benchmark Implementations Across Companies

Define opportunities for counting defects in a way that is consistent within the company and any companies being benchmarked

Defect eensity comparing performance with other companies (and, if applicable, other industries)

• DPMO
• Sigma level
• Cpk
• z-score

Looking Ahead

In part 2 we’ll look at Goal 3: Predict defect find and fix rates during development and after release. Building on the foundation provided by containment measures, we will explore defect arrival rate prediction models and tests for their adequacy. In part 3 we’ll cover Goals 5 and 6 related to defect density measures, including the elusive defects per million Ooportunities (DPMO).

Read Six Sigma Software Metrics, Part 2 »

Footnotes and References

1 Yield = good units / units started. First time yield (FTY) computes this quantity before any rework.
2 FP = Function points, measure software work-product size based on functionality and complexity. While this system is independent of implementation language, there are conversion factors that map function point counts to ‘lines of code’ in cases where code is the work-product being sized. As order of magnitude rule of thumb 1 FP converts to about 100 lines of C code.
3 M.E.Fagan, “Design and Code Inspections to Reduce Errors in Program Development,” IBM Systems Journal, vol. 15, No. 3, 1976.