A variety of analyses can be done during the Analyze phase of a Six Sigma DMAIC (Define, Measure, Analyze, Improve, Control) software project with data from Fagan-style inspections. These analyses suggest possible implications when considering Improve activities. Analyses used here are based on a real situation and the conclusions drawn are valid in that situation, but are not necessarily applicable to other organizations. These analyses are done with various subsets of the data and are hence not directly comparable to one another.

Measure Phase Data

Table 1 provides a portion of the inspections data collected during the Measure phase. This data was used in the analyses that follow.

Table 1: Inspections Data
Work Product Type

4GL

4GL

4GL

4GL

4GL

4GL

Appraised Size

73

67

116

122

172

225

Number of Participants

4

4

4

4

4

5

Code Was Tested

True

True

True

True

True

True

Rework Hours

2.0

0.65

2.0

6.23

0.18

0.96

Total Appraisal Hours

11.5

11.2

13.5

7.3

9.62

13.2

Major Defects

3

1

7

4

1

2

Major Defects Per Hour

0.261

0.089

0.519

0.548

0.104

0.152

Major Defects Per Size

41.096

14.925

60.345

32.787

5.814

8.889

Language

4GL

4GL

4GL

4GL

4GL

4GL

Defect Severity

Minor

Major

Minor

Major

Minor

Minor

Defect Type

Documentation

Checking

Data

Interfaces

Interfaces

Documentation

Defect-Type Distributions

Objective of this analysis is to determine which, if any, defect types are represented in significantly different proportions by language, and within language, by project type. Minitab basic statistics > 2 proportions was used to develop Tables 2 and 3. Entries in the tables with p-values < .05 are significantly different between the populations compared.

Table 2: Java Versus 4GL

Defect Type

Java (n = 498)

4GL (n = 1,342)

p-Value

Function/Algorithm

106

21%

574

43%

0.000

Data/Relationships

56

11%

184

14%

0.159

Documentation

19

4%

169

13%

0.000

Object/Program Structure

88

18%

139

10%

0.000

Performance/Scalability

82

16%

66

5%

0.000

Standards

56

11%

51

4%

0.000

Checking

52

10%

112

8%

0.180

Interfaces

30

6%

38

3%

0.006

Table 3: Transactional Applications Versus User Interfaces

Defect Type

Trans. (n=436)

U.I. (n=225)

p-Value

Function/Algorithm

191

44%

71

32%

0.002

Data/Relationships

33

8%

14

6%

0.511

Documentation

82

19%

37

16%

0.446

Object/Program Structure

50

11%

51

23%

0.000

Performance/Scalability

13

3%

25

11%

0.000

Checking

34

8%

8

4%

0.017

Interfaces

19

4%

10

4%

0.959

Inspection of Tested Versus Untested Code

Inspections of untested code find twice as many defects as in tested code – the difference is statistically significant. Note, however, that the cost to find and fix a defect by inspection was less than half of the cost of testing, even if the code had been tested prior to inspection.

Figure 1: Summary for Code Defects Count TRUE
Figure 1: Summary for Code Defects Count TRUE
Figure 2: Summary for Code Defects Count FALSE
Figure 2: Summary for Code Defects Count FALSE

 Mann-Whitney Test and Confidence Interval:
Code Defects Count FALSE, Code Defects Count TRUE
(Minitab Printout)

 

Number of Inspectors

Figure 3: Scatterplot of Major Defects/Hour Versus Number of Participants
Figure 3: Scatterplot of Major Defects/Hour Versus Number of Participants

 Implications for the Improve Phase

  • Number of inspectors: In this situation, the conventional wisdom – based on Fagan’s data – is clearly not valid (i.e., four inspectors are not more cost effective than three). Hence, reduce the number of participants per inspection to three and use the resultant savings to evaluate alternative allocation of that effort to:
    • More inspections (data collected so far does not indicate that a sufficient percentage of code has been inspected to reach diminishing returns); or
    • More detailed design (see next item).
  • 4GL: Explore methods to prevent function/algorithm defects. Discussion with the development team suggests insufficient detail in design and requirements documents may be the most significant root cause. Conduct a pilot effort to evaluate cost/benefit of additional design effort.
  • Java: As defect types are more evenly distributed compared to 4GL, a more broadly based educational effort may be more effective than a focus on particular defect types. Examination of a cross section of defects suggests they predominately originate in code and are not related to design or requirements.
  • Focus on untested code: If, as in most cases, effort allocated to inspections is severely limited, then priority should be given to untested code. However, a pilot program to test the cost-benefit ratio of allocating a significantly higher percentage of total effort to inspections is clearly indicated. Most projects in this sample allocated 5 percent to 10 percent of total construction effort to inspections. In most instances, 30 percent to 40 percent of total development effort is allocated to testing.

Other Analyses

Many other analyses can be performed when data is available. One high potential area is to examine the differential effectiveness of different appraisal methods (design and code inspections, unit, system and acceptance testing) in terms of the types of defects most efficiently found by each method.

About the Author