Implications of Analyses of Software Inspections Data

A variety of analyses can be done during the Analyze phase of a Six Sigma DMAIC (Define, Measure, Analyze, Improve, Control) software project with data from Fagan-style inspections. These analyses suggest possible implications when considering Improve activities. Analyses used here are based on a real situation and the conclusions drawn are valid in that situation, but are not necessarily applicable to other organizations. These analyses are done with various subsets of the data and are hence not directly comparable to one another.

Measure Phase Data

Table 1 provides a portion of the inspections data collected during the Measure phase. This data was used in the analyses that follow.

Table 1: Inspections Data
Work Product Type	4GL	4GL	4GL	4GL	4GL	4GL
Appraised Size	73	67	116	122	172	225
Number of Participants	4	4	4	4	4	5
Code Was Tested	True	True	True	True	True	True
Rework Hours	2.0	0.65	2.0	6.23	0.18	0.96
Total Appraisal Hours	11.5	11.2	13.5	7.3	9.62	13.2
Major Defects	3	1	7	4	1	2
Major Defects Per Hour	0.261	0.089	0.519	0.548	0.104	0.152
Major Defects Per Size	41.096	14.925	60.345	32.787	5.814	8.889
Language	4GL	4GL	4GL	4GL	4GL	4GL
Defect Severity	Minor	Major	Minor	Major	Minor	Minor
Defect Type	Documentation	Checking	Data	Interfaces	Interfaces	Documentation

Defect-Type Distributions

Objective of this analysis is to determine which, if any, defect types are represented in significantly different proportions by language, and within language, by project type. Minitab basic statistics > 2 proportions was used to develop Tables 2 and 3. Entries in the tables with p-values < .05 are significantly different between the populations compared.

Table 2: Java Versus 4GL
Defect Type	Java (n = 498)		4GL (n = 1,342)		p-Value
Function/Algorithm	106	21%	574	43%	0.000
Data/Relationships	56	11%	184	14%	0.159
Documentation	19	4%	169	13%	0.000
Object/Program Structure	88	18%	139	10%	0.000
Performance/Scalability	82	16%	66	5%	0.000
Standards	56	11%	51	4%	0.000
Checking	52	10%	112	8%	0.180
Interfaces	30	6%	38	3%	0.006

Table 3: Transactional Applications Versus User Interfaces
Defect Type	Trans. (n=436)		U.I. (n=225)		p-Value
Function/Algorithm	191	44%	71	32%	0.002
Data/Relationships	33	8%	14	6%	0.511
Documentation	82	19%	37	16%	0.446
Object/Program Structure	50	11%	51	23%	0.000
Performance/Scalability	13	3%	25	11%	0.000
Checking	34	8%	8	4%	0.017
Interfaces	19	4%	10	4%	0.959

Inspection of Tested Versus Untested Code

Inspections of untested code find twice as many defects as in tested code – the difference is statistically significant. Note, however, that the cost to find and fix a defect by inspection was less than half of the cost of testing, even if the code had been tested prior to inspection.

Figure 1: Summary for Code Defects Count TRUE

Figure 2: Summary for Code Defects Count FALSE

Mann-Whitney Test and Confidence Interval:
Code Defects Count FALSE, Code Defects Count TRUE
(Minitab Printout)

Number of Inspectors

Figure 3: Scatterplot of Major Defects/Hour Versus Number of Participants

Implications for the Improve Phase

Number of inspectors: In this situation, the conventional wisdom – based on Fagan’s data – is clearly not valid (i.e., four inspectors are not more cost effective than three). Hence, reduce the number of participants per inspection to three and use the resultant savings to evaluate alternative allocation of that effort to:
- More inspections (data collected so far does not indicate that a sufficient percentage of code has been inspected to reach diminishing returns); or
- More detailed design (see next item).
4GL: Explore methods to prevent function/algorithm defects. Discussion with the development team suggests insufficient detail in design and requirements documents may be the most significant root cause. Conduct a pilot effort to evaluate cost/benefit of additional design effort.
Java: As defect types are more evenly distributed compared to 4GL, a more broadly based educational effort may be more effective than a focus on particular defect types. Examination of a cross section of defects suggests they predominately originate in code and are not related to design or requirements.
Focus on untested code: If, as in most cases, effort allocated to inspections is severely limited, then priority should be given to untested code. However, a pilot program to test the cost-benefit ratio of allocating a significantly higher percentage of total effort to inspections is clearly indicated. Most projects in this sample allocated 5 percent to 10 percent of total construction effort to inspections. In most instances, 30 percent to 40 percent of total development effort is allocated to testing.

Other Analyses

Many other analyses can be performed when data is available. One high potential area is to examine the differential effectiveness of different appraisal methods (design and code inspections, unit, system and acceptance testing) in terms of the types of defects most efficiently found by each method.