Inspections are one of the most common methods of review performed in software teams. The goal of code inspection is to identify software faults early in the software development lifecycle. Teams are faced with the challenge, however, of determining whether those inspections are effective. One way to quantify this is by predicting the total number of faults that can be found in an inspection. A prediction model can be applied to evaluate an inspection process and refine the achieved quality level.
Inspection fault density (the number of operational faults detected divided by lines of code inspected) is used by development teams to evaluate the effectiveness of code inspections. Teams are required to reinspect code wherever an inspection does not meet the inspection fault density guideline – which can potentially be a waste of resources. Alternatively, a block of code under inspection can be passed without trying to find (hidden, but existing) faults once the inspection fault density guideline is satisfied. Hence, there is a need for a fault prediction model based on various factors associated with software development and inspection. This article describes the development of such a model.
The number of faults detected during code inspection is characterized by the Poisson distribution, which models counts of events having a constant mean arrival rate over a specified interval. The inspection model based on Poisson distribution assumes that the mean rate of fault introduction and fault detection is constant over a specified length of code block. In practice, however, there are several potential problems with this assumption:
All of these problems are addressed by the statistical procedure called generalized linear modeling (GLM). Similar to regression, GLM is a response variable that depends on at least one continuous predictor and multiple categorical variables. This is a generalization of the Poisson model where the rate of detection also depends on other factors present in the model.
To develop a model that predicts the expected the numbers of faults to be found in an inspection, potential factors that influence fault detection need to be identified. Based on historical knowledge, the following eight potential factors were identified for further investigation:
Hypothesis testing was carried out to determine the significance levels of the aforementioned factors. The results are shown in Table 1.
Table 1: Hypothesis Testing Results for Potential Factors  
Factors  Statistical Hypothesis  pvalue  Conclusion 
File Type (.h and .c file)  H_{0 }: p_{.h }= p_{.c} H_{1 }: p_{.h }? p_{.c}  0.045  Statistically significant 
Type of work (new and port)  H_{0 }: p_{new} = p_{port} H_{1 }: p_{new} ? p_{port}  0.00  Statistically significant 
Programming language (C and C++)  H_{0 }: p_{c} = p_{c++} H_{1 }: p_{c} ? p_{c++}  0.312  Fail to reject null hypothesis 
Development Teams (A and B)  H_{0 }: p_{A} = p_{B} H_{1 }: p_{A} ? p_{B}  0.368  Fail to reject null hypothesis 
Platform (C and D)  H_{0 }: p_{D} = p_{D}_{ }H_{1 }: p_{D} ? p_{D}  0.620  Fail to reject null hypothesis 
Where p_{*} is the inspection fault density for the specified criteria 
Based on the hypothesis testing results, it was concluded that two factors – file type and type of work – have a significant effect on the faults detected in an inspection. Next a GLM model was developed, using the numbers of operational faults as the response variable. The results of the GLM model are shown in Table 2.
Table 2: Parameter Estimates for GLM Model (Log Link Function)  
Parameter  Estimate  Standard error  95% Wald Confidence Interval  Hypothesis Test  
Lower limit  Upper limit  Wald chisquare  Df  Significance  
(Intercept)  0.108  0.5458  1.178  0.962  0.039  1  0.843 
[file =.c ]  0.656  0.4532  1.544  0.233  2.093  1  0.048 
[file =.c and.h]  0.46  0.4195  1.282  0.362  1.202  1  0.023 
[file = .h]  0  
[work = new]  0.579  0.2607  0.068  1.09  4.928  1  0.026 
[work = port]  0  
Inspected size  0.003  0.0009  0.001  0.005  10.342  1  0.001 
Inspected size squared  8.58E07  4.37E07  1.72E06  1.31E09  3.853  1  0.05 
Average development branches  0.03  0.0248  0.019  0.079  1.441  1  0.03 
Average panic branches  0.288  0.4396  0.573  1.15  0.43  1  0.012 
R^{2} adjusted = 60.03% 
The results of the statistical analyses shown in Table 2 were used to generate prediction charts with 95 percent confidence limits. The upper and lower limits were calculated based on the parameter estimate of 95 percent Wald confidence interval shown in Table 2. Since the inspected file type, work type, average numbers of development branch and average numbers of panic branch are significant predictors of faults (from Table 2), two separate prediction charts for each combination of these factors were generated and are shown in Figures 1 and 2.
A similar inspection fault prediction chart may be plotted for different combinations of predictor variables based on the requirements of software development.
Model validation was done on inspections as an ongoing project. Based on the existing guidelines for inspection fault density, 28 inspections were found to be free of defects or bugs. The parameters of these 28 inspections were fed into the GLM model and seven inspections were selected for reinspection. Development teams were able to find additional operational faults in four of these seven inspections. However, this model correctly identified 16 inspections as bugfree and saved approximately 200 staff hours, which was a significant improvement.
This model predicts the expected number of faults based on the detection ability of teams so this is still a guideline. This guideline can be converted to goals by including the testing defects, which are simply faults that were not detected in inspection. The model explains around 60 percent of overall variation in the predicted operational faults. Improvements in variability can be obtained by adding more predictor variables such as inspection rate, preparation rate, team size and the experience level of the inspection team.


Comments
Excellent article and very well written.
Thanks for sharing, Jeff
Thanks for your comments Jeff.
Regards,
Shiv
First of all it is a very good article.
My observation is that you consider constant mean arrival rate over a specified interval for your model but that arrival rate may change depending on Type of work i.e., New feature or Port feature.
Secondly, you can mentioned R Square value also for your GLM model and comparing R Square and Adj R Square one can check model parsimony wrt predictor variables
Very good article.
Thanks for your comments BG. Yes, assumption of constant fault introduction is not practically feasible. So GLM model is used instead of homogenious Poission process modelling. The basic assumption here is that mean arrival rate of fault is not constant and varies depending on the length of inspected code/experience level of inspectors/ work typenew or port etc.
Regarding your second comment…yes, you are right. Providing Rsq value is important to check if the model is overfitted or not. In this case Rsq value was close to Rsqadj. Rsq was 63% and Rsqadj was 60.03%.
Again Thanks a lot for your feedback/suggestions.
Regards,
Shib
Thank you for this article. I haven’t seen much written about human errors in regards to using the Poisson distribution. It was good to see how you looked into potential variables influencing predicted error density.
—Russell
Thanks for your feedback.
Regards,
Shiv