One of the many challenges faced when attempting to build a business case for software process improvement is the relative lack of credible measurement data. If a company does not have the data to build the business case, it does not have the improvement project to get the data. It is the classic chicken-and-egg dilemma. But there is a solution.

An example case study – actually a composite of several similar situations – illustrates some of the challenges and how to overcome them when attempting to create a realistic, defensible evaluation of potential and actual benefits. As the example indicates, this is often a multistage process that unfolds over a period of months or even years. One of the keys to success is to candidly acknowledge what is and is not known at a particular point in time. All of the numbers here have been changed to protect the innocent (and the guilty), but the overall story and learning are faithful to the real projects.

Beginning with the Situation

A 100-person software development team is responsible for a major software product containing a total of about 4,000,000 statements. The product has been built in a series of releases, with each release typically adding 800,000 to 1,200,000 statements. The development cycle for each release is approximately one year (which includes design, coding and all testing prior to release). The development team is responsible for all support and defect repair during the first year after release. Hence, the team is concurrently responsible for maintenance of the previous release and development of the next. After the first year, support and defect repair is handled by a separate maintenance organization.

In order to build a business case that will lead to approval for a pilot improvement project, available baseline data on the most recent release is collected. This data, together with industry data when local data does not exist, is the basis for the initial business case.

The Initial Business Case

The basic premise of the initial business case is that the introduction of formal peer reviews, initially applied to code only, will reduce the cost to find and fix defects relative to find-and-fix costs associated with existing test practices. In addition, the new process is expected to deliver a higher quality product, as measured by “total containment effectiveness,” or TCE. (TCE = defects discovered before release, divided by [defects discovered before release plus defects discovered first year after release] minus percent of defects discovered before release.)

What needs to be known – “as-is” (baseline) and “to-be:”

  • The number of defects found in each phase – code, integration test, system/acceptance test and release Year 1)
  • The effort required to find and fix a defect in each phase
  • Average labor rate for the team

What is known:

  • The number of customer calls related to the prior release. These included requests for consulting assistance, various types of questions, as well as reports of actual defects.
  • The number of internally reported defects. It was widely recognized that many defects were not reported.
  • The approximate start and end date of each phase (although there was some overlap), and the percent of total effort devoted to maintenance of the prior release and development of the current release during those time periods. From this information, the approximate find-and-fix time for each phase can be calculated. This leads to 12 hours per defect during integration test, 18 hours during system/acceptance test, and 42 hours for defects delivered to customers.
  • The average labor rate

What is not known…and what is assumed:

  • It was recognized that the actual number of defects was less than the number of messages from customers, but there was no way to determine the actual number of true defects. Hence, the number of calls was used because that was the best information available. In any event, that would not have a major impact on the business case as it is assumed that any distortion is uniform across phases. During the pilot deployment, the defect/non-defect ratio will be measured and the appropriate adjustment will be retroactively applied.
  • While an approximate estimate of total labor devoted to test phases is known, there was no way to distinguish “find” time from “fix” time, so the total of the two was estimated, with an intent to measure the distinction during the pilot process.
  • The internal defect count was known to be under-reported. Therefore it was decided to check the code management system to approximate the defect count by looking at the versions created during testing – this turned out to be about 50 percent greater than the number of defects recorded during testing. This data was not believed to be completely accurate either, but closer to the actual facts.
  • Since inspections had not been done in the previous release, the defect removal effectiveness rate was not known. Industry experience indicated that inspections can typically remove around 60 percent of the defects present in work products inspected. Industry data also shows that in many instances defect “clustering” occurs (e.g., perhaps 60 percent of all defects are found in 20 percent of the work products); hence, selecting the right items to inspect was critical to success. Since as a practical matter it would not be possible to inspect everything, it was decided to inspect about 20 percent of the total code. In the ideal case, where the high defect 20 percent was selected, this could theoretically lead to removal of 36 percent of the total defects by inspection (i.e., 60 percent of 60 percent). That was deemed unlikely, so it was decided to base the business case on 18 percent of total defects removed by inspections.
  • Cost to find and fix defects by inspection also was not known. Based on industry experience, it was decided to use four hours as the initial estimate, to be confirmed or changed by experience during the pilot.

The above leads to the following initial business case as outlined in Table 1.

Table 1: Initial Business Case

Rates

As-Is Baseline

To-Be

Labor Rate (Per Hour)

$80.00

Defects

F&F Hours

Dollars

Defects

F&F Hours

Dollars

F&F Hours

Total

% Found

16,510

16,510

Code Inspections

18.0%

0

2,972

11,887

$950,976

4

Integration Test

25.4%

4,188

70,615

$5,649,224

3,434

41,213

$3,297,056

12

System/Acceptance Test

48.1%

5,927

106,680

$8,534,400

4,860

87,478

$6,998,208

18

Customer (One Year)

38.7%

6,395

268,590

$21,487,200

5,244

220,244

$11,639,885

42

Year 1 TCE

61.3%

Year 1 TCE

68.2%

Totals

445,885

$35,670824

360,822

$28,865,744

Savings

$6,805,080

19.1%

Results of the Pilot Project

Based on the initial business case, the improvement initiative was approved and the DMAIC (Define, Measure, Analyze, Improve, Control) roadmap was followed. Two component development teams, each with about 10 people, were selected for the pilot to demonstrate “proof of concept.” Each team inspected roughly 20 percent of the code they developed, performing a total of about 100 inspections. In addition to inspections, the pilot teams used improved tools and processes for defect tracking and time accounting throughout the development and testing cycle. Hence at the end of the one-year pilot phase, there was much more accurate data on defect containment rates in each phase, as well as accurate data on find and fix costs.

Data from the pilots was used to revisit the initial business case. The more accurate containment rates derived from the pilot were retroactively applied to the baseline, based on the fact that the testing processes had not changed, the application and team composition were the same – hence, this approach gives a more accurate comparison of as-is and to-be. Also investigated was the relationship between the number of calls and the number of actual defects – roughly 60 percent of calls were actually defects.

The results of the pilot were scaled-up to the same scale as the original business case (i.e., the pilot represented about 20 percent of the total), so results were multiplied by five. Note that at this point there are no post-release results, so it had to be assumed that the total number of statements developed and the number of defects will remain unchanged from the baseline.

The pilot results were actually somewhat better than the initial business case. The percent of defects discovered by inspections was actually higher than forecast (25 percent versus 18 percent). This is based on the assumption that the total number of statements and the number of defects “inserted” remains the same compared to the previous release. Since no other process changes were made during this time and the staff is the same, this was a reasonable assumption – it will be check against actual results at the end of Year 2.

The defect find-and-fix costs are a bit different that in the initial business case, but this has no effect on the benefits, since these rates are applied to both as-is and to-be.

The scenario presented in Table 2 assumes that the primary goal is to reduce cost; hence, reducing test effort significantly while delivering a product with a slightly higher TCE.

Table 2: Initial Business Case, Assuming Goal of Cost Reduction

Rates

As-Is Baseline

To-Be

Labor Rate (Per Hour)

$80.00

Defects

F&F Hours

Dollars

Defects

F&F Hours

Dollars

F&F Hours

Total

% Found

9,906

9,906

Code Inspections

18.0%

0

2,477

9,906

$792,480

4

Integration Test

25.4%

2,513

37,343

$2,987,454

1,885

18,848

$1,507,800

10

System/Acceptance Test

48.1%

3,556

90,678

$7,254,240

2,667

68,009

$5,440,680

26

Customer (One Year)

38.7%

3,837

161,154

$12,892,320

2,878

120,866

$6,387,742

42

Year 1 TCE

61.3%

Year 1 TCE

70.9%

Totals

289,175

$23,134,014

217,628

$17,410,200

Savings

$5,723,814

24.7%

Alternatively, management might prefer to hold test effort constant, deliver higher quality, and realize cost savings in post-release maintenance. If that approach to harvesting to-be benefits is chosen, it must be assumed that testing will be somewhat less effective because fewer defects “enter” testing since they were removed by inspections. The business case in Table 3 assumes the hours devoted to testing will be unchanged compared to the baseline, but testing will be 10 percent less effective (i.e., the defects found in each test phase will be 10 percent less than in the baseline). Savings are only slightly less, but delivered quality is much higher as measured by TCE – 80.1 percent rather than 70.9 percent. About half as many defects are delivered – 1,967 rather than 3,837.

Table 3: Initial Business Case, Assuming Goal of Higher Quality

Rates

As-Is Baseline

To-Be

Labor Rate (Per Hour)

$80.00

Defects

F&F Hours

Dollars

Defects

F&F Hours

Dollars

F&F Hours

Total

% Found

9,906

9,906

Code Inspections

18.0%

0

2,477

9,906

$792,480

4

Integration Test

25.4%

2,513

37,343

$2,987,454

2,262

37,343

$2,987,454

10

System/Acceptance Test

48.1%

3,556

90,678

$7,254,240

3,200

90,678

$7,254,240

26

Customer (One Year)

38.7%

3,837

161,154

$12,892,320

1967

82,631

$4,367,038

42

Year 1 TCE

61.3%

Year 1 TCE

80.1%

Totals

289,175

$23,134,014

220,558

$17,644,638

Savings

$5,489,376

23.7%

Conclusion: Next Steps and Take-Aways

Based on the results so far, management has agreed to apply inspections to the complete product during the next release development cycle. They have also agreed to have the complete team use the new data collection tools and processes so that in the future accurate data will be available for the entire life cycle, including customer use. That data can be used to prepare much more accurate business cases for future improvement proposals, such as improvements to the test processes.

After another year, data will be available to confirm or revise estimates of total defects and cost to find and fix defects delivered to the customer. The business cases can then be revisited and be restated using the results at that time.

This example case study offers a number of takeaways:

  • No one ever knows everything they would like to know at the start.
  • Nothing can be found out if you do not start.
  • Make the best assumptions, use available data and get better as you go.
About the Author