This is one of two articles by David Wetzel that explore the value of developing a data management plan as the intial step in the Measure phase of the Six Sigma DMAIC methodology. The other article is “Data Management Plans Can Reduce Project Cycle Time.”
Understanding data and defining process paramaters (input) or product characteristics (output) is the beginning of data-based decision-making – the very heart of Six Sigma. A data management plan (DMP), which outlines the use of data, can be employed as a key document in the Measure phase of DMAIC. Because of its rigor, its identification of process validation (Control) and its stratification factors (Analysis), a DMP can be one of the pivotal planning documents for a Six Sigma project.
In order to select the appropriate measurement system analysis (MSA) method, project teams should ask three questions related to data collection and validation:
1. What is the data source or location? Answers to this question help to clearly identify the point at which the raw data is collected. The most common mistake made in answering this question is not to identify where reports come from. Examples of raw data collection include: on tags, in log books, entered into data bases, scribbled on surveys, interpreted from phone conversations or automatically tallied by machinery.
2. Who is the data collector? The answer to this question is typically a front line employee: operator, clerk, waiter or other. If the data is scanned or automatically tallied by machinery or computer, then a simple entry of the method employed is adequate.
3. What is the sampling plan? This question is often confused with reporting. The question is meant to apply to raw data. How often is data collected? Examples include: continuously, once per minute, each setup, each shift, each customer contact, every fifth call.
The answers to these three sampling questions determine how quickly the team will be able to validate improvements and verify sustainability of its process after changes have been made. Combined with family of measure (FoM), the answers can be used to create a handy reference sheet.
Every project is a comparative experiment. Every team compares “after-data” to “before-data” to validate process improvements. The comparison can take on many forms and involve many different statistics. The before-data usually consists of one of three levels – no data, occasional data or lots of data. If the data does not exist, this is a potential project killer or could at least extend project cycle time significantly because the team will have to define, collect and validate its own data. If the data is collected occasionally (less often than weekly), the team will have to think about more frequent data collection or face an extended project cycle time. If there is lots of data, collected frequently (at least weekly) for a long period of time, a trend analysis should be conducted to see if the team can consider the data “historical.” If the data is historical – perhaps several quarters’ worth – then the team can validate its process improvements faster with less effort (smaller sample size requirements for the comparative experiment).
Regardless, historical data must be collected and displayed to document conditions before any process improvement. Summary statistics must be calculated and assumptions tested in order to determine the correct comparative experiments to be carried out to validate process improvements.
However, people simply do not like looking at numbers – even summaries of numbers (means, medians, ranges, standard deviations). Most people are more easily able to understand graphical representations of data. This has two benefits. First, a team can visualize variability and skewness better with bar charts and histograms. Second, a team can visualize historical performance over time better with a run, individual or median chart.
Armed with summary statistics, graphical displays of central tendency, variability, skewness, graphical displays of time, and a little knowledge of research methods (control groups and randomness), the project team can choose appropriate comparative experiments to validate future process improvements. The team should consider four points to help them lay out a process and validation strategy:
Visualizing means is quite natural for most team members but looking at variability, skewness and outliers for the first time can be new and insightful. Is the team satisfied with the variability in its main project metric? Do the team members like where it is centered? What summary statistics are appropriate to describe the main project metric? Most teams by habit default to the mean; however, this has two problems – it does not include variability and it is not the right summary statistic for skewed data sets, which should be described by the median and percentiles.
Many poor and costly organizational decisions have been made based on the wrong central tendency statistic, e.g., using the mean when the median was more appropriate and insightful. It is for these reasons that it is imperative teams learn how to employ histograms to graphically display central tendency and variability and to conduct and interpret normality checks. The references shown in the table below can be used to help teams choose the appropriate graphical method to display their historical or before-data.
Family of Measure
Attribute MSA (Counts, Pass/Fail, Disposition, Type)
or Paired t-Test of Counts
Paired t-Test of Counts
Attribute MSA (Counts or Categories),
Paired t-Test of Days, Hours
Traditional (ANOVA or Range Method)
Time Study or Paired t-Test of Time
Mixtures of Above – Usually Paired t-Test of Two Sample Porportion Test Using Normal Approximation
Graphical displays of data over time – run charts in particular – are familiar to most project teams. The challenge is to have the team upgrade these simple charts which have inherent limitations (management by means and no decision limits) to more useful and insightful alternatives, such as individual control charts, median control charts or more advanced control charts. Every main project metric should be at least displayed on an individual chart or median chart since both can handle counts and measures.
Another consideration is desirability. What is the main purpose of the process improvement team? There are essentially four ways to improve a process. Most teams, especially first-time teams, are interested in shifting a mean or median (i.e., scrap is too high, customer satisfaction is too low or credit card fraud is too high). All teams should strive to reduce variability, but for a few teams it is the main desirability of the project (i.e., on average “delivery times to commit” are 0 but the company experiences +/- 50 days of variability). A third potential purpose of a process improvement is to stabilize it. Besides being a project goal, all projects must both check for reductions in variability and prove stability after process improvements have been made to ensure sustainability. A fourth way to improve a process is to make it more capable, which was the early history and original purpose of Six Sigma.
All four of these methods employ comparative experiments to validate process improvements. One payoff of the DMP is that the team identifies its comparative experiment plan. What comparative experiments are the team going to employ to validate its process improvements? For most teams this is a simple two sample t-test to show a shift in means and an f-test to show whether a reduction in variation was achieved. However, it can quickly become more complicated if assumptions are violated, control groups are not employed and nonparametric statistics are required. A project team needs useful decision trees, additional software skills and a capable mentor (Black Belt, statistician or Master Black Belt) to help guide the team to create a valid process improvement validation strategy using comparative experiments, due diligence and good research methods.
This section of the DMP only includes one question. What is the team’s plan for slicing and dicing its historical or before-data? The purpose of this question is to look at the data in as many different ways as possible. Often, stratifying the data yields insights into possible root causes and sources of variation. The tool of choice for this task is analysis of variance (ANOVA) to determine if there are differences between shifts, suppliers, customers, machines, methods, plants, providers, procedures and on and on. Again, until the team is comfortable with conducting and interpreting ANOVAs, it may require the help of a capable mentor. This effort is one of the three foundational ways to conduct the Analysis phase and pivotal to project acceleration and success.
In addition to the uses described here, there are many other ways to use data management plans, and to customize them for different companies and cultures. Some of these uses include:
Outside of the Six Sigma framework, DMPs have been used as auditing tools of suppliers and as a way for leadership teams to identify and manage internal and external measures for partners, suppliers and customers.
The data management plan can be deployed as a key document for the planning and conducting of the Measure phase of DMAIC. Because of its rigor, its identification of process validation (control), and the stratification factors (analysis), it can be one of the pivotal planning documents for the entire project.