Measurement system analysis (MSA) determines whether the measurement system is adequate and confirms that significant error is not introduced to the true value of a process characteristic. MSA is the one of the most misunderstood and underused concepts in Six Sigma. This article highlights two of the common mistakes made during the study and explains how to avoid them.
Mathematically, total variance is equivalent to the sum of true variance and the measurement system error. Measurement system error should be zero but, practically speaking, this is not often the case because of factors such as worn and noncalibrated gauges, inconsistency of an appraiser, and different knowledge levels of the appraisers. In other terms, total variation should arise due to the difference in the parts being measured. It is important to maintain a measurement system error as low as possible.
Variance (total) = variance (true) + variance (measurement error)
To consider a measurement system as adequate, there are set rules based on the data type being used. For continuous data, 1) gage R&R has to be within 10 percent (10 percent to 30 percent allowed if the process is not critical) of the total study variation, and 2) the number of distinct categories has to be greater than four. (For discrete data where attribute agreement analysis is used, kappa value has to be at least 0.7 for nominal and ordinal data, and Kendall’s correlation coefficient [with a known standard] has to be at least 0.9 for ordinal data.)
The process of conducting MSA study for continuous and discrete data is similar. Take 10 to 20 samples for a study, provide them to two or three appraisers for the first trial, and then rerun the study. The main difference lies in the fact that the appraisers use a gauge to measure the part in continuous data. For discrete data, however, it is left to the knowledge of the appraisers whether the transaction is defective.
One common challenge faced in an MSA study of discrete data is regarding the two trials. How can the bias be removed when appraisers are given the same samples for the two trials through an email? When provided the same sample twice at the same time, the appraisers will surely provide the experimenter the same results for Trials 1 and 2; thus, no repeatability issues will be detected when the study is done in this manner. Additionally, if the two appraisers are aware of the study being run, then the reproducibility component results will be biased. The following example highlights such a mistake being made during an MSA study.
Example: Compliance Project in Banking
A project leader at a financial institution was asked to do an MSA study to confirm that the measurement system was adequate. He ran the study for a week, put 10 samples in a spreadsheet and sent them to the two appraisers. The study was completed and the data was shared with the Black Belt (BB). The BB completed the study in a statistical analysis program and found that there was no issue in repeatability. There were, however, some mismatches between the two appraisers. Curious, the BB asked the project leader how the study was conducted.
The project leader explained that he documented 10 samples in a spreadsheet and sent them to the two appraisers through separate emails. For the second trial, the project leader again sent the 10 samples in a spreadsheet via email. The BB told the project leader that while the project leader ensured that the two appraisers did not know that the study was being conducted by two different individuals, there was a repeatability bias involved in the process. The BB suggested that the project leader instead follow the following procedure to ensure that there would be no repeatability or reproducibility bias involved in the study.
The project leader took a new 10 samples and provided them to the SMEs following the new documented method. This time there were differences within appraisers, but the kappa value was within the permissible limit. By using this process, the repeatability bias was removed and the true measurement system error was determined.
Another common challenge is frequently observed when an MSA study is done for a set of continuous data. How should a sample be selected when the manufacturing process happens on a number of machines that results in varying product sizes? Can that influence the MSA study?
Example: Multiple Machines in Manufacturing
A supervisor was conducting a MSA study for the thickness parameter of a grinding wheel. She had parts produced from different presses, which used to come in sizes varying from 5 mm to 200 mm in thickness (categorized into large, medium and small thickness wheels). The supervisor thought that one study of 10 samples done with two appraisers would be good enough for the study.
She met with the Six Sigma expert in the organization and asked if she was using the right approach to conduct the study. The Six Sigma expert asked her how she would ensure that no measurement error was introduced (taking linearity into consideration). The expert recommended that the supervisor needed to ensure that the gauge is linear across the entire range of measurements (varying range of thicknesses).
The supervisor then took another set of 10 samples each for the small, medium and large thickness wheels to check the linearity of the gauges (the gage R&R). This way the supervisor ensured that both accuracy and precisionrelated measurement errors were correctly addressed during the study.
While conducting MSA studies, be aware of their practical challenges and how to remove them so as to avoid measurement errors.


Comments
Hello,
Regarding the Multiple Machines in Manufacturing MSA with different size of grinding wheels. What we don’t know is why the measurement study was being performed. Normally the purpose of an MSA is to ensure the product meets an external/internal requirement for a customer (in spec/out of spec, in control, out of control, etc..) In this case, it almost appears as if the purpose is to tell the difference between a 5mm wheel and a 200 mm wheel. The Six Sigma expert did well to help her understand that the samples needed to be in smaller buckets. However, to lump the grinding wheels into buckets with ranges of 70mm each is still a non starter. A metric stanley tape measure will provide all the discrimination needed to tell the difference between a 5mm, 10mm, 15mm, etc., up to the ten samples in the study. I have normally seen these type of MSA’s when a Green Belt/Black Belt was trying to check off a box as part of their certification.
Your continuous example may be misleading. If you pick 3 different “SKU’s” and test the measurement system across such a broad range, the % error of the MSA relative to the process will be mistakenly thought to be small. You want to evaluate a measurement system for a product line and compare how the measurement is for variation compared to the process variation and specs for that 1 product.
There’s nothing wrong with stating you’d want to check the linearity across the entire range but you’d get mislead on the MSA variation with my understanding of what was presented. If there’s a huge difference in product characteristics on the same measurement device, one could easily say you would need to do an MSA on the various points across the spectrum (e.g. the 5 – 200 mm thickness is too large of product variation–I’m assuming not one product spec is 5200 but a much smaller range).
Good topic.
I like the article. Here are some suggestions to enhance its understandability.
1. You state at the beginning (and provide a formula) “Mathematically, total variance is equivalent to the sum of true variance and the measurement system error.” You should explain that since we are measuring several parts, that the equation becomes variance(total) = variance(parts) + variance(measurement error). That is, the “true variance” in MSA is the variance of the parts. If you only measure one part that does not change during the measurement period, then there is only measurement error if there is any variance.
2. You state that one criterion for a good measurement system is that “the number of distinct categories has to be greater than four,” when doing continuous MSA. However, your solution for your example of multiple machines with “sizes varying from 5 mm to 200 mm in thickness” is not appropriate for this criterion. (Note you also mention these thicknesses are “categorized into large, medium and small thickness wheels” suggesting that this should be a discrete MSA and not continuous. Clarify this for the readers.)
3. You should never knowingly use parts of multiple sizes (even if the tolerances are the same) to determine the number of distinct categories. The formula is: number of distinct categories = √2σ_parts /σ_measurement. Thus, the greater the variation of the parts the more likely this quantity will be four or more. By merely choosing parts that are very different (as in your example where they range from 5 to 200 mm), you will get more than four. Yet, you have no idea what the smallest difference your measurement system can detect. Or worse, you believe it is acceptable for distinguishing parts you need to recognize as different when it isn’t capable of that. Your recommendation of checking for linearity is good but not applicable to determining number of distinct categories, or, more informatively, the smallest difference your measurement system can detect. It is a myth that
4. Fortunately, there is a solution and it doesn’t require measuring multiple parts. You select the smallest difference Δ you want to recognize with your measurement system. Then determine the standard error of your measurement system by measuring only one part (two if you need to check that the variance is constant across the linear range) multiple times. Then if σ_measurement ≤ Δ√2, you can distinguish between parts that differ by Δ or more. I have explained this in one of books and some myths surrounding measurement system analyses.
5. You need to separate the criteria for evaluating a measurement system into two areas: a) to distinguish between good parts and bad parts relative to specifications and b) to distinguish between parts, regardless of whether they meet specs. Percent of tolerance address the first but not the second while number of distinct categories addresses the second but not the first.
Hi Rohin,
I just loved this article! As a former high school teacher, I know well the study mistakes students make – esp. studying what you already know.
Every time I mentioned this mistake to a student and/or parent it was like an epiphany. Oh, and telling students about this study mistake needs to be done individually. Students who make this mistake don’t hear the suggestion when told to the entire class.
I always suggested that students list what they know (yes, handwrite the list!), put a check by it (give credit to yourself which gives confidence), and then move on!