Problem with Normality Tests for Cpk/Ppk and Solution
Six Sigma – iSixSigma › Forums › General Forums › Tools & Templates › Problem with Normality Tests for Cpk/Ppk and Solution
 This topic has 2 replies, 3 voices, and was last updated 2 years, 5 months ago by David007.

AuthorPosts

February 22, 2020 at 1:04 pm #246296
PharmaUser1Participant@PharmaUser1 Include @PharmaUser1 in your post and this person will
be notified via email.Hello community,
in real world situations there are often outliers or rounded data which cause normality tests to fail even if the histogram looks sufficiently normal. In addition if the sample size is rather big there is the tendency that normality tests always fail.
Questions:
 Do you know literature to cite that normality tests are exceedingly pessimistic for high sample sizes?
 What are the alternatives? For example you could check data for a long interval in the past to provide evidence that the histogram has always looked normal.
 This topic was modified 2 years, 5 months ago by PharmaUser1.
0February 22, 2020 at 3:44 pm #246301
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.As you noted – given a big enough sample (or a small one for that matter), even from a package that generates random numbers based on an underlying normal distribution, there is an excellent chance you will fail one or more of the statistical tests… and that’s the problem – the tests are extremely sensitive to any deviation from perfect normality (too pointed a peak, too heavy tails, a couple of data points just a little way away from 3 std, etc.) which is why you should always plot your data on a normal probability plot (histograms are OK but they depend on binning and can be easily fooled) and look at how it behaves relative the perfect reference line. Once you have the plot you should do a visual inspection of the data using what is often referred to as “the fat pencil” test – if you can cover the vast majority of the points with a fat pencil placed on top of the reference line (and, yes, it is a judgement call) it is reasonable to assume the data is acceptably normal and can be used for calculating things like the Cpk.
I would recommend you calibrate your eyeballs by plotting different sized samples from a random number generator with an underlying distribution on normal probability paper to get some idea of just how odd this kind of data can look (you should also run a suite of tests on the generated data to see how often the data fails one or more of them).
I can’t provide a citation for the hypersensitivity of the various tests but to get some idea of just how odd samples from a normal population can look I’d recommend borrowing a copy of Fitting Equations to Data – Daniel and Wood (mine is the 2nd edition) and look at Appendix 3A – Cumulative Distribution Plots of Random Normal Deviates.
I would also recommend you look at normal probability plots of data from distributions such as the bimodal, exponential, log normal, and any other underlying distribution that might be of interest so that you will have a visual understanding of what normal plots of data from these distributions look like.
0February 26, 2020 at 10:40 pm #246379
David007Participant@David007 Include @David007 in your post and this person will
be notified via email.For the rounded data problem, an omnibus skewness kurtosis test such as Doornik Hansen (or Jarque Bera) is useful.
0 
AuthorPosts
You must be logged in to reply to this topic.