NonNormal data and Process Capability
Six Sigma – iSixSigma › Forums › Old Forums › General › NonNormal data and Process Capability
 This topic has 26 replies, 15 voices, and was last updated 14 years, 3 months ago by Severino.

AuthorPosts

April 23, 2008 at 7:47 pm #49933
jrajkowskiParticipant@jrajkowski Include @jrajkowski in your post and this person will
be notified via email.I have several data sets obtained for destructive testing for a process qualification. These qualifications are customer driven and it was demanded that we use 60 for our sample size for variable data. I tried to use MiniTAB to fit a distribution to the sample data so Capability Analysis can be performed. No distributions tested show a pvalue > 0.01. Cpk or Ppk is what the customer wants to use as acceptance criteria for their qualification. What can i do?
0April 23, 2008 at 7:55 pm #171420
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.For nonnormal take your data and plot it on normal probability paper and identify the .135 and 99.865 percentile values (Z = +3). The difference between these two values is the span for producing the middle 99.73% of the process output. This is the equivalent 6 sigma spread. Use this value for computing the equivalent process capability.
For more details try Measuring Process Capability by Bothe. Chapter 8 is titled “Measuring Capability for NonNormal Variable Data.0April 23, 2008 at 8:07 pm #171421Since you are using Mintab, did you also try Johnson and BoxCox transformations? Nothing against Mssrs Butler and Bothe, but the location of 6s under the curve is not far off from just using Minitab’s Normal Capability Analysis … Not horrible, but if I was your customer, I wouldn’t accept it…
0April 23, 2008 at 10:21 pm #171423Why dont you transform your data…?
0April 23, 2008 at 10:40 pm #171425
SeverinoParticipant@Jsev607 Include @Jsev607 in your post and this person will
be notified via email.Throw your data into a histogram and a normal probability plot. If you see that the data for the most part follows the normal distribution, but you have some issues in the tails you probably have outliers in your data. You should look to see if there is any special causes that could’ve created those outliers, correct them and reperform your study.
Alternatively, if you find your data seems to follow a pretty smooth distribution, then there should be some transformation or alternative distribution that suits your data. Sometimes correcting these issues is as simple as taking the natural log or square root of your data.
Finally, if you are not assured that your process is in control there is no point in doing a capability study. Your values will be worthless. Put your data in a control chart and see if everything falls within control limits. Take a look at your measurement system and ensure it has a good R&R (might be difficult since you are doing a destructive test). Correct any issues you find, then try your study again.0April 24, 2008 at 1:48 pm #171444
jrajkowskiParticipant@jrajkowski Include @jrajkowski in your post and this person will
be notified via email.I tried to perform Johnson and BoxCox Transformation but MiniTAB showed that even when the data is transformed the pvalue of the transformed data is too low. The transformations I tried did not result in normal data.
0April 24, 2008 at 2:08 pm #171446jrajkowski,
This is not nearly as exotic as transforms or even fitting a distribution and you won’t get a p value, but you have given no indication of what the data looks like.
Plot the stuff and let us know.
If it is multimodal, you will never get there. Correct whatever has made it so.
Plot the data.
If there are large areas without any values and/or all the values are stacked at a few points(chunky data), you will never get there.
Plot the data.
Or maybe it is uniform, i.e. all values have equal numbers of observations. You won’t get there.
Plot the data!
If it is only 60 values you should be able to print the data for all to see.0April 24, 2008 at 2:38 pm #171447
benjammin0341Participant@benjammin0341 Include @benjammin0341 in your post and this person will
be notified via email.Rather than tranforming the data, as data transformation has both benefits and consequences, I would first recommend identifying what probability distribution best fits the data. You can do this in Minitab with Reliability and Survival. Look for the best fit. Once you have identified the best fit, you can proceed with nonnormal capability analysis. The default nonnormal capability analysis distribution is Weibull, but if there is a significant difference in fit between the Weibull and whatever you have identifed as best fitting the data, you can change it here. Also, you can utilize your multiple subgroups as well, or lump the data together in one large sample with non normal capability analysis. My recommendation would be to utilize your subgroups as this is how you have collected your data, using nonnormal capability analysis.
0April 24, 2008 at 7:16 pm #171458
jrajkowskiParticipant@jrajkowski Include @jrajkowski in your post and this person will
be notified via email.I have plotted the data and over several different lot samples I see some of the things you described. There are a few outliers, some of the data is chunky, and some appears to be bimodal. This is most likely an issue with our test method.
Thanks all for your advice!0April 28, 2008 at 4:09 pm #171513
Forrest W. Breyfogle IIIMember@ForrestBreyfogle Include @ForrestBreyfogle in your post and this person will
be notified via email.The point that Jsev607 was leading to is very important: “if you are not assured that your process is in control there is no point in doing a capability study. Your values will be worthless. Put your data in a control chart and see if everything falls within control limits.” — also perhaps something shifted over time and now you are calculating process capability from two different processes.
The data needs to be also randomly collected over a longer period of time for you to really know if your process will be stable over the long haul and the sample represents something the customer will later experience relative to process capability indices (as opposed to the first parts of a production process).
Also, if you for some reason do not have the production sequencing of your parts for input into Minitab, a Cp and Cpk calculation is meaningless. You will have to use Pp and Ppk. People often do not appreciate the importance of this point; hence, to prove this point, generate in Minitab a random set of data. Calculate the Cp and Cpk value. Rank this same data and recalculate Cp and Cpk. Notice how the Cp and Cpk value improved?
I did this exercise for a random set of data that had a mean of 0 and a standard deviation of 1 with a specification limit of 1 and 1. The first time the Cp value was 0.39, while the second time the Cp was 2.38.
Forrest Breyfogle0April 28, 2008 at 5:53 pm #171520Forrest,
Your example is intentionally misleading. Shame on you, you should know better.
Will ordering the data make the subgroup range or moving range artificially small? Oh my god yes.
Will anyone every do that? No.
What’s your point?0April 28, 2008 at 6:15 pm #171521
Forrest W. Breyfogle IIIMember@ForrestBreyfogle Include @ForrestBreyfogle in your post and this person will
be notified via email.My mistake I did not know that everybody in this forum was aware that the standard deviation term in the Cp and Cpk calculation for a column of data in Minitab is calculated from MR between the adjacent data rows.*
Hence, if someone had 60 measurements that were not collected sequentially out of the production facility and inputted in this production sequence, then they would probably be entering the data in an arbitrary fashion. My point was that if they entered the same data in a different order, they would most likely get a different Cp and Cpk value (for the same set of 60 samples).
*In conversations that I have had with most practitioners they are not aware of this calculation procedure for Cp and Cpk.
Forrest Breyfogle0April 28, 2008 at 6:38 pm #171522Wow, maybe we should teach them. These are not difficult concepts and Minitab’s help menus and Stat guides explain it clearly.
The SPC book from AIAG also explains this clearly with equations given.
Do we have a bad metric or sloppy teaching and lazy practicioners?0April 28, 2008 at 8:07 pm #171523
Chris SeiderParticipant@cseider Include @cseider in your post and this person will
be notified via email.Impractically inexperienced MBB’s or the deployment models of emphasizing lean instead and minimizing the depth of statistical analysis yet still all calling it lean six sigma.
0April 29, 2008 at 4:13 pm #171546I’m with you. In my experience, most users are NOT aware of Minitab’s stdev estimation using MR. Most, however, run across it when they check the math on the UCL and LSL in its control charts…
FYI, if you really are the Forrest Breyfogle of “Implementing Six Sigma”, my hat is off to you! I still recommend (for 9 years now) that text to Six Sigma students who want a reference on the “deep science” topics…0April 29, 2008 at 7:09 pm #171557
Forrest W. Breyfogle IIIMember@ForrestBreyfogle Include @ForrestBreyfogle in your post and this person will
be notified via email.Bob,
Glad you like my book, “Implementing Six Sigma” and are suggesting it to others.
Forrest Breyfogle0April 29, 2008 at 8:09 pm #171558
BobrandonParticipant@Bobrandon Include @Bobrandon in your post and this person will
be notified via email.You are welcome. I enjoy promoting products here more than you.
0April 29, 2008 at 8:28 pm #171559
Heebeegeebee BBParticipant@HeebeegeebeeBB Include @HeebeegeebeeBB in your post and this person will
be notified via email.Exactly spot on!
I agree 100%0April 30, 2008 at 1:34 am #171563
Engine BoyParticipant@EngineBoy Include @EngineBoy in your post and this person will
be notified via email.I believe that many experts have learned learned a lot from your book.
It is for Six Sigma as Juran’s book for Quality.
Just my opinion0April 30, 2008 at 1:37 am #171564
BobrandonParticipant@Bobrandon Include @Bobrandon in your post and this person will
be notified via email.I think Marlon or Dog Sxxt suited you better.
0April 30, 2008 at 2:15 am #171565
Engine BoyParticipant@EngineBoy Include @EngineBoy in your post and this person will
be notified via email.A typical silly answer
Who are those guys?
What is wrong with them?
I suggest that “Silly Guy” should suit you as a new screen name0April 30, 2008 at 4:26 am #171567As a matter of information, anyone who has found Forrest Breyfogle’s earlier work useful might also be interested in his latest books on “Integrated Enterprise Excellence” which build well beyond the content of “Implementing Six Sigma”.
0April 30, 2008 at 1:48 pm #171582
Forrest W. Breyfogle IIIMember@ForrestBreyfogle Include @ForrestBreyfogle in your post and this person will
be notified via email.Wow, to be mentioned in the same sentence as Juran, what an honor! Thank you!
I think it could be beneficial to some if I elaborate because Bob’s earlier comment “I’m with you. In my experience, most users are NOT aware of Minitab’s stdev estimation using MR (for Cp and Cpk calculations from an individuals chart input format)”.
I would like to make the following point since it is my experience that people often do not appreciate the impact subgrouping frequency can have on both how an individuals control chart looks relative to whether the process appears in control or not — and calculated Cp and Cpk values.
I will talk about a manufacturing situation but the same applies to transactional processes as well; e.g., hold time in a call center.
Consider that a process makes widgets and a key output response for this widget varies depending upon the current state of a lot of things; e.g., raw material lottolot which changes daily (and has a large impact on the process response — which the company may not know about), people, time of day, shifts, ambient temperature.
Let’s consider Practitioner 1 who chose to have an hourly subgrouping when creating an individuals control chart — because they wanted to react to out of control conditions. They will then calculate process capability indices Cp and Cpk — the customer requires that they report this information to them.
For this hourly sampling plan, moving range (MR) between adjacent samples will be relative small relative to longterm process variability since people do not change during most hours, the same batch of material is run throughout the day, temperature does not change much in one hour, etc. If someone were to only conduct a Cp and Cpk analysis (and report this to their customer), the numbers could look quite good since the standard deviation calculation would appear low relative to a longterm process variability assessment (because the MR calculation was low and is in the denominator of the equation). If the practitioner were to run an individuals control chart (like they should), they would probably see a lot of outofcontrol signals that they should be reacting to (because the upper and lower control limits of the control chart are also a function of the smallcalculated MR value).
Consider now Practitioner 2 who chooses to have a daily subgrouping. For this practitioner, the MR value will appear much larger since raw materiallottolot and the other variables will occur between subgroupings. For this practitioner the customer reported Cp and Cpk values will not appear near as good as for Practitioner 1 because the MR value for this subgrouping will appear larger. In addition, an individuals control chart will not have the many out of control signals as Practitioner 1 did because the MR value included the variability from the other factors; i.e., time of day, raw material lottolot, etc.
Practitioner 1 and Practitioner 2 can get a VERY different picture of the same process relative to Cp and Cpk, along with an assessment of the process incontrol state.
The question is: which is the best approach? This depends upon how you view the world. To assess how you view the world, ask yourself the questions: (1) Do you believe that a control charts and process capability statements should give signals and be consistent to how you “view the world”? If the answer is yes, the next question is: (2) Do you believe that typical variations from raw material lots, normal time of day variations and the others mentioned above should be considered as a source to common or special cause variability?
If you vote common for question #2, then you and I have the same belief system. I use the term 30,000footlevel to describe this control charting and process capability/performance assessment described as the Practitioner 2 situation (although I suggest using only longterm PPM rates rather than Cp, Cpk, Pp, and Ppk process capability indices, because these indices are hard to understand — if a customer asks for them give these to them — but internally I suggest using longterm PPM)
Forrest Breyfogle0May 1, 2008 at 4:14 am #171604
SeverinoParticipant@Jsev607 Include @Jsev607 in your post and this person will
be notified via email.If you are doing a “daily subgrouping” why would you use a MR chart?
0May 1, 2008 at 5:25 am #171606
Forrest W. Breyfogle IIIMember@ForrestBreyfogle Include @ForrestBreyfogle in your post and this person will
be notified via email.You asked: “If you are doing a “daily subgrouping” why would you use a MR chart?”
Guess I was not clear or I don’t understand your question. I will try to clarify.
I prefer to focus on using an individuals control chart; however, one could use an XmR chart.
I did make reference that the individuals control chart’s control limits are a function of the MR between adjacent subgroups; i.e., xbar+/2.66(MRbar). If there is a daily subgrouping (as opposed to hourly), the MR would be larger causing the control limits to be wider — for the previous described illustration.
With 30,000footlevel control charting you first need to have infrequent subgrouping and sampling so that the normal input process variability occurs between subgroups. This procedure would be appropriate for those who have the belief system that they don’t want to react to the output changes that are caused by the normalvariability of these xprocess inputs, as though they were causing a special cause event.
With 30,000footlevel reporting we would have an individuals control chart as described above paired with a process capability statement(s) for stages of stability (i.e., in control regions). If a process is in control, I prefer to say that the process is “predictable” (i.e., easier to understand).
The next obvious question is what is predicted. If the data are continuous data, one could then do a probability plot of the raw data where the specification limits on the probability plot could be used to determine the percentage nonconformance. Note, for the latest region of stability we can assume that the data from this region is a random sample of the future (assuming no process changes occur in the future).
I prefer to have both the control chart (left side) and probability plot (right side) on one page with a box under the two graphs stating: Process is predictable with ____ nonconformance (easier to understand than Cp, Cpk, Pp, and Ppk).
This is a format that I think most people can easily understand. You could also make a statement about cost implications from the nonconformance rate.
Should also note — if data are not normally distributed, appropriate transformations can be appropriate.
Hope this helps.
Forrest Breyfogle
0May 4, 2008 at 1:13 pm #171734
Prabhakar.G.Participant@Prabhakar.G. Include @Prabhakar.G. in your post and this person will
be notified via email.Dear Rajkowski,
I can understand your problem in fitting the data. I feel that you can take the odd men out and fit to normal distribution and qualify for calculating Ppk and Cpk. Many a times, this kind of random distributions happen in destructive testing results and we can always take a trade off in eliminating the abnormal readings out of the population array.
Cheers
Prabhakar.G.
ManagerQuality
Ashokleyland, Unit2
Hosur.0May 5, 2008 at 10:38 am #171740
SeverinoParticipant@Jsev607 Include @Jsev607 in your post and this person will
be notified via email.You can’t justify removing data just based on the fact that the OP is performing destructive testing. The “tradeoff” is that the Cpk and Ppk estimates will be completely inaccurate. If there is a problem with the destructive measurement system then it needs to be improved and the study repeated.
0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.