# Non-Normal data and Process Capability

Six Sigma – iSixSigma Forums Old Forums General Non-Normal data and Process Capability

Viewing 27 posts - 1 through 27 (of 27 total)
• Author
Posts
• #49933

jrajkowski
Participant

I have several data sets obtained for destructive testing for a process qualification.  These qualifications are customer driven and it was demanded that we use 60 for our sample size for variable data.  I tried to use MiniTAB to fit a distribution to the sample data so Capability Analysis can be performed.  No distributions tested show a p-value > 0.01.  Cpk or Ppk is what the customer wants to use as acceptance criteria for their qualification.  What can i do?

0
#171420

Robert Butler
Participant

For non-normal take your data and plot it on normal probability paper and identify the .135 and 99.865 percentile values (Z = +-3).  The difference between these two values is the span for producing the middle 99.73% of the process output.  This is the equivalent 6 sigma spread. Use this value for computing the equivalent process capability.
For more details  try Measuring Process Capability by Bothe.  Chapter 8 is titled “Measuring Capability for Non-Normal Variable Data.

0
#171421

melvin
Participant

Since you are using Mintab, did you also try Johnson and Box-Cox transformations?  Nothing against Mssrs Butler and Bothe, but the location of 6s under the curve is not far off from just using Minitab’s Normal Capability Analysis …  Not horrible, but if I was your customer, I wouldn’t accept it…

0
#171423

Tony Bo
Member

Why dont you transform your data…?

0
#171425

Severino
Participant

Throw your data into a histogram and a normal probability plot.  If you see that the data for the most part follows the normal distribution, but you have some issues in the tails you probably have outliers in your data.  You should look to see if there is any special causes that could’ve created those outliers, correct them and reperform your study.
Alternatively, if you find your data seems to follow a pretty smooth distribution, then there should be some transformation or alternative distribution that suits your data.  Sometimes correcting these issues is as simple as taking the natural log or square root of your data.
Finally, if you are not assured that your process is in control there is no point in doing a capability study.  Your values will be worthless.  Put your data in a control chart and see if everything falls within control limits.  Take a look at your measurement system and ensure it has a good R&R (might be difficult since you are doing a destructive test).  Correct any issues you find, then try your study again.

0
#171444

jrajkowski
Participant

I tried to perform Johnson and Box-Cox Transformation but MiniTAB showed that even when the data is transformed the p-value of the transformed data is too low.  The transformations I tried did not result in normal data.

0
#171446

DaveS
Participant

jrajkowski,
This is not nearly as exotic as transforms or even fitting a distribution and you won’t get a p value, but you have given no indication of what the data looks like.
Plot the stuff and let us know.
If it is multimodal, you will never get there. Correct whatever has made it so.
Plot the data.
If there are large areas without any values and/or all the values are stacked at a few points(chunky data), you will never get there.
Plot the data.
Or maybe it is uniform, i.e. all values have equal numbers of observations. You won’t get there.
Plot the data!
If it is only 60 values you should be able to print the data for all to see.

0
#171447

benjammin0341
Participant

Rather than tranforming the data, as data transformation has both benefits and consequences, I would first recommend identifying what probability distribution best fits the data.  You can do this in Minitab with Reliability and Survival. Look for the best fit. Once you have identified the best fit, you can proceed with non-normal capability analysis. The default non-normal capability analysis distribution is Weibull, but if there is a significant difference in fit between the Weibull  and whatever you have identifed as best fitting the data, you can change it here. Also, you can utilize your multiple subgroups as well, or lump the data together in one large sample with non normal capability analysis. My recommendation would be to utilize your subgroups as this is how you have collected your data, using non-normal capability analysis.

0
#171458

jrajkowski
Participant

I have plotted the data and over several different lot samples I see some of the things you described.  There are a few outliers, some of the data is chunky, and some appears to be bimodal.  This is most likely an issue with our test method.

0
#171513

Forrest W. Breyfogle III
Member

The point that Jsev607 was leading to is very important: “if you are not assured that your process is in control there is no point in doing a capability study.  Your values will be worthless.  Put your data in a control chart and see if everything falls within control limits.”  — also perhaps something shifted over time and now you are calculating process capability from two different processes.
The data needs to be also randomly collected over a longer period of time for you to really know if your process will be stable over the long haul and the sample represents something the customer will later experience relative to process capability indices (as opposed to the first parts of a production process).
Also, if you for some reason do not have the production sequencing of your parts for input into Minitab, a Cp and Cpk calculation is meaningless.  You will have to use Pp and Ppk.  People often do not appreciate the importance of this point; hence, to prove this point, generate in Minitab a random set of data. Calculate the Cp and Cpk value.  Rank this same data and recalculate Cp and Cpk.  Notice how the Cp and Cpk value improved?
I did this exercise for a random set of data that had a mean of 0 and a standard deviation of 1 with a specification limit of -1 and 1.  The first time the Cp value was 0.39, while the second time the Cp was 2.38.
Forrest Breyfogle

0
#171520

Mikel
Member

Forrest,
Your example is intentionally misleading. Shame on you, you should know better.
Will ordering the data make the subgroup range or moving range artificially small? Oh my god yes.
Will anyone every do that? No.

0
#171521

Forrest W. Breyfogle III
Member

My mistake I did not know that everybody in this forum was aware that the standard deviation term in the Cp and Cpk calculation for a column of data in Minitab is calculated from MR between the adjacent data rows.*
Hence, if someone had 60 measurements that were not collected sequentially out of the production facility and inputted in this production sequence, then they would probably be entering the data in an arbitrary fashion.  My point was that if they entered the same data in a different order, they would most likely get a different Cp and Cpk value (for the same set of 60 samples).
*In conversations that I have had with most practitioners they are not aware of this calculation procedure for Cp and Cpk.
Forrest Breyfogle

0
#171522

Mikel
Member

Wow, maybe we should teach them. These are not difficult concepts and Minitab’s help menus and Stat guides explain it clearly.
The SPC book from AIAG also explains this clearly with equations given.
Do we have a bad metric or sloppy teaching and lazy practicioners?

0
#171523

Chris Seider
Participant

Impractically inexperienced MBB’s or the deployment models of emphasizing lean instead and minimizing the depth of statistical analysis yet still all calling it lean six sigma.

0
#171546

melvin
Participant

I’m with you.  In my experience, most users are NOT aware of Minitab’s stdev estimation using MR.  Most, however, run across it when they check the math on the UCL and LSL in its control charts…
FYI, if you really are the Forrest Breyfogle of “Implementing Six Sigma”, my hat is off to you!  I still recommend (for 9 years now) that text to Six Sigma students who want a reference on the “deep science” topics…

0
#171557

Forrest W. Breyfogle III
Member

Bob,
Glad you like my book, “Implementing Six Sigma” and are suggesting it to others.
Forrest Breyfogle

0
#171558

Bobrandon
Participant

You are welcome. I enjoy promoting products here more than you.

0
#171559

Heebeegeebee BB
Participant

Exactly spot on!
I agree 100%

0
#171563

Engine Boy
Participant

I believe that many experts have learned learned a lot from your book.
It is for Six Sigma as Juran’s book for Quality.
Just my opinion

0
#171564

Bobrandon
Participant

I think Marlon or Dog Sxxt suited you better.

0
#171565

Engine Boy
Participant

Who are those guys?
What is wrong with them?
I suggest that “Silly Guy” should suit you as a new screen name

0
#171567

frebo3
Participant

As a matter of information, anyone who has found Forrest Breyfogle’s earlier work useful might also be interested in his latest books on “Integrated Enterprise Excellence” which build well beyond the content of “Implementing Six Sigma”.

0
#171582

Forrest W. Breyfogle III
Member

Wow, to be mentioned in the same sentence as Juran, what an honor! Thank you!

I think it could be beneficial to some if I elaborate because Bob’s earlier comment “I’m with you.  In my experience, most users are NOT aware of Minitab’s stdev estimation using MR (for Cp and Cpk calculations from an individuals chart input format)”.

I would like to make the following point since it is my experience that people often do not appreciate the impact subgrouping frequency can have on both how an individuals control chart looks relative to whether the process appears in control or not — and calculated Cp and Cpk values.

I will talk about a manufacturing situation but the same applies to transactional processes as well; e.g., hold time in a call center.

Consider that a process makes widgets and a key output response for this widget varies depending upon the current state of a lot of things; e.g., raw material lot-to-lot which changes daily (and has a large impact on the process response — which the company may not know about), people, time of day, shifts, ambient temperature.

Let’s consider Practitioner 1 who chose to have an hourly subgrouping when creating an individuals control chart — because they wanted to react to out of control conditions.  They will then calculate process capability indices Cp and Cpk — the customer requires that they report this information to them.

For this hourly sampling plan, moving range (MR) between adjacent samples will be relative small relative to long-term process variability since people do not change during most hours, the same batch of material is run throughout the day, temperature does not change much in one hour, etc.  If someone were to only conduct a Cp and Cpk analysis (and report this to their customer), the numbers could look quite good since the standard deviation calculation would appear low relative to a long-term process variability assessment (because the MR calculation was low and is in the denominator of the equation). If the practitioner were to run an individuals control chart (like they should), they would probably see a lot of out-of-control signals that they should be reacting to (because the upper and lower control limits of the control chart are also a function of the small-calculated MR value).

Consider now Practitioner 2 who chooses to have a daily subgrouping.  For this practitioner, the MR value will appear much larger since raw material-lot-to-lot and the other variables will occur between subgroupings.  For this practitioner the customer reported Cp and Cpk values will not appear near as good as for Practitioner 1 because the MR value for this subgrouping will appear larger.  In addition, an individuals control chart will not have the many out of control signals as Practitioner 1 did because the MR value included the variability from the other factors; i.e., time of day, raw material lot-to-lot, etc.

Practitioner 1 and Practitioner 2 can get a VERY different picture of the same process relative to Cp and Cpk, along with an assessment of the process in-control state.

The question is: which is the best approach?  This depends upon how you view the world.  To assess how you view the world, ask yourself the questions: (1) Do you believe that a control charts and process capability statements should give signals and be consistent to how you “view the world”?  If the answer is yes, the next question is: (2) Do you believe that typical variations from raw material lots, normal time of day variations and the others mentioned above should be considered as a source to common or special cause variability?

If you vote common for question #2, then you and I have the same belief system.  I use the term 30,000-foot-level to describe this control charting and process capability/performance assessment described as the Practitioner 2 situation (although I suggest using only long-term PPM rates rather than Cp, Cpk, Pp, and Ppk  process capability indices, because these indices are hard to understand — if a customer asks for them give these to them — but internally I suggest using long-term PPM)

Forrest Breyfogle

0
#171604

Severino
Participant

If you are doing a “daily subgrouping” why would you use a MR chart?

0
#171606

Forrest W. Breyfogle III
Member

You asked: “If you are doing a “daily subgrouping” why would you use a MR chart?”

Guess I was not clear or I don’t understand your question.  I will try to clarify.

I prefer to focus on using an individuals control chart; however, one could use an XmR chart.

I did make reference that the individuals control chart’s control limits are a function of the MR between adjacent subgroups; i.e., x-bar+/-2.66(MR-bar).  If there is a daily subgrouping (as opposed to hourly), the MR would be larger causing the control limits to be wider — for the previous described illustration.

With 30,000-foot-level control charting you first need to have infrequent sub-grouping and sampling so that the normal input process variability occurs between subgroups. This procedure would be appropriate for those who have the belief system that they don’t want to react to the output changes that are caused by the normal-variability of these x-process inputs, as though they were causing a special cause event.

With 30,000-foot-level reporting we would have an individuals control chart as described above paired with a process capability statement(s) for stages of stability (i.e., in control regions).  If a process is in control, I prefer to say that the process is “predictable” (i.e., easier to understand).

The next obvious question is what is predicted.  If the data are continuous data, one could then do a probability plot of the raw data where the specification limits on the probability plot could be used to determine the percentage non-conformance.  Note, for the latest region of stability we can assume that the data from this region is a random sample of the future (assuming no process changes occur in the future).

I prefer to have both the control chart (left side) and probability plot (right side) on one page with a box under the two graphs stating:  Process is predictable with ____ non-conformance (easier to understand than Cp, Cpk, Pp, and Ppk).

This is a format that I think most people can easily understand.  You could also make a statement about cost implications from the non-conformance rate.

Should also note — if data are not normally distributed, appropriate transformations can be appropriate.

Hope this helps.

Forrest Breyfogle

0
#171734

Prabhakar.G.
Participant

Dear Rajkowski,
I can understand your problem in fitting the data. I feel that you can take the odd men out and fit to normal distribution and qualify for calculating Ppk and Cpk. Many a times, this kind of random distributions happen in destructive testing results and we can always take a trade off in eliminating the abnormal readings out of the population array.
Cheers

Prabhakar.G.
Manager-Quality
Ashokleyland, Unit-2
Hosur.

0
#171740

Severino
Participant

You can’t justify removing data just based on the fact that the OP is performing destructive testing.  The “trade-off” is that the Cpk and Ppk estimates will be completely inaccurate.  If there is a problem with the destructive measurement system then it needs to be improved and the study repeated.

0
Viewing 27 posts - 1 through 27 (of 27 total)

The forum ‘General’ is closed to new topics and replies.