iSixSigma

Non-Normal data and Process Capability

Six Sigma – iSixSigma Forums Old Forums General Non-Normal data and Process Capability

Viewing 27 posts - 1 through 27 (of 27 total)
  • Author
    Posts
  • #49933

    jrajkowski
    Participant

    I have several data sets obtained for destructive testing for a process qualification.  These qualifications are customer driven and it was demanded that we use 60 for our sample size for variable data.  I tried to use MiniTAB to fit a distribution to the sample data so Capability Analysis can be performed.  No distributions tested show a p-value > 0.01.  Cpk or Ppk is what the customer wants to use as acceptance criteria for their qualification.  What can i do?

    0
    #171420

    Robert Butler
    Participant

      For non-normal take your data and plot it on normal probability paper and identify the .135 and 99.865 percentile values (Z = +-3).  The difference between these two values is the span for producing the middle 99.73% of the process output.  This is the equivalent 6 sigma spread. Use this value for computing the equivalent process capability.
      For more details  try Measuring Process Capability by Bothe.  Chapter 8 is titled “Measuring Capability for Non-Normal Variable Data.

    0
    #171421

    melvin
    Participant

    Since you are using Mintab, did you also try Johnson and Box-Cox transformations?  Nothing against Mssrs Butler and Bothe, but the location of 6s under the curve is not far off from just using Minitab’s Normal Capability Analysis …  Not horrible, but if I was your customer, I wouldn’t accept it…

    0
    #171423

    Tony Bo
    Member

    Why dont you transform your data…?

    0
    #171425

    Severino
    Participant

    Throw your data into a histogram and a normal probability plot.  If you see that the data for the most part follows the normal distribution, but you have some issues in the tails you probably have outliers in your data.  You should look to see if there is any special causes that could’ve created those outliers, correct them and reperform your study.
    Alternatively, if you find your data seems to follow a pretty smooth distribution, then there should be some transformation or alternative distribution that suits your data.  Sometimes correcting these issues is as simple as taking the natural log or square root of your data.
    Finally, if you are not assured that your process is in control there is no point in doing a capability study.  Your values will be worthless.  Put your data in a control chart and see if everything falls within control limits.  Take a look at your measurement system and ensure it has a good R&R (might be difficult since you are doing a destructive test).  Correct any issues you find, then try your study again.

    0
    #171444

    jrajkowski
    Participant

    I tried to perform Johnson and Box-Cox Transformation but MiniTAB showed that even when the data is transformed the p-value of the transformed data is too low.  The transformations I tried did not result in normal data.

    0
    #171446

    DaveS
    Participant

    jrajkowski,
    This is not nearly as exotic as transforms or even fitting a distribution and you won’t get a p value, but you have given no indication of what the data looks like.
    Plot the stuff and let us know.
    If it is multimodal, you will never get there. Correct whatever has made it so.
    Plot the data.
    If there are large areas without any values and/or all the values are stacked at a few points(chunky data), you will never get there.
    Plot the data.
    Or maybe it is uniform, i.e. all values have equal numbers of observations. You won’t get there.
    Plot the data!
    If it is only 60 values you should be able to print the data for all to see.

    0
    #171447

    benjammin0341
    Participant

    Rather than tranforming the data, as data transformation has both benefits and consequences, I would first recommend identifying what probability distribution best fits the data.  You can do this in Minitab with Reliability and Survival. Look for the best fit. Once you have identified the best fit, you can proceed with non-normal capability analysis. The default non-normal capability analysis distribution is Weibull, but if there is a significant difference in fit between the Weibull  and whatever you have identifed as best fitting the data, you can change it here. Also, you can utilize your multiple subgroups as well, or lump the data together in one large sample with non normal capability analysis. My recommendation would be to utilize your subgroups as this is how you have collected your data, using non-normal capability analysis.

    0
    #171458

    jrajkowski
    Participant

    I have plotted the data and over several different lot samples I see some of the things you described.  There are a few outliers, some of the data is chunky, and some appears to be bimodal.  This is most likely an issue with our test method.
    Thanks all for your advice!

    0
    #171513

    Forrest W. Breyfogle III
    Member

    The point that Jsev607 was leading to is very important: “if you are not assured that your process is in control there is no point in doing a capability study.  Your values will be worthless.  Put your data in a control chart and see if everything falls within control limits.”  — also perhaps something shifted over time and now you are calculating process capability from two different processes.
    The data needs to be also randomly collected over a longer period of time for you to really know if your process will be stable over the long haul and the sample represents something the customer will later experience relative to process capability indices (as opposed to the first parts of a production process).
    Also, if you for some reason do not have the production sequencing of your parts for input into Minitab, a Cp and Cpk calculation is meaningless.  You will have to use Pp and Ppk.  People often do not appreciate the importance of this point; hence, to prove this point, generate in Minitab a random set of data. Calculate the Cp and Cpk value.  Rank this same data and recalculate Cp and Cpk.  Notice how the Cp and Cpk value improved? 
    I did this exercise for a random set of data that had a mean of 0 and a standard deviation of 1 with a specification limit of -1 and 1.  The first time the Cp value was 0.39, while the second time the Cp was 2.38.
    Forrest Breyfogle

    0
    #171520

    Mikel
    Member

    Forrest,
    Your example is intentionally misleading. Shame on you, you should know better.
    Will ordering the data make the subgroup range or moving range artificially small? Oh my god yes.
    Will anyone every do that? No.
    What’s your point?

    0
    #171521

    Forrest W. Breyfogle III
    Member

    My mistake I did not know that everybody in this forum was aware that the standard deviation term in the Cp and Cpk calculation for a column of data in Minitab is calculated from MR between the adjacent data rows.*  
    Hence, if someone had 60 measurements that were not collected sequentially out of the production facility and inputted in this production sequence, then they would probably be entering the data in an arbitrary fashion.  My point was that if they entered the same data in a different order, they would most likely get a different Cp and Cpk value (for the same set of 60 samples).
    *In conversations that I have had with most practitioners they are not aware of this calculation procedure for Cp and Cpk.
    Forrest Breyfogle

    0
    #171522

    Mikel
    Member

    Wow, maybe we should teach them. These are not difficult concepts and Minitab’s help menus and Stat guides explain it clearly.
    The SPC book from AIAG also explains this clearly with equations given.
    Do we have a bad metric or sloppy teaching and lazy practicioners?

    0
    #171523

    Chris Seider
    Participant

    Impractically inexperienced MBB’s or the deployment models of emphasizing lean instead and minimizing the depth of statistical analysis yet still all calling it lean six sigma.
     

    0
    #171546

    melvin
    Participant

    I’m with you.  In my experience, most users are NOT aware of Minitab’s stdev estimation using MR.  Most, however, run across it when they check the math on the UCL and LSL in its control charts…
    FYI, if you really are the Forrest Breyfogle of “Implementing Six Sigma”, my hat is off to you!  I still recommend (for 9 years now) that text to Six Sigma students who want a reference on the “deep science” topics…

    0
    #171557

    Forrest W. Breyfogle III
    Member

    Bob,
    Glad you like my book, “Implementing Six Sigma” and are suggesting it to others.
    Forrest Breyfogle

    0
    #171558

    Bobrandon
    Participant

    You are welcome. I enjoy promoting products here more than you.

    0
    #171559

    Heebeegeebee BB
    Participant

    Exactly spot on!
    I agree 100%

    0
    #171563

    Engine Boy
    Participant

    I believe that many experts have learned learned a lot from your book.
    It is for Six Sigma as Juran’s book for Quality.
    Just my opinion  

    0
    #171564

    Bobrandon
    Participant

    I think Marlon or Dog Sxxt suited you better.

    0
    #171565

    Engine Boy
    Participant

    A typical silly answer
    Who are those guys?
    What is wrong with them?
    I suggest that “Silly Guy” should suit you as a new screen name 

    0
    #171567

    frebo3
    Participant

    As a matter of information, anyone who has found Forrest Breyfogle’s earlier work useful might also be interested in his latest books on “Integrated Enterprise Excellence” which build well beyond the content of “Implementing Six Sigma”.

    0
    #171582

    Forrest W. Breyfogle III
    Member

    Wow, to be mentioned in the same sentence as Juran, what an honor! Thank you!
     
    I think it could be beneficial to some if I elaborate because Bob’s earlier comment “I’m with you.  In my experience, most users are NOT aware of Minitab’s stdev estimation using MR (for Cp and Cpk calculations from an individuals chart input format)”. 
     
    I would like to make the following point since it is my experience that people often do not appreciate the impact subgrouping frequency can have on both how an individuals control chart looks relative to whether the process appears in control or not — and calculated Cp and Cpk values.
     
    I will talk about a manufacturing situation but the same applies to transactional processes as well; e.g., hold time in a call center. 
     
    Consider that a process makes widgets and a key output response for this widget varies depending upon the current state of a lot of things; e.g., raw material lot-to-lot which changes daily (and has a large impact on the process response — which the company may not know about), people, time of day, shifts, ambient temperature.
     
    Let’s consider Practitioner 1 who chose to have an hourly subgrouping when creating an individuals control chart — because they wanted to react to out of control conditions.  They will then calculate process capability indices Cp and Cpk — the customer requires that they report this information to them.
     
    For this hourly sampling plan, moving range (MR) between adjacent samples will be relative small relative to long-term process variability since people do not change during most hours, the same batch of material is run throughout the day, temperature does not change much in one hour, etc.  If someone were to only conduct a Cp and Cpk analysis (and report this to their customer), the numbers could look quite good since the standard deviation calculation would appear low relative to a long-term process variability assessment (because the MR calculation was low and is in the denominator of the equation). If the practitioner were to run an individuals control chart (like they should), they would probably see a lot of out-of-control signals that they should be reacting to (because the upper and lower control limits of the control chart are also a function of the small-calculated MR value).  
     
    Consider now Practitioner 2 who chooses to have a daily subgrouping.  For this practitioner, the MR value will appear much larger since raw material-lot-to-lot and the other variables will occur between subgroupings.  For this practitioner the customer reported Cp and Cpk values will not appear near as good as for Practitioner 1 because the MR value for this subgrouping will appear larger.  In addition, an individuals control chart will not have the many out of control signals as Practitioner 1 did because the MR value included the variability from the other factors; i.e., time of day, raw material lot-to-lot, etc.
     
    Practitioner 1 and Practitioner 2 can get a VERY different picture of the same process relative to Cp and Cpk, along with an assessment of the process in-control state.
     
    The question is: which is the best approach?  This depends upon how you view the world.  To assess how you view the world, ask yourself the questions: (1) Do you believe that a control charts and process capability statements should give signals and be consistent to how you “view the world”?  If the answer is yes, the next question is: (2) Do you believe that typical variations from raw material lots, normal time of day variations and the others mentioned above should be considered as a source to common or special cause variability?
     
    If you vote common for question #2, then you and I have the same belief system.  I use the term 30,000-foot-level to describe this control charting and process capability/performance assessment described as the Practitioner 2 situation (although I suggest using only long-term PPM rates rather than Cp, Cpk, Pp, and Ppk  process capability indices, because these indices are hard to understand — if a customer asks for them give these to them — but internally I suggest using long-term PPM)
     
    Forrest Breyfogle

    0
    #171604

    Severino
    Participant

    If you are doing a “daily subgrouping” why would you use a MR chart?

    0
    #171606

    Forrest W. Breyfogle III
    Member

    You asked: “If you are doing a “daily subgrouping” why would you use a MR chart?”
     
    Guess I was not clear or I don’t understand your question.  I will try to clarify.
     
    I prefer to focus on using an individuals control chart; however, one could use an XmR chart. 
     
    I did make reference that the individuals control chart’s control limits are a function of the MR between adjacent subgroups; i.e., x-bar+/-2.66(MR-bar).  If there is a daily subgrouping (as opposed to hourly), the MR would be larger causing the control limits to be wider — for the previous described illustration.
     
    With 30,000-foot-level control charting you first need to have infrequent sub-grouping and sampling so that the normal input process variability occurs between subgroups. This procedure would be appropriate for those who have the belief system that they don’t want to react to the output changes that are caused by the normal-variability of these x-process inputs, as though they were causing a special cause event.
     
    With 30,000-foot-level reporting we would have an individuals control chart as described above paired with a process capability statement(s) for stages of stability (i.e., in control regions).  If a process is in control, I prefer to say that the process is “predictable” (i.e., easier to understand). 
     
    The next obvious question is what is predicted.  If the data are continuous data, one could then do a probability plot of the raw data where the specification limits on the probability plot could be used to determine the percentage non-conformance.  Note, for the latest region of stability we can assume that the data from this region is a random sample of the future (assuming no process changes occur in the future).
     
    I prefer to have both the control chart (left side) and probability plot (right side) on one page with a box under the two graphs stating:  Process is predictable with ____ non-conformance (easier to understand than Cp, Cpk, Pp, and Ppk). 
     
    This is a format that I think most people can easily understand.  You could also make a statement about cost implications from the non-conformance rate.
     
    Should also note — if data are not normally distributed, appropriate transformations can be appropriate.  
     
    Hope this helps.
     
    Forrest Breyfogle
     

    0
    #171734

    Prabhakar.G.
    Participant

    Dear Rajkowski,
    I can understand your problem in fitting the data. I feel that you can take the odd men out and fit to normal distribution and qualify for calculating Ppk and Cpk. Many a times, this kind of random distributions happen in destructive testing results and we can always take a trade off in eliminating the abnormal readings out of the population array.
    Cheers
     
    Prabhakar.G.
    Manager-Quality
    Ashokleyland, Unit-2
    Hosur.  

    0
    #171740

    Severino
    Participant

    You can’t justify removing data just based on the fact that the OP is performing destructive testing.  The “trade-off” is that the Cpk and Ppk estimates will be completely inaccurate.  If there is a problem with the destructive measurement system then it needs to be improved and the study repeated. 

    0
Viewing 27 posts - 1 through 27 (of 27 total)

The forum ‘General’ is closed to new topics and replies.