iSixSigma

Non-normal Capability or Johnson Transformation

Six Sigma – iSixSigma Forums General Forums Tools & Templates Non-normal Capability or Johnson Transformation

Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
    Posts
  • #55663

    KKiru
    Participant

    Dear All,

    I have a question, this has already been discussed in few forums in Isixsigma, the thing is the conclusion wasn’t clear for me therefore I want to start this discussion. I am working on a project the data is non-normal. I started using non-normal capability analysis I have performed a Wei-bull distribution. I have also tried transforming the data to Normal using Johnson Transformation, the data got successfully transformed the p value was 0.63.

    I also found the difference of process capability which was 0.80 using Wei-bull and 0.99 using Johnson transformation.

    I have also gone through about Johnson transformation and I learned “This transformation is very powerful, and it can be used with data that include zero and negative values, but it is more complicated and it only provides overall capability statistics. Use this when the Box-Cox transformation does not find a suitable transformation”.

    My question is which one should I choose and why? Am I looking for any major impact using either of one. Kindly clarify this one.

    Regards
    KK

    0
    #201054

    Robert Butler
    Participant

    Neither – get a copy of Measuring Process Capability by Bothe – read Chapter 8 – Measuring capability for non-normal variable data. According to other posts to this forum Minitab has the Bothe method as one of their options in the capability calculation section.

    0
    #201069

    KKiru
    Participant

    Dear Robert ,

    I will go through the chapter today in detail. Thank you so much.

    0
    #201097

    Hajo Schmidt
    Participant

    Hi KKiru,

    if you have a sufficient amount of data (appr. > 2000) you can calculate the capability directly from the data without any distribution or transformation using the “Quantile”-methode and calculating the 0,135%- and 99,865%-percentile values in Minitab.
    In case of fever data you have to find an appropriate fit of the data (especially at the “boarders” of the distribution). Use the following Minitab function to find the best fit:
    Stat > Quality Tools > Individual Distribution Identification

    0
    #201098

    Robert Butler
    Participant

    Thanks, @Hajo – that is the part of the Minitab program which corresponds to the method referenced in the book by Bothe. The only thing I find odd is the requirement for >2000 data points. I’ve used this method dozens of times over the years and I’ve never had the option of asking for >2000 data points. The cost per sample for most of the processes I’ve worked on has been such that the sample size has been between 15 and 30 and the results I’ve had using this method have been quite good.

    I just pulled Bothe’s book off of my shelf and did a quick scan of Chapter 8 – I may have missed something but I don’t see anything in the chapter which discusses a required minimum sample size.

    Often what I’ve had to do when working with small samples is print out the line as plotted on normal probability paper and extrapolate to find the .135 and 99.865% points. To do this I use a trusty manual device known as a French Curve and a pencil – old fashioned but it does an admirable job.

    0
    #201099

    MBBinWI
    Participant

    @rbutler – ah, the French curve, like the slide rule, something that anyone under 40 will have to search Bing to find out what it is. ;-) Along with reverse Polish notation calculating.

    0
    #201100

    Hajo Schmidt
    Participant

    @rbutler In fact you won’t find “>2000 data point” in the literature. It is my own advice because otherwise the number of data at the boarders becomes very low (0,00135 * 2000 = 2,7) and the percentile-value will be very sensitive to rare events.
    In fact if you have fewer data points you have to extrapolate either by finding a suitable fit or by the mentioned graphical methode. By the way, in my training I often add the percentile-values 0,135% and 99,865% at the y-axle of the probability plot in Minitab. This makes the percentile-values visible to my colleages.

    0
    #201101

    Robert Butler
    Participant

    @Hajo, thanks for the clarification on the sample size. I agree that with 15-30 samples there will be few data points and that you will have to extrapolate – hence the manual curve fitting using a french curve. As for rare events – I’m not so sure. If you are running a capability analysis one of the assumptions is that the process is in control so I don’t see how such a thing would be much of a problem.

    I do think is you are being overly conservative with respect to the sample size. Indeed, your choice of sample size is such that I really can’t think of anything I’ve ever worked on where that requirement could have been met. What I do know is that the results I got from the analysis of the small samples were more than adequate and provided the information we needed to press on with the work.

    0
    #201104

    Hajo Schmidt
    Participant

    @rbutler, thanks for your statement. Myself I work in the german automotive industry where one’s requires at least 125 data points for the capability analysis. In many cases we have much higher sample sizes.
    I made a simulation with normally distributed random data, sample size 30. See pic enclosed. I wonder that you can accept such a high 95%CI. The VDA norm (German Association of Automotive Industry) allows smaller sample sizes for the short term capability but then the target value has to be higher (i.e. 1,77 for sample size 30)

    0
    #201105

    Robert Butler
    Participant

    @Hajo it’s not a matter of acceptance so much as it is a matter of physically not being able to generate large sample sizes and still remain in business. I’ve worked in specialty chemicals, plastics, aerospace, and medical and in each of these cases the cost of large sample sizes is prohibitive.

    For example, in the chemical industry the samples were those from a 10,000 gallon reactor vessel. While you can take repeated measures on material coming from such a reactor you have to remember that the smallest unit of independence is the individual reactor. Waiting to make a decision until you have chewed up 300,000 gallons of raw material simply isn’t possible.

    It is true there are things you can do with repeated measures and you can also employ time series to test for sample independence within a given reactor but at the end of the day you are still going to have to make do with a small sample of independent events and you will have to use those to generate estimates of the various statistics of the process.

    0
Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic.