iSixSigma

Non-normal data

Viewing 30 posts - 1 through 30 (of 30 total)
  • Author
    Posts
  • #53814

    Buckle
    Participant

    Afternoon all,

    Looking for some help reference non-normal data. I have some data on prior to a new machine install and after and want to see if the process is more capable now than it was previously, and not sure which capability analysis to run, so I ran them all. I ran a non-normal capability analysis in minitab which gave me one set of outputs and then also ran it as a box cox transformation. I found that in box cox the resulting PPK values were massively different than in the weibull. In fact it showed the opposite (ie weibull showed that one process was better, and the box cox showed the other way around)

    Could anybody help me please?

    Thanks
    Mike

    0
    #191515

    MBBinWI
    Participant

    Mike: Understand that when you transform data, you must also transform the specification limits used to evaluate the transformed data.
    If you use a Box-Cox transform in Minitab, you should get an indicator as to the similar distribution being transformed to. Weibull is a very flexible distribution and can take on characteristics of many other distributions.

    0
    #191529

    Kharoo
    Participant

    If testing the impact, You could use nonparametric test – Mann-Whitney.
    note should be used if two samples (pre and post) follow same distribution and same variance.

    Or else if have to check capability , go for multiple variables (non parametric test)
    and will take care of spec transformation also.

    when using non parametric capability , make sure you select the right distribution and could be easily identified by running >Stat>Quality tool>Individual distribution identification.

    Hope this helps.

    Ashky

    0
    #191533

    Buckle
    Participant

    Thanks,

    Im still a littel confused as to the process i should follow for non normal data capability analysis.

    Could anyone talk me through the steps i should take to understand which method to use, a decision tree anyone has?

    Im very new to non-normal data.

    Thanks for help in advance
    Mike

    0
    #191534

    Mikel
    Member

    Mike,

    This ain’t rocket science. Look at histograms of the untransformed data and you will know which of the two is the correct analysis.

    When you understand that, you can figure out what was wrong with your method. I believe MBBinWI is leading you in the right direction, your specs probably were not transformed when you did the Box-Cox.

    0
    #191535

    Robert Butler
    Participant

    Assuming you have checked to make sure that the non-normality isn’t due to something like bimodality, sample truncation, or a few extreme data points (plotting and looking at the data using histograms, box-plots, and normal probability plots are the usual things one does to test for these things) and that you really are looking at data whose usual pattern is non-normal (any kind of a measurement where there is a natural lower or upper bound and you are working close to that bound for example) then the simplest thing to do is plot the data on a normal probability plot, print out the plot, and use a simple manual curve fit of the data to identify where the extrapolated fitted curve crosses the .135 and 99.865 percentiles (you can use the calibrated eyeball or use a plain old French Curve – most graphic arts supply houses have these things).

    Subtracting the two values you get from your plotting efforts from one another will provide an estimate of the 6 sigma spread and if you take the difference between your lower and upper spec limits and divide the difference by the estimated 6 sigma spread will give you an estimate of the capability.

    The above should provide you with a reasonable answer to your question. The step-by-step details as well as the justification for this procedure can be found in Bothe’s book Measuring Process Capability – Chapter 8 – Measuring Capability for Non-Normal Variable Data. The book is very readable and, if you are going to have to deal with non-normal capability issues I would strongly recommend you either purchase a copy of the book or get a copy through inter-library loan and commit Chapter 8 to memory.

    0
    #191536

    Kharoo
    Participant

    Minitab

    Step 1 : Identifying distribution : It is essential to choose the correct distribution when conducting a capability analysis. You can use individual distribution identification to select the distribution that best fits your data prior to conducting a capability analysis.

    Goto Stat >Quality tool > select individual distribution analysis

    Input data and mentioned sub-group size. If sub-groups not formed, mentioned it as 1 and click OK

    Step 2 : Check Minitab Result in “Session”. You will find several distribution. Check for P-Value. Identify distribution with the largest p-value (& should be more than 0.05)

    Step 3.1 : (1) If Box-cox value more than 0.05, Goto Stat >Quality tool > Capability analysis > select Normal.
    Step 3.1 : (2) Click Box-cox and tick mark Box cox power transformation (W=Y…….)

    ELSE

    Step 3.2 : Goto Stat >Quality tool > Capability analysis > select Nonnormal.
    Input data and Select the distribution which fits your data. You will find this option in same screen.

    NOTE : IF Johnson transformation P-value is more than 0.05, instead of selecting the distribution simply click on Johson transformation.
    ELSE Select a distribution with the largest p-value. (as checked in Step 2)

    Step 4 : Input your lower and upper specs and click on OK

    The Mintab results will transform the data and Specs also.

    Ashky

    0
    #191537

    Buckle
    Participant

    Thanks guys,

    this is really helpfull. I have just ran the analysis as suggested but i still dont get any decent P Value. The weibull comes out at 0.10 even though the box cox looks better.

    I have ran this analysis on a few data sets and the weibull always seems to be 0.10?

    Any more help is massively appreciated.

    Thanks
    Mike

    0
    #191538

    Kharoo
    Participant

    HI Mike,

    Could you run “individual distribution analysis” test and share the Minitab session P-value for all distribution.

    Ashky

    0
    #191539

    Buckle
    Participant

    This is the analysis from the session window

    Goodness of Fit Test

    Distribution AD P
    Normal 35.628 <0.005
    Exponential 13.129 <0.003
    Weibull 12.689 <0.010
    Box-Cox Transformation 2.116 <0.005

    ML Estimates of Distribution Parameters

    Distribution Location Shape Scale Threshold
    Normal* 0.19160 0.40180
    Exponential 0.19160
    Weibull 1.02389 0.19429
    Box-Cox Transformation* 2.87034 0.81335

    * Scale: Adjusted ML estimate

    0
    #191540

    Kharoo
    Participant

    Mike,

    Re-run the test by slecting button “Use all distributions and Transformation”

    and share the results.

    Ashky

    0
    #191541

    Buckle
    Participant

    Distribution ID Plot for Oxygen %

    Descriptive Statistics

    N N* Mean StDev Median Minimum Maximum Skewness Kurtosis
    175 0 0.1916 0.401803 0.12 0.05 5.2 11.3099 140.627

    Box-Cox transformation: Lambda = -0.5

    Goodness of Fit Test

    Distribution AD P LRT P
    Normal 35.628 <0.005
    Box-Cox Transformation 2.116 <0.005
    Lognormal 4.235 <0.005
    3-Parameter Lognormal 1.082 * 0.000
    Exponential 13.129 <0.003
    2-Parameter Exponential 7.156 <0.010 0.000
    Weibull 12.689 <0.010
    3-Parameter Weibull 4.012 <0.005 0.000
    Smallest Extreme Value 55.646 <0.010
    Largest Extreme Value 9.909 <0.010
    Gamma 9.941 <0.005
    3-Parameter Gamma 5.086 * 0.000
    Logistic 11.900 <0.005
    Loglogistic 3.613 <0.005
    3-Parameter Loglogistic 1.198 * 0.000

    ML Estimates of Distribution Parameters

    Distribution Location Shape Scale Threshold
    Normal* 0.19160 0.40180
    Box-Cox Transformation* 2.87034 0.81335
    Lognormal* -2.01040 0.67541
    3-Parameter Lognormal -2.64071 1.07838 0.04715
    Exponential 0.19160
    2-Parameter Exponential 0.14241 0.04919
    Weibull 1.02389 0.19429
    3-Parameter Weibull 0.79877 0.11871 0.04950
    Smallest Extreme Value 0.51875 1.23950
    Largest Extreme Value 0.12033 0.09325
    Gamma 1.54195 0.12426
    3-Parameter Gamma 0.78275 0.18154 0.04950
    Logistic 0.14578 0.08252
    Loglogistic -2.08316 0.36781
    3-Parameter Loglogistic -2.70149 0.63939 0.04893

    * Scale: Adjusted ML estimate

    0
    #191542

    Robert Butler
    Participant

    Any chance you could just post the numbers you are using – no units necessary just the data?

    0
    #191543

    Buckle
    Participant

    Heres the data set im using…. thanks

    34.7956
    26.8407
    27.5982
    28.2627
    28.0241
    28.5170
    26.7090
    27.6946
    26.5507
    26.6109
    27.1106
    27.1987
    25.5562
    26.9535
    25.9623
    25.8414
    26.3387
    27.1976
    25.7848
    25.6298
    25.2313
    26.0340
    26.0751
    26.1579
    28.9148
    29.6561
    29.6696
    29.4520
    29.8276
    29.8614
    27.4307
    28.0247
    27.9960
    27.9969
    27.9465
    28.0591
    26.0738
    26.9244
    27.3275
    27.7118
    27.9261
    27.9071
    27.4087
    27.8203
    28.1117
    28.2959
    28.2170
    28.2986
    27.6946
    28.0826
    28.2375
    28.2478
    28.3426
    28.6024
    27.2496
    27.7920
    28.0731
    28.4528
    28.5585
    28.7453
    27.0332
    27.5707
    27.8537
    28.1381
    28.5959
    28.5170
    26.7016
    27.0112
    27.1330
    27.1691
    27.3697
    27.5656
    32.6204
    29.2555
    29.8183
    29.3440
    29.8260
    30.7933
    29.2004
    27.9739
    27.8901
    28.1965
    27.9616
    28.3761
    25.6178
    25.9589
    26.2786
    26.6628
    26.7916
    27.0770
    25.9280
    26.3653
    26.3974
    26.5317
    26.8214
    27.1037
    26.9260
    26.9369
    26.9410
    27.4516
    27.4663
    27.6769
    26.6496
    26.9369
    27.0602
    27.3973
    27.7303
    27.9285
    26.3913
    26.6135
    26.9135
    27.1531
    27.2643
    27.5955
    28.1394
    28.2081
    29.1518
    28.6612
    28.4477
    28.5848
    25.0587
    26.2044
    25.4356
    25.1662
    25.1546
    25.8400
    26.1900
    26.0915
    26.5535
    26.9541
    26.9261
    27.0047
    26.0933
    26.7576
    27.3581
    27.8301
    27.9156
    28.7666
    25.6002
    26.2510
    26.6499
    27.4160
    27.7058
    28.1594
    31.4738
    31.0888
    32.0138
    29.7196
    31.7220
    33.0185
    31.7841
    31.6263
    32.1091
    31.3272
    30.6816
    30.9591
    27.9471
    27.4519
    27.7643
    28.4875
    27.8002
    28.3449
    28.2695
    27.9110
    28.7224
    28.9212
    28.2042
    28.4878
    37.2891
    36.4794
    36.5538
    34.8407
    37.0576
    36.8199
    29.3376
    27.8850
    26.7118
    28.8572
    27.0282
    26.8887
    27.3125
    26.5359
    26.4336
    26.6632
    26.3718
    26.7741
    26.5888
    25.9281
    25.9868
    26.1036
    26.7571
    26.4790
    26.3027
    26.8911
    27.1491
    27.3750
    27.7376
    27.6690
    31.0505
    30.1861
    30.5296
    30.2761
    30.4650
    30.2560
    27.8038
    27.1651
    27.3153
    27.7210
    27.7498
    27.7579
    26.7461
    26.6210
    26.8018
    26.8091
    27.3648
    27.1790
    25.5695
    26.2970
    27.1850
    27.2957
    27.1416
    28.3137
    31.8318
    33.8098
    33.8048
    31.7869
    31.3226
    33.3913
    29.0797
    26.2804
    28.1706
    28.2725
    27.7638
    28.1400
    29.1175
    29.6619
    30.6951
    28.7511
    28.9438
    29.3498
    28.5866
    29.7306
    31.2539
    30.0896
    33.6808
    32.8666
    26.1217
    26.6297
    26.7408
    27.0675
    26.7338
    27.1886
    25.6845
    26.0578
    26.4023
    26.5925
    26.9061
    27.1112
    26.8118
    26.8760
    26.7835
    26.7711
    26.7167
    27.4371
    26.7007
    27.7044
    27.0059
    27.2474
    27.4930
    27.0218
    36.8939
    34.4951
    35.0471
    36.2435
    34.1718
    35.0347
    26.3611
    26.6210
    26.9345
    27.1359
    27.4240
    27.6216
    25.4229
    26.2527
    26.3901
    26.7237
    26.7666
    26.9844
    25.6695
    25.9475
    26.2830
    26.5131
    26.8022
    26.8726
    25.6699
    25.3544
    26.2830
    26.1016
    26.0110
    26.7766
    26.0742
    26.4514
    26.7077
    26.9841
    27.0983
    27.3415
    24.4919
    24.2229
    24.3698
    25.2690
    25.2958
    25.5700
    24.2427
    24.8514
    25.2029
    25.5368
    25.0419
    25.5963
    26.5402
    26.3338
    26.3020
    26.6214
    26.9099
    26.3791
    23.6948
    23.7258
    23.7252
    23.2964
    24.1735
    23.8892
    25.5572
    26.1148
    26.5335
    26.8512
    27.1508
    27.2778
    25.5469
    26.3975
    26.2776
    26.1952
    25.6502
    25.6393
    26.2231
    26.1593
    26.1949
    26.7796
    26.7510
    26.7322
    27.3824
    27.4510
    27.8248
    27.9959
    28.0076
    27.4788
    24.7737
    26.0821
    26.6679
    26.9115
    26.8756
    27.0165
    25.1133
    26.0836
    26.2061
    26.7339
    26.9681
    27.0353
    26.8872
    26.6035
    26.9758
    26.7511
    27.3441
    26.9257
    26.6280
    26.5230
    26.4236
    27.0349
    27.0189
    27.2362
    27.6946
    27.0698
    26.8508
    26.7322
    26.6823
    27.0009
    30.8092
    30.1542
    29.6811
    28.8638
    29.6141
    29.6047
    27.3978
    27.2643
    27.7867
    27.4393
    27.8973
    27.6674
    29.5683
    29.2834
    28.9750
    29.3176
    29.3082
    29.5079
    25.6699
    25.9878
    26.3601
    26.6649
    26.9844
    27.1390
    30.9969
    31.2682
    30.3084
    31.6985
    30.3726
    30.7860
    28.4472
    28.2375
    28.0390
    27.8368
    28.2416
    28.6905
    26.6076
    26.5888
    26.6732
    27.0251
    26.9844
    27.0420
    26.1023
    25.7791
    25.8738
    25.7900
    25.8699
    26.5662
    26.5283
    26.8287
    26.8424
    26.4742
    26.7511
    26.8474
    25.2658
    25.7601
    26.4801
    26.3354
    26.2701
    26.3636
    27.1786
    27.7303
    27.0659
    27.1946
    27.2856
    27.5092
    26.8449
    27.3690
    26.4078
    26.7404
    27.0059
    26.8944
    25.3596
    25.8513
    26.1941
    26.4474
    26.8860
    27.1946
    28.1601
    28.0247
    28.2682
    28.1226
    28.1525
    27.8758
    29.3656
    28.4575
    28.7647
    28.9819
    28.8436
    29.0577
    35.2271
    34.3236
    34.6877
    35.5343
    37.1490
    34.6552
    27.1224
    27.7240
    27.7775
    27.6616
    27.5416
    27.4472
    33.7149
    39.5269
    38.6333
    38.7396
    35.8785
    34.0548

    0
    #191544

    Kharoo
    Participant

    Check the minitab results ..

    You could use Johnson transformation as P-value is more than 0.05

    Step 1 :Goto Stat > Quality tools > Capability Analysis > Select Nonnormal…
    And click button “Johnson transformation”

    Step 2 : Input data and specification limit and click OK and you will get the output.

    Have attached Minitab process capability nonnormal test result. Assuming Spec as LSL 26 and USL 28
    Check the attachment.

    Ashky

    Goodness of Fit Test

    Distribution AD P LRT P
    Normal 30.411 <0.005
    Box-Cox Transformation 5.528 <0.005
    Lognormal 23.988 <0.005
    3-Parameter Lognormal 9.423 * 0.000
    Exponential 194.741 <0.003
    2-Parameter Exponential 76.712 <0.010 0.000
    Weibull 50.231 <0.010
    3-Parameter Weibull 21.746 <0.005 0.000
    Smallest Extreme Value 58.701 <0.010
    Largest Extreme Value 8.483 <0.010
    Gamma 26.043 <0.005
    3-Parameter Gamma 13.128 * 0.000
    Logistic 16.000 <0.005
    Loglogistic 12.750 <0.005
    3-Parameter Loglogistic 4.293 * 0.000
    Johnson Transformation 0.347 0.479

    0
    #191545

    Kharoo
    Participant

    due to some tech issues, i am not able to attach the image file.

    Mike, Hope you have the results.

    Ashky

    0
    #191546

    Buckle
    Participant

    Thanks!! A great lesson.

    I didnt see the Johnson data p value, not heard of that one before.

    Thanks again.
    Mike

    0
    #191552

    Buckle
    Participant

    I now have some more data (all part of the smae thing) and the best distribution is the weibull (although only 0.010 p value)

    When i do a capability analysis on this the PPK is great and shows no defects. However i know that there are many defects. I guess i need to transform the LCL and UCL? How can i do this?

    Any help would massively be appreciated.

    Thanks
    Mike

    0
    #191555

    Robert Butler
    Participant

    I’m missing something here. If I understand the comments made by others concerning Minitab output it look like a P< .05 means that the fit is not correct. If that is the case then P = .01 for Weibull would mean this isn't the correct fit.

    I went ahead and plotted the data you provided – a quick eyeball says the curve crosses .135 and 99.865 at 23.6 and 40.2. If you take the difference of these two numbers and divide the tolerance by this difference how close is it to the capability estimate you got when you ran the Johnson transform (by the way – which Johnson family does Minitab use Su, Sl, or Sb)?

    0
    #191556

    Buckle
    Participant

    Hi.

    The p value were looking for in normality checks is P>(greater)0.005 and the weibull is the best fit at 0.010 so is higher than the above but not great. The data set you did the anlysis on which was posted above is different to the analysis im doing now. This has 4000 data points so i cant publish it here. I have tried to attach file….

    This is the output i get in session window from the new data set. None are great but weibull is the best…… Im now stuck as to what to do? Any help is massively appreciated.

    Goodness of Fit Test

    Distribution AD P LRT P
    Normal 2883.764 <0.005
    Box-Cox Transformation 110.589 <0.005
    Lognormal 162.833 <0.005
    3-Parameter Lognormal 148.337 * 0.000
    Exponential 963.921 <0.003
    2-Parameter Exponential 658.669 <0.010 0.000
    Weibull 968.306 <0.010
    3-Parameter Weibull 708.063 <0.005 0.000
    Smallest Extreme Value 4422.141 <0.010
    Largest Extreme Value 652.429 <0.010
    Gamma 765.221 <0.005
    3-Parameter Gamma 623.345 * 0.000
    Logistic 926.114 <0.005
    Loglogistic 57.526 <0.005
    3-Parameter Loglogistic 73.337 * 0.000

    0
    #191557

    Robert Butler
    Participant

    I don’t follow the logic concerning the p-values so I’ll just chalk that up to not knowing how Minitab presents its output. In the meantime – did you try plotting the data on normal probability paper?

    If you look at the curve for the data listed above then a plot of either the raw data or log transformed data gives about the same crossing points for the percentages mentioned before. I’d recommend trying this with data set you listed as well as with the expanded data and see what you see. If the crossings are about the same then the method outlined in Bothe’s book would be the approach I would recommend.

    0
    #191564

    MBBinWI
    Participant

    Robert Butler wrote:

    The above should provide you with a reasonable answer to your question. The step-by-step details as well as the justification for this procedure can be found in Bothe’s book Measuring Process Capability – Chapter 8 – Measuring Capability for Non-Normal Variable Data. The book is very readable and, if you are going to have to deal with non-normal capability issues I would strongly recommend you either purchase a copy of the book or get a copy through inter-library loan and commit Chapter 8 to memory.

    Hey, Robert – excellent reference. And believe it or not, Davis is a neighbor of mine (at least we share the same zip code). Want an autographed copy?

    0
    #191565

    MBBinWI
    Participant

    Robert Butler wrote:

    I don’t follow the logic concerning the p-values so I’ll just chalk that up to not knowing how Minitab presents its output. In the meantime – did you try plotting the data on normal probability paper?

    Wow! It’s the rare day that I can teach (or at least communicate) something to RB. Here you go (direct from Minitab help):

    Anderson-Darling (AD) statistic

    Measures how well the data follow a particular distribution. Smaller Anderson-Darling values indicate that the distribution fits the data better. Use the Anderson-Darling statistic to compare the fit of several distributions to see which one is best or to test whether a sample of data comes from a population with a specified distribution.

    If the p-value (when available) for the Anderson-Darling test is lower than the chosen significance level (usually 0.05 or 0.10), conclude that the data do not follow the specified distribution. Minitab does not always display a p-value for the Anderson-Darling test because it does not mathematically exist for certain cases.

    If you are trying to determine which distribution the data follow and you have multiple Anderson-Darling statistics, compare them. The distribution with the smallest Anderson-Darling statistic has the closest fit to the data. If distributions have similar Anderson-Darling statistics, choose one based on practical knowledge.

    0
    #194652

    Hello. I have similar case. My set of data (85) is arround the same five values. It does not adjust to any distribution. I cannot chenge my lecture instruments and i have to demmosntrate capability for a tolerance of 6 +/-0,5 (obviously this process is capable). What do you recommend me to do?
    6,03
    6,01
    6,02
    6,03
    6,01
    6,05
    6,02
    6,03
    6,03
    6,02
    6,03
    6,01
    6,02
    6,02
    6,02
    6,02
    6,02
    6,01
    6,02
    6,01
    6,04
    6,04
    6,01
    6,03
    6,03
    6,01
    6,02
    6,05
    6,04
    6,04
    6,02
    6,01
    6,01
    6,02
    6,02
    6,02
    6,02
    6,04
    6,02
    6,04
    6,01
    6,03
    6,02
    6,05
    6,02
    6,03
    6,02
    6,05
    6,05
    6,03
    6,03
    6,03
    6,03
    6,05
    6,05
    6,03
    6,02
    6,03
    6,03
    6,03
    6,03
    6,05
    6,03
    6,04
    6,05
    6,06
    6,01
    6,02
    6,02
    6,03
    6,02
    6,04
    6,06
    6,04
    6,03
    6,01
    6,03
    6,04
    6,03
    6,02
    6,02
    6,02

    0
    #194654

    Chris Seider
    Participant

    You could increase precision or just use the % estimated outside spec and convert that to a process capability index. Have you done a precision study at a minimum? Is the device accurate? If it was off by .02, it would have bad process capability.

    0
    #194663

    johnb
    Guest

    Buckle,

    1. Did you plan and oversee the collection effort for both data sets?
    2. How is this capability comparison going to be used (assuming its resolved)?

    0
    #194701

    Joel Smith
    Participant

    @mikebuckle – Assuming the data you posted are in time order, then the reason you cannot find a good distribution fit is that they are not in control. Try making an I-MR Chart of your data and it should be obvious.

    When data are not in control, then your data are not all coming from the same or even similar distributions. Throw a bunch of wildly varying distributions together and you get a mess that does not match a known distribution.

    More importantly, if your data are not in control then doing capability analysis does not have much value to you.

    0
    #194702

    Joel Smith
    Participant

    Adriana – Your data also show a slight shift around point 48 but nothing compared to the other dataset shown.

    In any event, I’m not sure distribution fit is all that important here…you don’t have any data that is even remotely in the ballpark of your specs, and barring a process shift or special cause you will never ever see a part out of spec. Assigning a specific sigma level, Cpk, etc. is a pointless exercise. Make a histogram of the data and plot the specs as reference lines.

    0
    #197716

    Juan Balderrama
    Guest

    Hello,

    Right now I am working on some equipment validation, and when identifying distributions data, shows that 3-parameter weibull is the one with a P value > .05.

    But, need to transform data to fit 3-parameter weibull distribution and how can I transform limits to same dist???

    Need the transform limits as well, please help me…

    Thanks for your help…

    0
    #197718

    Chris Seider
    Participant

    Don’t make yourself sweat so hard. If using Minitab, use capability analysis non-normal. Pick the distribution you want and then it will transform the data into a best fit curve and the original data is still shown.

    0
Viewing 30 posts - 1 through 30 (of 30 total)

You must be logged in to reply to this topic.