Non-normal data
Six Sigma – iSixSigma › Forums › General Forums › Methodology › Non-normal data
- This topic has 29 replies, 7 voices, and was last updated 7 years, 6 months ago by
Chris Seider.
-
AuthorPosts
-
May 31, 2011 at 1:43 pm #53814
BuckleParticipant@mikebuckleInclude @mikebuckle in your post and this person will
be notified via email.Afternoon all,
Looking for some help reference non-normal data. I have some data on prior to a new machine install and after and want to see if the process is more capable now than it was previously, and not sure which capability analysis to run, so I ran them all. I ran a non-normal capability analysis in minitab which gave me one set of outputs and then also ran it as a box cox transformation. I found that in box cox the resulting PPK values were massively different than in the weibull. In fact it showed the opposite (ie weibull showed that one process was better, and the box cox showed the other way around)
Could anybody help me please?
Thanks
Mike0June 1, 2011 at 3:38 am #191515
MBBinWIParticipant@MBBinWIInclude @MBBinWI in your post and this person will
be notified via email.Mike: Understand that when you transform data, you must also transform the specification limits used to evaluate the transformed data.
If you use a Box-Cox transform in Minitab, you should get an indicator as to the similar distribution being transformed to. Weibull is a very flexible distribution and can take on characteristics of many other distributions.0June 3, 2011 at 10:05 am #191529
KharooParticipant@ashkharooInclude @ashkharoo in your post and this person will
be notified via email.If testing the impact, You could use nonparametric test – Mann-Whitney.
note should be used if two samples (pre and post) follow same distribution and same variance.Or else if have to check capability , go for multiple variables (non parametric test)
and will take care of spec transformation also.when using non parametric capability , make sure you select the right distribution and could be easily identified by running >Stat>Quality tool>Individual distribution identification.
Hope this helps.
Ashky
0June 3, 2011 at 1:14 pm #191533
BuckleParticipant@mikebuckleInclude @mikebuckle in your post and this person will
be notified via email.Thanks,
Im still a littel confused as to the process i should follow for non normal data capability analysis.
Could anyone talk me through the steps i should take to understand which method to use, a decision tree anyone has?
Im very new to non-normal data.
Thanks for help in advance
Mike0June 3, 2011 at 1:52 pm #191534Mike,
This ain’t rocket science. Look at histograms of the untransformed data and you will know which of the two is the correct analysis.
When you understand that, you can figure out what was wrong with your method. I believe MBBinWI is leading you in the right direction, your specs probably were not transformed when you did the Box-Cox.
0June 3, 2011 at 2:04 pm #191535
Robert ButlerParticipant@rbutlerInclude @rbutler in your post and this person will
be notified via email.Assuming you have checked to make sure that the non-normality isn’t due to something like bimodality, sample truncation, or a few extreme data points (plotting and looking at the data using histograms, box-plots, and normal probability plots are the usual things one does to test for these things) and that you really are looking at data whose usual pattern is non-normal (any kind of a measurement where there is a natural lower or upper bound and you are working close to that bound for example) then the simplest thing to do is plot the data on a normal probability plot, print out the plot, and use a simple manual curve fit of the data to identify where the extrapolated fitted curve crosses the .135 and 99.865 percentiles (you can use the calibrated eyeball or use a plain old French Curve – most graphic arts supply houses have these things).
Subtracting the two values you get from your plotting efforts from one another will provide an estimate of the 6 sigma spread and if you take the difference between your lower and upper spec limits and divide the difference by the estimated 6 sigma spread will give you an estimate of the capability.
The above should provide you with a reasonable answer to your question. The step-by-step details as well as the justification for this procedure can be found in Bothe’s book Measuring Process Capability – Chapter 8 – Measuring Capability for Non-Normal Variable Data. The book is very readable and, if you are going to have to deal with non-normal capability issues I would strongly recommend you either purchase a copy of the book or get a copy through inter-library loan and commit Chapter 8 to memory.
0June 3, 2011 at 2:34 pm #191536
KharooParticipant@ashkharooInclude @ashkharoo in your post and this person will
be notified via email.Minitab
Step 1 : Identifying distribution : It is essential to choose the correct distribution when conducting a capability analysis. You can use individual distribution identification to select the distribution that best fits your data prior to conducting a capability analysis.
Goto Stat >Quality tool > select individual distribution analysis
Input data and mentioned sub-group size. If sub-groups not formed, mentioned it as 1 and click OK
Step 2 : Check Minitab Result in “Session”. You will find several distribution. Check for P-Value. Identify distribution with the largest p-value (& should be more than 0.05)
Step 3.1 : (1) If Box-cox value more than 0.05, Goto Stat >Quality tool > Capability analysis > select Normal.
Step 3.1 : (2) Click Box-cox and tick mark Box cox power transformation (W=Y…….)ELSE
Step 3.2 : Goto Stat >Quality tool > Capability analysis > select Nonnormal.
Input data and Select the distribution which fits your data. You will find this option in same screen.NOTE : IF Johnson transformation P-value is more than 0.05, instead of selecting the distribution simply click on Johson transformation.
ELSE Select a distribution with the largest p-value. (as checked in Step 2)Step 4 : Input your lower and upper specs and click on OK
The Mintab results will transform the data and Specs also.
Ashky
0June 3, 2011 at 3:06 pm #191537
BuckleParticipant@mikebuckleInclude @mikebuckle in your post and this person will
be notified via email.Thanks guys,
this is really helpfull. I have just ran the analysis as suggested but i still dont get any decent P Value. The weibull comes out at 0.10 even though the box cox looks better.
I have ran this analysis on a few data sets and the weibull always seems to be 0.10?
Any more help is massively appreciated.
Thanks
Mike0June 3, 2011 at 3:10 pm #191538
KharooParticipant@ashkharooInclude @ashkharoo in your post and this person will
be notified via email.HI Mike,
Could you run “individual distribution analysis” test and share the Minitab session P-value for all distribution.
Ashky
0June 3, 2011 at 3:12 pm #191539
BuckleParticipant@mikebuckleInclude @mikebuckle in your post and this person will
be notified via email.This is the analysis from the session window
Goodness of Fit Test
Distribution AD P
Normal 35.628 <0.005
Exponential 13.129 <0.003
Weibull 12.689 <0.010
Box-Cox Transformation 2.116 <0.005ML Estimates of Distribution Parameters
Distribution Location Shape Scale Threshold
Normal* 0.19160 0.40180
Exponential 0.19160
Weibull 1.02389 0.19429
Box-Cox Transformation* 2.87034 0.81335* Scale: Adjusted ML estimate
0June 3, 2011 at 3:15 pm #191540
KharooParticipant@ashkharooInclude @ashkharoo in your post and this person will
be notified via email.Mike,
Re-run the test by slecting button “Use all distributions and Transformation”
and share the results.
Ashky
0June 3, 2011 at 3:18 pm #191541
BuckleParticipant@mikebuckleInclude @mikebuckle in your post and this person will
be notified via email.Distribution ID Plot for Oxygen %
Descriptive Statistics
N N* Mean StDev Median Minimum Maximum Skewness Kurtosis
175 0 0.1916 0.401803 0.12 0.05 5.2 11.3099 140.627Box-Cox transformation: Lambda = -0.5
Goodness of Fit Test
Distribution AD P LRT P
Normal 35.628 <0.005
Box-Cox Transformation 2.116 <0.005
Lognormal 4.235 <0.005
3-Parameter Lognormal 1.082 * 0.000
Exponential 13.129 <0.003
2-Parameter Exponential 7.156 <0.010 0.000
Weibull 12.689 <0.010
3-Parameter Weibull 4.012 <0.005 0.000
Smallest Extreme Value 55.646 <0.010
Largest Extreme Value 9.909 <0.010
Gamma 9.941 <0.005
3-Parameter Gamma 5.086 * 0.000
Logistic 11.900 <0.005
Loglogistic 3.613 <0.005
3-Parameter Loglogistic 1.198 * 0.000ML Estimates of Distribution Parameters
Distribution Location Shape Scale Threshold
Normal* 0.19160 0.40180
Box-Cox Transformation* 2.87034 0.81335
Lognormal* -2.01040 0.67541
3-Parameter Lognormal -2.64071 1.07838 0.04715
Exponential 0.19160
2-Parameter Exponential 0.14241 0.04919
Weibull 1.02389 0.19429
3-Parameter Weibull 0.79877 0.11871 0.04950
Smallest Extreme Value 0.51875 1.23950
Largest Extreme Value 0.12033 0.09325
Gamma 1.54195 0.12426
3-Parameter Gamma 0.78275 0.18154 0.04950
Logistic 0.14578 0.08252
Loglogistic -2.08316 0.36781
3-Parameter Loglogistic -2.70149 0.63939 0.04893* Scale: Adjusted ML estimate
0June 3, 2011 at 3:27 pm #191542
Robert ButlerParticipant@rbutlerInclude @rbutler in your post and this person will
be notified via email.Any chance you could just post the numbers you are using – no units necessary just the data?
0June 3, 2011 at 3:37 pm #191543
BuckleParticipant@mikebuckleInclude @mikebuckle in your post and this person will
be notified via email.Heres the data set im using…. thanks
34.7956
26.8407
27.5982
28.2627
28.0241
28.5170
26.7090
27.6946
26.5507
26.6109
27.1106
27.1987
25.5562
26.9535
25.9623
25.8414
26.3387
27.1976
25.7848
25.6298
25.2313
26.0340
26.0751
26.1579
28.9148
29.6561
29.6696
29.4520
29.8276
29.8614
27.4307
28.0247
27.9960
27.9969
27.9465
28.0591
26.0738
26.9244
27.3275
27.7118
27.9261
27.9071
27.4087
27.8203
28.1117
28.2959
28.2170
28.2986
27.6946
28.0826
28.2375
28.2478
28.3426
28.6024
27.2496
27.7920
28.0731
28.4528
28.5585
28.7453
27.0332
27.5707
27.8537
28.1381
28.5959
28.5170
26.7016
27.0112
27.1330
27.1691
27.3697
27.5656
32.6204
29.2555
29.8183
29.3440
29.8260
30.7933
29.2004
27.9739
27.8901
28.1965
27.9616
28.3761
25.6178
25.9589
26.2786
26.6628
26.7916
27.0770
25.9280
26.3653
26.3974
26.5317
26.8214
27.1037
26.9260
26.9369
26.9410
27.4516
27.4663
27.6769
26.6496
26.9369
27.0602
27.3973
27.7303
27.9285
26.3913
26.6135
26.9135
27.1531
27.2643
27.5955
28.1394
28.2081
29.1518
28.6612
28.4477
28.5848
25.0587
26.2044
25.4356
25.1662
25.1546
25.8400
26.1900
26.0915
26.5535
26.9541
26.9261
27.0047
26.0933
26.7576
27.3581
27.8301
27.9156
28.7666
25.6002
26.2510
26.6499
27.4160
27.7058
28.1594
31.4738
31.0888
32.0138
29.7196
31.7220
33.0185
31.7841
31.6263
32.1091
31.3272
30.6816
30.9591
27.9471
27.4519
27.7643
28.4875
27.8002
28.3449
28.2695
27.9110
28.7224
28.9212
28.2042
28.4878
37.2891
36.4794
36.5538
34.8407
37.0576
36.8199
29.3376
27.8850
26.7118
28.8572
27.0282
26.8887
27.3125
26.5359
26.4336
26.6632
26.3718
26.7741
26.5888
25.9281
25.9868
26.1036
26.7571
26.4790
26.3027
26.8911
27.1491
27.3750
27.7376
27.6690
31.0505
30.1861
30.5296
30.2761
30.4650
30.2560
27.8038
27.1651
27.3153
27.7210
27.7498
27.7579
26.7461
26.6210
26.8018
26.8091
27.3648
27.1790
25.5695
26.2970
27.1850
27.2957
27.1416
28.3137
31.8318
33.8098
33.8048
31.7869
31.3226
33.3913
29.0797
26.2804
28.1706
28.2725
27.7638
28.1400
29.1175
29.6619
30.6951
28.7511
28.9438
29.3498
28.5866
29.7306
31.2539
30.0896
33.6808
32.8666
26.1217
26.6297
26.7408
27.0675
26.7338
27.1886
25.6845
26.0578
26.4023
26.5925
26.9061
27.1112
26.8118
26.8760
26.7835
26.7711
26.7167
27.4371
26.7007
27.7044
27.0059
27.2474
27.4930
27.0218
36.8939
34.4951
35.0471
36.2435
34.1718
35.0347
26.3611
26.6210
26.9345
27.1359
27.4240
27.6216
25.4229
26.2527
26.3901
26.7237
26.7666
26.9844
25.6695
25.9475
26.2830
26.5131
26.8022
26.8726
25.6699
25.3544
26.2830
26.1016
26.0110
26.7766
26.0742
26.4514
26.7077
26.9841
27.0983
27.3415
24.4919
24.2229
24.3698
25.2690
25.2958
25.5700
24.2427
24.8514
25.2029
25.5368
25.0419
25.5963
26.5402
26.3338
26.3020
26.6214
26.9099
26.3791
23.6948
23.7258
23.7252
23.2964
24.1735
23.8892
25.5572
26.1148
26.5335
26.8512
27.1508
27.2778
25.5469
26.3975
26.2776
26.1952
25.6502
25.6393
26.2231
26.1593
26.1949
26.7796
26.7510
26.7322
27.3824
27.4510
27.8248
27.9959
28.0076
27.4788
24.7737
26.0821
26.6679
26.9115
26.8756
27.0165
25.1133
26.0836
26.2061
26.7339
26.9681
27.0353
26.8872
26.6035
26.9758
26.7511
27.3441
26.9257
26.6280
26.5230
26.4236
27.0349
27.0189
27.2362
27.6946
27.0698
26.8508
26.7322
26.6823
27.0009
30.8092
30.1542
29.6811
28.8638
29.6141
29.6047
27.3978
27.2643
27.7867
27.4393
27.8973
27.6674
29.5683
29.2834
28.9750
29.3176
29.3082
29.5079
25.6699
25.9878
26.3601
26.6649
26.9844
27.1390
30.9969
31.2682
30.3084
31.6985
30.3726
30.7860
28.4472
28.2375
28.0390
27.8368
28.2416
28.6905
26.6076
26.5888
26.6732
27.0251
26.9844
27.0420
26.1023
25.7791
25.8738
25.7900
25.8699
26.5662
26.5283
26.8287
26.8424
26.4742
26.7511
26.8474
25.2658
25.7601
26.4801
26.3354
26.2701
26.3636
27.1786
27.7303
27.0659
27.1946
27.2856
27.5092
26.8449
27.3690
26.4078
26.7404
27.0059
26.8944
25.3596
25.8513
26.1941
26.4474
26.8860
27.1946
28.1601
28.0247
28.2682
28.1226
28.1525
27.8758
29.3656
28.4575
28.7647
28.9819
28.8436
29.0577
35.2271
34.3236
34.6877
35.5343
37.1490
34.6552
27.1224
27.7240
27.7775
27.6616
27.5416
27.4472
33.7149
39.5269
38.6333
38.7396
35.8785
34.05480June 3, 2011 at 3:47 pm #191544
KharooParticipant@ashkharooInclude @ashkharoo in your post and this person will
be notified via email.Check the minitab results ..
You could use Johnson transformation as P-value is more than 0.05
Step 1 :Goto Stat > Quality tools > Capability Analysis > Select Nonnormal…
And click button “Johnson transformation”Step 2 : Input data and specification limit and click OK and you will get the output.
Have attached Minitab process capability nonnormal test result. Assuming Spec as LSL 26 and USL 28
Check the attachment.Ashky
Goodness of Fit Test
Distribution AD P LRT P
Normal 30.411 <0.005
Box-Cox Transformation 5.528 <0.005
Lognormal 23.988 <0.005
3-Parameter Lognormal 9.423 * 0.000
Exponential 194.741 <0.003
2-Parameter Exponential 76.712 <0.010 0.000
Weibull 50.231 <0.010
3-Parameter Weibull 21.746 <0.005 0.000
Smallest Extreme Value 58.701 <0.010
Largest Extreme Value 8.483 <0.010
Gamma 26.043 <0.005
3-Parameter Gamma 13.128 * 0.000
Logistic 16.000 <0.005
Loglogistic 12.750 <0.005
3-Parameter Loglogistic 4.293 * 0.000
Johnson Transformation 0.347 0.4790June 3, 2011 at 4:03 pm #191545
KharooParticipant@ashkharooInclude @ashkharoo in your post and this person will
be notified via email.due to some tech issues, i am not able to attach the image file.
Mike, Hope you have the results.
Ashky
0June 3, 2011 at 4:04 pm #191546
BuckleParticipant@mikebuckleInclude @mikebuckle in your post and this person will
be notified via email.Thanks!! A great lesson.
I didnt see the Johnson data p value, not heard of that one before.
Thanks again.
Mike0June 6, 2011 at 9:35 am #191552
BuckleParticipant@mikebuckleInclude @mikebuckle in your post and this person will
be notified via email.I now have some more data (all part of the smae thing) and the best distribution is the weibull (although only 0.010 p value)
When i do a capability analysis on this the PPK is great and shows no defects. However i know that there are many defects. I guess i need to transform the LCL and UCL? How can i do this?
Any help would massively be appreciated.
Thanks
Mike0June 6, 2011 at 12:51 pm #191555
Robert ButlerParticipant@rbutlerInclude @rbutler in your post and this person will
be notified via email.I’m missing something here. If I understand the comments made by others concerning Minitab output it look like a P< .05 means that the fit is not correct. If that is the case then P = .01 for Weibull would mean this isn't the correct fit.
I went ahead and plotted the data you provided – a quick eyeball says the curve crosses .135 and 99.865 at 23.6 and 40.2. If you take the difference of these two numbers and divide the tolerance by this difference how close is it to the capability estimate you got when you ran the Johnson transform (by the way – which Johnson family does Minitab use Su, Sl, or Sb)?
0June 6, 2011 at 2:34 pm #191556
BuckleParticipant@mikebuckleInclude @mikebuckle in your post and this person will
be notified via email.Hi.
The p value were looking for in normality checks is P>(greater)0.005 and the weibull is the best fit at 0.010 so is higher than the above but not great. The data set you did the anlysis on which was posted above is different to the analysis im doing now. This has 4000 data points so i cant publish it here. I have tried to attach file….
This is the output i get in session window from the new data set. None are great but weibull is the best…… Im now stuck as to what to do? Any help is massively appreciated.
Goodness of Fit Test
Distribution AD P LRT P
Normal 2883.764 <0.005
Box-Cox Transformation 110.589 <0.005
Lognormal 162.833 <0.005
3-Parameter Lognormal 148.337 * 0.000
Exponential 963.921 <0.003
2-Parameter Exponential 658.669 <0.010 0.000
Weibull 968.306 <0.010
3-Parameter Weibull 708.063 <0.005 0.000
Smallest Extreme Value 4422.141 <0.010
Largest Extreme Value 652.429 <0.010
Gamma 765.221 <0.005
3-Parameter Gamma 623.345 * 0.000
Logistic 926.114 <0.005
Loglogistic 57.526 <0.005
3-Parameter Loglogistic 73.337 * 0.0000June 6, 2011 at 2:56 pm #191557
Robert ButlerParticipant@rbutlerInclude @rbutler in your post and this person will
be notified via email.I don’t follow the logic concerning the p-values so I’ll just chalk that up to not knowing how Minitab presents its output. In the meantime – did you try plotting the data on normal probability paper?
If you look at the curve for the data listed above then a plot of either the raw data or log transformed data gives about the same crossing points for the percentages mentioned before. I’d recommend trying this with data set you listed as well as with the expanded data and see what you see. If the crossings are about the same then the method outlined in Bothe’s book would be the approach I would recommend.
0June 10, 2011 at 2:29 am #191564
MBBinWIParticipant@MBBinWIInclude @MBBinWI in your post and this person will
be notified via email.Robert Butler wrote:
The above should provide you with a reasonable answer to your question. The step-by-step details as well as the justification for this procedure can be found in Bothe’s book Measuring Process Capability – Chapter 8 – Measuring Capability for Non-Normal Variable Data. The book is very readable and, if you are going to have to deal with non-normal capability issues I would strongly recommend you either purchase a copy of the book or get a copy through inter-library loan and commit Chapter 8 to memory.
Hey, Robert – excellent reference. And believe it or not, Davis is a neighbor of mine (at least we share the same zip code). Want an autographed copy?
0June 10, 2011 at 2:47 am #191565
MBBinWIParticipant@MBBinWIInclude @MBBinWI in your post and this person will
be notified via email.Robert Butler wrote:
I don’t follow the logic concerning the p-values so I’ll just chalk that up to not knowing how Minitab presents its output. In the meantime – did you try plotting the data on normal probability paper?
Wow! It’s the rare day that I can teach (or at least communicate) something to RB. Here you go (direct from Minitab help):
Anderson-Darling (AD) statistic
Measures how well the data follow a particular distribution. Smaller Anderson-Darling values indicate that the distribution fits the data better. Use the Anderson-Darling statistic to compare the fit of several distributions to see which one is best or to test whether a sample of data comes from a population with a specified distribution.
If the p-value (when available) for the Anderson-Darling test is lower than the chosen significance level (usually 0.05 or 0.10), conclude that the data do not follow the specified distribution. Minitab does not always display a p-value for the Anderson-Darling test because it does not mathematically exist for certain cases.
If you are trying to determine which distribution the data follow and you have multiple Anderson-Darling statistics, compare them. The distribution with the smallest Anderson-Darling statistic has the closest fit to the data. If distributions have similar Anderson-Darling statistics, choose one based on practical knowledge.
0January 30, 2013 at 10:38 am #194652Hello. I have similar case. My set of data (85) is arround the same five values. It does not adjust to any distribution. I cannot chenge my lecture instruments and i have to demmosntrate capability for a tolerance of 6 +/-0,5 (obviously this process is capable). What do you recommend me to do?
6,03
6,01
6,02
6,03
6,01
6,05
6,02
6,03
6,03
6,02
6,03
6,01
6,02
6,02
6,02
6,02
6,02
6,01
6,02
6,01
6,04
6,04
6,01
6,03
6,03
6,01
6,02
6,05
6,04
6,04
6,02
6,01
6,01
6,02
6,02
6,02
6,02
6,04
6,02
6,04
6,01
6,03
6,02
6,05
6,02
6,03
6,02
6,05
6,05
6,03
6,03
6,03
6,03
6,05
6,05
6,03
6,02
6,03
6,03
6,03
6,03
6,05
6,03
6,04
6,05
6,06
6,01
6,02
6,02
6,03
6,02
6,04
6,06
6,04
6,03
6,01
6,03
6,04
6,03
6,02
6,02
6,020January 30, 2013 at 1:32 pm #194654
Chris SeiderParticipant@cseiderInclude @cseider in your post and this person will
be notified via email.You could increase precision or just use the % estimated outside spec and convert that to a process capability index. Have you done a precision study at a minimum? Is the device accurate? If it was off by .02, it would have bad process capability.
0January 31, 2013 at 2:00 pm #194663Buckle,
1. Did you plan and oversee the collection effort for both data sets?
2. How is this capability comparison going to be used (assuming its resolved)?0February 7, 2013 at 2:09 pm #194701
Joel SmithParticipant@joelsmithInclude @joelsmith in your post and this person will
be notified via email.@mikebuckle – Assuming the data you posted are in time order, then the reason you cannot find a good distribution fit is that they are not in control. Try making an I-MR Chart of your data and it should be obvious.
When data are not in control, then your data are not all coming from the same or even similar distributions. Throw a bunch of wildly varying distributions together and you get a mess that does not match a known distribution.
More importantly, if your data are not in control then doing capability analysis does not have much value to you.
0February 7, 2013 at 2:15 pm #194702
Joel SmithParticipant@joelsmithInclude @joelsmith in your post and this person will
be notified via email.Adriana – Your data also show a slight shift around point 48 but nothing compared to the other dataset shown.
In any event, I’m not sure distribution fit is all that important here…you don’t have any data that is even remotely in the ballpark of your specs, and barring a process shift or special cause you will never ever see a part out of spec. Assigning a specific sigma level, Cpk, etc. is a pointless exercise. Make a histogram of the data and plot the specs as reference lines.
0January 9, 2015 at 11:52 am #197716Hello,
Right now I am working on some equipment validation, and when identifying distributions data, shows that 3-parameter weibull is the one with a P value > .05.
But, need to transform data to fit 3-parameter weibull distribution and how can I transform limits to same dist???
Need the transform limits as well, please help me…
Thanks for your help…
0January 9, 2015 at 2:13 pm #197718
Chris SeiderParticipant@cseiderInclude @cseider in your post and this person will
be notified via email.Don’t make yourself sweat so hard. If using Minitab, use capability analysis non-normal. Pick the distribution you want and then it will transform the data into a best fit curve and the original data is still shown.
0 -
AuthorPosts
You must be logged in to reply to this topic.