Nonnormal Capability or Johnson Transformation
Six Sigma – iSixSigma › Forums › General Forums › Tools & Templates › Nonnormal Capability or Johnson Transformation
 This topic has 9 replies, 4 voices, and was last updated 5 years, 6 months ago by Robert Butler.

AuthorPosts

March 20, 2017 at 10:29 pm #55663
Dear All,
I have a question, this has already been discussed in few forums in Isixsigma, the thing is the conclusion wasn’t clear for me therefore I want to start this discussion. I am working on a project the data is nonnormal. I started using nonnormal capability analysis I have performed a Weibull distribution. I have also tried transforming the data to Normal using Johnson Transformation, the data got successfully transformed the p value was 0.63.
I also found the difference of process capability which was 0.80 using Weibull and 0.99 using Johnson transformation.
I have also gone through about Johnson transformation and I learned “This transformation is very powerful, and it can be used with data that include zero and negative values, but it is more complicated and it only provides overall capability statistics. Use this when the BoxCox transformation does not find a suitable transformation”.
My question is which one should I choose and why? Am I looking for any major impact using either of one. Kindly clarify this one.
Regards
KK0March 21, 2017 at 5:24 am #201054
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.Neither – get a copy of Measuring Process Capability by Bothe – read Chapter 8 – Measuring capability for nonnormal variable data. According to other posts to this forum Minitab has the Bothe method as one of their options in the capability calculation section.
0March 21, 2017 at 11:23 pm #201069Dear Robert ,
I will go through the chapter today in detail. Thank you so much.
0March 29, 2017 at 5:52 am #201097
Hajo SchmidtParticipant@Hajo Include @Hajo in your post and this person will
be notified via email.Hi KKiru,
if you have a sufficient amount of data (appr. > 2000) you can calculate the capability directly from the data without any distribution or transformation using the “Quantile”methode and calculating the 0,135% and 99,865%percentile values in Minitab.
In case of fever data you have to find an appropriate fit of the data (especially at the “boarders” of the distribution). Use the following Minitab function to find the best fit:
Stat > Quality Tools > Individual Distribution Identification0March 29, 2017 at 6:44 am #201098
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.Thanks, @Hajo – that is the part of the Minitab program which corresponds to the method referenced in the book by Bothe. The only thing I find odd is the requirement for >2000 data points. I’ve used this method dozens of times over the years and I’ve never had the option of asking for >2000 data points. The cost per sample for most of the processes I’ve worked on has been such that the sample size has been between 15 and 30 and the results I’ve had using this method have been quite good.
I just pulled Bothe’s book off of my shelf and did a quick scan of Chapter 8 – I may have missed something but I don’t see anything in the chapter which discusses a required minimum sample size.
Often what I’ve had to do when working with small samples is print out the line as plotted on normal probability paper and extrapolate to find the .135 and 99.865% points. To do this I use a trusty manual device known as a French Curve and a pencil – old fashioned but it does an admirable job.
0March 29, 2017 at 7:16 am #201099
MBBinWIParticipant@MBBinWI Include @MBBinWI in your post and this person will
be notified via email.@rbutler – ah, the French curve, like the slide rule, something that anyone under 40 will have to search Bing to find out what it is. ;) Along with reverse Polish notation calculating.
0March 29, 2017 at 7:17 am #201100
Hajo SchmidtParticipant@Hajo Include @Hajo in your post and this person will
be notified via email.@rbutler In fact you won’t find “>2000 data point” in the literature. It is my own advice because otherwise the number of data at the boarders becomes very low (0,00135 * 2000 = 2,7) and the percentilevalue will be very sensitive to rare events.
In fact if you have fewer data points you have to extrapolate either by finding a suitable fit or by the mentioned graphical methode. By the way, in my training I often add the percentilevalues 0,135% and 99,865% at the yaxle of the probability plot in Minitab. This makes the percentilevalues visible to my colleages.0March 29, 2017 at 8:19 am #201101
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.@Hajo, thanks for the clarification on the sample size. I agree that with 1530 samples there will be few data points and that you will have to extrapolate – hence the manual curve fitting using a french curve. As for rare events – I’m not so sure. If you are running a capability analysis one of the assumptions is that the process is in control so I don’t see how such a thing would be much of a problem.
I do think is you are being overly conservative with respect to the sample size. Indeed, your choice of sample size is such that I really can’t think of anything I’ve ever worked on where that requirement could have been met. What I do know is that the results I got from the analysis of the small samples were more than adequate and provided the information we needed to press on with the work.
0March 30, 2017 at 1:44 am #201104
Hajo SchmidtParticipant@Hajo Include @Hajo in your post and this person will
be notified via email.@rbutler, thanks for your statement. Myself I work in the german automotive industry where one’s requires at least 125 data points for the capability analysis. In many cases we have much higher sample sizes.
I made a simulation with normally distributed random data, sample size 30. See pic enclosed. I wonder that you can accept such a high 95%CI. The VDA norm (German Association of Automotive Industry) allows smaller sample sizes for the short term capability but then the target value has to be higher (i.e. 1,77 for sample size 30)0March 30, 2017 at 5:21 am #201105
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.@Hajo it’s not a matter of acceptance so much as it is a matter of physically not being able to generate large sample sizes and still remain in business. I’ve worked in specialty chemicals, plastics, aerospace, and medical and in each of these cases the cost of large sample sizes is prohibitive.
For example, in the chemical industry the samples were those from a 10,000 gallon reactor vessel. While you can take repeated measures on material coming from such a reactor you have to remember that the smallest unit of independence is the individual reactor. Waiting to make a decision until you have chewed up 300,000 gallons of raw material simply isn’t possible.
It is true there are things you can do with repeated measures and you can also employ time series to test for sample independence within a given reactor but at the end of the day you are still going to have to make do with a small sample of independent events and you will have to use those to generate estimates of the various statistics of the process.
0 
AuthorPosts
You must be logged in to reply to this topic.