# Non-normal Data Transformation

Six Sigma – iSixSigma › Forums › Old Forums › General › Non-normal Data Transformation

- This topic has 12 replies, 8 voices, and was last updated 11 years, 2 months ago by Severino.

- AuthorPosts
- October 1, 2008 at 8:52 pm #51052

Mike A JrParticipant@Mike-A-Jr**Include @Mike-A-Jr in your post and this person will**

be notified via email.Hi all,

I am updating a capabilty study on a measurement and the p value on a normality test is < .005. The measurement is crimp height on a terminal with spec of .250 of +/- .004". The data should be normal, but the sample size and lack of discrimination (only had four different measurements over the 30 samples - .2495, .2500 .2505 and .2510) causes it to fail the normality test. I'm sure if I had the ability to measure the parts at a .00001 precision it would test as normal data.

How do I transform this data so the capability study can be performed?

Thanks,

-Mike A.0October 1, 2008 at 9:14 pm #176351You don’t!!!!! It is meaningless since you basically have discrete data. Forget trying to do this as continuous data and report out any capability in a discrete format. At this point all you have is pass/fail data.

0October 2, 2008 at 12:11 am #176355

Shereen MosallamMember@Shereen-Mosallam**Include @Shereen-Mosallam in your post and this person will**

be notified via email.your problem is clearly MSAYou need to do MSA study as data points you mentioned are out of Specsso before you conclude you have a problem you need to validate your MSA with a better precision equipmentas your process range of varaition is much wider than Specs. Shereen Mosallam, MBB

0October 2, 2008 at 12:26 pm #176365

Mike A JrParticipant@Mike-A-Jr**Include @Mike-A-Jr in your post and this person will**

be notified via email.Thanks for the input.

An MSA was already completed. The GRR results were acceptable. 13.62% (%tol) with NDC = 24.

I believe it is a sample size issue more than anything. I bet with 100 samples it would pass the normality test.0October 2, 2008 at 12:55 pm #176367How much ya willing to bet? Stan will probably want some of this action. If you only have 4 categories of measurements, it is discrete no matter how many you collect. Showing a high p value will be a trick even if you are able to get it. Either get some resolution or forget about trying to use continuous tools. You are a tool in search of an application which is an incorrect approach.

0October 2, 2008 at 1:05 pm #176368“you are a tool in search of an application”….I get it….I will so be using that one in the future….very nice!

0October 2, 2008 at 1:18 pm #176370Hmm Darth,

he is maybe thinking about “taking a large dataset and then subgrouping/stratifying it in groups such that the Means of the groups will approximate a normal distribution”. (just an assumption of course)

This trick will work but it is just that: a trick to present data that satisfies a normal distribution.

I agree that if there are so few different possible values the data should be treated as noncontinuous.

But the range of the data is 0.2510-0.2495=0.0010 (+/- 0.0010 due to rounding of) while the customer(?) spec is +/- 0.004. So the data-Range fits >4* inside the specs. With normal distributed data this would mean a Cp of 4. So where is the problem that should be solved?

Mike A jr: I recommend that you plot the data you have in a time series chart and include Target and Specs. This way you can see how your data behaves 1: in time and 2: compared to the customer wish. And there you have your capability study.

Good luck0October 2, 2008 at 1:57 pm #176372

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.As Darth has noted, if you are offering bets you will get a lot of takers and you will wind up paying out a lot of money for the reasons already cited.

I think you are laboring under the mistaken belief that if you collect enough individual samples you will eventually get a normal distribution. This is a common misinterpretation of the Central Limit Theorem – which does not apply to distributions of individual measurements but to distributions of averages of measurements.

Now, if you really want to cheat and fool the tests for normality you would want to go in the opposite direction – smaller numbers of cherry picked samples. For example, if you run 20 tests and you get a spread of the 4 numbers you have you could cherry pick 8 of them so that your choices looked like the following:

.2495

.2500 .2500 .2500 .2500

.2505 .2505

.2510

This distribution will pass all of the normality tests.

Shapiro-Wilk P = .32

Kolmogorov-Smirnov D P = .054

Cramer Von-Mises P =.091

Anderson-Darling P =.138

Obviously the above is creative science (cheating) at its worst. However, if you want to improve your understanding of the issues then rather than betting I’d recommend you take the 4 values you have, set up different choices of the 4 numbers as above and play with these data sets in whatever package you have and see what happens.

0October 2, 2008 at 2:44 pm #176374

Mike A JrParticipant@Mike-A-Jr**Include @Mike-A-Jr in your post and this person will**

be notified via email.Thanks for the replies… I think.

I just got a reply from one of our master blackbelts and I’ll go with his recommendations.

If the approximation for standard deviation seems accurate (as indicated by the fit of the curve to the histogram), I would probably use the data as is to establish capability. If there is a big difference between the two, the overall (Pp, Ppk) values are probably better as they represent the “average” variation from the mean (less likely to be influenced by discrimination issues) and the within (Cp, Cpk) uses the average moving range which is more influence by discrimination issues.

0October 2, 2008 at 2:49 pm #176376

Mike A JrParticipant@Mike-A-Jr**Include @Mike-A-Jr in your post and this person will**

be notified via email.Thanks for the info.

As for betting… No thanks, I’m not enough of a stats geek to bet on stuff like that. Football, however, is a different story.

0October 2, 2008 at 8:35 pm #176381

KennySkyParticipant@KennySky**Include @KennySky in your post and this person will**

be notified via email.A discrete process capability won’t work for you since you would have 0 defects. The nature of your data is continuous even though you don’t have enough precision to measure it adequately.

If you don’t mind my asking, what is your kurtosis with the values you do have?0October 2, 2008 at 9:59 pm #176384

Shereen MosallamMember@Shereen-Mosallam**Include @Shereen-Mosallam in your post and this person will**

be notified via email.I am not very sure how you have this high number of distinct categories and Low resolutionequation of NDC » 1.41* (spart / s R&R)

so in your MSA study, you might have taken wide range of parts which masked R&R problem and falsely gave high NDC

Sample size is an issue i agreeyour data can’t be transformed as it is too discrete as wellso you need more data and if all your measurements remain too discrete then you have two solutions:

1. get a better measuring tool (which i believe you should do as you have data points out of tolerance)

2. deal with it as go no go which is not so good in your case as you have out of tolerance data pointsgood luck

Shereen A. [email protected]://www.symbios-consulting.com

0October 3, 2008 at 2:18 am #176389

SeverinoParticipant@Jsev607**Include @Jsev607 in your post and this person will**

be notified via email.Just out of curiosity, what instrument are you using to measure the crimp height?

0 - AuthorPosts

The forum ‘General’ is closed to new topics and replies.