# Non-normal distribution

Six Sigma – iSixSigma Forums Old Forums General Non-normal distribution

Viewing 10 posts - 1 through 10 (of 10 total)
• Author
Posts
• #49923

gman
Participant

I am looking for some confirmation here on whether to consider normal or non-normal distribution in the following scenario:
When the part/process has a LSL of -0.12 and a USL of 0.12 and I know for a fact that all of my data are going to be skewed towards the upper or lower boundary and not distributed evenly over the nominal value, would this be considered a non-normal distribution?
To clarify – this is a min-max measurement we perform on a radial dimension. Our customer wants us to report the minimum “centre distance” and the maximum “centre distance” of a gear in separate columns. We treat them as separate “processes” because we are taking two different measurements on the same feature.
So basically, can anybody help me clarify the difference between a “normal” distribution and a “non-normal” distribution and when to use either? I find Mini-tab a little fuzzy on this.
Thanks,
G-man

0
#171387

Ron
Member

I’ll try.

When using Shewhart control charts normality is a non issue.
When doing statistical inference such as ANOVA or @ Sample T test normality is a requirement
Use the median rather than mean when describing central tendency for skewed data.

0
#171388

Ryan
Member

Gman,
If data are normally distributed, then a graph of the data should resemble a bell-like curve:
Notice the bell-like shape curve with more weight in the center and the tails tapering off to zero.
Create a histogram and see if it approximates the picture above. If you see very obvious departures (e.g. two distinct modes, highly skewed), then can probably stop there. You could also conduct a statistical inferential test on normality (e.g. Shapiro-Wilk, Komologrov-Sminov) and/or look at skewness/kurtosis values. I’m not fond of inferential tests for this situation b/c your conclusion will be impacted by your sample size. I prefer to look at skewness (lack of symmetry) and kurtosis (heaviness in the tails).
There’s a lot more that could be said on this topic, but I’ll stop here. I hope this response helps to answer your question.
Ryan

0
#171389

DaveS
Participant

Gman,
You said,
“When the part/process has a LSL of -0.12 and a USL of 0.12 and I know for a fact that all of my data are going to be skewed towards the upper or lower boundary and not distributed evenly over the nominal value, would this be considered a non-normal distribution?”
How do you know that it is skewed? Have you run data already and tested it for normality? Theoretically, the sampling distribution of the max/min is normal to the sample size value.
It almost sounds like you are assuming that the max will cluster toward the upper spec boundary and min the lower? The location of the data has nothing to do with the shape of the data.
What do you want to do with the data? If it is a capability study, then you do need to determine the shape if you want to be more exact of your possible ppm defective.
Despite what others may have posted ANOVA does not require normality of the data inputs, only of the residuals.
Most other statistical techniques (t-test, control charting, etc.) are robust to normal assumption.

0
#171390

gman
Participant

Daves,
I am assuming that the max will cluster towards the upper spec boundary and the min the lower and this is for a capability study. This has been the trend for as long as we have been checking these parts. It has been a bit tricky for me to understand as my statistical knowledge is still quite basic.
Currently we divide it into a Cpk for minimum and a Cpk for maximum. Since both values have to fall between the USL and LSL, they always tend to cluster towards the top and bottom respectively.
I am currently inputting data for a short-run capability run to report to our customer and I want to be sure I am doing this right. When I am creating a histogram I have the option of creating one based on normal distribution or non-normal distribution. One will produce a better Cpk than the other, but I don’t want to be cheating to make my parts look better than they are.

0
#171392

Ryan
Member

Gman,
Daves is absolutely correct about the normality assumption.
1. The normality assumption for Regression/ANOVA is based on normality of the residuals, not the raw data. Having said that, if the residuals depart *substantially* from normality, this can become an issue! More importantly, though, you want your sample sizes between the levels of your factor to be roughly equal, and it’s important that the variances between the levels of the factor not be *significantly* different from each other.
2. If you want to know if your data or residuals approximate a normal distribution, then follow the steps I mentioned before. Another common option would be to look at a Q-Q plot.
Ryan

0
#171393

Robert Sutter
Member

Use a normal probability plot in Minitab to determine if the distribution is normal. Also, it may be helpful to determine what question you want to answer. You may or may not need to be concerned whether the distribution is normal.

0
#171407

DaveS
Participant

Gman,
Your statement that you are a novice to statistics explains much.
In doing capability studies, we have a choice of using a normal model for the data or a non-normal. Normal/nonnormal applies to the shape of the data, not to where it is in relation to the specifications.
I always teach that there are NO normal distributions in nature. The data are what they are.
We use the normal mathematical model to approximate the reality in the data. So when we say that the data follow a normal model, what we really mean is that the data are distributed in a fashion that the normal model mathematically describes it best. So we can us the mathematical methods for the normal.
When the data do not line up with the normal model, we choose another model so that we can use the mathematical methods for that model. To choose the best model, (and it seems you have Minitab available) I would do this (others have suggested similar):
·        You really ought to have at least 100 points to do this kind of evaluation.
·        Use Graph > Histogram to do the histogram of the data you have for maximum.
·        Examine the histogram. Is it unimodal (one peak)? If not you must correct this. Probably data collection error. Is it symmetric? If so and has a basic bell shape, than the normal model will probably work. Use Graph>Probability plot to confirm that normal is an adequate model. It is better if it can be modeled normal as the mathematics are much more tractable.
·        If it cannot be modeled as normal, look at the tails of the distribution. Is it skewed to one side or the other? If so, either a Weibull or 3-parameter Weibull will likely work. If it is reasonably symmetric, but fails the normal probability test, then a lognormal model may fit.
·        To find the best model use STAT>Quality Tools>Individual Distribution Identification. Read the help carefully, especially the example to find out how to use. Try lognormal, Weibull and their 3-parameter alternatives. Select the best fit. If none fit, you can look at some of the more exotic alternatives, but usually for data of this type normal, lognormal or Weibull should suffice.
·        Under STAT> Quality Tools> Capability Analysis choose either Normal or Non-normal (and the best fit distribution you found) and run the test. If the data model selected above is normal you will get Cp and Pp metrics. I would key on the ppm defective.
·        If you run non-normal, you will only get a Ppk value. I always key on the ppm defective and not the capability numbers. Depending on the amount of nonnormality (skew) and it’s direction, the predicted ppm defectives may be better or worse than using the normal model.
Repeat for your other minimum data.
There are other rules and guidelines for actually conducting capability studies that I assume you are aware of and have followed.
In review, the relationship of the distribution to the specification limits has NO bearing on the selection of normal or nonnormal. The selection IS based on the best mathematical model fit to the data as observed.
Hope this helps.

0
#171408

gman
Participant

Daves,

That helps a lot. All the other suggestions have been helpful too. I will try these suggestions out and will post if I have any further questions.

Gman

0
#171424

Severino
Participant

One thing that confused me about your question is your customer’s request to have the min data and max data in two separate columns.  Your post indicates that these are measurements of the same feature.  My question would be are they measurements of the same exact part or are you taking a subgroup of say 5 parts and reporting the min and max for the subgroup?
If you are reporting one Cpk for the “min” measurements and a separate Cpk for the “max” measurements of the exact same feature you are penalizing your process for no good reason.  Before you go through great pains to perform transformations of data and such, you may want to clarify exactly what you are doing because we may be able to help you get a much truer picture of what is going on in your process.

0
Viewing 10 posts - 1 through 10 (of 10 total)

The forum ‘General’ is closed to new topics and replies.