# Reasonably Normal

Six Sigma – iSixSigma Forums Old Forums General Reasonably Normal

Viewing 6 posts - 1 through 6 (of 6 total)
• Author
Posts
• #30768

Participant

In doing Six Sigma greenbelt training, I have seen a couple of times, when learning about T-test and ANOVA, a requirement that data be “reasonably normal.”  Why is that, and what is the threshold for reasonably normal data?  Is that any set of data that has a p-value greater than .01?  .02?  .03?  Can anyone help me with these questions?

0
#80607

Ashman
Member

The great thing about statistics is that what matters to you may not matter to someone else.  So what I think is important may have little to do with what you think is important.  Think back to what a p-value is.  The p-value is the probability that you will reject the null hypothesis when it is true.  Some people like to think of it as risk to themselves.  So how much risk are you willing to live with?  alpha=0.01?  alpha=0.05?  Most people say alpha=0.05 is a good risk.  If it is a process that doesn’t really have a an effect on the customer or has really high statistical power (ie small beta or type II error) then maybe you only need to have a p-value less that 0.1.  It is really up to you and how much risk you can live with.

0
#80608

Robert Butler
Participant

The requirement for “reasonably normal” data is driven by the construct of the tests.  Both the t-test and ANOVA assume that the data that you are testing consists of independent samples from normal populations.  This is the reason that there is such a focus on understanding the underlying distribution of the data before continuing with an investigation.  If you wish to use either of the above tests and your data proves to be significantly non-normal you will either have to investigate ways to transform the data so that it is “reasonably normal” and/or you will have to study tests that are equivalent to the t-test and ANOVA for non-normal data.

0
#80620

Erik L
Participant

John,
When dealing with Hypothesis testing the concers that you should have (in order of precedence) are independence, homogeneity of variance, and then last on the chain is normality.  Data can be quite ‘non-normal’ and still work when we look at the data via something like the t-test.  Normality is not a super-critical assumption for test using the t distribution.  If you have concerns, my usual guidance to Belts is to perform the parametric, and appropriate non-parametric equivalent, and look for confirmation in the result.  If the two differ, it could be that the data is truly non-normal enough to effect the result or you are dancing on the edge of the P-value that you’ve arbitrarily picked as a decision for significance of effect.  Remember to not throw away common sense when you review a statistical result.
Regards,
Erik

0
#80622

Mike Carnell
Participant

Erik,
Even though the normality isn’t a big issue you should know it when you run the homogeneity of variance tests and the outcome of the homogeneity of variance test determines the formula in the t test. It is more of a logical flow.

0
#80686

Erik L
Participant

Mike,
I agree with the logic of approaching the scenario, but there needs to be greater than a three-fold difference in the variances of two data sets before it triggers an effect on the result of a hypothesis test.  I don’t put much valididty behind the tests of equality of variance.  Box commented that, “…to make the preliminary test on variances is rather like putting to sea in a rowing boat to find out whether conditions are sufficiently calm for an ocean liner to leave port.”

0
Viewing 6 posts - 1 through 6 (of 6 total)

The forum ‘General’ is closed to new topics and replies.