# 2 Sample t test

Six Sigma – iSixSigma Forums Old Forums General 2 Sample t test

Viewing 15 posts - 1 through 15 (of 15 total)
• Author
Posts
• #45506

Rekha
Participant

I want to test whether there is any significant difference between the means of two sample, the basic assumption for t test is data should be normal. But here one sample is normal and other is not normal. what test can be used to see its differences with deviation? Could someone help me out?
Regards,
Rekha

0
#148894

Eric Maass
Participant

Rehka,
One possibility is the Mann-Whitney test. It is somewhat similar to the Student’s t-test, but looks for a significant difference in medians rather than means. Hence, it is considered a nonparametric test that does not have the same normality assumption as the Student’s t-test.
Incidentally, the Student’s t-test is considered fairly “robust” to the normality assumption.
Best regards,Eric

0
#148896

Darth
Participant

Eric, while I agree with both your comments, I think we need to reinforce that Median tests are not without assumptions themselves.

0
#148925

The Force
Member

If the other data is normal and another is not normal — you need to transform the non-normal to normal data then apply t-test (after checking if variances are equal) OR you can use non-parametric test for 2 samples which is mann-whitney

0
#148928

Darth
Participant

Let me see if I understand your suggestion.  You have two sets of data of which one is normal and the other is non-normal.  You just suggested that he transform the non-normal set and then do the t-test.  Wouldn’t that put the two data sets totally at odds since the scale of measurement is now totally different?  Better to stick with your second recommendation and do the non-parametric.

0
#148950

Rekha
Participant

What Dart is telling is true. If we convert a non normal data to normal, the transformation will be any of the forms as Y=X2, Y=sqrt(X), Y=Log(X),.. so it is not apple to apple comparison. Thanks for all your suggestions. So I can use Mann Whitney.

0
#148951

Eric Maass
Participant

Darth,
Thanks – and you are right, of course.
Have a pleasant holiday season!
Best regards,Eric

0
#149074

The Force
Member

Both for t-test and F-test — if it was known that the variances are equal, further using t or Ftest will be insensitive provided that the other data which was found out as non-normal is not highly skewed or it has large sample size.

0
#149923

Sea
Participant

Whilst I agree that a Mann Whitney test should be used in this case, I have one question.  Did you expect one set of data to be normal and the other non-normal?
I often have trainee BB and GB asking similar questions, but when we look at the data find that the non-normal data set has too few samples and then when more samples are collected it becomes normal or that there are really two data sets combined in one and if these are split out then we have normal data as well.
Basically don’t jump to the conclusion that you need to use non-parametric analysis until you have validated and understood your data.

0
#149944

Jonathon Andell
Participant

I agree with everything you are saying. In addition, let’s remember to plot the data in a control chart of some sort. The difference between the two distributions may be due to a special cause.

0
#149960

howe
Participant

Assuming the lack of special causes has been verified, you have several options.
1. Use the 2-sample t-Test if n > 25 for both data sets.
2. Transform BOTH data sets using the same lambda value, then do the t-Test.
3. Trim the data using a statistically valid method prior to the test.
Out of habit I always run both a 2-sample t-Test and a Mann-Whitney test to see if they agree.  When sample size is large enough, they almost always provide the same outcome.

0
#149965

Dave L
Participant

I’m very curious that you have two different distributions?  I’d say you don’t even need any statistical test because you know you have a difference already.
To me, a t-test or any other stat test is a tool to help you understand whether values from two sample distributions of data are actually coming from two different population distributions or if they are coming from the same population distribution and to give a probability that the difference is real.  If the distribution changed on you then you have your answer already.

0
#149993

Robert Butler
Participant

As a point of clarification I don’t know the source of the following recommendation:
“Use the 2-sample t-Test if n > 25 for both data sets.”
but I do know it is incorrect.  If there was any validity to this statement then there would have been no reason for the Guinness Brewery to have demanded that Gosset publish his result under the pseudonym “Student” and there would be no reason for the standard t tables (which can be found in the back of any elementary book on statistics) to start at n=2.
The t-test’s value as a “secret weapon” was due to the fact that it permitted one to reach valid conclusions about a process with sample sizes far less than 25 per population. It is valued today for much the same reason.
I can’t speak for others who post here but I do know that in all of the places where I’ve worked I’ve never had the luxury of 25 independent samples per population.  The norm has been 2-5 samples per population. Every now and then I’ve had a sample range of 6-10 and once in a great while I’ve been able to exceed this but even then the maximum samples per population was less than 15.

0
#150003

Dave L
Participant

I agree.  The power in statistical testing is the ability to get away with measuring a small fraction of the population and extrapolate the entire population from that sample.  I too rarely have the ability to measure more than n=3 in a controlled experiement then be expected to make big decisions.  If I had n>25 I’d just plot histograms and line them up.  I wouldn’t need a p value to let me guess about the distributions, I’d already have it.
In this case whatever is causing a change in the distribution to non-normal tells me that there’s a lot of variation and something is causing a shift toward a bounded value.  The average may not change much but the bias toward the bounded value is shifting the distribution.

0
#150345

melvin
Participant

As part of understanding data set prior to the t-test:
1.  Is the process stable (in control).  If not, is it appropriate to make the comparison?
2.  t-Test is using t-distribution to model a data set as foundation for making a statistical test.  If data is not normal, need to take a graphical look at the data and understand why not?  There are many conditions where we conclude non-normality, but the practical difference from normality is so small that the t-Test is still appropriate.  Look at the data.  For example, did we have a large sample size so the difference is practically meaningless, etc.  I often talk people through using a fat pencil test around detecting non-normality.
Also, is the data multimodal – if so, we should understand that first, …
Bob

0
Viewing 15 posts - 1 through 15 (of 15 total)

The forum ‘General’ is closed to new topics and replies.