# 2 Sample t test

Six Sigma – iSixSigma › Forums › Old Forums › General › 2 Sample t test

- This topic has 14 replies, 10 voices, and was last updated 13 years, 7 months ago by melvin.

- AuthorPosts
- December 13, 2006 at 6:11 am #45506
I want to test whether there is any significant difference between the means of two sample, the basic assumption for t test is data should be normal. But here one sample is normal and other is not normal. what test can be used to see its differences with deviation? Could someone help me out?

Regards,

Rekha

0December 13, 2006 at 1:06 pm #148894

Eric MaassParticipant@poetengineer**Include @poetengineer in your post and this person will**

be notified via email.Rehka,

One possibility is the Mann-Whitney test. It is somewhat similar to the Student’s t-test, but looks for a significant difference in medians rather than means. Hence, it is considered a nonparametric test that does not have the same normality assumption as the Student’s t-test.

Incidentally, the Student’s t-test is considered fairly “robust” to the normality assumption.

Best regards,Eric0December 13, 2006 at 1:44 pm #148896Eric, while I agree with both your comments, I think we need to reinforce that Median tests are not without assumptions themselves.

0December 13, 2006 at 7:51 pm #148925

The ForceMember@The-Force**Include @The-Force in your post and this person will**

be notified via email.If the other data is normal and another is not normal — you need to transform the non-normal to normal data then apply t-test (after checking if variances are equal) OR you can use non-parametric test for 2 samples which is mann-whitney

0December 13, 2006 at 9:38 pm #148928Let me see if I understand your suggestion. You have two sets of data of which one is normal and the other is non-normal. You just suggested that he transform the non-normal set and then do the t-test. Wouldn’t that put the two data sets totally at odds since the scale of measurement is now totally different? Better to stick with your second recommendation and do the non-parametric.

0December 14, 2006 at 2:52 am #148950What Dart is telling is true. If we convert a non normal data to normal, the transformation will be any of the forms as Y=X2, Y=sqrt(X), Y=Log(X),.. so it is not apple to apple comparison. Thanks for all your suggestions. So I can use Mann Whitney.

0December 14, 2006 at 3:00 am #148951

Eric MaassParticipant@poetengineer**Include @poetengineer in your post and this person will**

be notified via email.Darth,

Thanks – and you are right, of course.

Have a pleasant holiday season!

Best regards,Eric0December 15, 2006 at 2:18 pm #149074

The ForceMember@The-Force**Include @The-Force in your post and this person will**

be notified via email.Both for t-test and F-test — if it was known that the variances are equal, further using t or Ftest will be insensitive provided that the other data which was found out as non-normal is not highly skewed or it has large sample size.

OR utilize non-parametric test instead0January 4, 2007 at 11:57 am #149923Whilst I agree that a Mann Whitney test should be used in this case, I have one question. Did you expect one set of data to be normal and the other non-normal?

I often have trainee BB and GB asking similar questions, but when we look at the data find that the non-normal data set has too few samples and then when more samples are collected it becomes normal or that there are really two data sets combined in one and if these are split out then we have normal data as well.

Basically don’t jump to the conclusion that you need to use non-parametric analysis until you have validated and understood your data.0January 4, 2007 at 4:38 pm #149944

Jonathon AndellParticipant@Jonathon-Andell**Include @Jonathon-Andell in your post and this person will**

be notified via email.I agree with everything you are saying. In addition, let’s remember to plot the data in a control chart of some sort. The difference between the two distributions may be due to a special cause.

0January 4, 2007 at 7:30 pm #149960Assuming the lack of special causes has been verified, you have several options.

1. Use the 2-sample t-Test if n > 25 for both data sets.

2. Transform BOTH data sets using the same lambda value, then do the t-Test.

3. Trim the data using a statistically valid method prior to the test.

Out of habit I always run both a 2-sample t-Test and a Mann-Whitney test to see if they agree. When sample size is large enough, they almost always provide the same outcome.0January 4, 2007 at 9:12 pm #149965I’m very curious that you have two different distributions? I’d say you don’t even need any statistical test because you know you have a difference already.

To me, a t-test or any other stat test is a tool to help you understand whether values from two sample distributions of data are actually coming from two different population distributions or if they are coming from the same population distribution and to give a probability that the difference is real. If the distribution changed on you then you have your answer already.0January 5, 2007 at 1:51 pm #149993

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.As a point of clarification I don’t know the source of the following recommendation:

“Use the 2-sample t-Test if n > 25 for both data sets.”

but I do know it is incorrect. If there was any validity to this statement then there would have been no reason for the Guinness Brewery to have demanded that Gosset publish his result under the pseudonym “Student” and there would be no reason for the standard t tables (which can be found in the back of any elementary book on statistics) to start at n=2.

The t-test’s value as a “secret weapon” was due to the fact that it permitted one to reach valid conclusions about a process with sample sizes far less than 25 per population. It is valued today for much the same reason.

I can’t speak for others who post here but I do know that in all of the places where I’ve worked I’ve never had the luxury of 25 independent samples per population. The norm has been 2-5 samples per population. Every now and then I’ve had a sample range of 6-10 and once in a great while I’ve been able to exceed this but even then the maximum samples per population was less than 15.

0January 5, 2007 at 3:23 pm #150003I agree. The power in statistical testing is the ability to get away with measuring a small fraction of the population and extrapolate the entire population from that sample. I too rarely have the ability to measure more than n=3 in a controlled experiement then be expected to make big decisions. If I had n>25 I’d just plot histograms and line them up. I wouldn’t need a p value to let me guess about the distributions, I’d already have it.

In this case whatever is causing a change in the distribution to non-normal tells me that there’s a lot of variation and something is causing a shift toward a bounded value. The average may not change much but the bias toward the bounded value is shifting the distribution.0January 11, 2007 at 11:33 pm #150345As part of understanding data set prior to the t-test:

1. Is the process stable (in control). If not, is it appropriate to make the comparison?

2. t-Test is using t-distribution to model a data set as foundation for making a statistical test. If data is not normal, need to take a graphical look at the data and understand why not? There are many conditions where we conclude non-normality, but the practical difference from normality is so small that the t-Test is still appropriate. Look at the data. For example, did we have a large sample size so the difference is practically meaningless, etc. I often talk people through using a fat pencil test around detecting non-normality.

Also, is the data multimodal – if so, we should understand that first, …

Bob0 - AuthorPosts

The forum ‘General’ is closed to new topics and replies.