2 Sample t test
Six Sigma – iSixSigma › Forums › Old Forums › General › 2 Sample t test
 This topic has 14 replies, 10 voices, and was last updated 14 years, 8 months ago by melvin.

AuthorPosts

December 13, 2006 at 6:11 am #45506
I want to test whether there is any significant difference between the means of two sample, the basic assumption for t test is data should be normal. But here one sample is normal and other is not normal. what test can be used to see its differences with deviation? Could someone help me out?
Regards,
Rekha
0December 13, 2006 at 1:06 pm #148894
Eric MaassParticipant@poetengineer Include @poetengineer in your post and this person will
be notified via email.Rehka,
One possibility is the MannWhitney test. It is somewhat similar to the Student’s ttest, but looks for a significant difference in medians rather than means. Hence, it is considered a nonparametric test that does not have the same normality assumption as the Student’s ttest.
Incidentally, the Student’s ttest is considered fairly “robust” to the normality assumption.
Best regards,Eric0December 13, 2006 at 1:44 pm #148896Eric, while I agree with both your comments, I think we need to reinforce that Median tests are not without assumptions themselves.
0December 13, 2006 at 7:51 pm #148925
The ForceMember@TheForce Include @TheForce in your post and this person will
be notified via email.If the other data is normal and another is not normal — you need to transform the nonnormal to normal data then apply ttest (after checking if variances are equal) OR you can use nonparametric test for 2 samples which is mannwhitney
0December 13, 2006 at 9:38 pm #148928Let me see if I understand your suggestion. You have two sets of data of which one is normal and the other is nonnormal. You just suggested that he transform the nonnormal set and then do the ttest. Wouldn’t that put the two data sets totally at odds since the scale of measurement is now totally different? Better to stick with your second recommendation and do the nonparametric.
0December 14, 2006 at 2:52 am #148950What Dart is telling is true. If we convert a non normal data to normal, the transformation will be any of the forms as Y=X2, Y=sqrt(X), Y=Log(X),.. so it is not apple to apple comparison. Thanks for all your suggestions. So I can use Mann Whitney.
0December 14, 2006 at 3:00 am #148951
Eric MaassParticipant@poetengineer Include @poetengineer in your post and this person will
be notified via email.Darth,
Thanks – and you are right, of course.
Have a pleasant holiday season!
Best regards,Eric0December 15, 2006 at 2:18 pm #149074
The ForceMember@TheForce Include @TheForce in your post and this person will
be notified via email.Both for ttest and Ftest — if it was known that the variances are equal, further using t or Ftest will be insensitive provided that the other data which was found out as nonnormal is not highly skewed or it has large sample size.
OR utilize nonparametric test instead0January 4, 2007 at 11:57 am #149923Whilst I agree that a Mann Whitney test should be used in this case, I have one question. Did you expect one set of data to be normal and the other nonnormal?
I often have trainee BB and GB asking similar questions, but when we look at the data find that the nonnormal data set has too few samples and then when more samples are collected it becomes normal or that there are really two data sets combined in one and if these are split out then we have normal data as well.
Basically don’t jump to the conclusion that you need to use nonparametric analysis until you have validated and understood your data.0January 4, 2007 at 4:38 pm #149944
Jonathon AndellParticipant@JonathonAndell Include @JonathonAndell in your post and this person will
be notified via email.I agree with everything you are saying. In addition, let’s remember to plot the data in a control chart of some sort. The difference between the two distributions may be due to a special cause.
0January 4, 2007 at 7:30 pm #149960Assuming the lack of special causes has been verified, you have several options.
1. Use the 2sample tTest if n > 25 for both data sets.
2. Transform BOTH data sets using the same lambda value, then do the tTest.
3. Trim the data using a statistically valid method prior to the test.
Out of habit I always run both a 2sample tTest and a MannWhitney test to see if they agree. When sample size is large enough, they almost always provide the same outcome.0January 4, 2007 at 9:12 pm #149965I’m very curious that you have two different distributions? I’d say you don’t even need any statistical test because you know you have a difference already.
To me, a ttest or any other stat test is a tool to help you understand whether values from two sample distributions of data are actually coming from two different population distributions or if they are coming from the same population distribution and to give a probability that the difference is real. If the distribution changed on you then you have your answer already.0January 5, 2007 at 1:51 pm #149993
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.As a point of clarification I don’t know the source of the following recommendation:
“Use the 2sample tTest if n > 25 for both data sets.”
but I do know it is incorrect. If there was any validity to this statement then there would have been no reason for the Guinness Brewery to have demanded that Gosset publish his result under the pseudonym “Student” and there would be no reason for the standard t tables (which can be found in the back of any elementary book on statistics) to start at n=2.
The ttest’s value as a “secret weapon” was due to the fact that it permitted one to reach valid conclusions about a process with sample sizes far less than 25 per population. It is valued today for much the same reason.
I can’t speak for others who post here but I do know that in all of the places where I’ve worked I’ve never had the luxury of 25 independent samples per population. The norm has been 25 samples per population. Every now and then I’ve had a sample range of 610 and once in a great while I’ve been able to exceed this but even then the maximum samples per population was less than 15.
0January 5, 2007 at 3:23 pm #150003I agree. The power in statistical testing is the ability to get away with measuring a small fraction of the population and extrapolate the entire population from that sample. I too rarely have the ability to measure more than n=3 in a controlled experiement then be expected to make big decisions. If I had n>25 I’d just plot histograms and line them up. I wouldn’t need a p value to let me guess about the distributions, I’d already have it.
In this case whatever is causing a change in the distribution to nonnormal tells me that there’s a lot of variation and something is causing a shift toward a bounded value. The average may not change much but the bias toward the bounded value is shifting the distribution.0January 11, 2007 at 11:33 pm #150345As part of understanding data set prior to the ttest:
1. Is the process stable (in control). If not, is it appropriate to make the comparison?
2. tTest is using tdistribution to model a data set as foundation for making a statistical test. If data is not normal, need to take a graphical look at the data and understand why not? There are many conditions where we conclude nonnormality, but the practical difference from normality is so small that the tTest is still appropriate. Look at the data. For example, did we have a large sample size so the difference is practically meaningless, etc. I often talk people through using a fat pencil test around detecting nonnormality.
Also, is the data multimodal – if so, we should understand that first, …
Bob0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.