The twosample ttest is one of the most commonly used hypothesis tests in Six Sigma work. It is applied to compare whether the average difference between two groups is really significant or if it is due instead to random chance. It helps to answer questions like whether the average success rate is higher after implementing a new sales tool than before or whether the test results of patients who received a drug are better than test results of those who received a placebo.
Here is an example starting with the absolute basics of the twosample ttest. The question being answered is whether there is a significant (or only random) difference in the average cycle time to deliver a pizza from Pizza Company A vs. Pizza Company B. This is the data collected from a sample of deliveries of Company A and Company B.
Table 1: Pizza Company A Versus Pizza Company B Sample Deliveries  
A  B 
20.4  20.2 
24.2  16.9 
15.4  18.5 
21.4  17.3 
20.2  20.5 
18.5  
21.5 
To perform this test, both samples must be normally distributed.
Since both samples have a pvalue above 0.05 (or 5 percent) it can be concluded that both samples are normally distributed. The test for normality is here performed via the Anderson Darling test for which the null hypothesis is “Data are normally distributed” and the alternative hypothesis is “Data are not normally distributed.”
Using the twosample ttest, statistics software generates the output in Table 2.
Table 2: TwoSample TTest and Confidence Interval for A Sample and B Sample  
N  Mean  Standard Deviation  SE Mean  
A Sample  7  20.23  2.74  1.0 
B Sample  5  18.68  1.64  0.73 
Difference = mu (A Sample) – mu (B Sample)
Estimate for difference: 1.54857
95% CI for difference: (1.53393, 4.63107)
Ttest of difference = 0 (vs not =): Tvalue = 1.12, Pvalue = 0.289, DF = 10
Both use pooled StDev = 2.3627
Since the pvalue is 0.289, i.e. greater than 0.05 (or 5 percent), it can be concluded that there is no difference between the means. To say that there is a difference is taking a 28.9 percent risk of being wrong.
If the twosample ttest is being used as a tool for practical Six Sigma use, that is enough to know. The rest of the article, however, discusses understanding the twosample ttest, which is easy to use but not so easy to understand.
Actually, if one subtracts the means from two samples, in most cases, there will be a difference. So the real question is not really whether the sample means are the same or different. The correct question is whether the population means are the same (i.e., are the two samples coming from the same or different populations)?
Hence, will most often be unequal to zero.
However, if the population means are the same, will equal zero. The trouble is, only two samples exist. The question that must be answered is whether is zero or not.
The first step is to understand how the onesample ttest works. Knowing this helps to answer questions like in the following example: A supplier of a part to a large organization claims that the mean weight of this part is 90 grams. The organization took a small sample of 20 parts and found that the mean score is 84 grams and standard deviation is 11. Could this sample originate from a population of mean = 90 grams?
The organization wants to test this at significance level of 0.05, i.e., it is willing to take only a 5 percent risk of being wrong when it says the sample is not from the population. Therefore:
Null Hypothesis (H0): “True Population Mean Score is 90”
Alternative Hypothesis (Ha): “True Population Mean Score is not 90”
Alpha is 0.05
Logically, the farther away the observed or measured sample mean is from the hypothesized mean, the lower the probability (i.e., the pvalue) that the null hypothesis is true. However, what is far enough? In this example, the difference between the sample mean and the hypothesized population mean is 6. Is that difference big enough to reject H0? In order to answer the question, the sample mean needs to be standardized and the socalled tstatistics or tvalue need to be calculated with this formula:
In this formula, is the standard error of the mean (SE mean). Because the population standard deviation is not known, we have to estimate the SE mean. It can be estimated by the following equation:
where is the sample standard deviation or s.
In our example, is:
Next we obtain the tvalue for this sample mean:
Finally, this tvalue must be compared with the critical value of t. The critical tvalue marks the threshold that – if it is exceeded – leads to the conclusion that the difference between the observed sample mean and the hypothesized population mean is large enough to reject H0. The critical tvalue equals the value whose probability of occurrence is less or equal to 5 percent. From the tdistribution tables, one can find that the critical value of t is +/ 2.093.
Since the retrieved tvalue of 2.44 is smaller than the critical value of 2.093, the null hypothesis must be rejected (i.e., the sample mean is not from the hypothesized population) and the supplier’s claims must be questioned.
In the twosample ttest, two sample means are compared to discover whether they come from the same population (meaning there is no difference between the two population means). Now, because the question is whether two populations are actually one and the same, the first step is to obtain the SE mean from the sampling distribution of the difference between two sample means. Again, since the population standard deviations of both of the two populations are unknown, the standard error of the two sample means must be estimated.
In the onesample ttest, the SE mean was computed as such:
Hence:
However, this is only appropriate when samples are large (both greater than 30). Where samples are smaller, use the following method:
Sp is a pooled estimate of the common population standard deviation. Hence, in this method it can be assumed that variances are equal for both populations. If it cannot be assumed, it cannot be used. (Statistical software can handle unequal variances for the twosample ttest module, but the actual calculations are complex and beyond the scope of this article).
The twosample ttest is illustrated with this example:
Table 3: Illustration of TwoSample TTest  
N  Mean  Standard Deviation  
A Sample  12  92  15 
B Sample  15  84  19 
Ho is: “The population means are the same, i.e.,
Ha is: “The population means are not the same, i.e.,
Alpha is to be set at 0.05.
In the twosample ttest, the tstatistics are retrieved by subtracting the difference between the two sample means from the null hypothesis, which is is zero.
Looking up ttables (using spreadsheet software, such as Excel’s TINV function, is easiest), one finds that the critical value of t is 2.06. Again, this means that if the standardized difference between the two sample means (and that is exactly what the t value indicates) is larger than 2.06, it can be concluded that there is a significant difference between population means.
Here, 1.19 is less than 2.06; thus, it is the null hypothesis that = 0.
Below is the output from statistical software using the same data:
Table 4: TwoSample TTest and Confidence Interval for A Sample and B Sample  
N  Mean  Standard Deviation  SE Mean  
1 Sample  12  92.0  15.0  4.3 
2 Sample  15  84.0  19.0  4.9 
Difference = mu (1) – mu (2)
Estimate for difference: 8.00000
95% CI for difference: (5.84249, 21.84249)
Ttest of difference = 0 (vs not =): Tvalue = 1.19, Pvalue = 0.245, DF = 25
Both use pooled StDev = 17.3540


Comments
I have a situation where i worked on a project to reduce the number of shipping errors. Before we started the project we had 80 errors for 754 loads . After we made improvements, we had 62 errors for 1800 loads. How do I show the difference we made?
after observing your data
i)
80 errors for 754 loads
in this case each error is occuring for every ten loads as an average.
ii)62 errors for 1800 loads
in this situation each error occuring nearly after 29 loads as an average so i can conclude this there is alot of difference after improvement.
please can you explain to me the difference between onesample t and twosample t. I seem to be confused between this two. thanks
Hello,
First, I am saddened that nobody answered this question. I am not a statistician but I would like to provide some input.
“Onesample” ttest is used to compare one “sample” (think of one column and multiple rows in Excel) to a known value. To do this, you must compare the calculated mean value of the column to the known and use the “Onesample” equation.
“Twosample” ttest is used to compare two “samples” (think of this as two columns and multiple rows in Excel) to each other. The question is if they come from the same population.
When thinking of “population” versus “sample”, the sample comes from the “population”. That is the test anyhow.
To play with excel, create two columns with multiple rows of exact values. In this ideal situation, the means will be exact and the null hypothesis of mean1mean2=0 will be satisfied. Use Excel function of TTEST and evaluate the two columns (Twosample ttest). You will get “1” as a probability. In other words, you will have a 100% chance of being wrong if you say the means are different. Now change two rows so that the columns have different values and do TTEST again. What is the probability?
what is the main criterion for determining which two sample t test to use?
One sample t test is used to test whether or not a a sample mean is significantly different from a hypothetical or known population mean. Think of an allegation like, “in this town, the average age at death is 50”.
On the other hand, a two sample t test is used to compare two means from two different populations. Think of an allegation “the average age at death in NY is significantly less than that in Johannesburg.”
how to test whether the two groups have common variance?
Is there any free software to run the 2 sample t tests?
http://www.rproject.org/
yes, its called R.
the command is:
t.test(sample1,sample2)
How do we interpret the 95% CI for the difference?
you may use TwoProportions Test in stead of TwoSample TTest to verify the difference
Thanks, the article was very informative and helpful.
Hey, I have a couple of questions about which methods should I use to:
1) Determine if the gender distribution is random.
2) Examine if women are more satisfied with the measurement scale
than men.
p val is incorrect pval is .25081
Some potentially misleading language here. In tests like these, you NEVER EVER compute the probability that the null hypothesis is true. The pvalue is the probability that, if the null hypothesis WERE true, you would observe data as extreme as what you actually see. In fact, when you think about it, the null hypothesis is never true; that’s why the “phrase of art” in this is that we either “reject the null” or “fail to reject” the null.
Figure 1: Probability Plot of Sample A and Sample B is an excellent example of what I need for my graduate program. Any chance you can share with me how to construct this or direct me to a program that will assist?
Thanks so much
This doesn’t make any sense at all. I need someone to explain this in a way as if they were trying to explain this to a three year old, because that’s about the intelligence that I have.
I just wish there was a simplified table to look at and from there link it to deeper meaning and understanding. It needs to be broken down a bit more I think and I am not exactly stupid but each of us have a different way of learning or understanding so simplified table, leading to step one and so on. I have been trying to find this for weeks now… I am study stats presently and it is hard going all the jargon
Hi.
Thank you for the information on the use of two sample t tests. Kindly could you guide me on how to report results from a twosample t test with pooled standard deviation.
its really very helpful content. thanks for this.
Can someone help me with this problem? I am really struggling. please email me edsheer5@gmail.com. Thanks
The Quality Assurance department at a call center wants to compare the call work times on incoming calls for two operators. Specifically, they want to determine whether the employees differ in their call time variation and call time means.
Data collection For each operator, 30 incoming calls are timed. The data are transformed to satisfy the normality assumption.
Instructions
1Compare the variation for the two operators using a 2variances test. Can you conclude that one operator has more consistent call work times than the other?
2Compare the mean call work times for the two operators using a 2sample ttest.
3Check the normality assumption for each operator.
Data set CCTimes.MPJ
Variable&Description
Date/Time: Date and time on the incoming call
HTime:Call work times in seconds
Operator: Operator (James or Laura)
LnHTime: Natural log of handling time
Note:HTime data are not normal. The natural log transformation has been stored in LnHTime. Use this column for your analysis.
Hello – after reading the first paragraph, I think some clarification is needed. The very first example you give for the application of a TwoSample Ttest is to compare “the average success rates before and after a new sales tool is implemented”.
My understanding was that this would be an example of a Paired Ttest, which would determine if a significant difference existed between the SAME sample at two different points in time (i.e. after the training event). I’ve always thought the regular TwoSample Ttest required 2 INDEPENDENT sample pools, not two sets of measurements from the same sample (e.g. the same sales workers before and after the implementation of your tool. Please advise – thank you.