# Null Hypothesis and two tailed t-test

Six Sigma – iSixSigma › Forums › General Forums › Methodology › Null Hypothesis and two tailed t-test

- This topic has 16 replies, 8 voices, and was last updated 10 years, 4 months ago by Mikel.

- AuthorPosts
- May 7, 2010 at 7:33 pm #53445
Looking for some guidance on hypothesis testing, particularly how the null hypothesis is stated. I conducted an improvement project aimed aimed at reducing lead times. The hypothesis was that improving the flow of materials would reduce lead times from a mean of 90 days to 30 days. A pilot was run and the sample data observed showed a mean of 33.3 days based on 17 observations (this was all that time and $ would allow). The below test was setup to reject/fail to reject the hypothesis. I am confident that the math is correct, but is the null and alternative hypothesis stated correctly?

Null Hypothesis (H0): mean = 30

Alternative Hypothesis (Ha): mean does not equal 30

Significance Level (alpha) = 0.05

Number of Samples (n) = 17

Degrees of Freedom (df) = n-1 = 17-1 = 16

Sample Mean (Xbar) = 33.3

Sample Standard Deviation (s) = 8.0

Critical Region of the t-Distribution: -2.120 is less than or equal to t, and t is less than or equal to 2.120

Test Statistic (t): 1.701Since the test statistic result 1.701 lies between the critical regions of the t-distribution (-2.120 and 2.120), the researchers fail to reject the null hypothesis.

0May 9, 2010 at 1:05 am #190126Your definitions are correct for a 1-sample t test. However, is that the test you want to run? Don’t you want to run a test comparing your baseline data and variation with your pilot data and variation to see if you’ve made a significant impact? That is, is the 30-day time point the most important part of the study (you have to hit 30 days as close as possible), or is maximizing the time saved the most important part of the study (minimize the original 90-day timeline as much as possible)? You might do a 2-sample t test, where Ho = the mean of (the population represented by the average of) the first data set = the mean of (the population represented by the average of) the second data set vs Ha = the population means of the two data sets are not the same.

It feels to me like that’s the more powerful test for you, along with a phased control chart to graphically show the process improvement.

0May 10, 2010 at 9:15 am #190129

Luciano OsorioParticipant@Luciano-Osorio**Include @Luciano-Osorio in your post and this person will**

be notified via email.I completely agree with Delenn… The 2-sample T test is more adequate to verify the change… But I’m affraid that 17 is a small sample size, unless that the variance (before and after) are also too small…

0May 10, 2010 at 1:07 pm #190130

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Let;’s do a recap.

1.You had an old process with an average lead time of 90 days.

2. You made some changes in the belief that these changes would reduced the average lead time to 30 days.

3. After making the changes you took 17 samples and computed the usual population statistics and found that the sample exhibits an average lead time of 33.3 days with a standard deviation of 8 days.The question, as I understand your post, is this: Is there a statistically significant difference between my sample mean of my revised process and my expressed target of 30 days?

If the above correctly summarizes your efforts and your question then the method of choice would be a one sample t-test, the t-value is 1.701 and, as noted, it is not greater than the two sided critical value of 2.12 therefore you do not have sufficient evidence to reject the null, which is to say you do not have sufficient evidence to declare that 33.3 is significantly different from 30.

If the sample is representative and if it meets the criteria of a properly drawn sample (random, independent, etc.) then it would be reasonable to say that you have done what you set out to do.

As far as the size of the sample is concerned, in the absence of any other information, I’d say 17 is more than adequate – I can tell you that your sample is at least 33% larger than the largest sample I’ve ever been allowed when the issues of time and money have been the drivers. If you are concerned about “just” having 17 samples please remember that the whole point of a t-test is to use small samples to reach meaningful conclusions about whatever it is that you are measuring.

From time to time someone will offer up some “rule of thumb” with respect to minimum sample sizes for an “acceptable” t-test. These “rules of thumb” numbers seem to sort of have a lower limit of 15 and sort of trail off around either 30 or 45. They are without merit. Check the t-table of any statistics book and you will see that they start with n=2 and nowhere on any of those tables will you find caveats or warnings concerning 2 or any other sample size number less than 15.

0May 10, 2010 at 4:13 pm #190131I totally agree with Robert, as usual.

I would add another tool to your analysis: the control chart. Hypothesis tests never were intended to differentiate whether or not the data come from a “stable” process. If special cause variation is present the visual display can contain more useful process information that a hypothesis test would. Putting it another way: special cause variation can utterly change whether or not a null hypothesis is rejected.

As a rule of thumb, I try to avoid running hypothesis tests (or distribution identification procedures for that matter) until I have a reasonably stable process. It isn’t always feasible to insist, but I definitely prefer to do so.

0May 10, 2010 at 4:17 pm #190132Delenn-

Thank you for your input. For this improvement, however, we were focused on reducing lead times to 30 days and not reducing variation. As part of my conclusion and recommendations for future improvement, I have recommended that lead time variation be evaluated. The factors affecting variation were noted during the study, but were not a part of the improvement efforts for this study.

Thanks again for your insight!

0May 10, 2010 at 4:25 pm #190134I have been told that cycle time is one metric where mean and variation are correlated. My experience bears this out. In other words, if you reduce one, the other is sure to come down, too. That’s not a bad thing.

I just wish I knew whether this correlation has been proven to be “real.”

0May 10, 2010 at 4:29 pm #190135Thank you Robert and Jandell!

Robert, your summary of the hypothesis and project are exactly right. Thanks for your insight and this confirms my understanding of t-tests in general.

Jandell, your comments regarding the control chart are well taken. I will keep this in mind during future improvements, particularly as we look to reduce lead time variation moving forward.

-Toby

0May 16, 2010 at 3:26 pm #190155Just wondering if a one-tailed t test would be more appropriate? Are we only concerned if the mean is greater than 30 in a statistically significant way?

0May 17, 2010 at 11:51 am #190156

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.The statement in the original post was “The hypothesis was that improving the flow of materials would reduce lead times from a mean of 90 days to 30 days.” A one-tailed test is appropriate for situations where direction matters. Nowhere in the original post were words which would imply a focus on direction (i.e. phrases such as “greater than 30 days” or “less than 30 days”). Rather the issue, as implied in the original post, is that of equivalence – i.e. either greater than or less than 30 days but not significantly different from 30 days. The two tailed test is the test to use in such instances.

0May 17, 2010 at 1:02 pm #190157Robert,

Don’t you think it would be dumb to reject the null if the average had actually dropped to 15 days?

0May 17, 2010 at 1:29 pm #190159

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Not at all. The stated objective was to test to see if the changes had resulted in an actual shift in process time from 90 to 30 days. If the changes had resulted in 15 days then all the rejection is telling you is that you aren’t anywhere near 30 days.

If a time shorter than 30 days is indeed better then if I were writing the final report I’d note that not only had we reduced the time from 90 days but that the changes had given us results that were even better than what we had initiallly thought might be possible.

0May 17, 2010 at 1:49 pm #190160Statistical precision triumphs over common sense!!

The benefit would be shown in process capability or $. The real objective would have been “reduce lead time to 30 days or less” and is definitely one tailed. Your results are discussed with a statistically unsophisticated crowd, the last thing we need to confusion due to wordmanship. Our language is confusing enough as it is.

We have a hypothesis which we reject if we do not get what we want. So simple.

0May 17, 2010 at 3:10 pm #190162

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I think it is more a matter of common sense triumphing over unasked for targets.

The original post said “The hypothesis was that improving the flow of materials would reduce lead times from a mean of 90 days to 30 days” Note the lack of emphasis on direction. The original post also noted they had achieved 33 days. For the data provided the two tailed test demonstrated that 33 days was not significantly different from 30. Therefore mission accomplished.

If we ignore his stated objective and decide we know better and run a one tailed test where

**our**hypothesis is that the new process needs to be less than or equal to 30 days the test will fail and we will wind up telling him his efforts failed and that he needs to do more work. In other words we have a hypothesis which we reject if we don’t get what we want. Simple, yes. Unfortunately, what we decided he needed isn’t what he wanted.If indeed he got 15 days instead of 33 and the proposed target was 30 I could imagine someone asking about process variation and wanting to know if the analysis supported the claim that 15 was real and perhaps repeatable. To that end I’d report the average and the 95% confidence interval around 15. If there were any residual questions about the difference between 15 and 30 one glance at the CI would answer them and if 30 was outside of those limits. No one, in my experience, would ask anything more.

0May 18, 2010 at 4:28 am #190164So Robert, the real question here is what was not told. There had to have some assumptions of how close to 30 and why. It’s the basis for sample size and also the basis to know if achieving 33 is close enough to declare victory.

You are wanting to be rigid about what was stated, but neither of us has a clue if the accomplishment is good. I have had more cases where the need was truly to achieve a certain number or less – to achieve a demand increase, to bring costs in line to keep a facility open, …. In fact with time, I have never seen the case where greater reduction was not good. If one tail was right, it has cost implications.

A better lesson than rigidity would be to get the questioner to complete their question.

0May 18, 2010 at 12:47 pm #190165

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I suppose we could view the problem as that of what was not told, however, when I read a post such as the one that started this thread I’m willing to assume that what was asked was indeed what was wanted and that the stated objective is the result of careful thought.

Consider what was said:

” The hypothesis was that improving the flow of materials would reduce lead times from a mean of 90 days to 30 days.”

“A pilot was run and the sample data observed showed a mean of 33.3 days base on 17 observations.”

“This was all that time and money would allow”

While not eveything has been spelled out in mind numbing detail it has been my experience that statements such as these strongly imply the investigator has done his/her homework with respect to problem definition, examination of prior data, and all of the other things one normally does in the phases of DMAIC.

Under these circumstances I will provide the best answer I can to address the question as asked since the presentation leaves me with the impression that the individual knows what they want.

This, of course, still leaves the question you asked – what happens if his efforts gave an average of 15 instead of 33?

If we assume that the choice of 30 days is as inferred above – the end result of a careful weighing of the evidence – then in my experience the following is also true:

1. 30 days is the hoped for target.

2. The process variables we have chosen are the end result of careful problem definition.

3. We think changing these variables will result in a reduction in time.

4. We hope that the result will be 30 days but we recognize this may not happen.So, we run the experiments and we get 15 days!

When things like this have happened to me (and they have on numerous occasions) the sequence of events goes something like this:

1. Surprise and amazement – followed by a flurry of activity by everyone involved.

a. I rush back to my cube and run preliminary plots of the data and do some rudimentary analysis to make sure that the results are representative and are not just the result of some questionable outlier.

b. The engineers start e-mailing me asking about my graphs and analysis and, if I say the initial results look fine, they, in turn, start holding meetings to identify the quickest way to make the necessary changes.

c. The managers are e-mailing everyone wanting to know when they can go to the big guy and announce the improvement.

d. If the changes are something that can be implemented at the line level – second trick will have already told third trick about the 15 days and the second and third trick foremen will have called me for confirmation. If I answer in the affirmative they will pull their people into a meeting and tell everyone what changes they want to make NOW.

e. Assuming the 15 days was real there will be a meeting then following day and

1. I’ll present my graphs and my analysis and I’ll comment on the certainty of the 15 days.

2. The engineers will have their long term plans drawn up and initial steps will already have been taken.

3. First trick foreman will show up with the results of the preliminary changes made the night before – if all went well the changes will have already become a part of line practice.

4. The managers will take my graphs, a sampling of slides from various power point presentations and that afternoon they will have a meeting with the big guy to announce their latest success.

At no time during all of that effort will anyone bother to ask me to run a one or a two tailed t-test to examine significant differences between 30 days and 15 days and at no time during that effort will I bother to do so. As I mentioned previously – I’ll check the 15 day result very carefully. If everything is fine I’ll report confidence bounds around 15 and if it isn’t then the result of 15 will be of no interest and we will look at other things

All of the above, of course, assumes that shorter is indeed better. This might not be the case. A time of 15 days might be great for my part of the process but what about the rest of the system?

Maybe taking advantage of 15 days on my part of the system would require extensive/expensive overhaul of other parts of the process that would not be cost effective.

Maybe there simply isn’t any demand/need for that kind of time.

One real life example for the second point – we built a deicing fluid for aircraft. The spec called for the fluid to be effective on the wing for some period of time – I’ve forgotten the exact number. We found that we could double and even triple that time by making a few additional changes to the composition with no real change in the cost- no one was interested – at that time regulations were such that if a plane was on the ground past the specified time it would not be allowed to fly without first returning to the gate for checks and fuel.

0May 19, 2010 at 2:45 pm #190168Wow Robert, you’ve got way to much time on your hands.

Most time reduction efforts I’ve been involved with exceed their target and people just take the money and run.

The 30 or less is implied, you are making way more of this than it is.

0 - AuthorPosts

You must be logged in to reply to this topic.