iSixSigma

Null Hypothesis and two tailed t-test

Six Sigma – iSixSigma Forums General Forums Methodology Null Hypothesis and two tailed t-test

Viewing 17 posts - 1 through 17 (of 17 total)
  • Author
    Posts
  • #53445

    Toby
    Member

    Looking for some guidance on hypothesis testing, particularly how the null hypothesis is stated. I conducted an improvement project aimed aimed at reducing lead times. The hypothesis was that improving the flow of materials would reduce lead times from a mean of 90 days to 30 days. A pilot was run and the sample data observed showed a mean of 33.3 days based on 17 observations (this was all that time and $ would allow). The below test was setup to reject/fail to reject the hypothesis. I am confident that the math is correct, but is the null and alternative hypothesis stated correctly?

    Null Hypothesis (H0): mean = 30
    Alternative Hypothesis (Ha): mean does not equal 30
    Significance Level (alpha) = 0.05
    Number of Samples (n) = 17
    Degrees of Freedom (df) = n-1 = 17-1 = 16
    Sample Mean (Xbar) = 33.3
    Sample Standard Deviation (s) = 8.0
    Critical Region of the t-Distribution: -2.120 is less than or equal to t, and t is less than or equal to 2.120
    Test Statistic (t): 1.701

    Since the test statistic result 1.701 lies between the critical regions of the t-distribution (-2.120 and 2.120), the researchers fail to reject the null hypothesis.

    0
    #190126

    Leader
    Participant

    Your definitions are correct for a 1-sample t test. However, is that the test you want to run? Don’t you want to run a test comparing your baseline data and variation with your pilot data and variation to see if you’ve made a significant impact? That is, is the 30-day time point the most important part of the study (you have to hit 30 days as close as possible), or is maximizing the time saved the most important part of the study (minimize the original 90-day timeline as much as possible)? You might do a 2-sample t test, where Ho = the mean of (the population represented by the average of) the first data set = the mean of (the population represented by the average of) the second data set vs Ha = the population means of the two data sets are not the same.

    It feels to me like that’s the more powerful test for you, along with a phased control chart to graphically show the process improvement.

    0
    #190129

    Luciano Osorio
    Participant

    I completely agree with Delenn… The 2-sample T test is more adequate to verify the change… But I’m affraid that 17 is a small sample size, unless that the variance (before and after) are also too small…

    0
    #190130

    Robert Butler
    Participant

    Let;’s do a recap.

    1.You had an old process with an average lead time of 90 days.
    2. You made some changes in the belief that these changes would reduced the average lead time to 30 days.
    3. After making the changes you took 17 samples and computed the usual population statistics and found that the sample exhibits an average lead time of 33.3 days with a standard deviation of 8 days.

    The question, as I understand your post, is this: Is there a statistically significant difference between my sample mean of my revised process and my expressed target of 30 days?

    If the above correctly summarizes your efforts and your question then the method of choice would be a one sample t-test, the t-value is 1.701 and, as noted, it is not greater than the two sided critical value of 2.12 therefore you do not have sufficient evidence to reject the null, which is to say you do not have sufficient evidence to declare that 33.3 is significantly different from 30.

    If the sample is representative and if it meets the criteria of a properly drawn sample (random, independent, etc.) then it would be reasonable to say that you have done what you set out to do.

    As far as the size of the sample is concerned, in the absence of any other information, I’d say 17 is more than adequate – I can tell you that your sample is at least 33% larger than the largest sample I’ve ever been allowed when the issues of time and money have been the drivers. If you are concerned about “just” having 17 samples please remember that the whole point of a t-test is to use small samples to reach meaningful conclusions about whatever it is that you are measuring.

    From time to time someone will offer up some “rule of thumb” with respect to minimum sample sizes for an “acceptable” t-test. These “rules of thumb” numbers seem to sort of have a lower limit of 15 and sort of trail off around either 30 or 45. They are without merit. Check the t-table of any statistics book and you will see that they start with n=2 and nowhere on any of those tables will you find caveats or warnings concerning 2 or any other sample size number less than 15.

    0
    #190131

    Andell
    Participant

    I totally agree with Robert, as usual.

    I would add another tool to your analysis: the control chart. Hypothesis tests never were intended to differentiate whether or not the data come from a “stable” process. If special cause variation is present the visual display can contain more useful process information that a hypothesis test would. Putting it another way: special cause variation can utterly change whether or not a null hypothesis is rejected.

    As a rule of thumb, I try to avoid running hypothesis tests (or distribution identification procedures for that matter) until I have a reasonably stable process. It isn’t always feasible to insist, but I definitely prefer to do so.

    0
    #190132

    Toby
    Member

    Delenn-

    Thank you for your input. For this improvement, however, we were focused on reducing lead times to 30 days and not reducing variation. As part of my conclusion and recommendations for future improvement, I have recommended that lead time variation be evaluated. The factors affecting variation were noted during the study, but were not a part of the improvement efforts for this study.

    Thanks again for your insight!

    0
    #190134

    Andell
    Participant

    I have been told that cycle time is one metric where mean and variation are correlated. My experience bears this out. In other words, if you reduce one, the other is sure to come down, too. That’s not a bad thing.

    I just wish I knew whether this correlation has been proven to be “real.”

    0
    #190135

    Toby
    Member

    Thank you Robert and Jandell!

    Robert, your summary of the hypothesis and project are exactly right. Thanks for your insight and this confirms my understanding of t-tests in general.

    Jandell, your comments regarding the control chart are well taken. I will keep this in mind during future improvements, particularly as we look to reduce lead time variation moving forward.

    -Toby

    0
    #190155

    cxg174
    Participant

    Just wondering if a one-tailed t test would be more appropriate? Are we only concerned if the mean is greater than 30 in a statistically significant way?

    0
    #190156

    Robert Butler
    Participant

    The statement in the original post was “The hypothesis was that improving the flow of materials would reduce lead times from a mean of 90 days to 30 days.” A one-tailed test is appropriate for situations where direction matters. Nowhere in the original post were words which would imply a focus on direction (i.e. phrases such as “greater than 30 days” or “less than 30 days”). Rather the issue, as implied in the original post, is that of equivalence – i.e. either greater than or less than 30 days but not significantly different from 30 days. The two tailed test is the test to use in such instances.

    0
    #190157

    Cone
    Participant

    Robert,

    Don’t you think it would be dumb to reject the null if the average had actually dropped to 15 days?

    0
    #190159

    Robert Butler
    Participant

    Not at all. The stated objective was to test to see if the changes had resulted in an actual shift in process time from 90 to 30 days. If the changes had resulted in 15 days then all the rejection is telling you is that you aren’t anywhere near 30 days.

    If a time shorter than 30 days is indeed better then if I were writing the final report I’d note that not only had we reduced the time from 90 days but that the changes had given us results that were even better than what we had initiallly thought might be possible.

    0
    #190160

    Cone
    Participant

    Statistical precision triumphs over common sense!!

    The benefit would be shown in process capability or $. The real objective would have been “reduce lead time to 30 days or less” and is definitely one tailed. Your results are discussed with a statistically unsophisticated crowd, the last thing we need to confusion due to wordmanship. Our language is confusing enough as it is.

    We have a hypothesis which we reject if we do not get what we want. So simple.

    0
    #190162

    Robert Butler
    Participant

    I think it is more a matter of common sense triumphing over unasked for targets.

    The original post said “The hypothesis was that improving the flow of materials would reduce lead times from a mean of 90 days to 30 days” Note the lack of emphasis on direction. The original post also noted they had achieved 33 days. For the data provided the two tailed test demonstrated that 33 days was not significantly different from 30. Therefore mission accomplished.

    If we ignore his stated objective and decide we know better and run a one tailed test where our hypothesis is that the new process needs to be less than or equal to 30 days the test will fail and we will wind up telling him his efforts failed and that he needs to do more work. In other words we have a hypothesis which we reject if we don’t get what we want. Simple, yes. Unfortunately, what we decided he needed isn’t what he wanted.

    If indeed he got 15 days instead of 33 and the proposed target was 30 I could imagine someone asking about process variation and wanting to know if the analysis supported the claim that 15 was real and perhaps repeatable. To that end I’d report the average and the 95% confidence interval around 15. If there were any residual questions about the difference between 15 and 30 one glance at the CI would answer them and if 30 was outside of those limits. No one, in my experience, would ask anything more.

    0
    #190164

    Cone
    Participant

    So Robert, the real question here is what was not told. There had to have some assumptions of how close to 30 and why. It’s the basis for sample size and also the basis to know if achieving 33 is close enough to declare victory.

    You are wanting to be rigid about what was stated, but neither of us has a clue if the accomplishment is good. I have had more cases where the need was truly to achieve a certain number or less – to achieve a demand increase, to bring costs in line to keep a facility open, …. In fact with time, I have never seen the case where greater reduction was not good. If one tail was right, it has cost implications.

    A better lesson than rigidity would be to get the questioner to complete their question.

    0
    #190165

    Robert Butler
    Participant

    I suppose we could view the problem as that of what was not told, however, when I read a post such as the one that started this thread I’m willing to assume that what was asked was indeed what was wanted and that the stated objective is the result of careful thought.

    Consider what was said:

    ” The hypothesis was that improving the flow of materials would reduce lead times from a mean of 90 days to 30 days.”

    “A pilot was run and the sample data observed showed a mean of 33.3 days base on 17 observations.”

    “This was all that time and money would allow”

    While not eveything has been spelled out in mind numbing detail it has been my experience that statements such as these strongly imply the investigator has done his/her homework with respect to problem definition, examination of prior data, and all of the other things one normally does in the phases of DMAIC.

    Under these circumstances I will provide the best answer I can to address the question as asked since the presentation leaves me with the impression that the individual knows what they want.

    This, of course, still leaves the question you asked – what happens if his efforts gave an average of 15 instead of 33?

    If we assume that the choice of 30 days is as inferred above – the end result of a careful weighing of the evidence – then in my experience the following is also true:

    1. 30 days is the hoped for target.
    2. The process variables we have chosen are the end result of careful problem definition.
    3. We think changing these variables will result in a reduction in time.
    4. We hope that the result will be 30 days but we recognize this may not happen.

    So, we run the experiments and we get 15 days!

    When things like this have happened to me (and they have on numerous occasions) the sequence of events goes something like this:

    1. Surprise and amazement – followed by a flurry of activity by everyone involved.

    a. I rush back to my cube and run preliminary plots of the data and do some rudimentary analysis to make sure that the results are representative and are not just the result of some questionable outlier.

    b. The engineers start e-mailing me asking about my graphs and analysis and, if I say the initial results look fine, they, in turn, start holding meetings to identify the quickest way to make the necessary changes.

    c. The managers are e-mailing everyone wanting to know when they can go to the big guy and announce the improvement.

    d. If the changes are something that can be implemented at the line level – second trick will have already told third trick about the 15 days and the second and third trick foremen will have called me for confirmation. If I answer in the affirmative they will pull their people into a meeting and tell everyone what changes they want to make NOW.

    e. Assuming the 15 days was real there will be a meeting then following day and

    1. I’ll present my graphs and my analysis and I’ll comment on the certainty of the 15 days.

    2. The engineers will have their long term plans drawn up and initial steps will already have been taken.

    3. First trick foreman will show up with the results of the preliminary changes made the night before – if all went well the changes will have already become a part of line practice.

    4. The managers will take my graphs, a sampling of slides from various power point presentations and that afternoon they will have a meeting with the big guy to announce their latest success.

    At no time during all of that effort will anyone bother to ask me to run a one or a two tailed t-test to examine significant differences between 30 days and 15 days and at no time during that effort will I bother to do so. As I mentioned previously – I’ll check the 15 day result very carefully. If everything is fine I’ll report confidence bounds around 15 and if it isn’t then the result of 15 will be of no interest and we will look at other things

    All of the above, of course, assumes that shorter is indeed better. This might not be the case. A time of 15 days might be great for my part of the process but what about the rest of the system?

    Maybe taking advantage of 15 days on my part of the system would require extensive/expensive overhaul of other parts of the process that would not be cost effective.

    Maybe there simply isn’t any demand/need for that kind of time.

    One real life example for the second point – we built a deicing fluid for aircraft. The spec called for the fluid to be effective on the wing for some period of time – I’ve forgotten the exact number. We found that we could double and even triple that time by making a few additional changes to the composition with no real change in the cost- no one was interested – at that time regulations were such that if a plane was on the ground past the specified time it would not be allowed to fly without first returning to the gate for checks and fuel.

    0
    #190168

    Mikel
    Member

    Wow Robert, you’ve got way to much time on your hands.

    Most time reduction efforts I’ve been involved with exceed their target and people just take the money and run.

    The 30 or less is implied, you are making way more of this than it is.

    0
Viewing 17 posts - 1 through 17 (of 17 total)

You must be logged in to reply to this topic.