Rejected! The Ugly Truth About Hypothesis Testing

You’ve got data. You’ve got a hypothesis. You’ve got Minitab… but you don’t have statistical significance at a p-value equal to or less than 0.05. You must have missed the critical Xs in the Define phase, right? It’s time to go figure out what variables you missed and collect more expensive, time-consuming and team-irritating data, right?

Wrong!

Hold on just a second. What are you trying to do here? Chances are you have made some sort of hypothesis about whether something takes longer, costs more, or has more defects than something else. You are hoping to reject the null hypothesis that says the variable of interest has no effect. If only you could do that, you would have found a critical X and could then take steps to improve your process.

Alas, you did not have a significant finding… Is that because there really is not a relationship between your variables? Or is it because you rushed your project? It can be tempting to hurry things along to get to the Improve phase, but that can backfire on you. Not taking the time to consider the foundations of hypothesis testing creates rework for you and your team.

Hypothesis testing is the art of being able to determine whether one group of things is different from another. The null hypothesis always states that the variables under study do not make a difference, and the burden is on the data to prove otherwise. That means rejecting the null hypothesis is making an assertion that your variables cause a difference in your outcome measure. This is probably the point of your Six Sigma project.

Null Hypothesis About More Than Just P-value

Most people use a p-value equal to or less than 0.05 as the criteria for rejecting the null. However, the ability to correctly reject your null hypothesis is about more than just your p-value. Traditional, but arbitrary, the 0.05 line in the sand represents the risk of making a false assertion five percent of the time. That could mean sending an innocent person to jail, firing a line worker for a problem he did not cause, or spending scarce company resources to fix a problem that did not exist. This little problem is called a Type I error, and most people live in fear of it. Therefore, strict adherence to the 0.05 rule has been commonly accepted among most practitioners and consumers of research. However, this practice does not guarantee you will not draw the wrong conclusion. Remember there is still a five percent chance you might be completely wrong.

To make matters even worse, strictly standing by significance at a p-value equal to or less than 0.05 might make you miss your critical X even though you had it right there in front of you. This could mean that the guilty person goes free, your cycle times are still too long, or your defect rate remains too high despite all of your DMAIC (Define, Measure, Analyze, Improve, Control) efforts. This other little problem is called a Type II error and is often ignored to the detriment of quality projects everywhere. If you cannot find your X, you cannot control it.

Getting Out of Type I and II Tough Spots

Both types of errors put the Six Sigma practitioner in a tough spot. You do not want to waste money by fixing unbroken systems, and you do not want to waste money by endlessly chasing the foxfire of your process’ problems. Both are expensive and nonproductive outcomes. But you can sail safely between the rock and the hard place. Always keep one eye out for making a Type I error while paying obsessive attention to minimizing the risk of a Type II error. After all, once you have found something, you can then decide if you are truly convinced by the evidence at hand.

A quick, almost cheating, way to dodge a Type II error and find a “significant” result is to increase your alpha (the risk of making an incorrect conclusion) to something a little higher than 0.05. However, you should proceed with extreme caution because doing this is risky business, and you should have other evidence that you are making a rational decision. Listen to your team and check the literature to find out what similar projects have found. Then think twice. After all, you are increasing your chances of making a big mistake.

Luckily, there are safer ways to proceed. Increasing your power is the best way to avoid those sneaky Type II errors. Statistical power is the ability to detect a true difference, which helps you to reject your null hypothesis appropriately. You can calculate the power of your study, but more practically, power is either increased or decreased by the way you go about collecting and analyzing your data. Listed below are some tricks for increasing your power to discover which Xs are troubling your process.

Tricks for Increasing Your Power

Sample Size – Before you collect the data, determine what sample size you will need to detect the difference you are looking for, then do your best to get all of it. Having an appropriate sample size is the best way to increase your power as it helps you pull your signal out of the noise. No matter how well you execute your study, you will always have noise.

Continuous Versus Attribute Data – When appropriate, design your data collection to give you continuous data. Continuous data contains more degrees of freedom than attribute data, and these degrees of freedom are what allow you to detect differences during hypothesis testing. You can always recode your data into attribute data if needed, but you can’t go the other way if you didn’t start with continuous data. Another plus: Continuous data usually requires a smaller sample size for statistical analysis. Some projects will only allow you to obtain attribute data because of the nature of your variables and that is perfectly fine. The most important thing is to gather the data that will best answer your questions.

Measurement System Analysis (MSA) – Always pay attention to gauge error. If there is too much variation in the system you use to gather data, you will not be able to discern what is causing variation in your process. It is a little like Where’s Waldo – you know he is there but you cannot find him easily because he is hidden in the midst of many other distracting pictures on the page. Doing everything you can to reduce variation in your measurements will go a long way in helping you avoid a Type II error.

Statistical Tests – Choose the appropriate analysis for your situation and remember that one size does not fit all. If you violate the assumptions of these tests, you are dulling your ability to detect differences between your variables. Make sure you get what you think you are asking for.

Conclusion: Finding the Right Answer Sooner

By leveraging the power of your statistical analyses, thus avoiding Type II errors, you can find the right answer sooner and with less rework. Real life is not perfect and most times you will not be able to get great measurements or just enough data, but getting as close as you can will make all the difference. X only marks the critical spot if you are using the right map and are willing to stop and ask for directions.

Rejected! The Ugly Truth About Hypothesis Testing

Null Hypothesis About More Than Just P-value

Getting Out of Type I and II Tough Spots

Tricks for Increasing Your Power

Conclusion: Finding the Right Answer Sooner

About the Author

Amy Diane Short