Home › Forums › General Forums › Tools & Templates › P-Values and Their Meaning
This topic contains 2 replies, has 2 voices, and was last updated by Jim Frost 1 week, 6 days ago.
So I was asked about P values and really needed to apply myself to try and understand this topic in detail. I have put together a learning module, but I’m not sure if I’m describing it correctly and was hoping to get some input from this community of ways to improve/correct it. Any new ways you could share to help make this topic more simple is what I’m after. Please see below for what I put together and reply with corrections or additions to make it more useful. If you find it useful, please use it to help someone else out! Here’s the essay:
What are the numbers being reported by Minitab and what do they mean to me?
A common output of a statistical test is called a P value. A P value is also known as the “calculated probability.” But the calculated probability of what, you ask?
It is the probability that you will find the observed or more extreme results when the null hypothesis of a test is true. We will go into greater detail on this later.
What’s a null hypothesis? It is a statement of how something is expected to operate normally without any intervention or change. The null hypothesis is written as Ho:. The effect that you are trying to determine if it has any influence is known as the alternative hypothesis. It is written as Ha:. The alternative hypothesis then is the statement that describes that change you expect to see by controlling one or several variables during a test. Any change from the null being true or the alternative being true will be described numerically by the P value. We will look at what different P values being reported means for your hypothesis test next.
In a hypothesis test, we are testing to see if the variable(s) we chose in our experiment have an effect on results being reported from that test. We are asking: “Does a change in variable A cause our process to behave differently than how we expect it to perform normally (normally meaning without a change to variable A in this example)?” Our P value generated by the results of the test with the change to variable A will tell us if a change has occurred based on this change to variable A.
For example, if our P value reported from the data in our test to see the effect of a change to variable A comes back as a low value (low means a P value less than or equal to 0.05), then we know that our change to variable A had a significant impact on the values being reported – we can say that the change to A made a difference.
A way to remember if the change is significant is through this saying: If P is low, reject the null. If confusion ever sets in, I find that reminding myself of this saying helps to straighten things out for me. Remember that the threshold for “being low” is usually set to 0.05 or less.
Now, if a P value reported from the data from your test was returned as a high value (e.g. > 0.05), then we would accept the null hypothesis as the truth. That is, we can say that the alternative hypothesis was not true and that our change to our variable(s) did not have an effect on our reported results. The P value being reported will tell us “how true” the null hypothesis is.
So a final way to describe the P value reported from a hypothesis test is the amount (usually converted to a percentage) that our variable setting in our hypothesis test did not cause a difference in the reported values.
Let’s take this ‘did not’ cause a difference perspective a bit further. If we had a P value reported at 0.813, we can say that we have an 81.3% probability that our variable settings did not cause a difference in the reported values. Conversely, if we have a P value reported as 0.026, we can say that we have a 2.6% probability that our variable settings did not cause a difference in the reported values compared to the original values. Stated another way, this low score for our null hypothesis allows us to say that since there is only a 2.6% chance our null hypothesis is true, there is a 97.4% chance that the alternate hypothesis is true.
Normally, a confidence level of 95% is used to make the decision about the null hypothesis. This 95% confidence is stated as a P value of 0.05 and is used as the threshold for significance. Anything reported as a P value < 0.05 allows us to reject the null and accept the alternate hypothesis as true instead. Stated another way, we have low confidence that our variable settings did not cause a difference in how the system performed. That last sentence is confusing. A better way to say it would be that we have high confidence that our settings did indeed cause a change in the system performance. Remember, when P is low, reject the null.
To bring this all together, we can say two things about what P values give us.
1. A P value will allow us to accept or reject the null hypothesis using the 0.05 threshold. Make this determination first.
2. A P value tells us how likely that our results are different than what is normally expected to occur – or – that we are some percentage sure that the variables (s) did not cause a difference in reported values.
Let’s look at some final examples.
Say we have a chemical process that produces a liquid that is used to cool cutting tools. The experimenter wants to know if making a change to Component A of the cutting tool fluid mixture will allow for increased performance of the fluid.
We would state our null hypothesis (Ho) and alternate hypothesis (Ha) as such:
Ho: No change in performance of fluid
Ha: Change in performance of fluid
So we would use the current fluid mixture before the test to set a baseline for how the fluid performs. We would then produce a new mixture with Component A changed to what we believe our better performing level would be for that component, and then use the liquid in that manufacturing process to gather data on how it performs.
If our returned P value was 0.045 for the data using the new Component A level, we would conclude that:
1. Since P is low (below 0.05), we can reject the null and state that changing Component A does have a significant effect on the fluid’s performance, and,
2. that there is only a 4.5% chance that the increase to the fluid’s performance was due to random sampling error – or – that we are 95.5% confident ( 100% – 4.50%) that changing Component A has an effect on the fluid’s performance.
@bigsby – your discussion tends to focus on cause/effect sorts of statistics tests, but p-values are not limited to such. For example, checking for normality. There is no cause/effect relationship check with an Anderson-Darling check for example. The p-value is the probability of obtaining a test statistic that is at least as extreme as the calculated value if the null hypothesis is true. For the normality test, the hypotheses are, H0: data follow a normal distribution vs. Ha: data do not follow a normal distribution. Typically, we’d be looking for a p value less than the threshold value, but in the case of examining for normality, the null hypothesis is that the data is normal, and only if the p-value is below our threshold (typically 0.05 or 0.10) do we conclude that it is not.
Yes, it’s a convoluted subject that is hard to explain in a clear manner. The problem is that p-values don’t tell us what we really want to know, so they lend themselves to misinterpretation.
One correction to what you’ve written. You never accept the null hypothesis. You only fail to reject it. If you have a small sample size, a very noisy data, or a small effect, your p-value can be high. However, this doesn’t indicate that the null is true. Your sample doesn’t provide sufficient evidence to be able to reject the null.
Finally, your description of the “did not cause” interpretation is incorrect. This misinterpretation produces the illusion of substantially more evidence against the null than is warranted. So, it’s a big problem.
© Copyright iSixSigma 2000-2017. User Agreement. Any reproduction or other use of content without the express written consent of iSixSigma is prohibited. More »