Home › Forums › General Forums › Tools & Templates › P-Values and Their Meaning
This topic contains 14 replies, has 5 voices, and was last updated by MBBinWI 1 month, 3 weeks ago.
So I was asked about P values and really needed to apply myself to try and understand this topic in detail. I have put together a learning module, but I’m not sure if I’m describing it correctly and was hoping to get some input from this community of ways to improve/correct it. Any new ways you could share to help make this topic more simple is what I’m after. Please see below for what I put together and reply with corrections or additions to make it more useful. If you find it useful, please use it to help someone else out! Here’s the essay:
What are the numbers being reported by Minitab and what do they mean to me?
A common output of a statistical test is called a P value. A P value is also known as the “calculated probability.” But the calculated probability of what, you ask?
It is the probability that you will find the observed or more extreme results when the null hypothesis of a test is true. We will go into greater detail on this later.
What’s a null hypothesis? It is a statement of how something is expected to operate normally without any intervention or change. The null hypothesis is written as Ho:. The effect that you are trying to determine if it has any influence is known as the alternative hypothesis. It is written as Ha:. The alternative hypothesis then is the statement that describes that change you expect to see by controlling one or several variables during a test. Any change from the null being true or the alternative being true will be described numerically by the P value. We will look at what different P values being reported means for your hypothesis test next.
In a hypothesis test, we are testing to see if the variable(s) we chose in our experiment have an effect on results being reported from that test. We are asking: “Does a change in variable A cause our process to behave differently than how we expect it to perform normally (normally meaning without a change to variable A in this example)?” Our P value generated by the results of the test with the change to variable A will tell us if a change has occurred based on this change to variable A.
For example, if our P value reported from the data in our test to see the effect of a change to variable A comes back as a low value (low means a P value less than or equal to 0.05), then we know that our change to variable A had a significant impact on the values being reported – we can say that the change to A made a difference.
A way to remember if the change is significant is through this saying: If P is low, reject the null. If confusion ever sets in, I find that reminding myself of this saying helps to straighten things out for me. Remember that the threshold for “being low” is usually set to 0.05 or less.
Now, if a P value reported from the data from your test was returned as a high value (e.g. > 0.05), then we would accept the null hypothesis as the truth. That is, we can say that the alternative hypothesis was not true and that our change to our variable(s) did not have an effect on our reported results. The P value being reported will tell us “how true” the null hypothesis is.
So a final way to describe the P value reported from a hypothesis test is the amount (usually converted to a percentage) that our variable setting in our hypothesis test did not cause a difference in the reported values.
Let’s take this ‘did not’ cause a difference perspective a bit further. If we had a P value reported at 0.813, we can say that we have an 81.3% probability that our variable settings did not cause a difference in the reported values. Conversely, if we have a P value reported as 0.026, we can say that we have a 2.6% probability that our variable settings did not cause a difference in the reported values compared to the original values. Stated another way, this low score for our null hypothesis allows us to say that since there is only a 2.6% chance our null hypothesis is true, there is a 97.4% chance that the alternate hypothesis is true.
Normally, a confidence level of 95% is used to make the decision about the null hypothesis. This 95% confidence is stated as a P value of 0.05 and is used as the threshold for significance. Anything reported as a P value < 0.05 allows us to reject the null and accept the alternate hypothesis as true instead. Stated another way, we have low confidence that our variable settings did not cause a difference in how the system performed. That last sentence is confusing. A better way to say it would be that we have high confidence that our settings did indeed cause a change in the system performance. Remember, when P is low, reject the null.
To bring this all together, we can say two things about what P values give us.
1. A P value will allow us to accept or reject the null hypothesis using the 0.05 threshold. Make this determination first.
2. A P value tells us how likely that our results are different than what is normally expected to occur – or – that we are some percentage sure that the variables (s) did not cause a difference in reported values.
Let’s look at some final examples.
Say we have a chemical process that produces a liquid that is used to cool cutting tools. The experimenter wants to know if making a change to Component A of the cutting tool fluid mixture will allow for increased performance of the fluid.
We would state our null hypothesis (Ho) and alternate hypothesis (Ha) as such:
Ho: No change in performance of fluid
Ha: Change in performance of fluid
So we would use the current fluid mixture before the test to set a baseline for how the fluid performs. We would then produce a new mixture with Component A changed to what we believe our better performing level would be for that component, and then use the liquid in that manufacturing process to gather data on how it performs.
If our returned P value was 0.045 for the data using the new Component A level, we would conclude that:
1. Since P is low (below 0.05), we can reject the null and state that changing Component A does have a significant effect on the fluid’s performance, and,
2. that there is only a 4.5% chance that the increase to the fluid’s performance was due to random sampling error – or – that we are 95.5% confident ( 100% – 4.50%) that changing Component A has an effect on the fluid’s performance.
@bigsby – your discussion tends to focus on cause/effect sorts of statistics tests, but p-values are not limited to such. For example, checking for normality. There is no cause/effect relationship check with an Anderson-Darling check for example. The p-value is the probability of obtaining a test statistic that is at least as extreme as the calculated value if the null hypothesis is true. For the normality test, the hypotheses are, H0: data follow a normal distribution vs. Ha: data do not follow a normal distribution. Typically, we’d be looking for a p value less than the threshold value, but in the case of examining for normality, the null hypothesis is that the data is normal, and only if the p-value is below our threshold (typically 0.05 or 0.10) do we conclude that it is not.
Yes, it’s a convoluted subject that is hard to explain in a clear manner. The problem is that p-values don’t tell us what we really want to know, so they lend themselves to misinterpretation.
One correction to what you’ve written. You never accept the null hypothesis. You only fail to reject it. If you have a small sample size, a very noisy data, or a small effect, your p-value can be high. However, this doesn’t indicate that the null is true. Your sample doesn’t provide sufficient evidence to be able to reject the null.
Finally, your description of the “did not cause” interpretation is incorrect. This misinterpretation produces the illusion of substantially more evidence against the null than is warranted. So, it’s a big problem.
And a other observation is that de p-value can be changed , for example the airplane industries, medical or medicine industries and car industries use p-value of 0.01.
And if you are doing regression of multivariables you con use p-value of 0.1 or 0.15 to have a good analysis.
Fos,
Interesting comments and I’m not sure fully accurate. You’ll see plenty of medical journal studies where a significance is found with p-values greater than 0.01
@bigsby I am not sure where I am coming in on this subject with regard to other people posting. Possibly close to @MBBinWI.
If you interpret stats as cause and effect you are always at risk. Statistics cannot determine that for any certainty. Stats looks at numbers and relationships between numbers. I can generate a lot of sets of random numbers in Minitab and test them for normality, variance and central tendency. The response from any of those tests cannot reflect any cause and effect. What you are getting with a p value is the probability of a relationship. The fact of the relationship is something you need to figure out.
Just my opinion.
Thanks Mike. I think you just made it all come together for me. P values are the probability of a relationship existing, but whether or not that relationship really exists is what will need to be determined. Sound about right?
Cris , I saw a project of a laboratory clinical analysis that they use for some situations that they use 0.01 to do more critical analysis and minimizing error probability , in a car industries use this for some critical parts and equipaments.
My intention of the commentary is open the mind that the p-value = 0.05. Or 5% is not always necessary , in same cases you can use other values. To check the normality is usually 0.05 but for other analysis you can use other values.
…and for those situations where you didn’t quite make it to < .05 we have the following choices for describing what happened (all of these quotes are from the published literature)should you find yourself in need of a phrase to describe what happened. :-)
428 approximately significant (p=0.053) 0.053 0.05300
429 at the limits of significance (p=0.053) 0.053 0.05300
430 at the very edge of significance (p=0.053) 0.053 0.05300
431 barely missed the commonly acceptable significance level (p<0.053)
<0.053 0.05300
432 borderline level of statistical significance (p=0.053) 0.053 0.05300
433 just above the margin of significance (p=0.053) 0.053 0.05300
434 just lacked significance (p=0.053) 0.053 0.05300
435 just shy of significance (p=0.053) 0.053 0.05300
436 on the verge of significance (p=0.053) 0.053 0.05300
437 slightly outside the statistical significance level (p=0.053) 0.053 0.05300
438(barely) not statistically significant (p=0.052) 0.052 0.05200
439 a clear tendency to significance (p=0.052) 0.052 0.05200
440 a marginal trend toward significance (p=0.052) 0.052 0.05200
441 a possible trend toward significance (p=0.052) 0.052 0.05200
442 approaching prognostic significance (p=0.052) 0.052 0.05200
443 did not quite achieve the conventional levels of significance (p=0.052)
0.052 0.05200
444 just skirting the boundary of significance (p=0.052) 0.052 0.05200
445 narrowly avoided significance (p=0.052) 0.052 0.05200
446 nearly borderline significance (p=0.052) 0.052 0.05200
447 not exactly significant (p=0.052) 0.052 0.05200
448 on the brink of significance (p=0.052) 0.052 0.05200
449 a strong tendency towards statistical significance (p=0.051)
0.051 0.05100
450 barely missed statistical significance (p=0.051) 0.051 0.05100
451 borderline conventional significance (p=0.051) 0.051 0.05100
452 effectively significant (p=0.051) 0.051 0.05100
453 fell just short of the traditional definition of statistical significance (p=0.051) 0.051 0.05100
454 just about significant (p=0.051) 0.051 0.05100
455 medium level of significance (p=0.051) 0.051 0.05100
456 narrowly missed significance (p=0.051) 0.051 0.05100
457 nearing significance (p<0.051) <0.051 0.05100
458 on the margin of significance (p=0.051) 0.051 0.05100
459 only just failed to meet statistical significance (p=0.051)
0.051 0.05100
etc.
@rbutler how long did it take to write that post? ;) nicely done.
To clarify to the original poster, no criticisms meant but statements that sound absolute can always use context.
@cseider – not long at all – just a copy/paste of part of a list of hundreds of these quotes. They are rank ordered in terms of the p-value starting with a p-value of .0999 and going all the way down to .05000. As you can see these are numbers 428-459. My favorite in this group is 456 it sounds like the results of whatever study it came from almost suffered some horrible fate.
@rbutler – very interesting. When I teach (taught, as I hope not to have to teach this again, being back in the practitioner phase) about the p-value, I always emphasize that statistics is shades of grey, but when selecting a p-value, you are selecting between black and white. You must be willing to identify a level of significance that if it is met by a shade over you will accept it, and a shade under you are willing to reject it. You cannot equivocate. Probabilities are not absolute. But p-values must be.
I find your compilation of those who are unable to discern this reality very funny.
@MBBinWI and @cseider – just to be clear – I didn’t do this. Some statistician with either a lot of time on his/her hands or perhaps one who got tired of reading phrases aimed at attempting to weasel out of the agreed upon significance level, compiled the list. It made its rounds in the stat community several years ago and it is a very long list.
Every now and then when I have some researcher whose initial report prose is headed in this direction I copy paste a section of the list pertaining to whatever p-value they are trying to explain away and send it along with a reminder that a choice was made with respect to significance before they started the work and they are not allowed to change their mind after the results have been compiled.
P.S. I too, find it funny.
@rbutler – still very funny. And I stand by my statement that selecting a p-value sets a line in the sand (one that is absolute, not like our previous president).
© Copyright iSixSigma 2000-2017. User Agreement. Any reproduction or other use of content without the express written consent of iSixSigma is prohibited. More »