significancemisleading?
Six Sigma – iSixSigma › Forums › Old Forums › General › significancemisleading?
 This topic has 17 replies, 7 voices, and was last updated 11 years, 9 months ago by Severino.

AuthorPosts

October 12, 2009 at 6:17 pm #52767
I have a question regarding the accuracy as to whether data should not be interpreted simply because it is deemed not significant. The paragraph below is copied from “Linear Regression Making Sense of a Six Sigma Tool”. I am using it as an example to illustrate my confusion.
“In this case, the pvalue is 0.134. If alpha is set at 0.05, then one would have to reject this regression line as having a valid fit because pvalue is greater than 0.05. This means that the model is not significant. The RSq value though looking quite good is of no value and should not be interpreted. Those who did this regression will need to collect more data, redo the regression and then see whether the pvalue is now significant before they interpret the RSq value.”
My question relates to the statement The RSq value though looking quite good is of no value and should not be interpreted. This seems inaccurate. That is, couldnt I conclude that there is a 13.4% chance that the regression line does not represent the data? Couldnt I conclude that it is significant if I arbitrarily selected alpha=0.14? If so, I feel the statement whether something is significant is misleading. This is not a yes or no, go or no go condition. The pvalue simply shows you the risk of concluding there is a difference when there really isnt one.
Rather than drawing an arbitrary line in the sand (alpha risk) and ignoring the 13.4%, is it not accurate to conclude that there is an 86.6% chance that the regression line does explain the data? If so, there may be no need to collect more data as long as I am ok with 13.4% chance of being wrong.
Any input/clarification would be much appreciated.
Thank you0October 12, 2009 at 6:35 pm #186039You are correct.I personally set my pvalues = .5
0October 12, 2009 at 6:56 pm #186041
Jered HornParticipant@HornJM Include @HornJM in your post and this person will
be notified via email.Please explain how you “set” your pvalues?
0October 12, 2009 at 7:19 pm #186042I think Stan got a little bit excited. You are correct in that you don’t set p values. They are the calculated alpha risk determined by the data and the test. Possibly Stan meant to say that he sets his acceptable risk at .5 and then compares the actual risk (p value) and the risk he is willing to tolerate for a type 1 error. I agree that the default of .05 we hear about is too strict and often prevents one from seeing something when it truly exists. Of course, good management of your Beta risk and Power needs to be considered as well.
0October 12, 2009 at 7:21 pm #186043Pvalues are not set, they are calculated. I did not say I set them.
Alpha values are set based on whatever level of risk you can live with of being wrong with your conclusions.0October 12, 2009 at 8:16 pm #186044Easy big guy…the references to “setting p values” were directed towards Stan’s comment.
0October 13, 2009 at 2:22 am #186046
SeverinoParticipant@Jsev607 Include @Jsev607 in your post and this person will
be notified via email.The nice part about statistics is that you generally do not need to rely on numbers alone. Why don’t you go ahead and plot that bad boy and see what it looks like? The p value isn’t even worth the number of decimal places it occupies on its own.
0October 13, 2009 at 11:17 am #186057Thank you for your input.
Can you expand on what you mean when you say beta risk and power need to be considered as well? Can you provide a reference/example that shows how they should be considered when assessing whether a regression model accurately explains the relationship between X and Y?0October 13, 2009 at 12:01 pm #186059Hi Mike,
I think you can pick any alpha level you think you can live with, with the condition that you do it BEFORE you calculate your p.So, you might say, that based on the weight of the problem, you are prepared to accept a maximal risk of 10% of being wrong – THEN you calculate the p value and you stay with the H0 if p > 0.1. 0.1. 0.1.Doing it the other way round is called IIRC “data snooping” and is generally a quite dishonorable practice. It is equivalent to saying – “my regression line is so nice that I will accept a 15% risk, just to be able to keep it :)))”.Concernig the Rsquared: if you decided to stay with the H0 (that is, you say that you see no evidence of a connection between X and Y) it would be nonsensical to say that the strength of the connection you do not believe that exists is say 83%.0October 13, 2009 at 12:03 pm #186060Hi Mike,
I think you can pick any alpha level you think you can live with, with the condition that you do it BEFORE you calculate your p.So, you might say, that based on the weight of the problem, you are prepared to accept a maximal risk of 10% of being wrong – THEN you calculate the p value and you stay with the H0 if p > 0.1. 0.1. 0.1.Doing it the other way round is called IIRC “data snooping” and is generally a quite dishonorable practice. It is equivalent to saying – “my regression line is so nice that I will accept a 15% risk, just to be able to keep it :)))”.Concernig the Rsquared: if you decided to stay with the H0 (that is, you say that you see no evidence of a connection between X and Y) it would be nonsensical to say that the strength of the connection you do not believe that exists is say 83%.0October 13, 2009 at 12:43 pm #186061Hello Sandor, I appreciate your input.
Your response exemplifies my confusion. Why does it matter when I decide what maximal risk I can live with? Does this change the value of the calculated alpha risk (pvalue)? I do not wish to do anything “dishonorable” but I do not want to say a relationship does not exist at all if I calculate a pvalue that is higher than my subjective predefined alpha level I can live with.
For example, say I plan to conduct a regression study. I say to myself before collecting any data that I can live with 5% chance of drawing the wrong conclusion. I then run the experiment, collect the data and then calculate the pvalue. The pvalue is calculated to be 0.051. Why should I rerun the experiment or collect more data until the pvalue gets below 0.05? The benefit of avoiding being dishonorable and not datasnooping does not seem to outweigh the added cost/time of getting new/additional data.
It seems to me that selecting the maximum amount of alpha risk we can live with is very subjective. The very fact the standard alpha level is 0.05 implies the subjectivity of this risk. Why would the standard amount of risk we can live with be the same? As much as rejecting or accepting the null hypothesis makes it black and white, the actual risk (pvalue) is not 0 or 1, it varies between 0 and 1.
Again, I value your input and further clarification would be appreciated:)0October 13, 2009 at 1:05 pm #186062Just my approach — for which I find nothing written —
When I start to process data I first try to determine what the physical process is, because the fundamentals/physics behind that process should reveal what the “real”/accepted variables (x’s) are. In those cases, I am essentially banking my reputation as a process improvement person on the work of others that are more knowledgeable than I. In essence I an taking the approach that I am very unlikely to be wrong on the regression equation. Now, if the p values are >.05 or the rsquared values is not >.9 to .95 then I look for an additional variable in the data (like shifttoshift differences, measurement accuracy, variables that are not well controlled, etc.) and improve them.
When I can find little on the process fundamentals/physics and branch out on my own (quite often), then I will accept p values of up to around 0.2, and rsquared values of over 0.8. If the process involves a lot of people determined x’s the I accept p values of around 0.5 (in a lot of biomed work and social services work they are doing quite good if they get rsquared >0.5)
The rsquared and p values I accept are largely determined by my guide: If I am wrong on the x’s, then the predictive capability is very poor. Poor predictive values mean that I will loose face — i.e., the number of times I’m called upon to solve problems will drop. To advance the processes here I need to have a high batting average.
0October 13, 2009 at 2:06 pm #186067Hi Mike,
this is indeed a difficult question and IMHO there are several distinct aspects.The idea of fixing the alpha level before the measurement is devised to avoid the “data snooping” fallacy. Ideally one should be able to say, what risk levels are acceptable to him/her in a project. So, if the pvalue is “much” above the alpha level then the case should be clear, and changing the alpha level to achieve a significant result could be rightly frowned upon :)On the other hand, I think that the pvalue is in the end an estimate,so it will necessarily have a confidence interval.This means that the values we see are not bettter then the point estimates of a mean for example. This also means that minute differences like 0.051 instead of 0.05 play no role at all.In the end I think the clean way of addressing this problem would be to go ahead and reject the Null if the p is close to the alpha, though bigger. (How close is “close” is a different question though.)If the p value is definitely greater then the alpha level – like alpha being 0.05 and p = 0.07 for instance AND it makes sense for the project I would renegotiate the alpha level and take another sample, with the new alpha level fixed again in advance.The data snooping would be to use the same data set to renegotiate the alpha level AND to prove that at that alpha level the connection is significant. If you can do it with independent samples, it would be OK IMHO.Regards
Sandor0October 13, 2009 at 3:00 pm #186069Mike,I agree that there is nothing wrong with deciding on what you can live with until after the p value is calculated. There is nothing sacred about your acceptable level of alpha risk. This is a practical selection, not a statistical one. In reality, what’s the real difference between accepting at 5% or 7% or 10%? If the p value turns out to be .80 and in your mind you were looking for something around 1015% then changing it to .80 would be silly.As for Beta and Power. If I fail to reject the null, it can be for one of two reasons…either there is no difference/change or I don’t have the Power to see it. Before getting too wrapped up in a high p value and failing to reject the null, you should double check Power which is defined as 1 minus your Beta risk. In reality, you should be selecting sample sizes with a sufficient and acceptable level of Power before you run your tests. The solution to low power is greater sample size.
0October 13, 2009 at 3:23 pm #186071Thank you Darth.
In MiniTab, how do I determine adeqaute sample size to determine a regression equation assuming a given beta risk?0October 13, 2009 at 3:23 pm #186072Hi Darth, I’m afraid that would be a mistake. It`s like shooting first and drawing the target around the holes afterwards. The point is that the acceptable risk level must be determined by the project environment, the height of the stakes in the project, psychology of the Belt, whatever BUT not the data set. There is nothing that speaks against fixing the risk level before the p calculation as all factors playing into it must be known beforehand. So, the only reason of fixing the alpha after the p gets calculated can only be to adapt the risk level to the measurement. That can not be a healthy policy over the long term IMHO.Regards Sandor
0October 13, 2009 at 8:16 pm #186086Unfortunately, Mini doesn’t do Power and Sample Size calculations for Regression nor Nonparametrics. I downloaded this program and it seems pretty good for doing all kinds of study calculations. Take a look.http://www.studysize.com/download.htm
0October 14, 2009 at 2:54 am #186091
SeverinoParticipant@Jsev607 Include @Jsev607 in your post and this person will
be notified via email.The alpha risk is his.
0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.