correlation and regression
Six Sigma – iSixSigma › Forums › Old Forums › Europe › correlation and regression
- This topic has 10 replies, 4 voices, and was last updated 11 years, 5 months ago by
jody.
- AuthorPosts
- August 24, 2009 at 2:08 pm #24120
Hello
could someone please explain to me how can it be possible that I have a very strong correlation between two variables but no regression at all?0September 3, 2009 at 3:23 pm #57760
ThothathiriMember@ThothathiriInclude @Thothathiri in your post and this person will
be notified via email.If correlation value is +1 its positive linear relationship, id value is -1 its negative linear relationship. If correlation is 0, describe random or non existence relationship
Its important to note that correlation does not give cause & effect relationship.
Regression is used to find out exact relationship between two variables.
To explain above with an example:
Ice cream sales are positively correlated with shark attack on swimmer.
Its mistake if we assume the ice cream sales cause the shark to attack swimmer.
Explanation for the above correlation example is during warm weather both the ice cream sales and no of swimmer increases.
Hope clarifies you.
Thothathiri0September 4, 2009 at 8:24 am #57761which tool are you using for testing the regression.
pointers:- remember correlation may not necessarily means linear correlation. it may be a non-linear relaions hip as well.
0September 4, 2009 at 10:38 am #57762Dear Thothathiri,
I thank you very much for your explanation. It was clear and simple
bye0September 4, 2009 at 10:42 am #57763I’m using regression test with minitab.
Now I know that my variables are correlated but there is no cause and effect linear relationship between them. I must find some other significant variables in my process.
Thanks
0September 22, 2009 at 11:53 am #57768Hi jody,
How do you know they are correlated? If from Data then the same data should give you a good Regression relation (mathematically true).If from Expertise then find out why the data does not correspond with what you know ‘has to be true’.Be aware that your way of data collection can kill any theoretical relations.
Check data with Expertise by making a PICTURE (scatterplots)!Remove all data that is ‘wrong’ but only if you know why!Can you trust the data to be correct (gage r&R,…)Check that data covers whole range of possible values (too much zooming in or out or taking a specific subsample can give totally different conclusions).
If relation is not linear then you have to think how to redefine your parameters (BoxCox maybe).If 2 X’s use 3D Scatterplot; if >2 X’s it gets complicated.
Remi
0September 22, 2009 at 1:06 pm #57769Hello Remi,
I performed a correlation test with minitab, that’s why I think I have a correlation. According to the pvalue, the pearson coefficient e the scatterplot, it is a positive linear correlation.
Unfortunately there’s no regression, the two variable are just correlated but no cause and effect relationship exists .
Jody
0September 22, 2009 at 1:37 pm #57770Hmm, Jody;
If you mean that the Correlation disagrees with the Regression analysis, I think you must have made a mistake somewhere.
If one has 2 columns of data in Minitab and performs Correlation (Stat => BasStat=>Corr) and Regression (Stat=> Regr=>Regr) one will get the same p-value (because essentially the same H0-analysis is done). And also the square of the Correlation value (r*r) will be equal to R-sq from the Regression.
It is possible that you know from Expertise that there is no (known physical) reason for Cause and Effect Relation and that the Correlation Analysis shows different, but that is something ELSE.Apparantly the data shows a relation anyway. Now you ‘only’ have to find out why (sometimes it’s your data collection plan; sometimes it happened because of special purposes and sometimes it only looks that way (often your sample is too small)). And sometimes you get a new insight; the theoretical expertise was wrong in this situation (most often because one of the assumptions was wrong).
How many datapoints (2 datapoints show always a strong correlation)? What are the Minitab outputs?
Remi0September 22, 2009 at 2:03 pm #57771Dear Remi
here’s the results shown in the session window of minitab.
I examined only two parameters: viscosity of the compound and scrap %, where scrap is the Y and viscosity is the X . I got this two variables on 62 samples.
Thanks for your comments.
Correlations: viscosity ; scrap
Pearson correlation of viscosity and scrap = 0.367
P-Value = 0.005Regression Analysis: scarto versus viscosity The regression equation is
scrap = – 0.949 + 0.0164 viscosity
Predictor Coef SE Coef T P
Constant -0.9492 0.3379 -2.81 0.007
viscosity 0.016381 0.005540 2.96 0.005
S = 0.100141 R-Sq = 13.5% R-Sq(adj) = 12.0%
Analysis of Variance
Source DF SS MS F P
Regression 1 0.08769 0.08769 8.74 0.005
Residual Error 56 0.56158 0.01003
Total 57 0.64927
Unusual Observations
viscosity
Obs scrap Fit SE Fit Residual St Resid
18 63.4 0.7000 0.0894 0.0189 0.6106 6.21R
R denotes an observation with a large standardized residual.0September 22, 2009 at 2:22 pm #57772Hai Jody,
Ah now things are clearer.
# data appears to be large enough. In the Scatterplot you should see a point (#18) far away from the regression line.Maybe analysis is improved when this point is removed from the data. But this may only be done if you can find out why this point shows different behaviour than the other points.
Both Correlation and Regression analysis say that there is a reasonable strong relation between Y and X (p-value=0.005).
But r=0.367 only so the relation is not strong. In Industry often -0.5<r<0.5 is considered as useless.The Regression agrees because it says R-sq=13.5%.So this X is NOT THE ONLY X that influences the Y (assumption: the X influences the Y at all). This X only explains an influence of 13.5% on Y and there iss till missing 86.5%.
Next: check if R-sq goes up when not using all 65 points (visual screening of the data; do they form several subgroups that should be analysed seperately). Find out what other X could possibly be missing (this is why DMAIC6 is thinking about X’s first and analysing patterns later).
If you have that X: do you have measurements and perform Regression again with the new X. If you’r unlucky the X you have started with shows up in a Pareto as #1 and you have a lot of other X’s with even smaller R-sq. Note: If you have >1 X analyse them all together because interactions can play a role.
So you did nothing wrong (as far as I can tell) you just happen to have an X with a weak influence on Y.
Good luck, Remi
0September 22, 2009 at 2:33 pm #57773Dear Remi
Thanks for your time and your explanation. I’ll treasure it.
Have a nice evening
Jody0 - AuthorPosts
The forum ‘Europe’ is closed to new topics and replies.