iSixSigma

correlation and regression

Six Sigma – iSixSigma Forums Old Forums Europe correlation and regression

Viewing 11 posts - 1 through 11 (of 11 total)
  • Author
    Posts
  • #24120

    jody
    Participant

    Hello
    could someone please explain to me how can it be possible that I have a very strong correlation between two variables but no regression at all?

    0
    #57760

    Thothathiri
    Member

    If correlation value is +1 its positive linear relationship, id value is -1 its negative linear relationship. If correlation is 0, describe random or non existence relationship
     
    It’s important to note that correlation does not give cause & effect relationship.
    Regression is used to find out exact relationship between two variables.
     
    To explain above with an example:
    Ice cream sales are positively correlated with shark attack on swimmer.
    Its mistake if we assume the ice cream sales cause the shark to attack swimmer.
     
    Explanation for the above correlation example is during warm weather both the ice cream sales and no of swimmer increases.
     
    Hope clarifies you.
    Thothathiri

    0
    #57761

    hitesh
    Participant

    which tool are you using for testing the regression.
    pointers:- remember correlation may not necessarily means linear correlation. it may be a non-linear relaions hip as well.
     

    0
    #57762

    jody
    Participant

    Dear Thothathiri,
    I thank you very much for your explanation. It was  clear and simple
    bye

    0
    #57763

    jody
    Participant

    I’m using regression test with minitab.
    Now I know that my variables are correlated but there is no cause and effect linear relationship between them. I must find some other significant variables in my process.
    Thanks
     
     

    0
    #57768

    Remi
    Participant

    Hi jody,
    How do you know they are correlated? If from Data then the same data should give you a good Regression relation (mathematically true).If from Expertise then find out why the data does not correspond with what you know ‘has to be true’.Be aware that your way of data collection can kill any theoretical relations.
    Check data with Expertise by making a PICTURE (scatterplots)!Remove all data that is ‘wrong’ but only if you know why!Can you trust the data to be correct (gage r&R,…)Check that data covers whole range of possible values (too much zooming in or out or taking a specific subsample can give totally different conclusions).
    If relation is not linear then you have to think how to redefine your parameters (BoxCox maybe).If 2 X’s use 3D Scatterplot; if >2 X’s it gets complicated.
    Remi
     

    0
    #57769

    jody
    Participant

    Hello Remi,
    I performed a correlation test with minitab, that’s why I  think I have a correlation. According to the pvalue, the pearson coefficient e the scatterplot, it is a positive linear correlation.
    Unfortunately there’s no regression, the two variable are just correlated but no cause and effect relationship exists .
    Jody
     

    0
    #57770

    Remi
    Participant

    Hmm, Jody;
    If you mean that the Correlation disagrees with the Regression analysis, I think you must have made a mistake somewhere.
    If one has 2 columns of data in Minitab and performs Correlation (Stat => BasStat=>Corr) and Regression (Stat=> Regr=>Regr) one will get the same p-value (because essentially the same H0-analysis is done). And also the square of the Correlation value (r*r) will be equal to R-sq from the Regression.
    It is possible that you know from Expertise that there is no (known physical) reason for Cause and Effect Relation  and that the Correlation Analysis shows different, but that is something ELSE.Apparantly the data shows a relation anyway. Now you ‘only’ have to find out why (sometimes it’s your data collection plan; sometimes it happened because of special purposes and sometimes it only looks that way (often your sample is too small)). And sometimes you get a new insight; the theoretical expertise was wrong in this situation (most often because one of the assumptions was wrong).
    How many datapoints (2 datapoints show always a strong correlation)? What are the Minitab outputs?
    Remi

    0
    #57771

    jody
    Participant

    Dear Remi
    here’s the results shown in the session window of minitab.
    I examined only two parameters: viscosity of the compound and scrap %, where scrap is the Y and viscosity is the X . I got this two variables on 62 samples.
    Thanks for your comments.
     
    Correlations: viscosity ; scrap
    Pearson correlation of viscosity  and scrap = 0.367
    P-Value = 0.005

    Regression Analysis: scarto versus viscosity The regression equation is
    scrap = – 0.949 + 0.0164 viscosity
    Predictor Coef SE Coef T P
    Constant -0.9492 0.3379 -2.81 0.007
    viscosity  0.016381 0.005540 2.96 0.005
     
    S = 0.100141 R-Sq = 13.5% R-Sq(adj) = 12.0%
     
    Analysis of Variance
    Source DF SS MS F P
    Regression 1 0.08769 0.08769 8.74 0.005
    Residual Error 56 0.56158 0.01003
    Total 57 0.64927
     
    Unusual Observations
    viscosity
    Obs scrap Fit SE Fit Residual St Resid
    18 63.4 0.7000 0.0894 0.0189 0.6106 6.21R
    R denotes an observation with a large standardized residual.

    0
    #57772

    Remi
    Participant

    Hai Jody,
    Ah now things are clearer.
    # data appears to be large enough. In the Scatterplot you should see a point (#18) far away from the regression line.Maybe analysis is improved when this point is removed from the data. But this may only be done if you can find out why this point shows different behaviour than the other points.
    Both Correlation and Regression analysis say that there is a reasonable strong relation between Y and X (p-value=0.005).
    But r=0.367 only so the relation is not strong.  In Industry often -0.5<r<0.5 is considered as useless.The Regression agrees because it says R-sq=13.5%.So this X is NOT THE ONLY X that influences the Y (assumption: the X influences the Y at all). This X only explains an influence of 13.5% on Y and there iss till missing 86.5%.
    Next: check if R-sq goes up when not using all 65 points (visual screening of the data; do they form several subgroups that should be analysed seperately). Find out what other X could possibly be missing (this is why DMAIC6 is thinking about X’s first and analysing patterns later).
    If you have that X: do you have measurements and perform Regression again with the new X. If you’r unlucky the X you have started with shows up in a Pareto as #1 and you have a lot of other X’s with even smaller R-sq. Note: If you have >1 X analyse them all together because interactions can play a role.
    So you did nothing wrong (as far as I can tell) you just happen to have an X with a weak influence on Y.
    Good luck, Remi
     

    0
    #57773

    jody
    Participant

    Dear Remi
    Thanks for your time and your explanation. I’ll treasure it.
    Have a nice evening
    Jody

    0
Viewing 11 posts - 1 through 11 (of 11 total)

The forum ‘Europe’ is closed to new topics and replies.