iSixSigma

Correlation between attribute x and continuous y

Six Sigma – iSixSigma Forums Old Forums General Correlation between attribute x and continuous y

Viewing 11 posts - 1 through 11 (of 11 total)
  • Author
    Posts
  • #52058

    howe
    Participant

    I am trying to see if there is a correlation between attribute x data and continuous y data.  I only have a sample size of 34.  I initially used the one way ANOVA tool.  What is the best tool to use to find out if there is a correlation between my data sets?

    0
    #182502

    gg1980
    Participant

    As a guideline for continuous y data and attribute x data use the F
    test to test for equal variance then use one of the following to test for
    differences of centre/mean.
    a) If your data has a normal distribution then use the One Way Anova
    for two or more samples with one factor.
    b) or if your data is non normal use the Moods Median Test to check if
    your factor has a correlation or not

    0
    #182506

    Ken Feldman
    Participant

    GG, you may have answered a bit too quickly. The poster said they wanted to test for correlation between x and y. Your suggestions allow for possibly exploring a statistical difference between y values for differing x categories. Poster will have to confirm whether they are looking for difference or correlation. They are different concepts with different tools. Difference and correlation are different questions.

    0
    #182510

    Craig
    Participant

    The null hypothesis for one way ANOVA is that all MEANS are equal. (The alternate is that at least one mean is different).
    I hope everyone realizes this. Equal variances is a condition that must be true for the F ratio to be valid. (Which is also tested with an F-ratio) A typical approach is the Bartett test.
    I set up a quick JMP table and set the X as nominal and then set it as ordinal. (Both fall under the non-continuos category). In each case, JMP used one way anova for the analysis. If your X was continuous, regression would be the obvious choice!
     

    0
    #182512

    Edwin D. Huff
    Participant

    Another technique, depending on the number of coded attribute categories, ideally collapsed into two (1 – yes, 0 -no), could be logistic regression, where the dependent attribute categories could be regressed onto the dependent  continuous variable to show likely predictive associations (odds coefficients) onto the continuous variable based on the attribute category. 

    0
    #182513

    Remi
    Participant

    Hi Mike,
    Best tool depends on what you (really) want (whywhywhywhywhy).
    Start with a graph ( dotplot or boxplot, with grouping). This will give you for each value of the categorical X a plot of how the Y-values look like.
    ANOVA will not tell you about correlation but about difference in Mu: can we prove that (at least) two of those plots are located different from eachother (low p-value=Yes, high p-value=No). Outcome of ANOVA is ‘only’ valid if residuals are normal distributed and the sigma’s are equal (enough): look at Residuals and perform equal variance test. If one of the two is not true replace ANOVA with Kruskal Wallis (test on medians).
    Remi
     
     

    0
    #182518

    MBBinWI
    Participant

    Technically you cannot perform a correlation between a discrete X and continuous Y.  That said, I am inferring that you are really looking to see if by changing X, Y also changes.  This is a different question.  How many levels of X do you have?  If two, use a 2 sample t, if more, use ANOVA.  These will help you evaluate whether X is affecting Y with some statistical level of confidence.
    Whether 34 is a sufficient sample size will depend on the difference you are looking to observe, the underlying variation of the system, and the level of risk you are willing to accept (alpha and beta).  Since you post nothing about any of these parameters, it is impossible to advise on sample size.

    0
    #182520

    Robert Butler
    Participant

      The statement “Technically you cannot perform a correlation between a discrete X and continuous Y” is in error.
      If the X is count data standard methods apply.
      If the X is binary or ordinal or nominal with only two categories you code the values -1 and 1 and run the regression in the usual manner.
      If the X is ordinal or nominal and there are more than two categories you build dummy variables for each X and run the regression on those.
    pp. 241 of Draper and Smith Applied Regression Analysis 2nd edition has the details.

    0
    #182524

    Ken Feldman
    Participant

    As is common, everyone is spinning their wheels and offering brilliant advice before understanding what the poster is trying to do. That is why I suggested that he first clarify whether he is looking for differences or correlation. Then we can offer him the appropriate advice for his research question. Until then, we are all engaging in non valued activity and only entertaining ourselves.

    0
    #182525

    howe
    Participant

    As stated in my original post, I am trying to determine correlation. 

    0
    #182528

    Ken Feldman
    Participant

    OK, then correlation and regression it is. Therefore, if your Xs are discrete and your Ys are continuous you must enter the world of Indicator Variables. There are some pretty rigorous assumptions underlying the tool so you might want to do a little research. The example I often use is answer time as the Y and type of channel as the X. Channel being a category of some characteristic and of course time being continuous. We therefore are testing whether there is a correlation between type of channel and answer time. From that you can move to a prediction equation. Now you have correlation and regression rather than seeking difference between channels. Does that answer your question?

    0
Viewing 11 posts - 1 through 11 (of 11 total)

The forum ‘General’ is closed to new topics and replies.