Correlation between attribute x and continuous y
Six Sigma – iSixSigma › Forums › Old Forums › General › Correlation between attribute x and continuous y
 This topic has 10 replies, 8 voices, and was last updated 13 years, 2 months ago by Ken Feldman.

AuthorPosts

March 18, 2009 at 9:17 pm #52058
I am trying to see if there is a correlation between attribute x data and continuous y data. I only have a sample size of 34. I initially used the one way ANOVA tool. What is the best tool to use to find out if there is a correlation between my data sets?
0March 18, 2009 at 10:06 pm #182502As a guideline for continuous y data and attribute x data use the F
test to test for equal variance then use one of the following to test for
differences of centre/mean.
a) If your data has a normal distribution then use the One Way Anova
for two or more samples with one factor.
b) or if your data is non normal use the Moods Median Test to check if
your factor has a correlation or not0March 19, 2009 at 12:14 am #182506
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.GG, you may have answered a bit too quickly. The poster said they wanted to test for correlation between x and y. Your suggestions allow for possibly exploring a statistical difference between y values for differing x categories. Poster will have to confirm whether they are looking for difference or correlation. They are different concepts with different tools. Difference and correlation are different questions.
0March 19, 2009 at 11:19 am #182510The null hypothesis for one way ANOVA is that all MEANS are equal. (The alternate is that at least one mean is different).
I hope everyone realizes this. Equal variances is a condition that must be true for the F ratio to be valid. (Which is also tested with an Fratio) A typical approach is the Bartett test.
I set up a quick JMP table and set the X as nominal and then set it as ordinal. (Both fall under the noncontinuos category). In each case, JMP used one way anova for the analysis. If your X was continuous, regression would be the obvious choice!
0March 19, 2009 at 12:22 pm #182512
Edwin D. HuffParticipant@EdwinD.Huff Include @EdwinD.Huff in your post and this person will
be notified via email.Another technique, depending on the number of coded attribute categories, ideally collapsed into two (1 – yes, 0 no), could be logistic regression, where the dependent attribute categories could be regressed onto the dependent continuous variable to show likely predictive associations (odds coefficients) onto the continuous variable based on the attribute category.
0March 19, 2009 at 12:50 pm #182513Hi Mike,
Best tool depends on what you (really) want (whywhywhywhywhy).
Start with a graph ( dotplot or boxplot, with grouping). This will give you for each value of the categorical X a plot of how the Yvalues look like.
ANOVA will not tell you about correlation but about difference in Mu: can we prove that (at least) two of those plots are located different from eachother (low pvalue=Yes, high pvalue=No). Outcome of ANOVA is ‘only’ valid if residuals are normal distributed and the sigma’s are equal (enough): look at Residuals and perform equal variance test. If one of the two is not true replace ANOVA with Kruskal Wallis (test on medians).
Remi
0March 19, 2009 at 1:35 pm #182518
MBBinWIParticipant@MBBinWI Include @MBBinWI in your post and this person will
be notified via email.Technically you cannot perform a correlation between a discrete X and continuous Y. That said, I am inferring that you are really looking to see if by changing X, Y also changes. This is a different question. How many levels of X do you have? If two, use a 2 sample t, if more, use ANOVA. These will help you evaluate whether X is affecting Y with some statistical level of confidence.
Whether 34 is a sufficient sample size will depend on the difference you are looking to observe, the underlying variation of the system, and the level of risk you are willing to accept (alpha and beta). Since you post nothing about any of these parameters, it is impossible to advise on sample size.0March 19, 2009 at 2:01 pm #182520
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.The statement “Technically you cannot perform a correlation between a discrete X and continuous Y” is in error.
If the X is count data standard methods apply.
If the X is binary or ordinal or nominal with only two categories you code the values 1 and 1 and run the regression in the usual manner.
If the X is ordinal or nominal and there are more than two categories you build dummy variables for each X and run the regression on those.
pp. 241 of Draper and Smith Applied Regression Analysis 2nd edition has the details.0March 19, 2009 at 4:13 pm #182524
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.As is common, everyone is spinning their wheels and offering brilliant advice before understanding what the poster is trying to do. That is why I suggested that he first clarify whether he is looking for differences or correlation. Then we can offer him the appropriate advice for his research question. Until then, we are all engaging in non valued activity and only entertaining ourselves.
0March 19, 2009 at 4:30 pm #182525As stated in my original post, I am trying to determine correlation.
0March 19, 2009 at 5:36 pm #182528
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.OK, then correlation and regression it is. Therefore, if your Xs are discrete and your Ys are continuous you must enter the world of Indicator Variables. There are some pretty rigorous assumptions underlying the tool so you might want to do a little research. The example I often use is answer time as the Y and type of channel as the X. Channel being a category of some characteristic and of course time being continuous. We therefore are testing whether there is a correlation between type of channel and answer time. From that you can move to a prediction equation. Now you have correlation and regression rather than seeking difference between channels. Does that answer your question?
0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.