Minitab correlation question
Six Sigma – iSixSigma › Forums › Old Forums › General › Minitab correlation question
 This topic has 11 replies, 10 voices, and was last updated 14 years, 10 months ago by annon.

AuthorPosts

January 23, 2008 at 8:25 pm #49179
I have a data set where I would like to find out if two of the columns of data are correlated. The issue I have is that one column is “time” from 0.00 to 2360.00 and the other is only made up of a 1 or a 0. I think (please correct me if i am mistaken) that the typical correlation method in minitab (pearson) would need to find a linear relationship between the two to tell me whether or not there is a correlation….and I dont think that a linear correlation is possible with only a 1 or 0 as the option for the second variable column.
So, what I want to find out is if there is any correlation between time (lower/higher) and the resulting occurance of 1’s or 0’s? I hope it makes a bit of sense, I appreciate any input you may have!0January 23, 2008 at 9:55 pm #167641Sixcent,
Try binary logistic regression. It’ll tell you if the probability of 1’s goes up or down with increasing time, and Minitab will also print out several measures of goodness of fit; you’ll just have to read about them. I’m not aware of anything quite so neat as a correlation coefficient for this kind of situation.
Good luck and let us know if this meets your needs.
BC0January 24, 2008 at 3:28 am #167646
HF ChrisParticipant@HFChris Include @HFChris in your post and this person will
be notified via email.Have you looked at a simple histogram of your data? Are ranges grouped or does the frequency of 1’s and 0’s appear to be random? You really need to look at your descriptive analysis first? I would look into chi square and expected values.
0January 24, 2008 at 6:02 am #167650
Chris SeiderParticipant@cseider Include @cseider in your post and this person will
be notified via email.Have you tried good ol’ graphical analysis like dot plot?….. to begin with.
0January 31, 2008 at 8:57 am #168009
rcweissParticipant@rcweiss Include @rcweiss in your post and this person will
be notified via email.You are correct. Correlation is most meaningful for data sets that have continuous inputs and continuous outputs.
Based on your question, you have one continuous variable, and one discrete (or binary) variable. You did not state explicitly which variable was the input, and which was the output.
First, let’s assume that your input is binary (0 or 1), and your output is continuous (time from 0 to 2360). The most useful tests you can perform are a 2sample t, and a 2variances test. These will tell you two very important things:
1. The 2sample t will tell you if there is a statistically significant difference between the average time (when the input was 0), and the average time (when the input was 1).
2. The 2variances test will tell you if there is a statistically significant difference between the variations (when the input was 0 or 1).
I agree with the previous writer that you should also do some simple dot plots, etc for visualization. They will help you understand the results of the statistical tests more clearly. *Note: For both statistical tests, you should evaluate the normality of the subgroups, and the sample size.
Next, let’s assume that your variables are reversed (i.e. the input is continuous, and the output is discrete). Then the question becomes much more difficult: does the probability of obtaining an output of 1 change as the input increases/decreases? If this is the way your data is set up, then you will need a much larger data set, and the analysis will not be as straightforward.
Regards, rc0January 31, 2008 at 11:09 am #168013Most likely, you will need to use a nonparametric analysis rather than the t or ztests. Timebased outcomes typically do not follow a normal distribution.
Definitely try the other authors’ recommendations and put your data into dot plots, histograms, etc. and ask yourself, “Self, can I visually detect a pattern in my data?” This will likely be all the analysis you need to make a practical decision.
Good luck!
0January 31, 2008 at 9:29 pm #168054I would suggest the 2 sample t test is still meaningful whether the discrete data is an input or output. The question is the same – is there a significant difference in times between group 0 and group 1? If we renamed 0 & 1 to red & blue it would be more clear that these are two discrete groups of data. And you’re just comparing to see if these 2 distributions different. You would word your hypothesis to reflect which is the x and whch the Y. Null for example would state there is no difference between times for 0 and 1 (if 0/1 are the x). Or 0 times are not different that 1 times (if 0/1 are the Y). Make sense?
0January 31, 2008 at 9:47 pm #168056Can you use any form of parametric hypothesis testing (eg paired t, 2 sample t/z, anova, etc) when dealing with discrete data? Thought this required continuous data formats……
0January 31, 2008 at 9:54 pm #168057
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.annon – the post cited below may be of some help.
https://www.isixsigma.com/forum/showmessage.asp?messageID=1331620January 31, 2008 at 10:47 pm #168061We’re still treating his 0/1 as an input x for the purpose of performing the t test. He’ll need to do dome process definition to determine which is x and which is Y but I suspect is descrete variable is the input once he looks at it closely. If not, maybe he should be asking a few more questions like “If these are from the same process and some become 1 and some 0, what determines it?”. If it’s random, both groups should be the same statitically – if not (rej the null) he needs to investigate why they’re different. Is there automation with logic or is it a human factor using some criteria – or both?
0January 31, 2008 at 11:23 pm #168062
George ChynowethParticipant@georgechynoweth Include @georgechynoweth in your post and this person will
be notified via email.the procedure you want to use is a pointbiserial correlation – it is mathematically equivalent to the pearson productmoment correlation when one variable is continuous and the other is dichotomous (0,1). if it’s not available in Minitab, the formula (quite straightforward) can be found in Wikipedia, along with a nice explanation.re: graphical/visual inspection. do a scatter plot of all of your group 0 data, then superimpose a scatter plot of your group 1 data. you’ll get a sense of how linear the data are (or aren’t).hth,
george0January 31, 2008 at 11:29 pm #168063Thanks Rome & Robert….gotta go get my floaties on….I’ll let you guys swim in the deep end. Thanks again!
0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.