iSixSigma

Minitab correlation question

Six Sigma – iSixSigma Forums Old Forums General Minitab correlation question

Viewing 12 posts - 1 through 12 (of 12 total)
  • Author
    Posts
  • #49179

    sixcent
    Member

    I have a data set where I would like to find out if two of the columns of data are correlated.  The issue I have is that one column is “time” from 0.00 to 2360.00 and the other is only made up of a 1 or a 0.  I think (please correct me if i am mistaken) that the typical correlation method in minitab (pearson) would need to find a linear relationship between the two to tell me whether or not there is a correlation….and I dont think that a linear correlation is possible with only a 1 or 0 as the option for the second variable column.
    So, what I want to find out is if there is any correlation between time (lower/higher) and the resulting occurance of 1’s or 0’s?  I hope it makes a bit of sense, I appreciate any input you may have!

    0
    #167641

    BC
    Participant

    Sixcent,
    Try binary logistic regression.  It’ll tell you if the probability of 1’s goes up or down with increasing time, and Minitab will also print out several measures of goodness of fit; you’ll just have to read about them.  I’m not aware of anything quite so neat as a correlation coefficient for this kind of situation.
    Good luck and let us know if this meets your needs.
    BC

    0
    #167646

    HF Chris
    Participant

    Have you looked at a simple histogram of your data? Are ranges grouped or does the frequency of 1’s and 0’s appear to be random? You really need to look at your descriptive analysis first? I would look into chi square and expected values.

    0
    #167650

    Chris Seider
    Participant

    Have you tried good ol’ graphical analysis like dot plot?…..  to begin with.

    0
    #168009

    rcweiss
    Participant

    You are correct.  Correlation is most meaningful for data sets that have continuous inputs and continuous outputs.
    Based on your question, you have one continuous variable, and one discrete (or binary) variable.  You did not state explicitly which variable was the input, and which was the output.
    First, let’s assume that your input is binary (0 or 1), and your output is continuous (time from 0 to 2360).  The most useful tests you can perform are a 2-sample t, and a 2-variances test.  These will tell you two very important things:
    1. The 2-sample t will tell you if there is a statistically significant difference between the average time (when the input was 0), and the average time (when the input was 1).
    2. The 2-variances test will tell you if there is a statistically significant difference between the variations (when the input was 0 or 1).
    I agree with the previous writer that you should also do some simple dot plots, etc for visualization.  They will help you understand the results of the statistical tests more clearly. *Note: For both statistical tests, you should evaluate the normality of the subgroups, and the sample size.
    Next, let’s assume that your variables are reversed (i.e. the input is continuous, and the output is discrete).  Then the question becomes much more difficult: does the probability of obtaining an output of 1 change as the input increases/decreases?  If this is the way your data is set up, then you will need a much larger data set, and the analysis will not be as straightforward.
    Regards, rc

    0
    #168013

    Fontanilla
    Participant

    Most likely, you will need to use a non-parametric analysis rather than the t- or z-tests.  Time-based outcomes typically do not follow a normal distribution.
    Definitely try the other authors’ recommendations and put your data into dot plots, histograms, etc. and ask yourself, “Self, can I visually detect a pattern in my data?”  This will likely be all the analysis you need to make a practical decision.
    Good luck!
     

    0
    #168054

    Bob Rome
    Participant

    I would suggest the 2 sample t test is still meaningful whether the discrete data is an input or output.  The question is the same – is there a significant difference in times between group 0 and group 1?  If we renamed 0 & 1 to red & blue it would be more clear that these are two discrete groups of data.  And you’re just comparing to see if these 2 distributions different.  You would word your hypothesis to reflect which is the x and whch the Y.  Null for example would state there is no difference between times for 0 and 1 (if 0/1 are the x).  Or 0 times are not different that 1 times (if 0/1 are the Y).  Make sense?

    0
    #168056

    annon
    Participant

    Can you use any form of parametric hypothesis testing (eg paired t, 2 sample t/z, anova, etc) when dealing with discrete data? Thought this required continuous data formats……

    0
    #168057

    Robert Butler
    Participant

    annon – the post cited below may be of some help.
    https://www.isixsigma.com/forum/showmessage.asp?messageID=133162

    0
    #168061

    Bob Rome
    Participant

    We’re still treating his 0/1 as an input x for the purpose of performing the t test.  He’ll need to do dome process definition to determine which is x and which is Y but I suspect is descrete variable is the input once he looks at it closely.  If not, maybe he should be asking a few more questions like “If these are from the same process and some become 1 and some 0, what determines it?”.  If it’s random, both groups should be the same statitically – if not (rej the null) he needs to investigate why they’re different.  Is there automation with logic or is it a human factor using some criteria – or both?

    0
    #168062

    George Chynoweth
    Participant

    the procedure you want to use is a point-biserial correlation – it is mathematically equivalent to the pearson product-moment correlation when one variable is continuous and the other is dichotomous (0,1). if it’s not available in Minitab, the formula (quite straightforward) can be found in Wikipedia, along with a nice explanation.re: graphical/visual inspection. do a scatter plot of all of your group 0 data, then superimpose a scatter plot of your group 1 data. you’ll get a sense of how linear the data are (or aren’t).hth,
    george

    0
    #168063

    annon
    Participant

    Thanks Rome & Robert….gotta go get my floaties on….I’ll let you guys swim in the deep end.  Thanks again!

    0
Viewing 12 posts - 1 through 12 (of 12 total)

The forum ‘General’ is closed to new topics and replies.