# Binary Correlation

Six Sigma – iSixSigma › Forums › General Forums › Methodology › Binary Correlation

- This topic has 7 replies, 2 voices, and was last updated 1 year, 8 months ago by Kristen Hill.

- AuthorPosts
- July 29, 2018 at 9:05 am #56053
I have data around attendance at classes and I need to find out if there is a relationship between attendance at the most recent class and previous classes. I have coded 1 to mean attended and 0 did not attend.

Name, Class 1, Class 2, Class 3

A. 1, 1, 1

B. 0, 1, 0I have tried correlating class 3 with classes 1 and 2 and also adding classes 1 and 2 together to see the number of previous classes attended and correlating this with class 3.

All the correlations are low and I’m wondering if correlation is the wrong method as I’m working with mainly binary data. What do you think?

0July 29, 2018 at 9:32 am #202862

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Based on your description of the problem I think your best bet would be to run a logistic regression with attendance at class 3 being the Y variable and attendance at classes 1 and 2 being the two X variables.

The output would be in the form of odds ratios – that is you would have an odds ratio for attendance at class 1 correlating with attendance at class 3 and similarly for class 2. One problem you are going to have with this approach is the question of independence of attendance at class 1 relative to attendance at class 2. If the two are not independent enough you will get huge odds ratios with very large Wald confidence limits and the analysis won’t be worth much.

One thing you need to remember with binary data – to see significance you need a lot more data than you would need for an analysis of continuous data so even if the measures exhibit sufficient independence you may not have enough data to detect a significant difference.

0July 29, 2018 at 9:37 am #202863Thank you for replying so quickly. I have data for around 100 students which should help.

I will try a logistic regression. Do you think doing a correlation would be meaningless?

0July 29, 2018 at 9:56 am #202864

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I’m not really sure what you mean by correlation. Since you are on-line – post the data and I’ll take a look at it – just 3 columns, one for each class yes/no (1/0) and an indication as to which class is which.

0July 29, 2018 at 2:29 pm #202865Sorry I only just saw this message.

The data is how you described it. Columns for each class containing 0s and 1s and I’m wondering if correlating class 3 with class 2 would give any meaningful information about whether there’s a relationship between attendance at points 2 and 3. (Standard Pearson/Spearman)

0July 29, 2018 at 3:44 pm #202866

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.Not really. One of the key assumptions of the Pearson Correlation Coefficient is normality (or approximate normality) of the variables – binary data is not normal.

If it was just a case of looking at class attendance in 2 vs class attendance in 3 you could run a 2×2 chi-square analysis which would give you a measure of association. However, given your problem description I still think logistic regression would be the better bet. It would tell you the odds of attending class 3 given attendance (or lack thereof) in class 2 and it would also tell you whether or not the odds ratio was significant.

0July 30, 2018 at 1:07 pm #202875OK that makes sense, thank you

0July 31, 2018 at 6:12 am #202877

Kristen HillParticipant@misskristen**Include @misskristen in your post and this person will**

be notified via email.You can run a chi-square very quickly to find a relationship, but a Logistic regression is correct as well. The chi-square can give you a very quick and easy view and can be done in Excel.

0 - AuthorPosts

You must be logged in to reply to this topic.