Home › Forums › General Forums › Methodology › Binary Correlation

This topic contains 7 replies, has 3 voices, and was last updated by Kristen Hill 2 weeks, 5 days ago.

Viewing 8 posts - 1 through 8 (of 8 total)

- AuthorPosts

AnnieI have data around attendance at classes and I need to find out if there is a relationship between attendance at the most recent class and previous classes. I have coded 1 to mean attended and 0 did not attend.

Name, Class 1, Class 2, Class 3

A. 1, 1, 1

B. 0, 1, 0I have tried correlating class 3 with classes 1 and 2 and also adding classes 1 and 2 together to see the number of previous classes attended and correlating this with class 3.

All the correlations are low and I’m wondering if correlation is the wrong method as I’m working with mainly binary data. What do you think?

Based on your description of the problem I think your best bet would be to run a logistic regression with attendance at class 3 being the Y variable and attendance at classes 1 and 2 being the two X variables.

The output would be in the form of odds ratios – that is you would have an odds ratio for attendance at class 1 correlating with attendance at class 3 and similarly for class 2. One problem you are going to have with this approach is the question of independence of attendance at class 1 relative to attendance at class 2. If the two are not independent enough you will get huge odds ratios with very large Wald confidence limits and the analysis won’t be worth much.

One thing you need to remember with binary data – to see significance you need a lot more data than you would need for an analysis of continuous data so even if the measures exhibit sufficient independence you may not have enough data to detect a significant difference.

AnnieThank you for replying so quickly. I have data for around 100 students which should help.

I will try a logistic regression. Do you think doing a correlation would be meaningless?

I’m not really sure what you mean by correlation. Since you are on-line – post the data and I’ll take a look at it – just 3 columns, one for each class yes/no (1/0) and an indication as to which class is which.

AnnieSorry I only just saw this message.

The data is how you described it. Columns for each class containing 0s and 1s and I’m wondering if correlating class 3 with class 2 would give any meaningful information about whether there’s a relationship between attendance at points 2 and 3. (Standard Pearson/Spearman)

Not really. One of the key assumptions of the Pearson Correlation Coefficient is normality (or approximate normality) of the variables – binary data is not normal.

If it was just a case of looking at class attendance in 2 vs class attendance in 3 you could run a 2×2 chi-square analysis which would give you a measure of association. However, given your problem description I still think logistic regression would be the better bet. It would tell you the odds of attending class 3 given attendance (or lack thereof) in class 2 and it would also tell you whether or not the odds ratio was significant.

AnnieOK that makes sense, thank you

You can run a chi-square very quickly to find a relationship, but a Logistic regression is correct as well. The chi-square can give you a very quick and easy view and can be done in Excel.

- AuthorPosts

Viewing 8 posts - 1 through 8 (of 8 total)