Confounding

You’ve created your measurement plan, conducted your experiments but something doesn’t seem quite right with the cause and effect relationship. Is there something hiding in the data that needs to be investigated? Let’s explore a possible culprit: confounding.

Overview: What is confounding?

The term confounding is used in everyday language and typically refers to something that is mixed up or causing confusion. In Lean and Six Sigma we often perform experiments and measure variables to understand the relationship between cause and effect. A confounding variable is a variable that relates to both the experiment’s independent and dependent variables and in doing so influences both the cause and effect. To be a confounding variable two conditions need to be true:

There must be correlation with the independent variable. This may be a causal relationship but not exclusively so
It must be causally related to the dependent variable

It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable hence leading to confusion or incorrect conclusions on causation.

We can mitigate or reduce the potential impacts of confounding by careful consideration of experiment design, data collection planning and data analysis. One of the most well used techniques is including a degree of randomization in the experiment design. When experiment runs are randomized, potentially confounding variables are likely to be evenly distributed throughout the experiment and their influence is minimized. If randomization is not practical, then restriction of the data set can be used however this can limit the potential of capturing all causes and effects.

An industry example of confounding

A medical practice was required to perform a number of audits on their patients, one of which was to understand the population risk and potential contributing causes to coronary heart disease. The practice head of quality decided to construct a patient questionnaire and analyze the data to try and draw conclusions on cause, effect and population risk. The medical practice had a patient cohort of over 10,000 people so total sample size was not an issue even with relatively low questionnaire return rates. All patients over 18 years old were included in the sample and results were reviewed in age and gender groups. The questions asked of the patients included a number of lifestyle type questions including typical caffeine and alcohol intake and nicotine usage.

The initial data suggested a strong link and risk factor for coronary heart disease with the level of caffeine intake which surprised the practice head of quality as there was clinical evidence that moderate caffeine intake was positive for coronary heart disease prevention. On deeper analysis a confounding variable was found, the use of nicotine being higher for a significant proportion of moderate to high caffeine drinkers and the likely causation of coronary heart disease in this population being nicotine usage and not caffeine.

A further questionnaire was developed and a reduced sample study conducted on patients with moderate to high caffeine intake and no nicotine usage to test the hypothesis. The analysis from this questionnaire concluded that there was no causal link between caffeine intake and risk of coronary heart disease.

Frequently Asked Questions (FAQ) about confounding

1. What exactly is confounding?

Confounding is when we have an unmeasured variable that influences both the cause and effect and cannot be separated or independently determined.

2. Is confounding a problem?

Yes as it has the potential to alter the conclusions you may draw on the cause and effect relationships of your data, making it difficult to separate the true cause from the effect.

3. How can I fix confounding?

Depending upon your experiment and data collection plan you may not be able to fix confounding but you can mitigate the impacts by introducing randomization of sampling and restricting the scope of your study.