When trying to understand what factors or variables can affect your process outcomes, you seek to identify all your independent variables. You will use these independent or predictor variables (X) to help you understand how they interact and impact your dependent or response variable (Y). But sometimes you miss an X variable and don’t include it in your analysis. This lurking variable can distort your conclusions. Let’s learn some more about lurking variables.

Overview: What is a lurking variable?

A lurking variable is an explanatory variable that is not included in your analysis but can impact the interpretation of the relationship between your other variables. A lurking variable can hide the true relationship between variables or it can signal a false relationship between variables that isn’t true. Essentially, lurking variables can cause the results of your analysis to be confusing and misleading.

For example, if you were seeking to understand what has caused an increase in the number of shark bites at the beaches in Florida, some of your obvious explanatory variables might be the level of ice cream sales or month of the year. Interestingly, your data shows that as ice cream sales go up, so do shark attacks. The same for the month of the year.

But, do you really believe that increasing ice cream sales (X) causes the number of shark attacks (Y) to correspondingly go up? There could be correlation but is there causation? Possibly there is a lurking variable which you did not include in your data collection and analysis that would be more valuable in explaining the cause of shark attacks. 

Does the weather sound like it might be a lurking variable? Would warm weather bring more people to the beach making for more potential targets for the sharks?

Lurking Variable

An industry example of a lurking variable 

The fleet manager was interested in predicting fuel economy for his fleet of trucks. He designed an experiment to study the impact of speed, tire pressure and fuel octane on the truck’s miles per gallon efficiency.

His experimental design had all his test runs for the low tire pressure settings being done in the morning. The high tire pressure runs were done in the afternoon. Fortunately, his Six Sigma Black Belt (BB) pointed out that there could be a lurking variable which could distort the study conclusions. 

The BB pointed out that the low tire pressure runs were being done in the morning when the temperature was low and the pavement cool while the afternoon runs were done when the temperature was hotter and the pavement much hotter than in the morning. The solution was to randomize the runs to mitigate the difference in temperature or to add pavement temperature as an additional explanatory variable.

Frequently Asked Questions (FAQ) about a lurking variable

What is a simple way to offset a lurking variable if I do a designed experiment? 

By randomizing your runs, you might be able to eliminate or mitigate a lurking variable.

What happens if I don’t account for a lurking variable in my analysis? 

If your lurking variable remains hidden, your analysis can easily become misleading and erroneous because you are not identifying all the important explanatory variables which may have a relationship with your response variable.

Is there a statistical test which can tell me whether I have a lurking variable? 

There is no direct statistical test for discovering lurking variables in your experiment. But, by analyzing your residual plots, you can discern the potential of a lurking variable. If there is a trend in the residuals, this could mean that a lurking variable may not have been included in your study and is impacting the other variables in your study.

About the Author