1) To mention that the distribution of regression coefficients as normal (he used the knowledge)

2) To show us with example rather than telling us that regression coefficients may have estimate with wrong sign. ]]>

Regression is a correlation model, not a causal model. There are two uses of correlation models: (A) as a precursor to finding causal factors and (B) simply to find predictors (noncausal factors).

By not distinguishing these two cases, readers may think correlation is causation. Regression is not meant to show causation. That’s what control studies are for.

Case (A): Correlation models as a precursor to finding root causes.

A typically approach to determining root causes and their optimal settings consists of four steps;

1. Identify plausible factors (based on scientific laws, R&D history, and subject matter expertise)these are the Xs.

2. Collect historical data on these factors and the variable they are suppose to effectthe Y.

3. Determine the X factors which are most highly correlated with the Y variable, e.g., through various types of regression or hypothesis testing (since all statistical tests between variables are tests of association).

4. Do controlled studies (DOEs) on the correlated factors to determine which are actually causally related to Y and what their optimal levels are.

Case (B): Regression and other correlation models as just prediction models.

These models are useful for forecasting, where we cannot or should not control the factors. In other words, we do not control the Xs to get the Y value we want. We only monitor the Xs and then predict the Y value and have action plans for various values. For example, we cannot cause customer demand to be what we want. Instead, we create correlation (not causal models) using predictors (not root causes), to predict demand. Based on what the model predicts, we adjust our resources, schedule, budgets, increase sales force and marketing, etc.

The author gives the following advice: To avoid model misspecification, first ask: Is there any functional relationship between the variables under consideration? This is true if you are looking for causal factors but not for prediction/forecasting models.

Know why you are using a correlation model is the first question–which case A or B.

]]>Robert Ballard

MBB – Global Productivity Solutions