Statistical modeling is used in process improvement in order to make predictions about an output based on critical inputs that affect it — you can think of the model as a crystal ball. In that comparison, the fitted values are the predictions made by the crystal ball. Even if the values of those critical inputs have never been observed or had data recorded, they can be utilized to calculate fitted values and estimate what will happen in real life under those conditions.
Overview: What are fitted values?
Most Six Sigma projects utilize the DMAIC methodology to distill a large number of inputs (Xes) into the “critical few” that are proven to exert significant influence on the output (Y) of interest. The relationship between those inputs and the output is described using a mathematical formula determined using statistical techniques such as regression and ANOVA.
This formula is often written as Y=f(X) as shorthand for the equation relating the inputs to the outputs.
Once the Y=f(X) relationship has been established and validated, it can be used to make predictions about what value to expect the output to center around given specific levels of the inputs. The value is known as the fitted value. It is called “fitted” because the statistical methods used to create the Y=f(X) equation are “fitting” that equation to be as central to the observed data as possible, similar to how a tailor would “fit” clothing to match the shape of the individual as closely as possible.
4 benefits (or drawbacks) of fitted values
While the definition of fitted values may seem very mathematical, or even abstract, the idea behind them has significant practical importance. In fact, it is the fitted values that enable us to optimize our processes in the Improve phase of a project! However, there are ways in which they might mislead us that we should be aware of.
1. Prediction of the output
A valuable benefit of fitted values is that they enable us to predict an output even when we’ve never observed the specific levels of the inputs. Often, the optimal levels of those inputs have never been observed, but by using observed data and designed experiments, we can still calculate fitted values and identify a better process.
2. Understanding sources of variation
Another benefit of fitted values is that they allow us to explore how variation in one or more of the inputs results in variation in the output of interest. This helps us identify the importance of removing variation from upstream factors in order to drive improvement in the downstream output.
3. Correlation vs. causation
One drawback is that fitted values from models that were created from observational data may not predict reality very well; this is often true because those inputs might not actually be the drivers behind changes in the output but rather are simply correlated with it. Because of this, validating the prediction is important.
Another drawback is that if the specific levels of interest for the inputs are not in the same region as the levels observed in the data, the fitted values may be wildly off; this is typically referred to as extrapolation.
Why are fitted values important to understand?
Determining a Y=f(X) relationship is not about showing off statistical skills or complicating the improvement process; rather, it is to enable us to identify the best combination of input levels to drive performance of the output. The link between the technical and the practical is the fitted value.
Fitted values allow us to translate the technical Y=F(x) relationship into practical knowledge. Rather than putting a lengthy and complicated mathematical equation in front of a team or leader, we can translate that equation into a simple prediction.
Understanding observation and expectation
Because random variation is inherent in every process, the specific data we observe in a single instance may not represent what we should expect to observe long term. Fitted values utilize the entire data set rather than a single observation in order to predict where multiple values will center around.
An industry example of fitted values
In manufacturing, determining appropriate centerlines is critical to consistent operation. However, those centerlines were often determined based on observation, and it is poorly understood how each may interact with others. Because experience in observing performance only at certain settings drives decisions on what centerlines are appropriate, operators and mechanics on different shifts might utilize completely different settings.
By observing process performance at different centerline settings, either observationally or through a designed experiment, a Y=f(X) relationship can be established and, from it, fitted values for different centerlines. We can then predict the best combination to utilize and confirm through a pilot.
3 best practices when thinking about fitted values
While fitted values are very helpful in predicting the output, the potential pitfalls require practitioners to follow certain guidelines.
1. Avoid extrapolation
Extrapolation means making predictions for input values well outside of the range of values observed in the dataset that created the model. For example, if you model gas mileage versus speed using a dataset containing speeds from 35 to 75 miles per hour, it would be inappropriate to utilize the resulting equation to predict mileage at 120 miles per hour.
2. Verify results
Before communicating project success or calculating financial benefits, an excellent protection against a variety of issues like faulty data and non-causal relationships is to simply try using factory settings of interest and verify that the observed output is close to the fitted value.
3. Use confidence and prediction intervals
Statistical software packages provide not only the fitted values, but also confidence and prediction intervals on those values; utilizing these helps others understand the amount of uncertainty in the fitted value.
Frequently Asked Questions (FAQ) about fitted values
1. How are fitted values calculated?
The Y=f(X) relationship is literally a math formula. Although software commonly presents output in tables and not explicitly as an equation, behind the scenes, that is exactly what has been determined. The fitted value is simply the number this equation returns when specific values for the inputs are plugged into the equation.
2. Why don’t my fitted values match my data?
If observed values for inputs are used to calculate fitted values, those fitted values typically will not match what was observed. Because random variation is inherent in virtually all processes, the observed value cannot be thought of as the “correct” value, and there is nothing wrong with either the observed value or the fitted value just because they do not match.
Fortune tellers and fitted values
If the Y=f(X) formula is a fortune teller looking into a crystal ball, then the fitted values are the images they see. But rather than based on dubious sources like tea leaves, they are based on real data and information and are often quite accurate; in fact, once we’ve validated their accuracy, we are willing to change our processes based on them.