Three reasons that R-Sq (adj) should be used instead:

1. takes into consideration the number of data points used in the regression model;

2. takes into consideration the number of terms in the regression equation;

3. is more conservative than R-Sq.

However, it is a good practice to compare R-Sq and R-Sq (adj) to be sure they are close in value, as a quick cross-check.

]]>Residual analysis is critical to successfully applying the tool.

]]>It cancels the effect of having both positive and negative values

It magnifies (penalizes) the larger errors.”

You haven’t explained why it is necessary to penalize larger errors to find a line that “fits” or explains the data? You could penalize larger errors in many ways: multiply only large errors by a factor, use any even power, e.g., fourth and not just second power (square), use only errors that are greater than a certain percentages. In addition, by “penalizing” larger errors, those values will be substantially more influential in determining the line. The result is that a single point can make the least squares line be substantially misleading for most points.

The better explanation for squaring is that it provides additional and tractable statistical benefits: hypothesis testing and confidence intervals using unbiased estimates of the variances. However, this does not mean that the least squares line is the most useful one.

If your purpose is to find a line that contains either the averages or proportion of individual values within a predefined limit, other ways of determining that line are better. First, determine why you want to fit a line to data and then determine what method(s) will be better. You might choose the absolute deviation approach—or other methods found in the literature.

“Therefore a much more important indicator of the validity of the model is – as always – the p-value.” The p-value is probably the least important indicator of model validity.

Again, depending on the purpose, other criteria that evaluate the extent the purpose is met are better indicators. No one (I hope) has as a purpose to fitting a line to data that the p value be significant. Rather most often the purpose is to predict. Hence, the accuracy and frequency (probability) of correct predictions are better indicators for the prediction purpose.

Check Minitab for definition of influential points. You will see that one type is a point far from a fitted line in a vertical direction (Y). This influence is exaggerated using least squares. The other type is a point far from the others in a horizontal (X) direction. This will increase R-square and lead to mistakenly significant p-values.

]]>1.5 divided by 6 does not equal 0.75, it equals 0.25.

]]>