I have a few queries related to interpretation of certain terms in Minitab related to Regression(GLM) and ANOVA. There are a few statistical concepts which I encountered in my research and I am taking the liberty of asking about them as well.
1) What is the difference between ANOVA in Regression and ANOVA in general as displayed in Minitab? In general how should one interpret ANOVA in regression?
2) When we look at ANOVA output in Minitab we also see an R-Sq value. Where does this come from?
3) I understand what residuals are but why should one do a normality test on residuals? Also I have read that post a regression test residulas should be plotted against predicted values. Why?
Except for the specifics of Minitab (which I don’t know so I can’t comment on) you are asking some very large, broad brush questions the answers to which are beyond the scope of a forum of this type.
I’ll offer the following – Chapter 9 Applied Regression Analysis 2nd Edition – Draper and Smith will give you a very readable explanation of the issues you raised in questions #1 and #2. Pages 8-24 and Chapter 3 of the same book will give you a detailed explanation of #3.
The short, relatively non-informative answers to your questions are as follows:
For #1 and #2:
ANOVA and regression are two sides of the same coin. From pp. 423 “One frequently used method of analysis on data is the analysis of variance technique. It is usually treated as being something foreign to and quite different from general regression, and some workers are unaware that any “fixed-effect” (sometimes called Model I) analysis of variance situation can be handled by a general regression routine, if the model is correctly identified and if precautions are taken to achieve independent normal equations… One though behind this comment is the fact that the question “What model are you considering?” is often met with “I am not considering one – I am using ANOVA.” The realization that a model exists for all ANOVA situations, and that it and it alone is the basis for the construction of an ANOVA table, might be aided by knowing that ANOVA is, practically equivalent to a regression analysis” …So where does R2 come from – it arises as a basic check of the amount of variance explained by the ANOVA.
Regression is just the simple act of algebraically fitting a line/surface through a cloud of points and the equations for doing this can be found in any basic book on regression. Regression ANALYSIS, on the other hand, involves assessing the fit of the surface and the correctness of the terms in the regression. This assessment requires the following assumptions:
1. The errors are normally distributed
2. They have a mean of zero
3. They are statistically independent
4. They have a constant variance
The assumptions of normality of the residuals is needed to validate the use of the t and F distributions as exact distributions for construction of test statistics (of the significance of the model terms) and confidence intervals. Thus you need to check the distribution of the residuals to confirm the adequacy of the regression effort. (A word of caution – you REALLY want to plot the residuals and not just slavishly count on normality tests. The reason for this is that it is very easy to fail one or more of these tests. An acceptable plot of the residuals – histogram, normal probability plot, etc. – will tell you whether or not this failure matters.
As for plotting residuals against predicted – that is all part and parcel of the act of Regression ANALYSIS. The investigator examines these plots for patterns – some of the more common ones are – linear trends, curvilinear trends, V mask shapes, residual banding. Each of these patterns provides understanding with respect to the adequacy of the model fit and the need for things such as term addition, variable transformation, etc. Again, Chapter 3 of the referenced book will have the details.
If you can’t get this book through inter-library loan another good reference which addresses the regression parts of your question is Regression Analysis by Example by Chatterjee and Price.
Thank you Robert. I will try and procure the books and go through the relevant chapters.
Quite the detailed question. Nice to take the time for that–@rbutler
@rbutler This is two forum posts I’ve read in five minutes in which you’ve casually mentioned not using Minitab. If you’re trying to drop subtle hints to get my attention, don’t be shy…just let me know you want to be in the Minitab family!
Minitab is fine Joel – no problems and I used it a long time ago. For the work I do I use SAS and R. The only reason for mentioning the lack of knowledge of the current Minitab program is because there are program differences and how you pose your problem to a program varies.