Residuals Properties in Linear and NonLinear Regression Models
Six Sigma – iSixSigma › Forums › General Forums › Tools & Templates › Residuals Properties in Linear and NonLinear Regression Models
 This topic has 5 replies, 2 voices, and was last updated 1 year, 7 months ago by Robert Butler.

AuthorPosts

April 23, 2021 at 8:33 am #253478
and.pisanoParticipant@and.pisano Include @and.pisano in your post and this person will
be notified via email.Hi all,
I know that for linear regression (simple and multiple) we assume:
Homoscedasticity: The variance of residual is the same for any value of X.
Independence: Observations are independent of each other.
Normality: For any fixed value of X, Y is normally distributed.
Normality of residuals tells us if the regression model is strong.
I wonder if this condition is valid even for nonlinear regression and in general if the properties that I mentioned before are assumptions for nonlinear regression.
Thanks0April 23, 2021 at 1:03 pm #253488
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.I’m afraid most of what you have stated is wrong.
My reference is Applied Regression Analysis 2nd Edition – Draper and Smith
1. There are no restrictions on the distributions for either the X or the Y. The question of normality (or approximate normality) is one that is restricted to just the residuals.
The variance of the residuals is what it is an there are no caveats concerning that variance as a function of the X’s as far as a requirement for regression is concerned.
The key points can be found on pages 22 and 23 of the cited reference. The short quoted version is this:
“[with regard to regression] Up to this point we have made no assumptions at all that involve probability distributions. A number of specified algebraic calculations have been made and that is all. We now make the basic assumptions that for a model of Y = fn(Xi +e) (I can’t write epsilon using this platform so I’m using “e” in its place)
1. e is a random variable with mean zero and variance sigma**2.
2. e(i) and e(j) are uncorrelated such that cov(e(i),e(j)) = 0
3. e(i) is a normally distributed random variable, with mean 0 and variance sigma**2″There are no other assumptions/requirements.
I don’t know what you mean by “Normality of residuals tells us if the regression model is strong.”
When plotted on normal probability paper if the residual patterns are not “acceptably” normal (passes the fat pencil test) or if a histogram of the residuals indicates bimodal/log normal, or if a plot of the residuals against the predicted results in patterns with significant linear or curvilinear trends or have < or > shapes, then the residuals are telling you there are still things you need to address before accepting a model for a test. Chapter 3 of the same reference covers most of this territory (there are other shapes for plots of residuals against predicted but those mentioned are the ones most often encountered).
As for nonlinear – that is models that are nonlinear in the parameters (not models that just happen to have higher orders of the X’s – these are still linear regression models) – the same rules apply.
The need for acceptable normality in the residuals is because the t and the F tests are the means used to check for term significance.
0April 23, 2021 at 1:38 pm #253489
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.It occurred to me your phrase “Homoscedasticity: The variance of residual is the same for any value of X.” could be interpreted as a short verbal summary of the paragraph I wrote concerning what to look for when running the residual analysis. If this is the case then the statement is true but I think it is far too brief and could easily mislead people with respect to what one should do when assessing residuals.
0April 25, 2021 at 2:46 pm #253522
and.pisanoParticipant@and.pisano Include @and.pisano in your post and this person will
be notified via email.Thank you for you answer.
I reported what I got from some web sites specialized on six sigma application (for example https://sphweb.bumc.bu.edu/otlt/MPHModules/BS/R/R5_CorrelationRegression/R5_CorrelationRegression4.html, https://opexresources.com/analysisresidualsexplained/) as I didn’t find one specific source to rely on.I was interested in possible differences between linear and nonlinear regression regarding residuals.
0April 25, 2021 at 4:37 pm #253523
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.Well, the best I can tell you is what I said in my first post – most of what you quoted is wrong.
Specifically – to the points made on the first site:
“There are four assumptions associated with a linear regression model:”
“Linearity: The relationship between X and the mean of Y is linear. ”
Not true – see the reference I gave in the first post.
“Homoscedasticity: The variance of residual is the same for any value of X.”
True – if you have a fit that accounts for all of the special cause variation – not true otherwise.
Most importantly it is not an assumption – it is a result of an adequate fit to the data.“Independence: Observations are independent of each other.”
True.
“Normality: For any fixed value of X, Y is normally distributed.”
Not true – see the reference I gave in the first post.
0April 25, 2021 at 5:55 pm #253524
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.As for the second link the following commentary from their site
“Consider the two regression models, and their residuals plots, shown here:
The (lower) plots show the residuals for each model (the residuals are the errors between the regression lines and the actual data points). It can be seen that:
1) The residuals for the ‘good’ regression model are Normally distributed, and random.
2) The residuals for the ‘bad’ regression model are nonNormal, and have a distinct, nonrandom pattern.Using this knowledge, the validity of a regression model can be assessed by looking at its residuals.”
isn’t wrong but it is a very poor and misleading “explanation” of the why’s and wherefore’s of residual analysis.
For starters the choice of the words, in quotes, “good” and “bad” is terrible. The issue isn’t one of “good” or “bad” nor is it one of “validity” – it is one of adequate fit to the data.
The greatest failing of the text on that site is it doesn’t tell you anything about what nonrandom residual patterns tell you about the shortcomings of your regression effort, nor does it explain how to use those patterns to further analyze your data to resolve those shortcomings.
0 
AuthorPosts
You must be logged in to reply to this topic.