iSixSigma

Residuals Properties in Linear and Non-Linear Regression Models

Six Sigma – iSixSigma Forums General Forums Tools & Templates Residuals Properties in Linear and Non-Linear Regression Models

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
    Posts
  • #253478

    and.pisano
    Participant

    Hi all,
    I know that for linear regression (simple and multiple) we assume:
    Homoscedasticity: The variance of residual is the same for any value of X.
    Independence: Observations are independent of each other.
    Normality: For any fixed value of X, Y is normally distributed.
    Normality of residuals tells us if the regression model is strong.
    I wonder if this condition is valid even for non-linear regression and in general if the properties that I mentioned before are assumptions for non-linear regression.
    Thanks

    0
    #253488

    Robert Butler
    Participant

    I’m afraid most of what you have stated is wrong.

    My reference is Applied Regression Analysis 2nd Edition – Draper and Smith

    1. There are no restrictions on the distributions for either the X or the Y. The question of normality (or approximate normality) is one that is restricted to just the residuals.

    The variance of the residuals is what it is an there are no caveats concerning that variance as a function of the X’s as far as a requirement for regression is concerned.

    The key points can be found on pages 22 and 23 of the cited reference. The short quoted version is this:

    “[with regard to regression] Up to this point we have made no assumptions at all that involve probability distributions. A number of specified algebraic calculations have been made and that is all. We now make the basic assumptions that for a model of Y = fn(Xi +e) (I can’t write epsilon using this platform so I’m using “e” in its place)

    1. e is a random variable with mean zero and variance sigma**2.
    2. e(i) and e(j) are uncorrelated such that cov(e(i),e(j)) = 0
    3. e(i) is a normally distributed random variable, with mean 0 and variance sigma**2″

    There are no other assumptions/requirements.

    I don’t know what you mean by “Normality of residuals tells us if the regression model is strong.”

    When plotted on normal probability paper if the residual patterns are not “acceptably” normal (-passes the fat pencil test) or if a histogram of the residuals indicates bimodal/log normal, or if a plot of the residuals against the predicted results in patterns with significant linear or curvilinear trends or have < or > shapes, then the residuals are telling you there are still things you need to address before accepting a model for a test. Chapter 3 of the same reference covers most of this territory (there are other shapes for plots of residuals against predicted but those mentioned are the ones most often encountered).

    As for non-linear – that is models that are non-linear in the parameters (not models that just happen to have higher orders of the X’s – these are still linear regression models) – the same rules apply.

    The need for acceptable normality in the residuals is because the t and the F tests are the means used to check for term significance.

    0
    #253489

    Robert Butler
    Participant

    It occurred to me your phrase “Homoscedasticity: The variance of residual is the same for any value of X.” could be interpreted as a short verbal summary of the paragraph I wrote concerning what to look for when running the residual analysis. If this is the case then the statement is true but I think it is far too brief and could easily mislead people with respect to what one should do when assessing residuals.

    0
    #253522

    and.pisano
    Participant

    Thank you for you answer.
    I reported what I got from some web sites specialized on six sigma application (for example https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/R/R5_Correlation-Regression/R5_Correlation-Regression4.html, https://opexresources.com/analysis-residuals-explained/) as I didn’t find one specific source to rely on.

    I was interested in possible differences between linear and non-linear regression regarding residuals.

    0
    #253523

    Robert Butler
    Participant

    Well, the best I can tell you is what I said in my first post – most of what you quoted is wrong.

    Specifically – to the points made on the first site:

    “There are four assumptions associated with a linear regression model:”

    “Linearity: The relationship between X and the mean of Y is linear. ”

    Not true – see the reference I gave in the first post.

    “Homoscedasticity: The variance of residual is the same for any value of X.”

    True – if you have a fit that accounts for all of the special cause variation – not true otherwise.
    Most importantly -it is not an assumption – it is a result of an adequate fit to the data.

    “Independence: Observations are independent of each other.”

    True.

    “Normality: For any fixed value of X, Y is normally distributed.”

    Not true – see the reference I gave in the first post.

    0
    #253524

    Robert Butler
    Participant

    As for the second link the following commentary from their site

    “Consider the two regression models, and their residuals plots, shown here:

    The (lower) plots show the residuals for each model (the residuals are the errors between the regression lines and the actual data points). It can be seen that:

    1) The residuals for the ‘good’ regression model are Normally distributed, and random.
    2) The residuals for the ‘bad’ regression model are non-Normal, and have a distinct, non-random pattern.

    Using this knowledge, the validity of a regression model can be assessed by looking at its residuals.”

    isn’t wrong but it is a very poor and misleading “explanation” of the why’s and wherefore’s of residual analysis.

    For starters the choice of the words, in quotes, “good” and “bad” is terrible. The issue isn’t one of “good” or “bad” nor is it one of “validity” – it is one of adequate fit to the data.

    The greatest failing of the text on that site is it doesn’t tell you anything about what non-random residual patterns tell you about the short-comings of your regression effort, nor does it explain how to use those patterns to further analyze your data to resolve those short-comings.

    0
Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.