April 28, 2010 at 8:27 pm
Kevin Clay
be notified via email.Do any of the Six Sigma Gurus or Statistician have a fun and educational example of why we use residual analysis to determine model adequac y in regression? I want to use this in my Green Belt class. I am having a difficult time transferring the knowledge to my students in this particular topic.
Regards,



Robert Butler
be notified via email.I’d recommend just making up a simple example and name the X and Y after variables known by your audience.
You might want to try a couple of different versions:
1. Set up about 20 points so that the relationship between X and Y is curvilinear and so that the simple linear fit of Y to X is significant then you can run the simple linear model – show everyone the summary statistics and then show a plot of the residuals against the predicted and highlight the curve. Refit with a squared term and let everyone see what happens to the summary statistics as well as the residual plot.
2. Build another data set that consists of 19 closely grouped points and one extreme point. Run a simple linear regression and note the significance of the slope, the high R2, etc. and then show the residual plot. The single extreme will stand out like a sore thumb. Rerun the regression with the single extreme removed and show everyone how all of the summary stats collapse.
April 28, 2010 at 9:54 pm
Kevin Clay
be notified via email.Do you know of any data already generated that i can use? Maybe in Minitab Sample Data Folder?
Regards,



Dr Shaik
be notified via email.residual analysis was one of the most important vlidation of fitting regression equation. After running process, taking actuals and their estimated values and find the these difference values plotted on graphically and see were is it consistant or not?.
April 29, 2010 at 1:10 pm
Robert Butler
be notified via email.I’m not trying to be offensive but I find your reluctance to take the time to pull out a simple piece of graph paper and spend a few minutes making up the two data sets I recommended to be very disturbing.
You say you are teaching a Green Belt class and you made reference to Minitab – this would suggest you have at least a Black Belt and some abilities with respect to using Minitab. Even if the assumption of a BB is incorrect the idea that you are teaching a class and cannot do this does not reflect well on your abilities either as a practitioner or as a teacher.
Below are two data sets which will do what I mentioned – it took me all of 5 minutes to make them up and check them via my computer package. As a homework assignment – tell me what you find interesting about the residuals of the first data set for both the linear and the curvilinear models. Hint – it provides further ammo with respect to the need to look at the actual residual plots.
x1 y1 – linearcurvilinear problem
1 1
1 1.1
1 1.2
3 1.1
3 1.2
3 1.3
5 1.4
5 1.5
5 1.45
7 1.2
7 1.3
7 1.35
8 1.3
8 1.35x1 y1 influential data point problem
1 1.1
1 2
1 1.5
2 2
2 1.7
2 1.2
3 1.3
3 1.6
3 1.8
7 4.2
7 4.2

7 4
Kevin Clay
be notified via email.Robert … Thank you for your input and comments. I think my requirements were misinterpreted as i quickly responded. I was looking for an extensive data set with a story behind it (like a file in Minitabs Sample Data Sets) something that is not generic (although i appreciate your work). I am presently using a generic data set much like the one that you had created.
Regards,
Regards,

Kevin
Jonathon Andell
be notified via email.Think of residual analysis as a way to extract the maximum information possible from the data. If any of the readily available residual plots shows a nonrandom pattern, then something in the process is likely to be causing that pattern. Often the nature of the pattern gives strong “hints” as to what the cause is, and that leads to increased understanding of what makes the process tick. The cause may be as simple as a sampling hiccup, or residual analysis could lead to rather significant understanding of a lurking X variable.
As to the “adequacy” of the model: while there is a statistical component to a model’s adequacy, there also is a practical component. If the prediction equation significantly improves your ability to predict and/or control the process outcomes, the practical aspect may outweigh the statistical.
 
