# Regression Analysis

Six Sigma – iSixSigma Forums Old Forums General Regression Analysis

Viewing 35 posts - 1 through 35 (of 35 total)
• Author
Posts
• #39794

VoteForPedro
Member

How does one present a regression model when reporting out?  My understanding is it does not show causation, simply correlation, and thus you can not say “Changing X ____ amount will result in a ____ change in Y”.  Is this understanding accurate? If so, how then should the results be interpreted?  Thanks for your time.

0
#121910

Issa
Participant

No VoteForPedro,
The main purpose of Regression Analysis is not to just determine the level of correlation between magnitudes. If is instead to determine how the dependent variable Y reacts to variations of the independent variable X. Y = aX + b.

0
#121912

Roshan
Member

You can use the R square value to determine the strength of the independent variable (x). The R square is calculated by dividing the TOTAL EXPLAINED VARIATION with TOTAL VARIATION.
Total Explained Variation is the variation that is explained by the X. This is derived from the standard deviation calculated from the predictor line. Total variation is derived from the standard deviation from the Mean line

0
#121914

Anonymous
Guest

VfP,
You are quiet right when you say that a regression correlation does not prove cause and effect. This is why most semiconductor facilities do not change processes on the basis of a single or even a number of trials (DOEs), but include other cautionary measures, such as a planned ramp.
Andy

0
#121915

Whitehurst
Participant

Hi Pedro,
As far as my understanding of regression analysis Rsq or Rsq(adj) can be interpreted in two says 1) how much variation in Y is explained by the particular X 2) How well the data fits the particular model.Even if you model fits the data well the assumption that your residual is not normal . the error the mean is zero has to be met.In multiple regression you have to take care of the mutilcolinearity then the VIF factor is important it should be more than 5 or 10.
I am not a statistical expert but this is how I understand it wait for all the Masters reply ( Stan,Darth,Robert and BTDT).
Regards
joe

0
#121919

ROSS
Member

You are right! You must determine the logic relation ship between” X”s and “Y”.  For an example, when the sun rises, the rooster begin to crow. We can not say “rooster crow” is the reason of “sun rise” . But we can say ” sun rise” leads to ” rooster crow”. And make sure of that “X”s are independent.

0
#121920

Anonymous
Guest

It’s interesting to note the difference between a ‘statistical’ approach and a ‘physical’ one. I’m sure many posters appreciate that most relationships can be represented by a polynomial, the real question is what physical law is operating in the backgound. In the past, this has lead some to believe that physical laws are simple and comprise few variables.
In my opinion, a simple mathematical model often provides real insight as to the cause of variation, epsecially when used with sensitivity analysis (partial differentiation.) One way to generate a simple model is to use diminsionless analysis.
Many consider the ‘physical’ approach inherently more reliable than statistical regression, but I would argue that both we should use the best of both.
Cheers,
Andy

0
#121924

Robert Butler
Participant

Assuming we’ve done all of the correct things with respect to developing the regression equation then I do the following when reporting on the final model.
1. Since I will have built the model using normalized X’s I will rank order the X’s by their coefficients. This ranking will automatically show the variables in their order of statistical importance and perhaps in their order of physical importance too.
2. Using this order as a talking point I will then review each term in the model with the people responsible for the process and for each variable I will ask if :
a.The presence of the variable makes physical sense
b. The sign of the coefficient associated with the variable makes physical sense.
3. If the terms in the model pass this test then I will ask for optimum settings for the Y’s.
4. I will use the model to identify possible combinations of X’s which, when plugged into the model, will predict Y values equal to or similiar the “optimum Y’s” (If there is more than one Y I will provide graphs of the equations to illustrate possible conflicts between settings for optimums for different Y’s).
5. I will review these X settings with the people involved to make sure we aren’t asking for combinations of X’s that are either impossible to attain or that, based on their experience, might result in damaging the equipment.
6. If these combinations of X’s pass this test I will then propose running these combinations of X’s as confirming runs for the model.  After the process has been run with these new X settings we will measure the resultant Y’s to see if they fall within their respective prediction intervals.
7. If the confirmation runs are successful we will then view the model as being of some utility. At this point we will begin to entertain the possibility that the model is actually telling us that over the ranges of the X’s we have studied “Changing X ____ amount will result in a ____ change in Y” and discussion will move towards utilization of the model for process control.
What I don’t do is bother with reporting R2 and I never use R2 for purposes of estimating model adequacy or value.  R2 is a single statistics, it is easily manipulated, and it does not provide any real understanding about the critical issues of model fit.

0
#121926

Darth
Participant

Robert, I didn’t read into the original post that he was referring to DOE.  With repect to your comment below:
1. Since I will have built the model using normalized X’s I will rank order the X’s by their coefficients. This ranking will automatically show the variables in their order of statistical importance and perhaps in their order of physical importance too.
Does the above refer to doing a DOE with coded values or does it apply to the more typical simple regression?  It is my understanding that most regression is done with uncoded values and the coded approach is done with a DOE.  Please clarify.  Thanks.

0
#121927

Robert Butler
Participant

The comment refers to all regression. I always normalize using the actual X levels regardless of the source.  If you run a regression on the ideal matrix of a DOE and your values don’t actually meet that criteria (i.e. say the low value (-1) is supposed to be 50 whatsits but on some of the runs it was only 45 whatsits, your model will be in error if you have coded both 50 and 45 as -1).  The usual normalization procedure is
Adj1 = (Max X – Min X)/2
Adj2 = (Max X + Min X)/2
and
for each independent X in the study.
If you don’t do this and you attempt to run regression diagnostics on the un-normalized X matrix you can get incorrect answers with respect to VIF estimates, and eigenvalues.  This, in turn, can lead you astray with respect to identification of X variables that can be used in the regression effort.

0
#121936

Darth
Participant

Thanks for the lesson.  I always learn a lot about regression from your posts.

0
#121940

Sigmordial
Member

Nice post.  Interesting that you perform the remedial even before the diagnostic for multicollinearity.
One question though — you mention eigenvalues.  Are you also performing principal component regression in addition to the normalization of the predictors?

0
#121941

VoteForPedro
Member

Outstanding response. Thank you.  At the risk of pushing my luck:
You speak of normalizing your predicters prior to modeling. Can you elaborate on how and why this is done?

0
#121944

BTDT
Participant

VoteForPedro:There are two points to make about ‘scope of inference’ and ‘causation’.The first point is ‘scope of inference’.It is common for someone to conduct an analysis using historical data. The most important thing about data is whether it has been gathered randomly or not. It is obvious that a bias can be built into data by the method in which is was gathered. If you are analyzing customer satisfaction from survey forms, for example, you will find that customers usually reply when they are very satisfied or very dissatisfied. The responses are usually bimodal. It would be a mistake to say that your customers are grouped into two ‘camps’. It is correct to say the self-selected customers responding to the survey are grouped into two ‘camps’.When I conduct the Gauge R&R section of a project, I know that the stakeholders will usually object to the results of the analysis if they think the data is biased in some way. This is a logical consequence of having a biased sample. I try to make sure we have a good, random sample of all products, service centers, time of year, experience of salespeople, and so forth BEFORE we conduct any analysis. The statistical reason for obtaining a random sample is that you can make conclusions about the greater population based on my sample. All statistical tests are based on this assumption.Historical data is often called ‘happenstance’ data. The problem is that there is always much more to the dataset than is recorded. There is a real possibility that there are ‘lurking variables’ in the data. A well known example is the mathematical relationship between crime rate as measured by the number of break-and-enter crimes and ice cream sales. The lurking variable is temperature. When it is hot, people leave their windows and door open, making it easier to break into a home. When it is hot, people also tend to buy more ice cream.The second point is ‘causation’Once you have satisfied a group of stakeholders and statisticians that the data has been gathered randomly, then the issue of causation can still be a problem. The above example might show that weather has an effect on both break-and-enter crime rate and ice cream sale, but only because it is logically defensible. The time lag between weather and the other two factors helps in establishing the causation in a statistical sense. This is why economic models concentrate on ‘leading indicators’.The only way to prove causation is to deliberately and randomly manipulate one variable to observe the response in the second. This is sometimes difficult, and takes place during a DOE. In the ‘ice-cream’ example, it would be impossible to deliberately manipulate the weather.When you run a regression analysis and find a correlation between two variables, you can only make a statement for the range of data you have. This can sound like you are making as many caveats as a good lawyer, but is the only way you can really make a true statement. This is a consequence of the ‘scope of inference’ and ‘causation’.If you can not prove causation, you have to use the phrase ‘associated with’. For example, you can truthfully state that ice cream sales ‘are associated with’ a high break-and-enter rate. If the data was for the year 2001 in Chicago, you can only state that ‘For the year 2001, in Chicago, high ice cream sales are associated with a high break-and-enter crime rate.’Cheers, BTDT

0
#121953

Dr. Scott
Participant

Pedro,
A pleasure to respond to you again. You have received some fairly esoteric advice to your simple and straightforward questions?
How does one present a regression model when reporting out?
Present the slope (relationship) between the two variables. And the amount of variation in Y explained by the X(s), i.e. R-square adjusted.
My understanding is it does not show causation, simply correlation, and thus you can not say “Changing X ____ amount will result in a ____ change in Y”.  Is this understanding accurate?
First, it isn’t the “correlation” except in layman’s terms. It is actually correlation (r) squared (hense R-squared). Second, unless you have a reasonable expectation that the X occurs before the Y and no other variable could have a stronger causation then yes, it is only a relationship that might or might not be causal. DOE is the best way to determine causation with more certainty. For example, storks really don’t deliver babies, though there is a strong relationship between stork sightings (previously observed X) and birthrate (latter observed Y) in some parts of the world. For more information on the true cause of births please ask your spouse, mother, or father (just a bit of humor).
If so, how then should the results be interpreted?  Thanks for your time.
Simply put. If the stats suggest so, then there is a relationship between the variables (or not). That is really all you can say. If the X is controllable, then try a DOE.
Hope this helps,
Dr. Scott

0
#121959

VoteForPedro
Member

0
#121961

VoteForPedro
Member

Per usual, your insight is greatly appreciated.  Thanks mate.

0
#122010

Anonymous
Guest

Dr. Scott,
You stated that:
While I agree that a DOE is better than a regression, surely the results of a DOE – fitting an orthogonal polynomial to a finite set of experimental data has nothing to say about causalty?
By the way, r-squared is the Pearson product-moment correlation coefficient – named after Karl Pearson. So I see nothing wrong in referring to it as a ‘correlation.’ :-)
Regards,
Andy

0
#122012

Whitehurst
Participant

“Since I will have built the model using normalized X’s”
Robert can you explain this to me please because as far as I know that it really doesnt matter whether your X’s or Y is normal in regression what it matters is your residual should be normal when your predicting the final model.
I have always learnt from your post so please if you can explain me  this it will of great help
Thankyou once again.
Joe

0
#122018

A.S.
Participant

As per my understanding ,
Regression Analysis is a basic step to identify the correlation within a given range of data.Validity of regression equation is limited to a range of data / samples.Agree with Andy’s view,i also felt that there is no mistake in naming as correlation.
After idnetifying the correlated factors we have to do the DOE to optimise those factors (Xs) .I am unable to understand how DOE will replace at the initial stage itself ie when we don’t know which X is having correlation with Y.
Normalization of X before regression modelling was suggested by Robert.In some cases it may not help.
When i did a regression analysis for Consumables cost Vs Productios Volume,i don’t find any correlation and my adjusted r square also low.During further diagnosis,it was found that some fixed assets where accounted as consumables (data entry mistake).After eliminating those special causes i got the correlation.Like our control charts analysis,we have identify the special / assignable causes.In this type of situation.normalization of will mislead.
Its really a interesting set of post on regression.thanks.

0
#122020

BTDT
Participant

Anbu:I’m glad you persisted in the analysis. Sometimes you can learn a lot from a single point, and sometimes a single point can dramatically alter the results.Try the following as an example. Do a regression analysis on the four points (0,0), (0,1), (1,0) and (1,1). The correlation (=SQRT(r-sq)) is zero. Add the single point (10,10) to the other four and re-run the regression. R-sq is now 97.3% with a p-value of 0.002.Cheers, BTDT

0
#122022

A.S.
Participant

Thanks BTDT,
Hope that for your example also normalization won’t help us.

0
#122023

Robert Butler
Participant

Joe,
What we have here is a failure to communicate! :-)  In the old days, centering and scaling your X responses to a -1 to 1 range (or to a 0 to 1 range in the case of mixture designs) was referred to as “normalizing” the X’s.  It has nothing to do with the distribution of the X’s.  Unfortunately, (or maybe fortunately) I’ve been around for a bit and so sometimes I goof and use ancient terms which can be a source of confusion. As you noted – in a regression, normality in the distribution sense, only matters when doing an analysis of the residuals.
Pedro, Darth, Sigmordal
(I’m going to have to refer to the concept of centering and scaling again and again in what follows so please forgive me but I’ll use the old term of “normalize” to keep the verbiage to a managable level)
There are a number of reason for “normalizing” the X’s and running the regression with these as opposed to the actual X values.  As I mentioned previously, you can get quite different estimates of collinearity if you run without normalized X’s, that is, you can get VIF, eigenvalue, and condition index estimates that will declare collinearity a problem when in fact it isn’t.
The eigenvalues I’m referring to are collinearity diagnostics. While I’ve never though of them in terms of principle component analysis, a review of some of my books last night suggests the eigenvalues of the two are one and the same.  Since I’m still not 100% sure about this I’ll just say that the eigenvalues I’m referring to help estimate linear dependencies or lack thereof among the X’s.  Condition indicies are just the ratios of the various eigenvalues.  pp.104-107 of Regression Diagnostics by Belsley, Kuh, and Welsch has a very good discussion of the uses of these measurements.  Their real value comes from the fact that unlike a simple correlation matrix which only looks at one-to-one correlations they take into accout the issues of many-to-many.  I don’t know if Minitab generates condition indices but I do know that SAS and Statistica do.
The other reason for wanting the normalize the X’s is to give all of the X variables a level playing field.  If you have X’s with gross differences in proportions (i.e. X1 ranges from .001 to .005 and X2 ranges from 10 to 100) you run the risk of roundoff error in the regression.  Even with double precision arithmetic this can be a problem and can lead to different results on different programs.  What this means is that you run the risk of overlooking significant variables and if you run the analysis with different programs you risk getting results that don’t agree. pp. 257 to 266 of Draper and Smith 2nd Edition Applied Regression Analysis has a good discussion of the issues.
As I mentioned before, to me, the additional benefit of normalized X’s is the ability to rank order the X’s by their coefficients and identify those variables having the biggest impact on the results of the regression equation.

0
#122029

Anonymous
Guest

Robert,
Correct me if I’m wrong but eigenvalues are just vectors of scalers .. For example, the process of Gram Schmidt ortogonalisation would deliver eigenvalues.
This appears to stand in stark contrast to a ‘principle componet’ which is ‘vector along an axis of maximum variation, and not necessarily orthogonal, and therefore not a scaler, or an eigenvalue.
Its been sometime since I practiced and my understanding is often in question. But I do agree with the practice of standardization, and for all the reasons you mention, and others. The only problem is that the procedure I refer to as standardization, is somtimes called nomalization, and there doesn’t appear a clear definition between the two? Do you divide by s as well ‘normalising’ by subtracting the mean?
Andy

0
#122034

BTDT
Participant

Andy:Normalizing = data reduction?The determination of the range and mean response for a particular individual is what was done in this study. These parameters were used to massage the raw data such that each respondent used the full range on the 5 point scale and the average for each person was the same. The subsequent analysis was clustering.
https://www.isixsigma.com/forum/showmessage.asp?messageID=73284BTDT

0
#122035

Whitehurst
Participant

Robert that was a great explaination and I have learnt many from each of your post.
Thankyou once again.
Thankyou Sir.

0
#122036

Anonymous
Guest

Thanks BTDT .. so in the case of a regression on a continuous variable, one would normalize the data, by subtracting the Xi-bar from each Xi. Is that correct?
Cheers,
Andy

0
#122037

Robert Butler
Participant

I don’t have a good answer to your question concerning the eigenvalues.  As I mentioned in the last post the verbiage surrounding the description of the two appears to be similar and suggests they may be equivalent but I’m not 100% sure of this – it would appear I have a new homework assignment for the weekend!
You are correct, the standardization method you mention (subtract the mean and divide by the standard deviation) is also called normalization.  When I was learning standardization methods both this method and the one I gave in a previous post were called normalization and later on  the termonology seemed to have changed to calling them methods of centering and scaling.
Subtracting the mean and dividing by the standard deviation will not, in general, give you a -1 to 1 range.  If memory serve me right, the arguments surrounding the two had to do with values generated when computing interaction terms.  With the -1 to 1 range the main effects and interactions would all have the same range whereas those centered and scaled by subtracting the mean and dividing by the standard deviation would not.

0
#122041

Anonymous
Guest

I’m sure you have better things to do over the weekend!!! It really isn’t urgent; I just curious.
I mentioned it because it came up when we had to replicate the Liver Disease calculations in Taguchi’s book MTGS.
We wanted to compare it with PLS and PCR. MTGS obviously uses the Gram-Schmidt method, but we really had to go through hoops to reproduce the calculations. It was only after several days that we realized the data set in the book was incomplete!!! And I had to write to the co-author to obtain the rest of the data.

0
#122044

BTDT
Participant

Andy:I can see no benefit for massaging data before running regression if for no other purpose than to have a set of continuous response data centered. Adding or multiplying Y by a constant will only change the value of the coefficients and have no effect on R-sq, p-values, etc. More aggressive transformation is a different story, and let’s leave that alone for now.For the example I gave, each individual would have an average response to the questions, call it the person’s mood. We would also use a different range of the scale, call this discrimination. The problem was that these effects caused a lot of noise while doing the clustering of customers into market niches.If you used to whole scale of 0-10 and I used only the values from 3-8, then we would have to ‘inflate’ my range of responses with respect to yours because I am more wishy-washy than you. A decent way to do this would be to divide our responses by our average standard deviation. Our spread of responses would now be on the same scale.If your average was now 6.5 on responses and mine was now 4.7, we would have to center our responses because you are in a good mood, while I am grumpy. We would subtract 1.5 from all your responses and add 0.3 to all of mine. Our average responses would now be on the same scale and centered.If I was using this data in a regression model, then I could always include the individual’s mood and discrimination values as refinable parameters in the model. If our survey was 35 questions with 200 respondents, then the data reduction (call it normalization if you wish) would add 400 parameters to the model. We did the data reduction outside any regression because we were using the data for cluster analysis.It worked pretty well for treating our survey data. Send me email if you want to keep going with this.DOEStuff(at)hotmail(dot)comCheers, BTDT

0
#122047

Anonymous
Guest

Thanks .. Ibut ‘ll leave it there until I have an application – thanks for the offer.
Andy

0
#122056

AT
Participant

Please help me toclarify this. It is my understanding that Regression analysis could be used predict values which are within the range of two extreme values (High and low end of the data sets) as long as these data sets are used in generating the equation and R square(adjusted) is higher than 80%. Typically Regression analysis should not be used as a forecasting equation for values outside the data range. Am I correct on this. A master black belt tells me other wise. Any help?
Thanks,
Regards.
Ramesh

0
#122058

Robert Butler
Participant

A regression equation is a description of the correlations observed within a given block of data.  Consequently it is best used for predicting responses over that region (interpolation).  You can use the regression equation for extrapolation (prediction outside the ranges of the variables used for the initial construction) but you are making the very large assumption that the region outside of the one you examined exhibits the exact same trends as those you saw within.
If you like the look of the extrapolated values and you would like to run your process in that region you should either set up a new design for the region of interest or augment the existing design to permit an expansion of the data into the region of interest.  After augmentation you will need to re-run your analysis to generate new predictive equations.  What you should not do is attempt to run your process based on nothing more than an extrapolation.
As for the R2 = 80% rule – forget it – you don’t evaluate the utility of a regression equation with one statistic.  For further details see the earlier discussions on this thread and the discussions associated with the thread listed below.
https://www.isixsigma.com/forum/showmessage.asp?messageID=43683

0
#122060

BTDT
Participant

Robert Butler made the same point on a earlier post, I just found it.https://www.isixsigma.com/forum/showmessage.asp?messageID=43683BTDT

0
#122061

BTDT
Participant

Well, that post went in the wrong placeBTDT

0
Viewing 35 posts - 1 through 35 (of 35 total)

The forum ‘General’ is closed to new topics and replies.