Non normal data
Six Sigma – iSixSigma › Forums › Old Forums › General › Non normal data
 This topic has 9 replies, 6 voices, and was last updated 14 years, 9 months ago by Vallejo.

AuthorPosts

October 30, 2007 at 5:11 pm #48547
Carlos PalmaParticipant@CarlosPalma Include @CarlosPalma in your post and this person will
be notified via email.My question is very simple, is valid the way that i have used with my nonnormal dta? First using BoxCox transformation into normal data and after using regression test, and using the value of the coeficients of regression obtained of this test?
Thank you!!0October 30, 2007 at 5:30 pm #164146Carlos,
If the transformations were done to the data before you conducted the regression, simply because you saw nonnormality then I would say that the transform equation obtained (and associated coefficients) are suspect. We typically dont transform the x and/or y unless there are warning signals provided through the analysis of the residuals and other associated diagnostics. I would suggest putting the data into the analysis, in an unadulterated fashion, and assess from there.
Another item to ponder is the usability of the transform equation that you currently have. Those factors, and specifically the coefficients, have been created due to an artificial manipulation of the data. The process that model is going to be used against, is not utilizing the transformation so you need to ensure that those coefficients are backtransformed to match the data. It would be similar to taking a process that is using Celsius, converting that to Kelvin and then developing your models, and sensitivities. You couldnt simply cut and paste the answer back into a process that is using data in a different scale. Youd be trying to mix apples and oranges.
Hope that helps.
Regards,
Erik0October 30, 2007 at 7:11 pm #164151
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.The problem is that your question isn’t simple or clear. As near as I can tell you are saying the following:
1. You have some data that you have determined, by methods unknown, to be nonnormal.
2. You chose, for reasons unknown, to use the BoxCox transform to transform the data to normal.
3. You chose to run a regression on this data (presumably this data was response data – the Y) against some unspecified X’s.
4. You are calling the act of regression a test – it isn’t. It is just a series of specified algebraic calculations made using the data.
5. You have the coefficients from this regression and you seem to want to use them for something – what this may be isn’t clear.
If, as Erik has been trying to elicit from you, your question is the following:
Do I need to transform my Y values before running a regression? Then the answer is no you do not.
If you are running a regression and you are trying to use the coefficients to give you some estimate of the impact of your X’s and your question is how to use the coefficients to determine significance of the relationship between the X’s and the Y’s then the answer is you need to look at the F statistics associated with the parameter estimates.
If it is something else then you will need to expand your explanation of your problem in order for anyone to offer a meaningful comment.0October 31, 2007 at 12:39 pm #164176
Carlos PalmaParticipant@CarlosPalma Include @CarlosPalma in your post and this person will
be notified via email.This is my model:
Predictor Coef SE Coef T P
Constant 0,5323 0,1201 4,43 0,000
Horas InicioFin 0,63305 0,02398 26,40 0,000
Horas InicioRecep 1,64047 0,09574 17,13 0,000
Horas EntregaFin 0,23132 0,02861 8,09 0,000
S = 0,198745 RSq = 90,9% RSq(adj) = 90,7%
Analysis of Variance
Source DF SS MS F P
Regression 3 54,348 18,116 458,63 0,000
Residual Error 137 5,411 0,039
Total 140 59,759
My model with BoxCox transformation.
The model without transformation and with nonnormal data:
Predictor Coef SE Coef T P
Constant 0,1525 0,1345 1,13 0,259
Horas Inicio Rep.Recepción 0,997422 0,007228 137,99 0,000
Horas InicioFin Rep. 0,994358 0,004428 224,57 0,000
Horas EntregaFin Rep. 1,01085 0,00908 111,31 0,000
S = 0,951435 RSq = 99,9% RSq(adj) = 99,9%
Analysis of Variance
Source DF SS MS F P
Regression 3 101497 33832 37374,50 0,000
Residual Error 137 124 1
Total 140 101621
Source DF Seq SS
Horas Inicio Rep.Recepción 1 33535
Horas InicioFin Rep. 1 56747
Horas EntregaFin Rep. 1 11215
And my question, wich one is correct?
0October 31, 2007 at 3:51 pm #164187
GB said it bestParticipant@GBsaiditbest Include @GBsaiditbest in your post and this person will
be notified via email.All models are wrong, some are usefull.
So both and neither are “correct”.
Neither are the real world, but….
One is “correct” in transformed space, the other in original unit space.0October 31, 2007 at 3:56 pm #164188
New ATIParticipant@NewATI Include @NewATI in your post and this person will
be notified via email.Is it possible to clarify the BoxCox transformation in just simple few sentences.
Many Thanks0October 31, 2007 at 4:02 pm #164192
Carlos PalmaParticipant@CarlosPalma Include @CarlosPalma in your post and this person will
be notified via email.But I mean, then do you think that the BoxCox transformation is correct? the conclusions could it be that the coeficient of regression with more influence is 1.64? That it means that we have to work in this X?
This is my question, can I trust in this analays?0October 31, 2007 at 5:32 pm #164198
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.Thanks for the model posting – it clarifies the issue and the question. The answer is – both, neither, or maybe one and not the other. The reason this is the answer is that you can’t use what you have provided to answer the question.
You have run a regression with (I assume) the Y’s either transformed or not. You have two regression equations which, with their summary statistics, are telling you that all of the terms you have included are significant. There is the curious fact that the constant in the second model isn’t significant and I don’t quite understand why that should be so. Other than that there are some differences with respect to MSE and R2 but given the fact that the data has been transformed this isn’t too surprising.
In order to determine which model may be the better fit (that is, the “correct” model) you are going to have to run a regression analysis – this means you are going to have to run a residual analysis. This analysis will give you information about adequacy of fit, influential points that may be adversely impacting your regression, identify the impact of unknown variables or terms not included in your model, etc. You will need to get a good book on regression analysis and read the chapter(s) on this subject.
A good place to start would be Applied Regression Analysis 2nd Edition – Draper and Smith.0November 2, 2007 at 10:11 am #164287
Carlos PalmaParticipant@CarlosPalma Include @CarlosPalma in your post and this person will
be notified via email.BoxCox transformation y a Six Sigma tool, usedonly when you have nonnormal data and then, you can transform this data in normal with BoxCox, and with this way You can use other Six Sigma tools where it is necessary the normallity of the data.
0November 9, 2007 at 2:37 pm #164587Hello, I would like to solve exactly about the residual data in regression analysis, what information can they provide me? what factors do I have to study?
Thanks for your help0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.