# Non normal data

Six Sigma – iSixSigma Forums Old Forums General Non normal data

Viewing 10 posts - 1 through 10 (of 10 total)
• Author
Posts
• #48547

Carlos Palma
Participant

My question is very simple, is valid the way that i have used with my non-normal dta? First using Box-Cox transformation into normal data and after using regression test, and using the value of the coeficients of regression obtained of this test?
Thank you!!

0
#164146

Erik L
Participant

Carlos,

If the transformations were done to the data before you conducted the regression, simply because you saw nonnormality then I would say that the transform equation obtained (and associated coefficients) are suspect.  We typically dont transform the x and/or y unless there are warning signals provided through the analysis of the residuals and other associated diagnostics.  I would suggest putting the data into the analysis, in an unadulterated fashion, and assess from there.

Another item to ponder is the usability of the transform equation that you currently have.  Those factors, and specifically the coefficients, have been created due to an artificial manipulation of the data.  The process that model is going to be used against, is not utilizing the transformation so you need to ensure that those coefficients are backtransformed to match the data.  It would be similar to taking a process that is using Celsius, converting that to Kelvin and then developing your models, and sensitivities.  You couldnt simply cut and paste the answer back into a process that is using data in a different scale.  Youd be trying to mix apples and oranges.

Hope that helps.

Regards,
Erik

0
#164151

Robert Butler
Participant

The problem is that your question isn’t simple or clear.  As near as I can tell you are saying the following:
1. You have some data that you have determined, by methods unknown, to be non-normal.
2. You chose, for reasons unknown, to use the Box-Cox transform to transform the data to normal.
3. You chose to run a regression on this data (presumably this data was response data – the Y) against some unspecified X’s.
4. You are calling the act of regression a test – it isn’t. It is just a series of specified algebraic calculations made using the data.
5. You have the coefficients from this regression and you seem to want to use them for something – what this may be isn’t clear.
If, as Erik has been trying to elicit from you, your question is the following:
Do I need to transform my Y values before running a regression?  Then the answer is no- you do not.
If you are running a regression and you are trying to use the coefficients to give you some estimate of the impact of your X’s and your question is how to use the coefficients to determine significance of the relationship between the X’s and the Y’s then the answer is you need to look at the F statistics associated with the parameter estimates.
If it is something else then you will need to expand your explanation of your problem in order for anyone to offer a meaningful comment.

0
#164176

Carlos Palma
Participant

This is my model:
Predictor          Coef     SE Coef     T   P
Constant         -0,5323    0,1201    -4,43 0,000
Horas Inicio-Fin  0,63305   0,02398   26,40 0,000
Horas Inicio-Recep 1,64047  0,09574   17,13 0,000
Horas Entrega-Fin  0,23132  0,02861    8,09 0,000
S = 0,198745 R-Sq = 90,9% R-Sq(adj) = 90,7%
Analysis of Variance
Source DF SS MS F P
Regression 3 54,348 18,116 458,63 0,000
Residual Error 137 5,411 0,039
Total 140 59,759
My model with Box-Cox transformation.
The model without transformation and with non-normal data:
Predictor Coef SE Coef T P
Constant 0,1525 0,1345 1,13 0,259
Horas Inicio Rep.-Recepción 0,997422 0,007228 137,99 0,000
Horas Inicio-Fin Rep. 0,994358 0,004428 224,57 0,000
Horas Entrega-Fin Rep. 1,01085 0,00908 111,31 0,000

S = 0,951435 R-Sq = 99,9% R-Sq(adj) = 99,9%

Analysis of Variance
Source DF SS MS F P
Regression 3 101497 33832 37374,50 0,000
Residual Error 137 124 1
Total 140 101621

Source DF Seq SS
Horas Inicio Rep.-Recepción 1 33535
Horas Inicio-Fin Rep. 1 56747
Horas Entrega-Fin Rep. 1 11215
And my question, wich one is correct?

0
#164187

GB said it best
Participant

All models are wrong, some are usefull.
So both and neither are “correct”.
Neither are the real world, but….
One is “correct” in transformed space, the other in original unit space.

0
#164188

New ATI
Participant

Is  it  possible  to  clarify  the Box-Cox transformation in  just  simple  few  sentences.
Many Thanks

0
#164192

Carlos Palma
Participant

But I mean, then do you think that the Box-Cox transformation is correct? the conclusions could it be that the coeficient of regression with more influence is 1.64? That it means that we have to work in this X?
This is my question, can I trust in this analays?

0
#164198

Robert Butler
Participant

Thanks for the model posting – it clarifies the issue and the question.  The answer is – both, neither, or maybe one and not the other.  The reason this is the answer is that you can’t use what you have provided to answer the question.
You have run a regression with (I assume) the Y’s either transformed or not.  You have two regression equations which, with their summary statistics, are telling you that all of the terms you have included are significant.  There is the curious fact that the constant in the second model isn’t significant and I don’t quite understand why that should be so. Other than that there are some differences with respect to MSE and R2 but given the fact that the data has been transformed this isn’t too surprising.
In order to determine which model may be the better fit (that is, the “correct” model) you are going to have to run a regression analysis – this means you are going to have to run a residual analysis.  This analysis will give you information about adequacy of fit, influential points that may be adversely impacting your regression, identify the impact of unknown variables or terms not included in your model, etc.  You will need to get a good book on regression analysis and read the chapter(s) on this subject.
A good place to start would be Applied Regression Analysis 2nd Edition – Draper and Smith.

0
#164287

Carlos Palma
Participant

Box-Cox transformation y a Six Sigma tool, usedonly when you have non-normal data and then, you can transform this data in normal with Box-Cox, and with this way You can use other Six Sigma tools where it is necessary the normallity of the data.

0
#164587

Vallejo
Participant

Hello, I would like to solve exactly about the residual data in regression analysis, what information can they provide me? what factors do I have to study?