Non normal data

Six Sigma – iSixSigma Forums Old Forums General Non normal data

Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
  • #48547

    Carlos Palma

    My question is very simple, is valid the way that i have used with my non-normal dta? First using Box-Cox transformation into normal data and after using regression test, and using the value of the coeficients of regression obtained of this test?
    Thank you!!


    Erik L

    If the transformations were done to the data before you conducted the regression, simply because you saw nonnormality then I would say that the transform equation obtained (and associated coefficients) are suspect.  We typically don’t transform the x and/or y unless there are warning signals provided through the analysis of the residuals and other associated diagnostics.  I would suggest putting the data into the analysis, in an unadulterated fashion, and assess from there.  
    Another item to ponder is the usability of the transform equation that you currently have.  Those factors, and specifically the coefficients, have been created due to an artificial manipulation of the data.  The process that model is going to be used against, is not utilizing the transformation so you need to ensure that those coefficients are backtransformed to match the data.  It would be similar to taking a process that is using Celsius, converting that to Kelvin and then developing your models, and sensitivities.  You couldn’t simply cut and paste the answer back into a process that is using data in a different scale.  You’d be trying to mix apples and oranges.
    Hope that helps.


    Robert Butler

    The problem is that your question isn’t simple or clear.  As near as I can tell you are saying the following:
    1. You have some data that you have determined, by methods unknown, to be non-normal.
    2. You chose, for reasons unknown, to use the Box-Cox transform to transform the data to normal.
    3. You chose to run a regression on this data (presumably this data was response data – the Y) against some unspecified X’s.
    4. You are calling the act of regression a test – it isn’t. It is just a series of specified algebraic calculations made using the data.
    5. You have the coefficients from this regression and you seem to want to use them for something – what this may be isn’t clear.
     If, as Erik has been trying to elicit from you, your question is the following:
    Do I need to transform my Y values before running a regression?  Then the answer is no- you do not. 
    If you are running a regression and you are trying to use the coefficients to give you some estimate of the impact of your X’s and your question is how to use the coefficients to determine significance of the relationship between the X’s and the Y’s then the answer is you need to look at the F statistics associated with the parameter estimates.
      If it is something else then you will need to expand your explanation of your problem in order for anyone to offer a meaningful comment.


    Carlos Palma

    This is my model:
    Predictor          Coef     SE Coef     T   P
    Constant         -0,5323    0,1201    -4,43 0,000
    Horas Inicio-Fin  0,63305   0,02398   26,40 0,000
    Horas Inicio-Recep 1,64047  0,09574   17,13 0,000
    Horas Entrega-Fin  0,23132  0,02861    8,09 0,000
    S = 0,198745 R-Sq = 90,9% R-Sq(adj) = 90,7%
    Analysis of Variance
    Source DF SS MS F P
    Regression 3 54,348 18,116 458,63 0,000
    Residual Error 137 5,411 0,039
    Total 140 59,759
    My model with Box-Cox transformation.
    The model without transformation and with non-normal data:
    Predictor Coef SE Coef T P
    Constant 0,1525 0,1345 1,13 0,259
    Horas Inicio Rep.-Recepción 0,997422 0,007228 137,99 0,000
    Horas Inicio-Fin Rep. 0,994358 0,004428 224,57 0,000
    Horas Entrega-Fin Rep. 1,01085 0,00908 111,31 0,000
    S = 0,951435 R-Sq = 99,9% R-Sq(adj) = 99,9%
    Analysis of Variance
    Source DF SS MS F P
    Regression 3 101497 33832 37374,50 0,000
    Residual Error 137 124 1
    Total 140 101621
    Source DF Seq SS
    Horas Inicio Rep.-Recepción 1 33535
    Horas Inicio-Fin Rep. 1 56747
    Horas Entrega-Fin Rep. 1 11215
    And my question, wich one is correct?


    GB said it best

    All models are wrong, some are usefull.
    So both and neither are “correct”.
    Neither are the real world, but….
    One is “correct” in transformed space, the other in original unit space.


    New ATI

    Is  it  possible  to  clarify  the Box-Cox transformation in  just  simple  few  sentences.
    Many Thanks


    Carlos Palma

    But I mean, then do you think that the Box-Cox transformation is correct? the conclusions could it be that the coeficient of regression with more influence is 1.64? That it means that we have to work in this X?
    This is my question, can I trust in this analays? 


    Robert Butler

      Thanks for the model posting – it clarifies the issue and the question.  The answer is – both, neither, or maybe one and not the other.  The reason this is the answer is that you can’t use what you have provided to answer the question.
      You have run a regression with (I assume) the Y’s either transformed or not.  You have two regression equations which, with their summary statistics, are telling you that all of the terms you have included are significant.  There is the curious fact that the constant in the second model isn’t significant and I don’t quite understand why that should be so. Other than that there are some differences with respect to MSE and R2 but given the fact that the data has been transformed this isn’t too surprising. 
      In order to determine which model may be the better fit (that is, the “correct” model) you are going to have to run a regression analysis – this means you are going to have to run a residual analysis.  This analysis will give you information about adequacy of fit, influential points that may be adversely impacting your regression, identify the impact of unknown variables or terms not included in your model, etc.  You will need to get a good book on regression analysis and read the chapter(s) on this subject. 
      A good place to start would be Applied Regression Analysis 2nd Edition – Draper and Smith.


    Carlos Palma

    Box-Cox transformation y a Six Sigma tool, usedonly when you have non-normal data and then, you can transform this data in normal with Box-Cox, and with this way You can use other Six Sigma tools where it is necessary the normallity of the data.



    Hello, I would like to solve exactly about the residual data in regression analysis, what information can they provide me? what factors do I have to study?
    Thanks for your help

Viewing 10 posts - 1 through 10 (of 10 total)

The forum ‘General’ is closed to new topics and replies.