Box Cox transformation

Six Sigma – iSixSigma Forums General Forums General Box Cox transformation

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
  • #53714


    Have a good day,
    I have transformed data by Box Cox transformation but Lambda is not between 5 and -5. Can I used these transformed dat for calculating Cp, Cpk, Pp and Ppk index?. Thanks

    He transformado unos datos con Box Cox tranformation pero Lambda no se encuentra entre 5 y -5. ¿Puedo usar esos datos transformados para calcular los indicadores Cp, Cpk, Pp y Ppk?. Gracias.



    Does your transformed data pass a normality test? If no, don’t use it.

    There are lots of causes of non-normality, most have to do with a poorly controlled process. Most are not suitable for transformation. Understand the underlying reason for non-normality instead and fix the issue.



    Good Morning:

    Data can exist in many different “shapes”. That is why we have distributions. Typically, the first thing we test for is normality, and there are three tests within that distribution, i.s. Anderson-Darling, Ryan-Joiner or Komolgorov-Smirnov. If the tests fail, we than check for stratification (splits or layered data). If this is not the reason for non-normality, the next option is to test to find out what distribution the data is from (e.g. exponential – compound interest, acceleration, etc.) This failing, the logical sequence suggests trying the Box-Cox transformation, and lambda does go from -5 to 5, with each representing a different transformation – the input variables each raised to the power of the lambda, except zero (0), which leads to the natural (naperian) logarithm. Some software allow you to set the lambda and you could also simulate the effect of different lambda value in data sets and test each for normality.

    The final choice is the Johnson Transformation. Either way, you can in most cases determine process capability. Some of transformed data sets do not fit well, but we can cross that bridge later. Minitan is, by far, a great tool to assist you.




    I would agree with Stan and avoid the complexity of Mr Woodley’s suggestion for the moment. Simply plot the data and (assuming you have already established its stability) view the distribution’s shape relative to the process / data type.

    Maybe you are working with cycle times, pricing data, etc where it isn’t reasonable to expect normality. You can do everything you want to do (eg baseline your capacity) without transformation.

    Simpler the better, both from an analytics and communications perspective. Analytically, you are adding complexity where it isn’t needed and from a communications stand point, If your stakeholders eyes haven’t glazed over after you explained “sigma” they certainly will once you throw Box Cox or Johnson transformations at them.

    En realidad, haciendo el complicado mas basico es nuestro trabajo verdadero, pero eso es solo mis “dos centavos”. Bien suerte.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.