    I am working on a project to improve our process for generating ID cards. We have collected over the past three years the number of ID cards generated in error by date (n=40).

    The data collected was tested and found to be non normal. Our next step was to divide the data into subgroups (about 4) with about 10 observations in each group and take the mean of each subgroup. This data set was found to be non normal as well.

    Next I took the means of the subgroup and applied a box cox transformation and the data is now normal. I used this as my software does not have a weibull distribution.

    The lambda used was .5.

    After all of this data transformation can the results be trusted as accurate? Should some other method be used to transform the data?

    Thanks for any help on this topic.



    Hai gsx,

    Why did you do this? ie why do you want to make the data ‘Normal’ ?
    It doesn’t help you solve the problem.
    Counted data will not be normal distributed in general.

    And no: I would not trust any data manipulated in this way.





    I also would not trust anyone who taught you that all this data manipulation was the right thing to do.

    What kind of dufus would teach you this?

    Generate a pareto of causes of the incorrect ID’s and do something about it. Done.

    Who gives a rats axx about it being normal? The only thing you care about is that the causes are addressed and the incidence of incorrect is dramatically reduced.



    dear gsx,,

    i think it’ll be more convenience if u use median to estimate population parameter.

    median is uses in non- parametric data (we don’t know about data distribution).

    median is more robust with non-normal data.

    example :

    Class A VS class B :

    (1 – 100 scale)
    score of students in class A are : 10, 10, 10, 10, 100
    mean : 28

    Score in class B : 10, 20, 40, 20,40
    median : 20

    if we use mean, we can conclude that class A has score higher than class B, but if we use median, we get class B has higher score than class A

