iSixSigma

Non-normal Data Transformation

Six Sigma – iSixSigma Forums Old Forums General Non-normal Data Transformation

Viewing 13 posts - 1 through 13 (of 13 total)
  • Author
    Posts
  • #51052

    Mike A Jr
    Participant

    Hi all,
    I am updating a capabilty study on a measurement and the p value on a normality test is < .005.  The measurement is crimp height on a terminal with spec of .250 of +/- .004".  The data should be normal, but the sample size and lack of discrimination (only had four different measurements over the 30 samples - .2495, .2500 .2505 and .2510) causes it to fail the normality test.  I'm sure if I had the ability to measure the parts at a .00001 precision it would test as normal data.
    How do I transform this data so the capability study can be performed?
    Thanks,
    -Mike A.

    0
    #176351

    Darth
    Participant

    You don’t!!!!!  It is meaningless since you basically have discrete data.  Forget trying to do this as continuous data and report out any capability in a discrete format.  At this point all you have is pass/fail data.

    0
    #176355

    Shereen Mosallam
    Member

    your problem is clearly MSAYou need to do MSA study as data points you mentioned are out of Specsso before you conclude you have a problem you need to validate your MSA with a better precision equipmentas your process range of varaition is much wider than Specs. Shereen Mosallam, MBB

    0
    #176365

    Mike A Jr
    Participant

    Thanks for the input.
    An MSA was already completed.  The GRR results were acceptable.  13.62% (%tol) with NDC = 24.
    I believe it is a sample size issue more than anything.  I bet with 100 samples it would pass the normality test.

    0
    #176367

    Darth
    Participant

    How much ya willing to bet?  Stan will probably want some of this action.  If you only have 4 categories of measurements, it is discrete no matter how many you collect.  Showing a high p value will be a trick even if you are able to get it.  Either get some resolution or forget about trying to use continuous tools.  You are a tool in search of an application which is an incorrect approach.

    0
    #176368

    annon
    Participant

    “you are a tool in search of an application”….I get it….I will so be using that one in the future….very nice!

    0
    #176370

    Remi
    Participant

    Hmm Darth,
    he is maybe thinking about “taking a large dataset and then subgrouping/stratifying it in groups such that the Means of the groups will approximate a normal distribution”. (just an assumption of course)
    This trick will work but it is just that: a trick to present data that satisfies a normal distribution.
    I agree that if there are so few different possible values the data should be treated as noncontinuous.
    But the range of the data is 0.2510-0.2495=0.0010 (+/- 0.0010 due to rounding of) while the customer(?) spec is +/- 0.004. So the data-Range fits >4* inside the specs. With normal distributed data this would mean a Cp of 4. So where is the problem that should be solved?
    Mike A jr: I recommend that you plot the data you have in a time series chart and include Target and Specs. This way you can see how your data behaves 1: in time and 2: compared to the customer wish. And there you have your capability study.
    Good luck

    0
    #176372

    Robert Butler
    Participant

      As Darth has noted, if you are offering bets you will get a lot of takers and you will wind up paying out a lot of money for the reasons already cited. 
      I think you are laboring under the mistaken belief that if you collect enough individual samples you will eventually get a normal distribution.  This is a common misinterpretation of the Central Limit Theorem – which does not apply to distributions of individual measurements but to distributions of averages of measurements.
      Now, if you really want to cheat and fool the tests for normality you would want to go in the opposite direction – smaller numbers of cherry picked samples.  For example, if you run 20 tests and you get a spread of the 4 numbers you have you could cherry pick 8 of them so that your choices looked like the following:
    .2495
    .2500 .2500 .2500 .2500
    .2505 .2505
    .2510
    This distribution will pass all of the normality tests.
    Shapiro-Wilk                  P = .32
    Kolmogorov-Smirnov D  P = .054
    Cramer Von-Mises         P =.091
    Anderson-Darling           P =.138
      Obviously the above is creative science (cheating) at its worst.  However, if you want to improve your understanding of the issues then rather than betting I’d recommend you take the 4 values you have, set up different choices of the 4 numbers as above and play with these data sets in whatever package you have and see what happens.
     

    0
    #176374

    Mike A Jr
    Participant

    Thanks for the replies… I think.
    I just got a reply from one of our master blackbelts and I’ll go with his recommendations.
    If the approximation for standard deviation seems accurate (as indicated by the fit of the curve to the histogram), I would probably use the data as is to establish capability. If there is a big difference between the two, the overall (Pp, Ppk) values are probably better as they represent the “average” variation from the mean (less likely to be influenced by discrimination issues) and the within (Cp, Cpk) uses the average moving range which is more influence by discrimination issues.
     

    0
    #176376

    Mike A Jr
    Participant

    Thanks for the info.
    As for betting… No thanks, I’m not enough of a stats geek to bet on stuff like that.  Football, however, is a different story.
     

    0
    #176381

    KennySky
    Participant

    A discrete process capability won’t work for you since you would have 0 defects. The nature of your data is continuous even though you don’t have enough precision to measure it adequately.
     If you don’t mind my asking, what is your kurtosis with the values you do have?

    0
    #176384

    Shereen Mosallam
    Member

    I am not very sure how you have this high number of distinct categories and Low resolutionequation of NDC » 1.41* (spart / s R&R)
    so in your MSA study, you might have taken wide range of parts which masked R&R problem and falsely gave high NDC
    Sample size is an issue i agreeyour data can’t be transformed as it is too discrete as wellso you need more data and if all your measurements remain too  discrete then you have two solutions:
    1. get a better measuring tool (which i believe you should do as you have data points out of tolerance)
    2. deal with it as go no go which is not so good in your case as you have out of tolerance data pointsgood luck
    Shereen A. [email protected]://www.symbios-consulting.com
     
     
     

    0
    #176389

    Severino
    Participant

    Just out of curiosity, what instrument are you using to measure the crimp height?

    0
Viewing 13 posts - 1 through 13 (of 13 total)

The forum ‘General’ is closed to new topics and replies.