T-Test for Attribute Data

Six Sigma – iSixSigma Forums Old Forums General T-Test for Attribute Data

Viewing 16 posts - 1 through 16 (of 16 total)
  • Author
  • #29394


    I have inner diameter measured using gage pins for some tubings before and after the annealing process. I want to investigate if annealing process, statistically, changes the inner diameter of the tubing. May I know how to conduct a t-test on attribute data? 



    You will have to measure your sample in a continous format as you will not have a fine enough measure to statically prove there is a difference. When we carry out T-Test’s you are really looking close at your process to see if there is a difference, using pins you would see that if your pins do not go into the internal diameter after annealing or a larger pin can go in as where it could not before you are proving a difference and do not need statistics to prove it as it is staring you in the face.If your pin goes in before and after you should have come to the conclusion your measuring system is not fine enough and opt for a continuous measuring sytem. For the likes of internal diameters there is a host of guages availible to use from bowers micrometers to air guaging. It is important at that stage to complete a guage R and R. Once your system has proved capable I reccommend a Paired T Test “The Test of Differences” You would have to use the same parts measuring thoughout this process if you do not have the accountability then use a two sample T Test. Good Luck This will send you down the right road of statistically proving a difference!
    All The Best



    Since you have attribute data think of performing a chi-square test between number of rejects before and after heat treat.  Caution: you need at least 5 rejects in each sample for the test to be valid.



    I believe that if you are using a measurement that it is really variable data and a T-test would be acceptable.


    RR Kunes

    Why not conduct a Chi Square Analysis. Chi Square utilizes attribute data. In either case all you will get is a statistical verification of whether they have changed or not.
    Since you have attribute data use the correct tool and you’re there!!
    Hope that helps.


    RR Kunes

    One correction from my earlier post. To be correct you have discrete data not attribute data attribute would be is it good or is it bad. Discrete is how many are good versus how many are bad.
    Sorry to split hairs but in this business you need to be accurate.


    Robert Butler

      If I understand your problem correctly you have a series of pins of given diameters and you test for pin fit to the inner diameter before and after annealing.  If this is the case you have paired samples with a measurement system that can be viewed as a discrete scale with a limited range of values.  In this case the way to test the mean difference between the two groups is to analyze the data using a t test with an inclusion of a correction for continuity.
      If you have say 15 tubes measured before and after annealing you would take the differences between the measurements and sum them.  The null hypothesis between non-annealed and annealed is that the signs on the differences are equally likely to be + or -. 
      Another way to check for significant differences with this kind of data would be Fisher’s randomization test. The methods for setting up your data and analyzing it using either of the above techniques can be found on pp.146 of the Seventh Edition of Statistical Methods by Snedecor and Cochran



    I think I need to split some more hairs here…
    Actually, there are two types of Attribute (discrete) data – (1) Binomial (a.k.a. yes/no) and (2) Poisson (a.k.a. count).  To make it simple, think of binomial data as a situation where you know the number of “yes”s and the number of “no”s.  With poisson data, on the other hand, you know the number of “yes”s but not the number of “no”s.
    Hope this helps.


    RR Kunes

    Sorry but you are incorrect!
    Their is only one type of attribute data and it is good or bad yes or no with NO quantification therefore useless.
    Their are two types of Variables data.



    I agree variables data is much more useful than attribute data, but I’ve seen applications of attribute data in transactional areas where it is quite useful.



    Mr. Kunes:
    My understanding is that there are indeed two types of attribute data – count (Poisson) and yes/no (binomial).  It is imperitive that you distinguish which type of attribute data you have so as to choose the right control chart.  For instance, a p chart is the correct chart if your data is yes/no data whereas a c chart or u chart is appropriate for count data.  Smithsigma’s assessment of determining “whether or not you know the number of no’s if you know the number of yes'” is correct.
    I have never heard of the “two” types of variables data.  Could you please enlighten me on this?



    I am with Bob on this one.  Attribute data can be either binomial or ordinal.  I have never heard of two different types of variable data.


    Erik L

    I think that the confusion here lies in the distinction between Continuous and Discrete.  The label revolves around how much information can be generated.  Discrete variables that can take on a large number of variables are often considered as quasi-continuous reponses hence the idea brought forward of two types of continuous data.
    When we’re dealing with variables the classification of data is nominal, ordinal, and inerval.  When the levels or ‘categorical’ labels we place on our variables do not have a natural order they are considered nominal variables.  When the ‘categorical’ labels we put on variables has a spectrum of meaning, then we can classify them as ordinal.  Interval variables have numerical distances between two different variables.



    Robert Butler:
    I am a moron on t-test in fact. Can you please tell me in more detail what do you mean by ” If you have say 15 tubes measured before and after annealing you would take the differences between the measurements and sum them.  The null hypothesis between non-annealed and annealed is that the signs on the differences are equally likely to be + or -.  ” ? 
    I have the data ready now, but I don’t know how to proceed. Can you help? Thanks!



    Hi woey
    Let’s keep this whole thing simple (my head is still spinning after reading all the messages in this thread.)
    I assume you have 15 tubes diameter for before and after anealing. Now my questions are
    a. Do you have by any means data for each tube (before and after) so that you know what was each tubes diameter before annealing and after annealing?
    b. Do you have a measuring instrument which can give you atleast some reading (eg: 15.4, 10.7, etc.)? Does your data looks something like this?
    Tube              Before annealing             After Annealing
    1                         15.2                               14.1
    2                         10.5                                 9.8
    and so on…
    if the answer to both the questions is yes then you can use a paired T test (please dont get confused between attribute and variable data..).   Now lets see how can you do a paired T test (I assume you know how to do a t test (if not them please post a reply I will explain it to you)?
    Now, Arrange the readings in the manner shown above, calculate the difference between the before and after readings for each tube (you need to have that information for each tube to do a paired t test..!!), Now its simple, just do a t test for this difference with test mean equal to zero.
    Conceptually you want to make sure that the reduction in tube diameter is not equal to zero. so you need to do a t test on the reduction (ie nothing but difference in before annealing reading and after annealing reading for each tube).
    Hope I havent added further to your woes..well my head is still spinning…


    Robert Butler

    We have paired data from two treatments A (before) and B (after).  The null hypothesis is that nothing happened.  We measure the tube diameter using a set of fixed pins and determine measurements in one of a couple ways.
    1. specific pin fits tube #1 before but does or does not after treatment
    2. specific pin fits tube #1 before, doesn’t fit after but either a smaller or a larger pin will now fit
    In the first instance we have three possibilities and the differences between before and after will be either 0 – same pin fit, 1 -larger pin fit, or -1 smaller pin fit.
    In the second case we will measure the diameters by checking first to see if the same pin will fit before and after giving a 0 difference.  If the same pin doesn’t fit we will find the closest larger or smaller pin that does fit and take the difference between the original diameter measurement and the new, closest fit. 
      If there has been no effect the null hypothesis that we are checking is that the median of the differences is zero.
    Take the differences and assign a value of Z as follows:
         Z = 1 if the difference is greater than zero
        Z = 0 if the difference is less than zero.
      The original distribution is continuous and the distribution of the differences will also be continuous. Since the differences are independent the Z values are also independent so we have a binomial situation of making n independent trials in which the probability of  Z is 1/2 on each trial.
      The probability of a tie (i.e. a difference of  0) is assumed zero.  Since this won’t occur in practice those differences which are 0 are excluded from the analysis and the number of samples for the test are reduced by the number of zero differences in the data set. Thus no values of Z for ties and the sample size is reduced by the number of ties as well.
    For small sample sizes the probability that the median of the differences is 0 is given by
    1/(2)**n * Sum (n|x)  where the sum is over x is from 0 to n-m and
    n = number of non zero differences and m = number of positive differences and n|x is the ratio of the factorials.
    For larger sets we can use the normal approximation to the binomial which is
    U(1-p) = (m-.5 – n*.5)/sqrt(n*.5*(1-.5)
    using the normal lookup table you can find the value for 1-p and then the probability of zero median is computed directly.
    We have 16 differences between paired samples whose differnces are as follows:
    .3, -1.7, 6.3, 1.6, 3.7,-1.8, 2.8, .6, 5.8, 4.5, -1.4, 1.9, 1.7, 2.4, 2.3, 6.8
    therefore Z=1 for 13 of these differences and Z=0 for three of them
    First way = (1/2)**16 Sum from 0 to 16-13 of (16|x)
                     = (1/65,536)*(16!/0!16! +16!/1!15! +16!/2!14! +16!/3!13!) =.01064
    U(1-p) = (13-.5 -16*.5)/(sqrt(16*.5*(1-.5)) = 2.25 
    thus 1-p = .9878 and p = .0122
    Depending on the book this is called  sign test, discrete scales,  or randomization test.  Doing a bean count of my reference books sign test seems to be the most common label.

Viewing 16 posts - 1 through 16 (of 16 total)

The forum ‘General’ is closed to new topics and replies.