Is this process data statistically Significant?

Six Sigma – iSixSigma Forums Old Forums General Is this process data statistically Significant?

Viewing 9 posts - 1 through 9 (of 9 total)
  • Author
  • #34107

    Paul K

    Dear All,
    I am working a project and believe I have made a breakthrough:
    The project relates to a setter that supports our product through the process. The layout of the setters is 16 columns across, a setter remains in it’s original column throughout all it’s cycles of the process. However the setter can be in any position in it’s column which has 20 rows. Here is the data I have collected:

    Column / # of cracked setters in column
    1 / 33
    2 / 28
    3 / 40
    4 / 29
    5 / 27
    6 / 39
    7 / 38
    8 / 54
    9 / 33
    10 / 21
    11 / 38
    12 / 21
    13 / 39
    14 / 35
    15 / 31
    16 / 24
    In total there are ~45000 setters in the system equally spread across all 16 columns.
    I have found a difference in the support stands for the setters in columns 2,5,7,10,12 and 15 compared to the other columns.
    An I on to something?



    Dear Paul,
    I am not sure that I completely understand your process description, the data you have and the question you are asking to these data.

    There are 45000 setters in the process, as you mention, distributed evenly over the columns. That means each column has about 2812 setters? Do you have a more precise count of the total number of setters per column?
    And of all the setters in a column, 20 are active in the process (the 20 rows you mention?). Are these 20 different processes? Or are these 20 positions with the same process?
    I get from your description that you are testing for a significant difference in the number of defect setters between the columns.

    What do you mean with ‘I have found a difference in the support stands for the setters in columns 2,5,7,10,12 and 15 compared to the other columns.’
    What test did you use to determine the significant difference you mention? Did you do a one-by-one comparison of the columns, or a Chi-square test?
    I think I can help you better when you give some more details. I am familiar with the sort of statistical test that you are doing, but i know that without sufficient understanding it si easy to give you a wrong answer.


    Paul K

    Dear Arend,
    Many thanks for your reply, I will try to clarify the situation.
    The setters are loaded with product and stacked onto a kiln car for transportation through a firing process. The kiln car is 16 setters wide, 20 setters high and 4 setters deep (1280 setters per kiln car). There are 35 kiln cars in the system. When the setters are stacked and unstacked between kiln cycles they can be reloaded in any of the 20 rows high or 4 setters deep. The only constant is their column which cannot change. Also to further randomise the process the setters do not return to the same kiln car for each cycle of the kiln.
    The setters crack and fail at a significant percentage rate of the population per year. My project is to find the root cause of the cracking.
    The data in my first post is the number of cracked setters in each column. In the columns I have identified (2,5,7,10,12,15) the setters are in contact with the kiln car in a different way to the other columns.
    My question is does the data from column 10 (possibly a BOB column according to Shainins terminology) differ enough from column 8 (a WOW column)?
    The data was collected from about 50 different kiln car cross-sections (looking at the 16 columns and 20 rows).
    Thanks in advance,



    Dear Paul,It took a while before I really figured it out, using all the information you gave. I’ll discuss in detail what I think the data are really saying.You have made an observation that there is a difference in the contact between setters and kiln car, so you distinguish two groups in the columns:
    group A: kiln car columns 1, 3, 4, 6, 8, 9, 11, 13, 16
    group B: kiln car columns 2, 5, 7, 10, 12, 15
    For each colums you observed 1000 setters and counted the defects, and you want to know if these two groups are significantly different.For this kind of testing there are a few very usefull techniques, being difference testing for proportions and Chi-squared testing which is difference testing for many proportions. I have applied these two techniques in the analysis. For your work I think it is very usefull to study these two techniques, or at least know how to do these analyses in Minitab or another program.I am not in favour of the BoB versus WoW approach in the way that you propose to do it, for reasons that I will explain first. Testing the difference between these two groups is best done by comparing the total defect rates.
    1) Compared to picking one colum from each group, it increases the number of observations and thus reduces the confidence interval width in your test. Said differently: the test becomes more sensitive for detecting differences.
    2) By comparing the best column of what you assume is the good group (BoB), and the worse column of what you assume is the bad group (WoW), you are biasing the conclusion very much. In this way of working, you run a very big risk of coming to the conclusion that there is a difference while in fact there is none.
    If you want to use the BOB versus WoW approach, I would use statistical testing to first confirm the idea that there really is a best and worst group. If this is confirmed, BoB versus WoW can be useful for collecting singals (candidate X’s) which you would then investigate further.Some analysis of the data you gave: In the ‘A’ group there are 364 defects in 10,000 setters while in the ‘B’ group there are 166 defects in 6000 setters. Doing a test on two proportions, you’ll find that the difference is significant (p-value 0.046). However, more inspection of the data shows that this difference comes almost entirely from column 8. If the analysis is done again without the data of this column, the significance entirely disappears (the p-value increases to 0.223) while within its own group it is likely to be an outlier (p-value 0.064 in a Chi-square test). Also, if the data of all columns except column 8 are put together, a Chi-square test shows that there is no significant difference between any of these columns (p-value 0.145).In conclusion I would say that column 8 is indeed a Worst-of-Worst column if you want to use the expression, but there is no ‘Best-of-Best’ column in your data.In your improvement project you will get some improvements by improving column 8, but in the bigger picture that will help you only so much. Where it could help is that this might identify some critical issues for cracked setters that might be usefull for the overall improvement that seems needed. And that is what you really wanted, isn’t it?I hope that this is usefull for you, and that you are sucessful in your further search for the important factors!Arend



    Before I give my shot on this project, I would like to make sure I understand the system.
    Your kiln car has a capacity of 1280 setters, 20 row setters , 4 high(deep), and 16 column setters . you can insert a 16 column holder to a 80 slots ( 20 rows times 4 deep setter ). Your 16 column setters are fix because they are laid on a common holder.
    this is how I understand your kiln car.
    let U = holder with capacity of 16 setters
    A  U U U U U U U U U U U U U U U U U U U U  = 20 rows x 16
    B  U U U U U U U U U U U U U U U U U U U U  = 20 rows x 16
    C  U U U U U U U U U U U U U U U U U U U U  =  20 rows x 16
    D  U U U U U U U U U U U U U U U U U U U U  = 20 rows x 16
         1 2  3  4  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
    If this is the case, I would suggest that you collect data by ” U .” number it from A1 to A20, B1 to B20, C1 to C20, and D1 to D20, and kiln car 1 to 50.
    If you already collected a data for 50 kiln car, we can analyze through two proportion and chi square the “row to row” ( 1 to 20 ) , “layer to layer” ( A,B,C, and D), “U to U” ( A1 to D20), “kiln car to kiln car”( 1 to 35 ).



    Dear AberF,Actually Paul also discussed that between different process runs, the setters are taken out of the cart and put back in the same column, but not in the same row or depth as before. So only column is a constant, and the setters are changing positions every production run within their own column. Obvioulsy I would have taken it into account if it would have been differently. As you can see in my message, the Chi-square test has already been done per column and shows that only column 8 is different from the rest.kind regards,


    Kevin S. Van Horn

    I have to disagree with the conclusions reached by others who have commented
    on this data set, even though their analyses appear technically correct — as
    far as they go. The problem is, they’re answering the wrong question.

    Assuming I understand the problem correctly, the practical question of
    interest here is whether or not the type of support stand makes a significant
    difference in the proportion of cracked setters. By “significant” I mean “of
    substantial magnitude”, as opposed to the classical statistical notion of
    rejecting a null hypothesis. This suggests a regression problem. The
    predictors are the column number (1 to 16) and the support stand type (1 for
    columns 2, 5, 7, 10, 12, and 15, 0 for all the others.)

    To cut to the chase, my analysis shows that, with high confidence (> 97.5%),
    the use of support stand type 1 does in fact reduce the proportion of cracked
    setters. The magnitude of the effect is, however, quite uncertain — a 95%
    confidence interval ranges from a 6.1% to a 38.9% relative reduction.
    the use of support stand type 1 does in fact reduce the proportion of cracked
    setters. The magnitude of the effect is, however, quite uncertain — a 95%
    confidence interval ranges from a 6.1% to a 38.9% relative reduction.
    the use of support stand type 1 does in fact reduce the proportion of cracked
    setters. The magnitude of the effect is, however, quite uncertain — a 95%
    confidence interval ranges from a 6.1% to a 38.9% relative reduction.

    I used the following logistic regression model:

    y[i] ~ Binomial(theta[i], n)
    logit(theta[i]) = alpha1[i] + alpha2 * x[i]


    y[i] == number of cracked setters in column i
    n == 2800
    theta[i] == (long-term) proportion of cracked setters for column i
    logit(p) == log(p / (1-p))
    x[i] == support stand type (0 or 1)
    alpha1[i] == regression coefficient for column i
    alpha2 == regression coefficient for support stand type

    If I were to use classical statistics, I would be stumped at this point, as
    the above model has an identifiability problem with the given data. That is,
    for every column i we have data for either x[i] = 0 or x[i] = 1, but not both.
    This means, for example, that if one subtracts a number delta from alpha2 and
    adds the same delta to all of the alpha1[i] for which x[i] = 1, the sampling
    probabilities theta[i] do not change at all.

    Fortunately, my specialty is Bayesian statistics, and use of a Bayesian
    hierarchical model easily overcomes this identifiability problem. My
    impression is that Bayesian methods are not commonly used among Six Sigma
    practitioners, so I’ll digress for a moment to explain how Bayesian statistics
    differs from classical statistics.

    As used in Bayesian statistics, probabilities measure the degree of
    plausibility / credibility / certainty of propositions; they are not
    long-run relative frequencies. Probability distributions represent states of
    information. The theoretical justification for this view is Cox’s Theorem,
    which addresses the question of how one may construct a logic for reasoning
    about degrees of certainty. Cox postulated certain qualitative properties one
    might reasonably desire of such a logic, and showed that any logical system
    having those properties is equivalent to one that manipulates degrees of
    uncertainty according to the laws of probability theory. (See my tutorial
    review of Cox’s theorem at for
    for details.)
    Here are the steps in a Bayesian solution to Paul’s problem:
    1. Define a joint probability distribution over all of the variables of
    interest. This includes not just “data” variables, but also model parameters.
    It includes the sampling probabilities of the observables given the
    unobservable parameters, and also encodes one’s information about the
    parameters problem before seeing the experimental data; thus it is
    called the prior distribution.

    In particular, consider our logistic regression model:

    y[i] ~ Binomial(theta[i])
    logit(theta[i]) = alpha1[i] + alpha2 * x[i]

    To complete the model, I specified a distribution for alpha2 and the
    parameters alpha[i], representing the vague prior information I had about
    these parameters:

    alpha2 ~ Normal(mean = 0, variance = 10)
    alpha1[i] ~ Normal(mean = mu, variance = 1/tau)

    A variance of 10 (standard deviation of 3.2) for alpha2 may not seem like a
    lot, but it corresponds to a very wide variation in the effect of using a type
    1 instead of type 0 support stand. For some specific figures, suppose that
    alpha1[i] is -4.6, so that theta[i] is 1% when a support stand of type 0 is
    used. The variance of 10 for alpha2 gives a one-sigma range for theta[i],
    when a type-1 support stand is used, going from 0.043% to 19% — a ratio of
    450 from the upper to lower end of the range, that is, considerable
    uncertainty as to the magnitude of the support stand effect. This is why I
    say that the distribution I have specified for alpha2 represents only vague

    Now consider the distribution for alpha1[i]. This is the key to resolving the
    identifiability problem between alpha2 and the set of alpha1[i] for which x[i]
    = 1. Any information that restricts the plausible values of the alpha1[i]
    parameters — making some range of values substantially more probable than
    others — thereby also better defines alpha2. In particular, the larger
    tau is — that is, the closer together the values alpha1[i] cluster — the
    better alpha2 and the alpha1[i] are resolved.

    Not knowing where nor how closely the parameters alpha1[i] cluster, I did not
    choose particular values for mu and tau, but instead defined vague
    distributions over these hyperparameters also:

    mu ~ Normal(mean = -4.6, variance = 10)
    tau ~ Gamma(alpha = 0.1631691, beta = 1.198267e-4)

    logit^(-1)(-4.6) = 1.0%, logit^(-1)(-4.6 – sqrt(10)) = 0.043%, and
    logit^(-1)(-4.6 + sqrt(10)) = 19%, hence the distribution for mu is suitably
    vague. The distribution for mu is centered at logit(0.01) under the assumption
    that cracking is an unusual condition.

    Prior distributions for precision (inverse variance) parameters are often
    specified as Gamma distributions. I chose alpha and beta to again represent
    only vague information as to how closely the parameters alpha1[i] cluster, as

    – sigma.low = logit(0.0102) – logit(0.01)
    – sigma.high = logit(0.5) – logit(0.01)
    – tau.low = 1/sigma.high^2
    – tau.high = 1/sigma.low^2
    – alpha and beta were chosen so that
    P(tau = tau.high | alpha, beta) = 0.15

    I used the free WinBUGS program (Bayesian inference Using Gibbs Sampling) to
    do my analysis; here is how I defined the above model for that program:

    model {


    Paul K

    Dear Kevin,
    Many thanks for your interesting (and detailed) answer to my question.
    The level of statistics you have used is beyond my knowledge and I will attempt to digest some of this information.
    I am planning to return to the plant in a couple of weeks to “test the theory”.
    Best Regards,


    Kevin S. Van Horn

    If you have any further questions, I’ll be happy to answer them as best I can. You can reach me at k v a n h o r n A T k s v a n h o r n D O T c o m.

Viewing 9 posts - 1 through 9 (of 9 total)

The forum ‘General’ is closed to new topics and replies.