iSixSigma

When Common Cause Data Is Unavailable

Six Sigma – iSixSigma Forums General Forums Methodology When Common Cause Data Is Unavailable

Viewing 13 posts - 1 through 13 (of 13 total)
  • Author
    Posts
  • #53447

    Andell
    Participant

    I am on record as preferring to use data from a statistically stable process whenever I want to estimate process capability, identify the underlying distribution, or conduct a hypothesis test. However, I have had occasions where a stable data set simply is not available for various reasons.

    I am interested in knowing what other practitioners do to handle such situations.

    0
    #190136

    Robert Butler
    Participant

    If you have the data in time order then one way to generate a guesstimate of stable variation is to take successive differences of the values( x1 – x2, x2 – x3, x3 – x4, etc.) ignoring the sign. Compute the average of these differences, multiply the resultant average by 3.27 discard all differences greater that this value and then repeat the process until no more differences are discarded. Take the average of the reduced set of differences and divide that by 1.128 to generate the estimated standard deviation.

    0
    #190137

    Andell
    Participant

    Let me see if I can re-state that accurately.

    We compute the moving ranges as in an I-MR chart, and discard those moving ranges that would lie outside the control limits? Then we use the remaining moving ranges to estimate the standard deviation using d2?

    That’s intriguing! I’m going to revisit some old data sets and see what that approach gives me.

    Obviously, this approach would not be such a good solution when time series data are highly auto correlated, but I envision it could work when the problem is spikes and/or shifts.

    Any thoughts on how to handle unstable process data for developing a distribution ID?

    Thanks for a great idea!

    0
    #190143

    Robert Butler
    Participant

    That’s correct – it removes spikes and shifts. If you want to play around with paper and scissors – generate a time series plot of the data – run the exercise – identify those differences that were discarded and then, take your scissors and chop up the graph to cut out all of the differences that were discarded and then line up the remaining pieces in time order but with a common mean. The parts and pieces of paper will shift relative to one another but what you will have is a great graphic illustrating what you did mathematically. Back when I used to teach this I would do this in front of the class (the graphs were pre-plotted on overhead slides) – it never failed to get the point across.

    I’ll have to think a bit more about autocorrelated daa but in meantime – what do you mean by “distribution ID”?

    0
    #190147

    Nassif Tadros BB
    Participant

    Simply if data are not available, then find out more or other variables that can lead to a better set of selected common data, in my experience, it was always important to identify the variables properly first.
    best regards

    0
    #190148

    Andell
    Participant

    Wouldn’t it be feasible to manipulate the data in Excel to achieve similar results as the paper and scissors option?

    As for distribution identification here is what I am thinking. I am sure some folks have seen me rant on this already, so if it’s old news I apologize in advance.

    There are a number of statistical distributions in the world. For continuous data we have normal, log-normal, Weibull, exponential, extreme value, and many many others. For discrete data we have binomial, poisson, hypergeometric, and others.

    As James King wrote in his book “Probability Charts for Decision Making,” there are natural phenomena that give rise to various distributions. Thus, if we use, for instance, probability plots in Minitab to examine data sets, we can speculate as to which distributions might describe the data from our process.

    In a perfect world that knowledge, in turn, could lead to insights into what natural phenomena might be influencing our process. Even without such inferences, we can use Minitab’s “Capability Non-Normal” utility to improve the accuracy with which we estimate process capability.

    However, and getting back to the original thread of this discussion: if the underlying data is predominantly special cause, it is hard to be confident that we can use distribution ID. The reason is that the special cause can “distort” the data to resemble a distribution that doesn’t truly represent the process.

    I hope this is useful.

    0
    #190360

    Lanning
    Member

    Looks like the constant 3.27 is constant D4 from making SPC charts or a sample size of 2, and 1.128 is constant D2. Is that correct?

    0
    #190361

    Robert Butler
    Participant

    Yes

    0
    #190362

    Lanning
    Member

    OK, there is something that I’m doing wrong here, or so it seems.
    I have a set of data that is time ordered, comprised of 62 deltas (x2-x1, …). The actual data is not very normal, with a significant tail to the lower values, but there is no limiting value even remotely cluse to my values (such as a value of below x is not possible).

    Initially my UCL for the range is 3.07 units. I applied the suggested methods and on the 12th iteration I finally got to no points flagged for rejection. At that time my data set only had 36 data points remaining, and the UCL Range had dropped to 0.7 units.

    My observations are: 1) The method takes a heavy toll on the number of data points. Is that to be expected? Is the heavy toll influenced by the shape of the original distribution? 2) Perhaps there is factor to consider that I’m not aware of 3) Perhaps my approach of taking deltas between two successive original data points is in error ( i.e., if one has orginal data of 2,3,2,4,5, and therefore the deltas are 1,1,2,1, and the 2 is flagged for removal, I retained the deltas as 1,1,1).

    I respect the inputs you have posted in the past, so my assumption here is that I’m doing something wrong. Suggestions??

    0
    #190364

    Robert Butler
    Participant

    The problem is that you are not following the methods outlined in my first post to this thread.

    You said “Perhaps my approach of taking deltas between two successive original data points is in error ( i.e., if one has orginal data of 2,3,2,4,5, and therefore the deltas are 1,1,2,1, and the 2 is flagged for removal, I retained the deltas as 1,1,1).

    If we follow the methods of the first post then the deltas are as you noted 1,1,2,1.

    The average of these deltas is 5/4 = 1.25. 1.25 x 3.27 = 4.09. None of the deltas are greater than 4.09 so the estimate for the ordinary variation is 1.25/1.128 = 1.11.

    0
    #190366

    Lanning
    Member

    The post on the deltas was intended to confirm the correct deltas for the 2nd iteration, i.e., ( i.e., if one has orginal data of 2,3,2,4,5, and therefore the deltas are 1,1,2,1, and the 2 is flagged for removal, I retained the deltas as 1,1,1 as opposed to recomputing the deltas by using data as 2,3,2,5 and getting revised deltas of 1,1,3).

    I believe I did use the approach you suggested with my data (See the atatched file) … so if someone can find the error I’m making before I do, let me know what it is …
    call me at 402-873-8404 x4009 if you wish talk vs continued use of this thread.

    0
    #190367

    Robert Butler
    Participant

    Ok I see your point. The 2 was removed because it fell outside the limits based on the entire group of deltas. You then recomputed a new average of deltas with the 2 excluded and repeated the exercise. At the end of the effort you had gone from 62 to 36 deltas and you used the average of these 36 to estimate ordinary variation.

    If the above is an accurate description of what you did then everything is fine. As for the issue of dropping 26 of the deltas – I don’t see that there is one. All it suggests is that you have a large number of changes in the process that are due to special cause and are not random. Under such circumstances a large number of drops would be expected. If the process had less special cause then you would most likely have fewer deletes.

    I can’t see anything attached to your post but what I would recommend you do is what I suggested in the initial post to John – plot the data, number the differences, cross out all of the differences that are deleted and then cut and paste the pieces of paper to make everything line up. This visual display should help you see what was deleted and why.

    0
    #190374

    Lanning
    Member

    Thanks for the kind reply. I had hoped for far fewer instances of Special Cause variation in the data, and that I was making a mistake somewhere. I now need to “roll up my sleeves” … Thanks again.

    For the missing file, I’ve emailed to find out what happened to it. However, I think I my questions are resolved.

    0
Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic.