iSixSigma

Confidence Intervals for Cycle Time Distribution

Six Sigma – iSixSigma Forums Old Forums General Confidence Intervals for Cycle Time Distribution

Viewing 40 posts - 1 through 40 (of 40 total)
  • Author
    Posts
  • #51107

    P S
    Participant

    Hello All,I am working with cycle time data and when I chart out the distribution, it is skewed right. I have a couple of questions:
    1. What distribution do most people use in modeling the cycle time distribution.
    2. Since the cycle time distribution is skewed right (Not normally distributed), how can I calculate the confidence intervals for this distribution without transforming the data. The reason for not transforming the data is that I do not want to lose information in the transformation.Thanks much,
    PS

    0
    #176652

    Ken Feldman
    Participant

    It should not have been a surprise that your time based data was skewed. Because of the natural boundary of zero, it has a tendency to do that. If you have access to Minitab, you can use the ID Distribution function to see what distribution best fits. It might be exponential or log normal or one of the other skewed distributions. There are standard formulas that you can Google and find to compute the CI for any of the distributions.

    0
    #176654

    Jedi Master
    Participant

    Darth,
    I’ve read many of your posts…are you a MBB?

    0
    #176657

    Nik
    Participant

    I agree with Darth. Cycle time data is naturally non-normal due to the fact you can’t have cycle times less than zero. MINITAB has the ability to check your data against 20+ distributions (Stat>Quality Tools>Individual Distribution Identification (might be slightly off, I’m doing it from memory)). If it fits one of those distributions, you can likely find a confidence interval type calc or some equivalent. MINITAB draws some sort of confidence or prediction bands to calculate the Anderson Darling fit values.
    Why do you need a confidence interval? Are you trying to establish a baseline?

    0
    #176660

    Ken Feldman
    Participant

    JM, why do you ask?

    0
    #176662

    Adam L Bowden
    Participant

    You might want to identify the “long cycle time” items and determine
    if these are special cause – then remove them from your data set –
    then the date may be normal.Adam

    0
    #176663

    P S
    Participant

    Thanks for the reply Darth.
    I have tried to do exactly what you have recommended. A followup question to the idea you just suggested is, what if the distribution does not fit perfectly with any of the distributions in Minitab ?? Is there an alternative course of action??
     

    0
    #176664

    P S
    Participant

    Nik,
    The reason I am looking to estabilsh a confidence interval is that I want to compare “pre-improvement” and  “post improvement” sets of data and be able to say that the post-improvement numbers are a statistical improvement to the pre-improvement numbers.
     FYI: I would use the “Mean Cycle Time” as my primary metric.
    Thanks,
    PS 

    0
    #176665

    Robert Butler
    Participant

      Your biggest concern with something like cycle time is, as Darth noted, the natural barrier at time 0.  Thus if you build confidence intervals using the raw data you run a risk of having a negative lower confidence bound.  The easiest way to deal with this is to take the log of the data , run the analysis on the log, identify the confidence interval and then back transform these intervals to the actual units.  Your confidence bounds will be asymmetrical but they will be physically meaningful. There are many processes where asymmetrical C.I.’s are a fact of life.
      I wouldn’t stop with just a simple check for a mean shift.  I would recommend you also look for a change in the variation about the mean.  You could find your improvement did nothing to the mean but that it significantly reduced the variance. A finding like that could easily prove to be more valuable than a discovery of a simple shift in the mean.

    0
    #176666

    Nik
    Participant

    I’d recommend keeping it a bit simpler. Use “cycle time”as the metric (instead of mean cycle time), then you can look for shifts in means or medians using hypothesis testing. See https://www.isixsigma.com/forum/showmessage.asp?messageID=8723
    With only two subgroups (before and after) this will likely lead you to a “Mann-Whitney” test for comparing population “medians” (which are a better estimate of central tendency for non-normal data). The other option, especially when collecting large sample sizes is difficult is to use control charts. The 8 different tests can help flag changes in process performance faster than hypothesis testing alone.
    By the way, the medians tests incorporates median confidence intervals, which I believe answers your original question.

    0
    #176667

    Jedi Master
    Participant

    you always give great answers in regards to stats. just wondering if you were an MBB our a statitician. Either way our firm in So Cal is looking for a good MBB. I hold BB certification from well respected firm.

    0
    #176668

    Craig
    Participant

    Seems like the use of median is the best option if you truly have a skewed distribution. 
    Are your time study values showing a boundary at zero or are you studying events that are substantially above zero? Don’t assume just because you are dealing with time, that the natural boundary will impact you!

    0
    #176669

    Mikel
    Member

    1) Exponential2) Confidence interval formulas for exponential are available.I saw the further post about using mean cycle time – I agree with that,
    but also use COV, an improved process should have a COV less than
    (preferred) or equal to the starting process. Go look at the
    assumptions of exponential and queueing to understand better.

    0
    #176670

    Mikel
    Member

    Darth is a Deming master and is slumming working in Six Sigma. I am
    sure he would give better guidance than 99.99% of the MBB’s out
    there.

    0
    #176671

    mand
    Member

    The classic example illustrating the nonsense of six sigma and its use of enumative tools for analytical problems.  Time based measurements are almost always skewed. Forget all the rubbish about normal distributions.  Shewhart showed that you don’t need them !!!!
    Has no one ever heard of Shewhart Charts ????

    0
    #176672

    Ken Feldman
    Participant

    Thanks Mom, but you shouldn’t be posting under Stan’s name. For JM’s benefit, I am a MBB but an engineer by training, not a statistician. If you wish to further discuss your opportunity, catch me offline at [email protected]

    0
    #176673

    Ken Feldman
    Participant

    Nik, to summarize;1. Explore your data and be sure that the skewness is real and not due to some anomalies.
    2. Transforming is the very last resort but in this case, is not necessary.
    3. If you are seeking to test two sets of data for change, then a nonparametric test is adequate but keep in mind that it is not assumption free, just distribution free.
    4. Not sure how a control chart is relevant although the assumption of normality is often dealt with by the robustness of the control chart.If you wish to share your data offline, I will be happy to take a look and comment. Catch me at [email protected]

    0
    #176675

    Severino
    Participant

    I’m fairly certain that you cannot just simply perform the inverse transformation on your confidence intervals (mentioned very specifically in Juran).  I believe this is part of what Wheeler referred to when he mentions the “muddy waters” of transforming data.

    0
    #176727

    Chris Seider
    Participant

    See the link below for a summary of distributions….
    http://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm
    I often use Weibull distributions for my cycle time distributions in projects.  To show proper statistical shift, the median or standard deviation should be monitored.  However many financial rep’s aren’t comfortable with medians so you will have to track mean and medians.

    0
    #176728

    Chris Seider
    Participant

    Sam,
    Kind of a vitriolic posting by yourself…..
    I wish you would recognize that Shewhart charts originally basically were used on averages of subgroups which is why the statement is only loosely correct that SPC charts can be used for all types of distributions.  Quoting Juran’s Quality Handbook “an individual chart is much more sensitive to a lack of normality of the underlying distribution than is the Xbar chart”. 
    It is uncommon that cycle times are plotted with Xbar charts.
    I stand by the six sigma methodology drives bottom line business results more uniformly than other techniques.

    0
    #176778

    SixSigmaGuy
    Participant

    >>you can use the ID Distribution function to see what distribution best fits.<<  You can't use a distribution just because the data in your sample happens to closely match it's shape, or other syntactical characteristics.  For example, I can make Weibull, Poisson, Chi-Square, etc. distributions all look the same based just on the parameters I choose.  In order to use a certain Distribution to model the data represented in your sample, you need to consider the semantic characteristics.  For example, a Weibull distribution is used to model failure rate data.  A Poisson distribution is used to model event frequency data.  None of these distributions were meant to model cycle time data and, as far as I know, there isn't one.  This might be a case where a new one needs to be invented.
    If you wanted to use an existing distribution to model cycle time data based on what Minitab returns in it’s ID function, you would need to square your confidence.  E.g., if Minitab says your sample matches a Weibull distribution, it’s only saying that you can’t reject the null hypothesis with 95% confidence that the your sample is from a Weibull distribution.  So, you’ve already got an error of 5% just based on your choice of distribution.  Now, if you go ahead and calculate a 95% confidence interval around the mean based on a Weibull distribution, your confidence is no longer 95%, but 95% squared, or 90.25%. 

    0
    #176779

    SixSigmaGuy
    Participant

    >>The easiest way to deal with this is to take the log of the data , run the analysis on the log, identify the confidence interval and then back transform these intervals to the actual units.<<
    This would work if not for the margin of error (e.g., 1.96 * SD/SQRT(n) for the 95% confidence interval) that’s applied when calculating the confidence interval.  If you calculate the confidence interval around the mean of the lognormal transformed data and then convert those CIs back to their anti-logs (i.e., the original data), the interval usually won’t include the mean of the sample, much less the mean of the population.
    This is easily proved by generating some random normal data, calculating the CIs around the mean, transforming the data to lognormal data, transforming the CIs to their lognormal equivalents, calculating the mean from the lognormal data, and noting where the mean exists in relation to the transformed CIs.

    0
    #176780

    SixSigmaGuy
    Participant

    Exactly!  See my other reply I just submitted.

    0
    #176781

    SixSigmaGuy
    Participant

    These methods are fine if you are able to reject Ho.  But the problem is that they often aren’t robust enough to mathematically reject even though the difference is significant.

    0
    #176782

    SixSigmaGuy
    Participant

    I don’t get your point.  How are Shewhart charts going to show that your improvements have significantly reduced cycle time?  That’s the main issue of this thread.  Shewhart charts are antiquated tools for distinguishing “random” or “controlled” variation from “special cause” or “uncontrolled” variation.  I.e., they tell you if the variation of your process is in control and, thus, not worth improving.  I say “antiquated” because they are based on simple counting rules that were easy to apply back before we had computers.  Shewhart developed his charts in 1931. 

    0
    #176783

    SixSigmaGuy
    Participant

    Can you clarify that you are looking for a “location” confidence interval, such as the mean?  If you are looking for confidence intervals around the variation, then the problem is different.  I’m assuming you are trying to show that your improvement significantly reduced the cycle time, which would mean you are dealing with a location problem.
    You do NOT lose information with a lognormal transformation like you do if you apply the central limit theorem.  You can transform data back and forth using the lognormal transformation and your data will be exactly the same (subject to minor roundoff errors caused by the computer’s processor).
    Unfortunately, although you do not lose information in the transformation, you do lose information when you calculate the confidence intervals.  Thus, avoiding transformations is a good thing to do, but not for the reason you cited.

    0
    #176786

    Nik
    Participant

    It all depends on your goal; which relates to what practical conclusions do you want to draw. If you want to say the central tendency of the process is different, then you use hypothesis tests. If you want to identify root causes and see if a change is beginning to have a significant effect (or are the fluctuations just noise) then use Control (Shewhart) Charts. Each tool helps you draw practical conclusions, it is up to you to interpret what the tool is telling you correctly.
    Now, Shewhart’s methods may appear simplistic and the tests rather crude, but nothing has replaced them in for identifying in-process changes, and although 1931 predates computers, Laplace’s derivation of the central limit theorem was 1778, and as modern statisticians are realizing “there is nothing normal about the normal distribution.” (https://www.isixsigma.com/library/content/c020121a.asp)
    As you study the history of the development of all of the tools in our Six Sigma toolbox, you repeatedly find individuals who could not determine what they needed to know from the methods that currently existed: Fisher, “Student T”, Kruskal, Wallis, etc. That spirit of invention & exploration should continue to drive us today, not the following of some roadmap, clicking some buttons in a computer program, or thinking a method “too old” to use.
     

    0
    #176810

    SixSigmaGuy
    Participant

    >>Now, Shewhart’s methods may appear simplistic and the tests rather crude, but nothing has replaced them in for identifying in-process changes<<
    Not sure why you say nothing has replaced them; a simple power spectral density (PSD) analysis of the data after putting it through a Fourier transform will tell you very clearly if there is special cause variation in the data, or not.  That’s basically all that Shewhart did.  His rules all represent different frequencies in the data.

    0
    #176812

    Ken Feldman
    Participant

    What rules of Shewhart’s are you referring to? The only rule I recall is the one that deals with being beyond the control limits.

    0
    #176813

    newbie
    Participant

    Dr R,
    Would not a basic 2 sample t test (regardless of skew) and variable control chart (staged to show before and after) provide the data requested by the original poster?  
     

    0
    #176815

    SixSigmaGuy
    Participant

    There are actually several sets of rules that have been published.  One of the more common is as follows:

    Your process is out of control if one, or more, of the following is true:

    One data point outside the 3 sigma limit,

    Two out of three consecutive data points outside the 2 sigma limit,

    Four out if Five consecutive data points outside the one-sigma limit,

    Eigth consecutive data points on one side of the center line,

    Six consecutive points showing an increase or decrease,

    14 consecutive points that oscillate up and down, and/or

    15 consecutive points within the 1 sigma limit.Relying on simple control limits can often trigger false alarms (the rules can too, but are less likely).  Say, for example, that you are using 3sigma control limits, we expect that .25% of the points will exceed 3sigma just through random variation.  Thus, you might say a process is out of control, even if it isn’t.  Not a big deal when you only have a few points, but in today’s world, with computers, etc., we are often relying on thousands of data points to determine if a process is in control or not.  Worse, though, is that you might claim the process is in control, when it has a very strong signal (e.g., 15 consecutive points inside the 1sigma limit) indicating it is out of control, even though none of the points are outside the control limits.

    0
    #176817

    Ken Feldman
    Participant

    Sorry Guy, but my recollection is that all those rules were developed much later by some engineers at Western Electric, hence the name Western Electric Rules.  Shewhart did not develop those and believed that the simple signal of exceeding the control limits was sufficient.  Is my recollection correct?

    0
    #176818

    Mikel
    Member

    You are correct but you have the advantage of being an old guy like me.

    0
    #176820

    SixSigmaGuy
    Participant

    No, those are not the Western Electric (WECO) rules; although some references may call them that.  As I mentioned, there have been several sets of rules (e.g., Wheeler, Nelson, etc.) that have been published and Western Electric rules is one of them.  For example, I don’t believe that the WECO rules include the one that says “15 consecutive points within 1 sigma.”  The main issue, though, is that if you only rely on the single rule of being outside the control limits, you will miss the strong low frequency signals that also indicate special variation; thus, you will be inclined to say that your process is in control when, in fact, it is not.  Putting your data through a Fourier transform will show all the frequencies in your data and a PSD will identify the strong signals.

    0
    #176821

    Robert Butler
    Participant

    I’ve been away for awhile and haven’t had a chance to drop by the forum.  I noticed the thread concerning the confidence interval for cycle time was still active and in reading the posts in reverse order I noted the comment concerning the error of just logging the data…what was worse was discovering that I was the source of the recommendation.   
     
      All I can say is that, as written it is indeed wrong and I can only plead a bad case of rented fingers and typing too fast.  The only hope I have is that no one tried it and if they did I would hope they would have noticed the confidence intervals weren’t asymmetrical and that therefore something wasn’t right.
     
    What should have been said is the following:
     
    1.      Log the data.
     
    2.      Find the mean of the logged data (Mlog) and the variance (Vlog) and standard deviation (Slog) of the logged data.
     
    3.      For the lower and upper confidence intervals compute
           exp(Mlog –t(1-alpha/2)*Slog) and  exp(Mlog +t(1-alpha/2)*Slog)
     
    4.      For the estimate of the mean compute exp(Mlog + Vlog/2)
     
    5.      If N is “large (i.e. greater than 10) and if Slog is less than square root of 2 this approximation should give a maximum error of around 10%
     
    I don’t happen to have the actual reference in front of me but according to my notes a reference for this is Continuous Univariate Distributions – Johnson and Kotz

    0
    #176822

    Ken Feldman
    Participant

    In fact, on page 172 of the Statistical Quality Control Handbook, they reference a pattern called “Stratification”.  It is described as, “…a stratification pattern appears to hug the centerline with few deviations or excursions at any distance from the centerline.”  Guess that is sort of like a Western Electric rule.  I agree that Shewhart was a bit simplistic on only relying on a point being in or out but given the context in which he developed the control chart, it was probably appropriate.  The concern with being over sensitive to patterns is that you start to see “bunnies in the clouds”.  Selecting all the WE tests in Minitab when doing a control chart may overwhelm you with signals that might be more information than you need.

    0
    #176826

    SixSigmaGuy
    Participant

    That’s why I say people should not use the control charts or rules to determine if the process is in control or not.  Use the Fourier tranform instead and analyze the PSD, instead.

    0
    #176829

    SixSigmaGuy
    Participant

    Aren’t you assuming that the logged data is Normal?
    I ran a simulation using your algorithm.  Mlog +/-t(1-alpha/2)*Slog gave me what appeared to be a good confidence interval around the mean of the logged data, but exp(Mlog +/-t(1-alpha/2)*Slog) gave me limits that were both on the same side of the mean.  I.e., the mean of the original data did not fall within the confidence interval.  Am I missing something?
    Thanks for the pointer to the Continuous Univariate Distributions reference; looks like something I need in my library.  Sure is expensive :-(

    0
    #176922

    Robert Butler
    Participant

    SSGuy,  sorry for the delay – for whatever reason I can’t find my copy of Johnson and Kotz so I can’t double check my notes and the comments I posted earlier.  There is something not right and I’ll have to check into it and try to get back to this thread later.

    0
    #176924

    Ken Feldman
    Participant

    Hey Robet, put away the books for a while and jump in and tell us how to save the world.  Much more important right now than statistical formulas.

    0
Viewing 40 posts - 1 through 40 (of 40 total)

The forum ‘General’ is closed to new topics and replies.