Confidence Intervals for Cycle Time Distribution
Six Sigma – iSixSigma › Forums › Old Forums › General › Confidence Intervals for Cycle Time Distribution
 This topic has 39 replies, 13 voices, and was last updated 13 years, 6 months ago by Ken Feldman.

AuthorPosts

October 11, 2008 at 12:38 am #51107
Hello All,I am working with cycle time data and when I chart out the distribution, it is skewed right. I have a couple of questions:
1. What distribution do most people use in modeling the cycle time distribution.
2. Since the cycle time distribution is skewed right (Not normally distributed), how can I calculate the confidence intervals for this distribution without transforming the data. The reason for not transforming the data is that I do not want to lose information in the transformation.Thanks much,
PS0October 11, 2008 at 3:23 pm #176652
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.It should not have been a surprise that your time based data was skewed. Because of the natural boundary of zero, it has a tendency to do that. If you have access to Minitab, you can use the ID Distribution function to see what distribution best fits. It might be exponential or log normal or one of the other skewed distributions. There are standard formulas that you can Google and find to compute the CI for any of the distributions.
0October 11, 2008 at 9:36 pm #176654
Jedi MasterParticipant@JediMaster Include @JediMaster in your post and this person will
be notified via email.Darth,
I’ve read many of your posts…are you a MBB?0October 12, 2008 at 1:08 am #176657I agree with Darth. Cycle time data is naturally nonnormal due to the fact you can’t have cycle times less than zero. MINITAB has the ability to check your data against 20+ distributions (Stat>Quality Tools>Individual Distribution Identification (might be slightly off, I’m doing it from memory)). If it fits one of those distributions, you can likely find a confidence interval type calc or some equivalent. MINITAB draws some sort of confidence or prediction bands to calculate the Anderson Darling fit values.
Why do you need a confidence interval? Are you trying to establish a baseline?0October 12, 2008 at 12:54 pm #176660
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.JM, why do you ask?
0October 12, 2008 at 3:32 pm #176662
Adam L BowdenParticipant@AdamLBowden Include @AdamLBowden in your post and this person will
be notified via email.You might want to identify the “long cycle time” items and determine
if these are special cause – then remove them from your data set –
then the date may be normal.Adam0October 12, 2008 at 6:01 pm #176663Thanks for the reply Darth.
I have tried to do exactly what you have recommended. A followup question to the idea you just suggested is, what if the distribution does not fit perfectly with any of the distributions in Minitab ?? Is there an alternative course of action??
0October 12, 2008 at 6:12 pm #176664Nik,
The reason I am looking to estabilsh a confidence interval is that I want to compare “preimprovement” and “post improvement” sets of data and be able to say that the postimprovement numbers are a statistical improvement to the preimprovement numbers.
FYI: I would use the “Mean Cycle Time” as my primary metric.
Thanks,
PS0October 12, 2008 at 6:35 pm #176665
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.Your biggest concern with something like cycle time is, as Darth noted, the natural barrier at time 0. Thus if you build confidence intervals using the raw data you run a risk of having a negative lower confidence bound. The easiest way to deal with this is to take the log of the data , run the analysis on the log, identify the confidence interval and then back transform these intervals to the actual units. Your confidence bounds will be asymmetrical but they will be physically meaningful. There are many processes where asymmetrical C.I.’s are a fact of life.
I wouldn’t stop with just a simple check for a mean shift. I would recommend you also look for a change in the variation about the mean. You could find your improvement did nothing to the mean but that it significantly reduced the variance. A finding like that could easily prove to be more valuable than a discovery of a simple shift in the mean.0October 12, 2008 at 6:40 pm #176666I’d recommend keeping it a bit simpler. Use “cycle time”as the metric (instead of mean cycle time), then you can look for shifts in means or medians using hypothesis testing. See https://www.isixsigma.com/forum/showmessage.asp?messageID=8723
With only two subgroups (before and after) this will likely lead you to a “MannWhitney” test for comparing population “medians” (which are a better estimate of central tendency for nonnormal data). The other option, especially when collecting large sample sizes is difficult is to use control charts. The 8 different tests can help flag changes in process performance faster than hypothesis testing alone.
By the way, the medians tests incorporates median confidence intervals, which I believe answers your original question.0October 12, 2008 at 6:59 pm #176667
Jedi MasterParticipant@JediMaster Include @JediMaster in your post and this person will
be notified via email.you always give great answers in regards to stats. just wondering if you were an MBB our a statitician. Either way our firm in So Cal is looking for a good MBB. I hold BB certification from well respected firm.
0October 12, 2008 at 7:55 pm #176668Seems like the use of median is the best option if you truly have a skewed distribution.
Are your time study values showing a boundary at zero or are you studying events that are substantially above zero? Don’t assume just because you are dealing with time, that the natural boundary will impact you!0October 12, 2008 at 10:01 pm #1766691) Exponential2) Confidence interval formulas for exponential are available.I saw the further post about using mean cycle time – I agree with that,
but also use COV, an improved process should have a COV less than
(preferred) or equal to the starting process. Go look at the
assumptions of exponential and queueing to understand better.0October 12, 2008 at 10:03 pm #176670Darth is a Deming master and is slumming working in Six Sigma. I am
sure he would give better guidance than 99.99% of the MBB’s out
there.0October 12, 2008 at 11:40 pm #176671The classic example illustrating the nonsense of six sigma and its use of enumative tools for analytical problems. Time based measurements are almost always skewed. Forget all the rubbish about normal distributions. Shewhart showed that you don’t need them !!!!
Has no one ever heard of Shewhart Charts ????0October 13, 2008 at 1:21 am #176672
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.Thanks Mom, but you shouldn’t be posting under Stan’s name. For JM’s benefit, I am a MBB but an engineer by training, not a statistician. If you wish to further discuss your opportunity, catch me offline at [email protected]
0October 13, 2008 at 1:26 am #176673
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.Nik, to summarize;1. Explore your data and be sure that the skewness is real and not due to some anomalies.
2. Transforming is the very last resort but in this case, is not necessary.
3. If you are seeking to test two sets of data for change, then a nonparametric test is adequate but keep in mind that it is not assumption free, just distribution free.
4. Not sure how a control chart is relevant although the assumption of normality is often dealt with by the robustness of the control chart.If you wish to share your data offline, I will be happy to take a look and comment. Catch me at [email protected]0October 13, 2008 at 3:46 am #176675
SeverinoParticipant@Jsev607 Include @Jsev607 in your post and this person will
be notified via email.I’m fairly certain that you cannot just simply perform the inverse transformation on your confidence intervals (mentioned very specifically in Juran). I believe this is part of what Wheeler referred to when he mentions the “muddy waters” of transforming data.
0October 15, 2008 at 8:06 am #176727
Chris SeiderParticipant@cseider Include @cseider in your post and this person will
be notified via email.See the link below for a summary of distributions….
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm
I often use Weibull distributions for my cycle time distributions in projects. To show proper statistical shift, the median or standard deviation should be monitored. However many financial rep’s aren’t comfortable with medians so you will have to track mean and medians.0October 15, 2008 at 8:29 am #176728
Chris SeiderParticipant@cseider Include @cseider in your post and this person will
be notified via email.Sam,
Kind of a vitriolic posting by yourself…..
I wish you would recognize that Shewhart charts originally basically were used on averages of subgroups which is why the statement is only loosely correct that SPC charts can be used for all types of distributions. Quoting Juran’s Quality Handbook “an individual chart is much more sensitive to a lack of normality of the underlying distribution than is the Xbar chart”.
It is uncommon that cycle times are plotted with Xbar charts.
I stand by the six sigma methodology drives bottom line business results more uniformly than other techniques.0October 16, 2008 at 10:12 pm #176778
SixSigmaGuyParticipant@SixSigmaGuy Include @SixSigmaGuy in your post and this person will
be notified via email.>>you can use the ID Distribution function to see what distribution best fits.<< You can't use a distribution just because the data in your sample happens to closely match it's shape, or other syntactical characteristics. For example, I can make Weibull, Poisson, ChiSquare, etc. distributions all look the same based just on the parameters I choose. In order to use a certain Distribution to model the data represented in your sample, you need to consider the semantic characteristics. For example, a Weibull distribution is used to model failure rate data. A Poisson distribution is used to model event frequency data. None of these distributions were meant to model cycle time data and, as far as I know, there isn't one. This might be a case where a new one needs to be invented.
If you wanted to use an existing distribution to model cycle time data based on what Minitab returns in it’s ID function, you would need to square your confidence. E.g., if Minitab says your sample matches a Weibull distribution, it’s only saying that you can’t reject the null hypothesis with 95% confidence that the your sample is from a Weibull distribution. So, you’ve already got an error of 5% just based on your choice of distribution. Now, if you go ahead and calculate a 95% confidence interval around the mean based on a Weibull distribution, your confidence is no longer 95%, but 95% squared, or 90.25%.0October 17, 2008 at 12:32 am #176779
SixSigmaGuyParticipant@SixSigmaGuy Include @SixSigmaGuy in your post and this person will
be notified via email.>>The easiest way to deal with this is to take the log of the data , run the analysis on the log, identify the confidence interval and then back transform these intervals to the actual units.<<
This would work if not for the margin of error (e.g., 1.96 * SD/SQRT(n) for the 95% confidence interval) that’s applied when calculating the confidence interval. If you calculate the confidence interval around the mean of the lognormal transformed data and then convert those CIs back to their antilogs (i.e., the original data), the interval usually won’t include the mean of the sample, much less the mean of the population.
This is easily proved by generating some random normal data, calculating the CIs around the mean, transforming the data to lognormal data, transforming the CIs to their lognormal equivalents, calculating the mean from the lognormal data, and noting where the mean exists in relation to the transformed CIs.0October 17, 2008 at 12:34 am #176780
SixSigmaGuyParticipant@SixSigmaGuy Include @SixSigmaGuy in your post and this person will
be notified via email.Exactly! See my other reply I just submitted.
0October 17, 2008 at 12:39 am #176781
SixSigmaGuyParticipant@SixSigmaGuy Include @SixSigmaGuy in your post and this person will
be notified via email.These methods are fine if you are able to reject Ho. But the problem is that they often aren’t robust enough to mathematically reject even though the difference is significant.
0October 17, 2008 at 12:53 am #176782
SixSigmaGuyParticipant@SixSigmaGuy Include @SixSigmaGuy in your post and this person will
be notified via email.I don’t get your point. How are Shewhart charts going to show that your improvements have significantly reduced cycle time? That’s the main issue of this thread. Shewhart charts are antiquated tools for distinguishing “random” or “controlled” variation from “special cause” or “uncontrolled” variation. I.e., they tell you if the variation of your process is in control and, thus, not worth improving. I say “antiquated” because they are based on simple counting rules that were easy to apply back before we had computers. Shewhart developed his charts in 1931.
0October 17, 2008 at 1:08 am #176783
SixSigmaGuyParticipant@SixSigmaGuy Include @SixSigmaGuy in your post and this person will
be notified via email.Can you clarify that you are looking for a “location” confidence interval, such as the mean? If you are looking for confidence intervals around the variation, then the problem is different. I’m assuming you are trying to show that your improvement significantly reduced the cycle time, which would mean you are dealing with a location problem.
You do NOT lose information with a lognormal transformation like you do if you apply the central limit theorem. You can transform data back and forth using the lognormal transformation and your data will be exactly the same (subject to minor roundoff errors caused by the computer’s processor).
Unfortunately, although you do not lose information in the transformation, you do lose information when you calculate the confidence intervals. Thus, avoiding transformations is a good thing to do, but not for the reason you cited.0October 17, 2008 at 3:16 am #176786It all depends on your goal; which relates to what practical conclusions do you want to draw. If you want to say the central tendency of the process is different, then you use hypothesis tests. If you want to identify root causes and see if a change is beginning to have a significant effect (or are the fluctuations just noise) then use Control (Shewhart) Charts. Each tool helps you draw practical conclusions, it is up to you to interpret what the tool is telling you correctly.
Now, Shewhart’s methods may appear simplistic and the tests rather crude, but nothing has replaced them in for identifying inprocess changes, and although 1931 predates computers, Laplace’s derivation of the central limit theorem was 1778, and as modern statisticians are realizing “there is nothing normal about the normal distribution.” (https://www.isixsigma.com/library/content/c020121a.asp)
As you study the history of the development of all of the tools in our Six Sigma toolbox, you repeatedly find individuals who could not determine what they needed to know from the methods that currently existed: Fisher, “Student T”, Kruskal, Wallis, etc. That spirit of invention & exploration should continue to drive us today, not the following of some roadmap, clicking some buttons in a computer program, or thinking a method “too old” to use.
0October 17, 2008 at 8:59 pm #176810
SixSigmaGuyParticipant@SixSigmaGuy Include @SixSigmaGuy in your post and this person will
be notified via email.>>Now, Shewhart’s methods may appear simplistic and the tests rather crude, but nothing has replaced them in for identifying inprocess changes<<
Not sure why you say nothing has replaced them; a simple power spectral density (PSD) analysis of the data after putting it through a Fourier transform will tell you very clearly if there is special cause variation in the data, or not. That’s basically all that Shewhart did. His rules all represent different frequencies in the data.0October 18, 2008 at 2:00 am #176812
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.What rules of Shewhart’s are you referring to? The only rule I recall is the one that deals with being beyond the control limits.
0October 18, 2008 at 2:21 am #176813Dr R,
Would not a basic 2 sample t test (regardless of skew) and variable control chart (staged to show before and after) provide the data requested by the original poster?
0October 18, 2008 at 5:43 am #176815
SixSigmaGuyParticipant@SixSigmaGuy Include @SixSigmaGuy in your post and this person will
be notified via email.There are actually several sets of rules that have been published. One of the more common is as follows:
Your process is out of control if one, or more, of the following is true:
One data point outside the 3 sigma limit,
Two out of three consecutive data points outside the 2 sigma limit,
Four out if Five consecutive data points outside the onesigma limit,
Eigth consecutive data points on one side of the center line,
Six consecutive points showing an increase or decrease,
14 consecutive points that oscillate up and down, and/or
15 consecutive points within the 1 sigma limit.Relying on simple control limits can often trigger false alarms (the rules can too, but are less likely). Say, for example, that you are using 3sigma control limits, we expect that .25% of the points will exceed 3sigma just through random variation. Thus, you might say a process is out of control, even if it isn’t. Not a big deal when you only have a few points, but in today’s world, with computers, etc., we are often relying on thousands of data points to determine if a process is in control or not. Worse, though, is that you might claim the process is in control, when it has a very strong signal (e.g., 15 consecutive points inside the 1sigma limit) indicating it is out of control, even though none of the points are outside the control limits.
0October 18, 2008 at 1:25 pm #176817
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.Sorry Guy, but my recollection is that all those rules were developed much later by some engineers at Western Electric, hence the name Western Electric Rules. Shewhart did not develop those and believed that the simple signal of exceeding the control limits was sufficient. Is my recollection correct?
0October 18, 2008 at 1:45 pm #176818You are correct but you have the advantage of being an old guy like me.
0October 18, 2008 at 8:48 pm #176820
SixSigmaGuyParticipant@SixSigmaGuy Include @SixSigmaGuy in your post and this person will
be notified via email.No, those are not the Western Electric (WECO) rules; although some references may call them that. As I mentioned, there have been several sets of rules (e.g., Wheeler, Nelson, etc.) that have been published and Western Electric rules is one of them. For example, I don’t believe that the WECO rules include the one that says “15 consecutive points within 1 sigma.” The main issue, though, is that if you only rely on the single rule of being outside the control limits, you will miss the strong low frequency signals that also indicate special variation; thus, you will be inclined to say that your process is in control when, in fact, it is not. Putting your data through a Fourier transform will show all the frequencies in your data and a PSD will identify the strong signals.
0October 18, 2008 at 8:55 pm #176821
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.Ive been away for awhile and havent had a chance to drop by the forum. I noticed the thread concerning the confidence interval for cycle time was still active and in reading the posts in reverse order I noted the comment concerning the error of just logging the data what was worse was discovering that I was the source of the recommendation.
All I can say is that, as written it is indeed wrong and I can only plead a bad case of rented fingers and typing too fast. The only hope I have is that no one tried it and if they did I would hope they would have noticed the confidence intervals werent asymmetrical and that therefore something wasnt right.
What should have been said is the following:
1. Log the data.
2. Find the mean of the logged data (Mlog) and the variance (Vlog) and standard deviation (Slog) of the logged data.
3. For the lower and upper confidence intervals compute
exp(Mlog t(1alpha/2)*Slog) and exp(Mlog +t(1alpha/2)*Slog)
4. For the estimate of the mean compute exp(Mlog + Vlog/2)
5. If N is large (i.e. greater than 10) and if Slog is less than square root of 2 this approximation should give a maximum error of around 10%
I dont happen to have the actual reference in front of me but according to my notes a reference for this is Continuous Univariate Distributions Johnson and Kotz0October 18, 2008 at 9:35 pm #176822
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.In fact, on page 172 of the Statistical Quality Control Handbook, they reference a pattern called “Stratification”. It is described as, “…a stratification pattern appears to hug the centerline with few deviations or excursions at any distance from the centerline.” Guess that is sort of like a Western Electric rule. I agree that Shewhart was a bit simplistic on only relying on a point being in or out but given the context in which he developed the control chart, it was probably appropriate. The concern with being over sensitive to patterns is that you start to see “bunnies in the clouds”. Selecting all the WE tests in Minitab when doing a control chart may overwhelm you with signals that might be more information than you need.
0October 19, 2008 at 1:05 am #176826
SixSigmaGuyParticipant@SixSigmaGuy Include @SixSigmaGuy in your post and this person will
be notified via email.That’s why I say people should not use the control charts or rules to determine if the process is in control or not. Use the Fourier tranform instead and analyze the PSD, instead.
0October 19, 2008 at 6:21 pm #176829
SixSigmaGuyParticipant@SixSigmaGuy Include @SixSigmaGuy in your post and this person will
be notified via email.Aren’t you assuming that the logged data is Normal?
I ran a simulation using your algorithm. Mlog +/t(1alpha/2)*Slog gave me what appeared to be a good confidence interval around the mean of the logged data, but exp(Mlog +/t(1alpha/2)*Slog) gave me limits that were both on the same side of the mean. I.e., the mean of the original data did not fall within the confidence interval. Am I missing something?
Thanks for the pointer to the Continuous Univariate Distributions reference; looks like something I need in my library. Sure is expensive :(0October 21, 2008 at 9:10 pm #176922
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.SSGuy, sorry for the delay – for whatever reason I can’t find my copy of Johnson and Kotz so I can’t double check my notes and the comments I posted earlier. There is something not right and I’ll have to check into it and try to get back to this thread later.
0October 21, 2008 at 9:13 pm #176924
Ken FeldmanParticipant@Darth Include @Darth in your post and this person will
be notified via email.Hey Robet, put away the books for a while and jump in and tell us how to save the world. Much more important right now than statistical formulas.
0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.