# Dealing with zeroes in non-normal cycle time data

Six Sigma – iSixSigma Forums Old Forums General Dealing with zeroes in non-normal cycle time data

Viewing 8 posts - 1 through 8 (of 8 total)
• Author
Posts
• #32739

Pipkin
Participant

I’m measuring cycle times for a proposal lead time reduction project.  We have historical data for 5 dates: Date a quote is requested, date the customer (supposedly) needs it, the date we complete the proposal, the date it was submitted, and the date (if/when) an order is released for the proposal (when a PO is received).
My confusion has to do with how to define my cycle time measurements that involve same day dates.  Seems trivial enough, but is the cycle time zero days, or should it be treated as one day (we have not measured time of day, so the resolution is one day at best)?  While zero days does have practical sense, it doesn’t make mathematical sense.  I do know that Minitab does not like data with zeroes (or negative numbers) when trying to transform data!  Be it a log transform, or trying to find “some lambda” using the Box-Cox Transformation.  Being time bound data, it naturally is not a normal distribution, so out with ANOVA or using an Individuals control chart (no subgroups to make use of the central limit theorum).  The distribution likely follows a Weibul more than any other, but it doesn’t like data <=0 either.  So what to do?  Fudge these same day cycle times by adding a "1"?  But since a lot of the data are on the order of a few days, this would seem to corrupt any measurements?
How have any of you handled cycle time data in this regard?
Regards,
Jack

0
#87797

Johnny Barret
Participant

Jack,
Your question immediately brought George Box’s qouote to mind: “All models are wrong, but some models are useful.”  This appears to me to be a measurement capability issue.  I am jumping in here without knowing where you are in the DMIAC process, but I suggest you frame a reasonable model that works to the order of magnitude for your improvement goal.  I agree that cycle time is something that can only approach zero and have not tried to model it otherwise.  (Forget the case where really efficient manufacturing cycles let me collect revenue before I have to pay for the parts used to make the goods.)
If you use an “estimate” for the cycle time events that fall into the 1 day or less category you could fit several values for this synthesized situation and demonstrate how much of an “error ” effect this kind of assumption has on your process measurement. (i.e.  0.1day, vs 0.75day, vs 1.0 or 1.5days for these indeterminate short cycle measurements.)
Frame this judgement with the current process performance vs the improvement goal you have in mind.  If we are at 7 cycle days (whatever central tendancy statistic you set as appropriate) and we need to improve to 3 or less, then having a data measurement that is discontinous near 1 is likely not important.  However, if completing this cycle in less than a day is your improvement goal, then the measurement capability is fundamentally flawed.  This situation requires improving the data collection process to be able to identify “hours” of cycle time instead of days.  My classic rule of thumb is measurement errors should be a maximum of  one fourth of the range from the target to the failure limit.
Good hunting, after the measurement fits the need, we can have all sorts of fun discussing how, when and if a Weibull distribution is the right model for the data.

0
#87801

clb1
Participant

If you are interested in examining the data for some kind of fit and you want to check the possibilites of a Box-Cox transform (of which log is a case) then the easiest thing to do is to shift your data by adding 1 to every response.  This will have the effect of moving you away from zero and it will preserve the integrity of the data with respect to distribution shape and the relation of the various measurements to one another.

0
#87931

gg
Participant

Check out Donald J. Wheeler’s books on SPC.  He really is a user / practitioner of the Shewhart method.  Start with “Undersrtanding Varaition – the key to managing chaos”, and consider his demonstration of using XmR for rates data.  This book should be read by every board of directors for evaluating their monthly reoprts!
His new book “Making Sense of Data”, service industry type examples is a also a good read.
SPC Press, Inc.  Knoxville, Tennessee.

0
#87944

Ron
Member

First forget about Normality !!!! In cycle time reduction projects it should not be a concern! Normality should only be considered when you are doing some form of hypothesis testing and even then it may not be a factor….
Typically observing between 7 and 10 cycles will be sufficient to calculate the average and standard deviation of that process. Drill down on the variation. Cycle time reduction projects typically fall into the lean toolkit rather than the six sigma toolkit which is specifically designed for variation reduction.
In my experience huge gains can be accomplished using simple value stream mapping and spaghetti charts.  Go the lean route first then come back for the variation reduction.

0
#87977

Pipkin
Participant

I appreciate the responses to-date.  I still have mixed feelings about what to gain from non-normal data, other than a visual depiction of its mean/median and the spread of data.  So I maybe cannot transform my cycle time data and normalize it, or make it at least more normal for a swag with ANOVA or the like.  A histogram shows the one-sided tendency (time bound) towards zero cycle time, with some proposals taking unusually long times to complete.  But business types still tend to want to put a number, an average that is, to how long it takes to crank out quotes (cycle time).  Isn’t an average somewhat useless when most all of the histogram is one-sided?  So maybe we state what the median number of days is, but what practical sense does this have?  My instincts tell me that an average on has merit when the data are somewhat normally distributed.  If so, so much for trying to report average cycle times?  And when’s the last time you saw a monthly report that said what the median days was to complete a task, or what the “mode” was?  So what should you report when your cycle time data are time-bounded / non-normal?

0
#87996

Robert Butler
Participant

Based on your last post it would appear that your single biggest problem is that you have a small army of “business types” who believe that it is possible to characterize a distribution with a single summary statistic (the average).  The mean, median, mode, standard deviation, skewness, etc. are not only numbers, they are descriptors and the thing that they are attempting to describe is a distribution.  Most statisticians will tell you that you need to know at least the mean, median, mode, standard deviation, and sample size before you can began to make any kind of judgements about the population of interest.  For non-normal data I would recommend reporting the median.
If you look around you will find reports that quote the median and not the mean.  The saving grace with the median is that you are guaranteed to have 50% of the data less and 50% of the data more than the stated value.  In those instances where the “business types” don’t seem to understand the importance of properly describing a distribution I have found that a simple table of summary statistics (with their favorite one heading the list) accompanied by a carefully explained box plot goes a long way towards changing minds and providing insight.

0
#88151

Jonathon
Participant

If your cycle can be completed the same day it is started, then you need to stop measuring days, and start measuring hours, or minutes, or seconds. Stop trying to beat bad numbers sesneless with charts and transforms, and start getting the right data.

0
Viewing 8 posts - 1 through 8 (of 8 total)

The forum ‘General’ is closed to new topics and replies.