iSixSigma

Caculating Standard Deviation with 10 Points

Six Sigma – iSixSigma Forums Old Forums General Caculating Standard Deviation with 10 Points

Viewing 16 posts - 1 through 16 (of 16 total)
  • Author
    Posts
  • #47028

    Harish S
    Participant

    Hi All,
    In an attempt to derive organizational baselines, I have computed the effort and the schedule variances for 10 projects previous year. However I have two issues.
    1. I am computing the standard deviation from 10 data points, what is the confidence level of such a SD arrived at from 10 data points.
    2. The SD is more than 100% of the mean, so to arrive at USL and LSL is use of 1 sixma level appropriate.
    Thanks for your suggestions in advance.
    Harish S

    0
    #156219

    DaveS
    Participant

    Harish,
    Your first question is not phrased in statistical terms so I’ll reply in kind and say your confidence will be low in laymans terms.
     If you want a statistical answer, construct a CI for the sd. With ten points it will be quite wide, which is a equivalent to the first question.
    If by the second question you mean is it appropriate to use 1 sigma level (as in nearest spec limit 3sd from mean), than that might be a good starting point and has nothing whatsoever to do with the relationship of the mean to the sd.
    If you mean (as Marko posted yesterday), that the limits are 1 sd from the mean, than that would never be approbriate.

    0
    #156222

    Harish S
    Participant

    Thanks Dave,
    If there any thumb rule that gives what sigma levels to use for considering  USL & LSL in case the sd dev is 100%, 80%, 50%, 25% and 10% more than the mean.
    Look where I am coming from, Setting up org baselines with a high sd dev  deviation in one area and a relatively moderate sd dev one in other.
    Also computing with 3 sigma levels will look exaggarated.
    Thanks in Advance
    Harish S
     

    0
    #156223

    Cone
    Participant

    Point 1- the confidence interval for Standard Deviation is computed
    using a Chi-Squared distribution. Go compute yours.Point 2 – Unless you can have a schedule variance of less than 0, what
    you are seeing is a skewed distribution – common and expected when
    measuring time. Go understand what this right skewed distribution
    means. As far as specs, they have to do with the customer’s
    expectations, not how big or small your standard deviation is.

    0
    #156230

    Waskita
    Participant

    Harish,
    I agree with Dave that with 10 data points, your confidence level will be very low. Thus; the conclusion withdrawn could be misleading since u don’t have enough data.
    Below are some bullet points that could practically be useful for your project :

    High SD for 10 data points is self-explanatory and very much expected. Thus; …. before constructing your base line, i suggest that you should first understand your data. Is there any outlier which is clearly visible from the run chart even without constructing control limit into the chart?
    If yes, i strongly believe that it was due to a special cause and your job is to find out what could possibly be the cause as it will give a hint of some “low hanging fruits solution” that you could easily get.
    My suggestion is to remove the outliers from the data points no matter what the cause is, coz it will give a wrong understanding about your baseline. 
    Once done properly, then you can now simply use the average figure of the remaining data points to derive a baseline.  Control limit built from SD will also be more reliable despite the minimum number of data points available, since the outlier has been taken out.
    When interpreting whether your improvement action items make a significant difference against the baseline, i personally suggest to still use 3SD for the control limit (remember : using base line data, even if you did nothing ….>99% of the future result will fluctuate within +/- 3 sigma bandwidth)
    Hope it helps …

    0
    #156232

    Cone
    Participant

    Nonsense.1) Hign standard deviation for 10 pints is expected? Just the opposite
    is true. Go simulate 100 samples of 10 from a known distribution and
    tell me how many times the number came out unexpectedly high.2) Is it special cause or skewed data? Time data is expected to be
    skewed resulting in a high Standard deviation.

    0
    #156233

    Key Question
    Participant

    The fkey question is: Were the 10 projects sampled randomly or do they constitute the total number of projects completed for the year. If they constitute the total number of projects completed, there is  (1)  no need to calculate a confidence interval, (2) the calculation of the standard deviation needs to be adjusted for populations, i.e. divide by n rather than n – 1. If the projects were not sampled randomly, a major assumption of the confidence interval will be violated and make it useless. If they were sampled randomly, the confidence interval will be large to the point of being useless for predictions. Either way, any statistical calculations based on confidence intervals seem to yield little useful information.

    0
    #156235

    Other question
    Participant

    Control limit built from SD will also be more reliable despite the minimum number of data points available, since the outlier has been taken out.
    How will you calculate control limits from an individual’s chart based on a standard deviation? Also, 10 data points are a very small number for a control chart that is robust enough to allow you to determine special cause. If this is a variable’s chart your number of groups is even further reduced.
    If you have 10 projects this year and 10 projects next year, and you compare the time to completion next year, there is no need for estimation of either standard deviation, confidence interval or control limits… the data are what they are.

    0
    #156242

    Waskita
    Participant

    Good discussion but too bad ….. bad choice of words.
    I believe Gary is a master in Six Sigma but it will be best if you don’t close your minds to any possibility.
    Non-sense? Sure or not? …
    Simply from SD formula =((x-xbar)^2/(n-1))^0.5 ….we can straight away see that the lower the n (number of samples), the smaller the denominator, and bigger SD value will be derived. Simple math!
    But ofcourse it also depends on how widely spread the data is distributed (x-xbar) that will affect the final SD result.
    I just took a real data on hotel room rate for certain period of month (n = 1517) and take randomly 10 data points for room rate value. Well … sorry to say but SD derived for 10 data points is bigger.
     
     
     

    0
    #156248

    Cone
    Participant

    Sorry for your mistake – go simulate the result 10,000 time and you will see that the calculated number is less than the known value over half the time (57% of the time last time I did it).
    Your assumption about the formula is interesting but doesn’t work out.

    0
    #156290

    Waskita
    Participant

    Glad that you implicitly admit your mistake Gary ;)
    Coz the word “non sense” leaves a strong impression that the possibility for higher SD is almost zero when n is low.
    43% of the time is huge man … though it should be even higher if you do proper data collection.
    Again, good discussion.
    Thanks for the education and also keep on learning buddy!

    0
    #156306

    43
    Participant

    Quick question: in one post Gary said that the stdev is calculated based on a chi-square distribution. Did Gary take random samples from a chi-square distribution, and comes to his odd conclusions?
    I would still like to know from the original post if the 10 projects were randomly sampled and from what population. Thanks!

    0
    #156332

    Harish S
    Participant

    Dear All,
    The discussion has raged on when I was on the week end.
    Never the less thanks to all for their participation
    Comming to the point – The data received from the 10 projects are the absolute data and no random sampling is done.
    From the discussion what I gather is, wiat for 30 projects or so till you make a baseline which has statistical relevance.
    In that case I will have to wait for 3 years – doesn’t make sense.
    Thanks for Your Inputs in Advance.
    Harish S

    0
    #156335

    Waskita
    Participant

    No Harish …
    You’ve misunderstood messages fr all who have contributed in this topic.
    No one said implicitly/explicitly to wait  longer just to get a baseline. What they did was just trying to provide their technical expertise on how to get proper baseline in order to arrive at a proper conclusion
    What past is past ….If  you only 10 data points to reflect current performance, then be it! Don’t wait any longer to start improving. Along with times if the improvement actions makes a significant difference, the data will speak for itself.
    Cheers,

    0
    #156343

    question
    Participant

    Harish,
    You are missing the point. You can use your ten data points to compare any future project against this baseline. You just don’t have to use inferential procedures because you are dealing with the population data. Track the data on a run chart. This way you avoid the pitfalls associated with control charting. You still get information about trending, clustering etc. Also, you can transform any new project into a z-value and compare it against the other 10 data points (beware of outliers though). Or, you simply compare the data against the range … or, or, or. Most importantly, understand the variation in your time to project completion. One way to do that is to debrief each project and track the key factors that impacted the length of the project. Qualitative analysis is equally informative.

    0
    #156766

    Ale
    Participant

    Sorry for doing a step back:
    Is it correct to use sd and average to measure your baseline and define your target ?
    From what I see it could be better, in your case, to deal with span.
    Measure your baseline span, define your target span, measure your improvment.
    Ciao.

    0
Viewing 16 posts - 1 through 16 (of 16 total)

The forum ‘General’ is closed to new topics and replies.