Caculating Standard Deviation with 10 Points
Six Sigma – iSixSigma › Forums › Old Forums › General › Caculating Standard Deviation with 10 Points
 This topic has 15 replies, 9 voices, and was last updated 15 years, 4 months ago by Ale.

AuthorPosts

May 18, 2007 at 9:20 am #47028
Harish SParticipant@HarishS Include @HarishS in your post and this person will
be notified via email.Hi All,
In an attempt to derive organizational baselines, I have computed the effort and the schedule variances for 10 projects previous year. However I have two issues.
1. I am computing the standard deviation from 10 data points, what is the confidence level of such a SD arrived at from 10 data points.
2. The SD is more than 100% of the mean, so to arrive at USL and LSL is use of 1 sixma level appropriate.
Thanks for your suggestions in advance.
Harish S0May 18, 2007 at 12:30 pm #156219Harish,
Your first question is not phrased in statistical terms so I’ll reply in kind and say your confidence will be low in laymans terms.
If you want a statistical answer, construct a CI for the sd. With ten points it will be quite wide, which is a equivalent to the first question.
If by the second question you mean is it appropriate to use 1 sigma level (as in nearest spec limit 3sd from mean), than that might be a good starting point and has nothing whatsoever to do with the relationship of the mean to the sd.
If you mean (as Marko posted yesterday), that the limits are 1 sd from the mean, than that would never be approbriate.0May 18, 2007 at 1:38 pm #156222
Harish SParticipant@HarishS Include @HarishS in your post and this person will
be notified via email.Thanks Dave,
If there any thumb rule that gives what sigma levels to use for considering USL & LSL in case the sd dev is 100%, 80%, 50%, 25% and 10% more than the mean.
Look where I am coming from, Setting up org baselines with a high sd dev deviation in one area and a relatively moderate sd dev one in other.
Also computing with 3 sigma levels will look exaggarated.
Thanks in Advance
Harish S
0May 18, 2007 at 1:41 pm #156223Point 1 the confidence interval for Standard Deviation is computed
using a ChiSquared distribution. Go compute yours.Point 2 – Unless you can have a schedule variance of less than 0, what
you are seeing is a skewed distribution – common and expected when
measuring time. Go understand what this right skewed distribution
means. As far as specs, they have to do with the customer’s
expectations, not how big or small your standard deviation is.0May 18, 2007 at 3:11 pm #156230
WaskitaParticipant@ferrywaskita Include @ferrywaskita in your post and this person will
be notified via email.Harish,
I agree with Dave that with 10 data points, your confidence level will be very low. Thus; the conclusion withdrawn could be misleading since u don’t have enough data.
Below are some bullet points that could practically be useful for your project :High SD for 10 data points is selfexplanatory and very much expected. Thus; …. before constructing your base line, i suggest that you should first understand your data. Is there any outlier which is clearly visible from the run chart even without constructing control limit into the chart?
If yes, i strongly believe that it was due to a special cause and your job is to find out what could possibly be the cause as it will give a hint of some “low hanging fruits solution” that you could easily get.
My suggestion is to remove the outliers from the data points no matter what the cause is, coz it will give a wrong understanding about your baseline.
Once done properly, then you can now simply use the average figure of the remaining data points to derive a baseline. Control limit built from SD will also be more reliable despite the minimum number of data points available, since the outlier has been taken out.
When interpreting whether your improvement action items make a significant difference against the baseline, i personally suggest to still use 3SD for the control limit (remember : using base line data, even if you did nothing ….>99% of the future result will fluctuate within +/ 3 sigma bandwidth)
Hope it helps …0May 18, 2007 at 3:22 pm #156232Nonsense.1) Hign standard deviation for 10 pints is expected? Just the opposite
is true. Go simulate 100 samples of 10 from a known distribution and
tell me how many times the number came out unexpectedly high.2) Is it special cause or skewed data? Time data is expected to be
skewed resulting in a high Standard deviation.0May 18, 2007 at 3:26 pm #156233
Key QuestionParticipant@KeyQuestion Include @KeyQuestion in your post and this person will
be notified via email.The fkey question is: Were the 10 projects sampled randomly or do they constitute the total number of projects completed for the year. If they constitute the total number of projects completed, there is (1) no need to calculate a confidence interval, (2) the calculation of the standard deviation needs to be adjusted for populations, i.e. divide by n rather than n – 1. If the projects were not sampled randomly, a major assumption of the confidence interval will be violated and make it useless. If they were sampled randomly, the confidence interval will be large to the point of being useless for predictions. Either way, any statistical calculations based on confidence intervals seem to yield little useful information.
0May 18, 2007 at 3:37 pm #156235
Other questionParticipant@Otherquestion Include @Otherquestion in your post and this person will
be notified via email.Control limit built from SD will also be more reliable despite the minimum number of data points available, since the outlier has been taken out.
How will you calculate control limits from an individual’s chart based on a standard deviation? Also, 10 data points are a very small number for a control chart that is robust enough to allow you to determine special cause. If this is a variable’s chart your number of groups is even further reduced.
If you have 10 projects this year and 10 projects next year, and you compare the time to completion next year, there is no need for estimation of either standard deviation, confidence interval or control limits… the data are what they are.0May 18, 2007 at 4:29 pm #156242
WaskitaParticipant@ferrywaskita Include @ferrywaskita in your post and this person will
be notified via email.Good discussion but too bad ….. bad choice of words.
I believe Gary is a master in Six Sigma but it will be best if you don’t close your minds to any possibility.
Nonsense? Sure or not? …
Simply from SD formula =((xxbar)^2/(n1))^0.5 ….we can straight away see that the lower the n (number of samples), the smaller the denominator, and bigger SD value will be derived. Simple math!
But ofcourse it also depends on how widely spread the data is distributed (xxbar) that will affect the final SD result.
I just took a real data on hotel room rate for certain period of month (n = 1517) and take randomly 10 data points for room rate value. Well … sorry to say but SD derived for 10 data points is bigger.
0May 18, 2007 at 4:47 pm #156248Sorry for your mistake – go simulate the result 10,000 time and you will see that the calculated number is less than the known value over half the time (57% of the time last time I did it).
Your assumption about the formula is interesting but doesn’t work out.0May 19, 2007 at 4:13 am #156290
WaskitaParticipant@ferrywaskita Include @ferrywaskita in your post and this person will
be notified via email.Glad that you implicitly admit your mistake Gary ;)
Coz the word “non sense” leaves a strong impression that the possibility for higher SD is almost zero when n is low.
43% of the time is huge man … though it should be even higher if you do proper data collection.
Again, good discussion.
Thanks for the education and also keep on learning buddy!0May 19, 2007 at 4:46 pm #156306Quick question: in one post Gary said that the stdev is calculated based on a chisquare distribution. Did Gary take random samples from a chisquare distribution, and comes to his odd conclusions?
I would still like to know from the original post if the 10 projects were randomly sampled and from what population. Thanks!0May 21, 2007 at 7:47 am #156332
Harish SParticipant@HarishS Include @HarishS in your post and this person will
be notified via email.Dear All,
The discussion has raged on when I was on the week end.
Never the less thanks to all for their participation
Comming to the point – The data received from the 10 projects are the absolute data and no random sampling is done.
From the discussion what I gather is, wiat for 30 projects or so till you make a baseline which has statistical relevance.
In that case I will have to wait for 3 years – doesn’t make sense.
Thanks for Your Inputs in Advance.
Harish S0May 21, 2007 at 10:05 am #156335
WaskitaParticipant@ferrywaskita Include @ferrywaskita in your post and this person will
be notified via email.No Harish …
You’ve misunderstood messages fr all who have contributed in this topic.
No one said implicitly/explicitly to wait longer just to get a baseline. What they did was just trying to provide their technical expertise on how to get proper baseline in order to arrive at a proper conclusion
What past is past ….If you only 10 data points to reflect current performance, then be it! Don’t wait any longer to start improving. Along with times if the improvement actions makes a significant difference, the data will speak for itself.
Cheers,0May 21, 2007 at 1:21 pm #156343
questionParticipant@question Include @question in your post and this person will
be notified via email.Harish,
You are missing the point. You can use your ten data points to compare any future project against this baseline. You just don’t have to use inferential procedures because you are dealing with the population data. Track the data on a run chart. This way you avoid the pitfalls associated with control charting. You still get information about trending, clustering etc. Also, you can transform any new project into a zvalue and compare it against the other 10 data points (beware of outliers though). Or, you simply compare the data against the range … or, or, or. Most importantly, understand the variation in your time to project completion. One way to do that is to debrief each project and track the key factors that impacted the length of the project. Qualitative analysis is equally informative.0May 31, 2007 at 12:04 pm #156766Sorry for doing a step back:
Is it correct to use sd and average to measure your baseline and define your target ?
From what I see it could be better, in your case, to deal with span.
Measure your baseline span, define your target span, measure your improvment.
Ciao.0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.