Confidence Level Question
Six Sigma – iSixSigma › Forums › Old Forums › General › Confidence Level Question
- This topic has 20 replies, 10 voices, and was last updated 16 years, 12 months ago by
Holly.
-
AuthorPosts
-
August 5, 2005 at 2:18 am #40240
peteandsheilaParticipant@peteandsheilaInclude @peteandsheila in your post and this person will
be notified via email.I am trying to ensure that a sampling of call types is truly representative of the population distribution. What information do I need to determine the sample size needed?
0August 5, 2005 at 5:32 am #124336try this website. scroll down to the bottom of the page and you will have the answer to your question.
http://www.sixsigmafirst.com/sampling.htm
Akram-Al Qiram.0August 5, 2005 at 12:53 pm #124342
Ken FeldmanParticipant@DarthInclude @Darth in your post and this person will
be notified via email.You did not specify the type of data being collected. Discrete and continuous data will utilize different calculations for computing sample size. In either case, at a generic level, three things are needed to calculate the sample size:
1. Estimated variation of the population
2. Desired level of confidence
3. Desired level of precision
A fourth variable might be population size if it is small. That way we can account for a correction factor to take into consideration the limited population.0August 5, 2005 at 2:11 pm #124353
Kris BrazealParticipant@Kris-BrazealInclude @Kris-Brazeal in your post and this person will
be notified via email.To Darth’s point, there are some differences between discrete and continuous data but here is a little more information on population sampling that you may find useful.
Precision required in estimate? This measures how accurate you want the estimate to be (ex. if we are measuring loan processing time, we may want our estimate to be within ± 1 day).
Amount of variation in characteristic? This measures the current variation of the set of data (ex. we know from past measurements that the variation in the processing time, as measured by standard deviation, is 6 days).
Confidence level? This measures the confidence that we want the estimate to be in the specified accuracy (precision). This is most often represented at a 95% confidence and is a constant number of 2 in the sampling formula.
Sample Size? The sample size is calculated from the following formula: n = (2s/∆)2 n=sample size, 2=constant based on confidence, s=standard deviation, and ∆=degree of precision required.In our example, the required sample size is 144
n = (2×6/1)2 = 1440August 5, 2005 at 2:14 pm #124354Kris,
Those are really some bold(ed) assertions.
Vinny0August 5, 2005 at 2:24 pm #124355
Kris BrazealParticipant@Kris-BrazealInclude @Kris-Brazeal in your post and this person will
be notified via email.Yeah…not sure what is going on with the font there.
Kris0August 5, 2005 at 3:03 pm #124358
Ken FeldmanParticipant@DarthInclude @Darth in your post and this person will
be notified via email.Not to be picky…..but the z value for 95% is 1.96. Why would you possibly recommend 2? Fine, in the old days when we had to calculate by hand, slide rule or abacus it made it easier. But today, why not do it correctly? Secondly, everybody seems to forget that this formula holds for large populations. Always keep that in the back of the mind just in case the population is not large. Finally, your formula….or some version of it holds for continuous data. Discrete data is a totally different calculation. There that wasn’t too picky and certainly not insulting……
0August 5, 2005 at 3:11 pm #124361
Kris BrazealParticipant@Kris-BrazealInclude @Kris-Brazeal in your post and this person will
be notified via email.Good points Darth and not too picky nor insulting at all. I agree that it is 1.96, just chose to round it up as I don’t think it makes that much of a statistical difference.
I did forget to mention that the formula was for continuous however, so good catch.
Kris Brazeal0August 5, 2005 at 8:37 pm #124390
TierradentroParticipant@johnInclude @john in your post and this person will
be notified via email.peteandsheila,
As you have referenced calls, it sounds like you are in a Service Industry rather than manufacturing. I am as well.
Could you identify the type of business you are in?
I am working for the most part in a call center environment in a transportation industry and would like swap lies with you.
John0August 5, 2005 at 10:34 pm #124394
VoteForPedroMember@VoteForPedroInclude @VoteForPedro in your post and this person will
be notified via email.When determining precision in sample size calculation for continuous data, are there any mathmatical rules of thumb? I would think one would view this relative to the std dev in the process…any thoughts? Thanks.
0August 6, 2005 at 3:26 am #124398
peteandsheilaParticipant@peteandsheilaInclude @peteandsheila in your post and this person will
be notified via email.I am using discrete data – categories of calls.
Is the standard deviation used to calculate the sample size the standard deviation of the distribution of calls categorized?
I work at a bank, have much call center experience – so John – yes we could possibly swap lies. One struggle is finding normal data. Our goal is continuous growth – so our data very rarely normal.0August 6, 2005 at 1:17 pm #124408
Ken FeldmanParticipant@DarthInclude @Darth in your post and this person will
be notified via email.First of all, forget about normal data if you are talking about most anything in a call center. Time, which is the most common metric will be non normal because of the natural boundary of zero on one side. No problem dealing with it but the statistics based on a normal distribution will likely not apply. Furthermore, data dealing with categorized or discrete data will also not be normal since you are either measuring percentage or counts, both of which again, have a natural boundary of zero or 100% if percentages.
The “s.d.” for the discrete sample size calculations is totally different than for the continuous. Since you are doing proportion of calls into categories, it is the s.d of a proportion (binomial) that is relevant. The following formula is what is used. A value of .5 is used for the p if you have no idea of the true proportion. This gives you the largest, worst case sample size.
n = (z/delta)squared times p(1-p)0August 6, 2005 at 1:20 pm #124409
Ken FeldmanParticipant@DarthInclude @Darth in your post and this person will
be notified via email.Forgot to add:
z = 1.96 if using 95%, delta is the precision you desire, often +-5% and p is .5 if unknown or whatever your value is if known.
I also have some bank/call center experience if you want to dive deeper offline.0August 8, 2005 at 12:43 pm #124485
TierradentroParticipant@johnInclude @john in your post and this person will
be notified via email.I work with a customer support call center, in a company that is just beginning (2 Yrs) to venture down Six Sigma, Lean has been around a while in our operations group.
Ops applications are much more similar to the typical training in Lean/Six Sigma, so we on the Serve side are attempting to make the transition to understand how the different techniques and tools work in a service environment.
Having access to people who have dealt with the transition in the past will be very helpful.
John0August 15, 2005 at 7:33 am #125060
Abul FaisalParticipant@Abul-FaisalInclude @Abul-Faisal in your post and this person will
be notified via email.First, how confident do you want to be that the sample you take will capture the true population defective rate. That is usually 95% but may be 99% if it is real critical.
Second, you need to determine how precise you want your estimate to be. When you make a statement about the population defective rate you will say that based on your sample you are 95% confident that the defective rate is X% plus or minus some percentage. That percentage is your precision. It may be 1%, 5% or anything you want.
Thirdly the percentage defective that you use in your calculations would be some estimate based on historical data or 50% if you know nothing about prior performance.
Keep in mind that the more confident you want to be, the more precise you want to be and the worse your population variation is, the larger the sample size required. In all cases, discrete sample sizes will be quite large.
This was the advice given by Ken when I asked for help on the same subject hope it will be useful to you too.0August 15, 2005 at 8:12 am #125062Abul,Thanks for your reference to my earlier response. To insure my guidance is accurately conveyed to this thread I would like to make a few adjustments to your suggestions.Requirements for Statistical Sampling Determination: You need the reference variation from the control group or the best estimate of such, you need the desired confidence for the claim, you need to know what practical difference you want to detect, you also need to know the desired power of the test, you will need to know whether the test is two or one-sided, and lastly you want to select samples from the population that are representative of that population.Larger samples are warranted as the desired power and confidence increase for fixed variation and practical difference, or for a fixed variation-power-confidence the diffrence to detect decreases.Some examples sampling using continuous measures:Given – One-Sided Test of Unpaird Means
Power = 90%
Confidence = 95%
Variation = 1 SDDifference to Detect N1 & N2
————————————–
0.5 86
1.0 22
1.5 11
2.0 7Given – One-Sided Test of Paired Means
Power = 90%
Confidence = 95%
Variation = 1 SDDifference to Detect N1 & N2
————————————–
0.5 37
1.0 11
1.5 6
2.0 4Given – Two-Sided Test of Unpaird Means
Power = 90%
Confidence = 95%
Variation = 1 SDDifference to Detect N1 & N2
————————————–
0.5 70
1.0 19
1.5 9
2.0 6Given – Two-Sided Test of Unpaird Means
Power = 90%
Confidence = 95%
Variation = 3 SDDifference to Detect N1 & N2
————————————–
0.5 758
1.0 191
1.5 86
2.0 49Given – Two-Sided Test of Unpaird Means
Power = 80%
Confidence = 95%
Variation = 3 SDDifference to Detect N1 & N2
————————————–
0.5 567
1.0 143
1.5 64
2.0 37Typical values for confidence and power for conservative studies are: Power = 90%, and Confidence = 95%.Typical values for confidence and power for exploratory studies are: Power >= 80%, and Confidence >=90%.= 80%, and Confidence >=90%.= 80%, and Confidence >=90%.=90%.=90%.Ken0August 16, 2005 at 1:13 am #125116All,
I will be more interested about how to take the samples. Let’s say we need to sample 200 from a population of 2000, 200 will be just randomly selected from 2000, or a regular sample size per day, or fixed size from each operator. This is more complicated than the calculation of sample size for it need our knowledge about the process. Just remind all that data will lead us to the right way and wrong way as well. I don’t see much discussion about this sampling technology.
Awaiting for your help.
Thanks a lot.
0August 16, 2005 at 3:07 am #125119Thanks Holly,
I’ve posted many responses over the past few weeks on the subject of sample selection, or what some call the “sampling frame.” In his day, Deming was especially insistent on insuring the sampling frame properly matched the problem or question at hand.
As you suggest, selecting a representative sample is good insurance against biased results. While selecting the right number of samples is key to insuring statistical validity, so is selecting a representative sample.
Rather than provide answers in this posting, I would like to see how others on the forum achieve this critical element that links sample selection to process understanding.
What methods do you use to insure the samples you select are representative of the system at large? What problems have you encountered in past work where improper sampling, (not including the improper size), provided you misunderstanding in the opreation of a process?
Regards,
Ken0August 16, 2005 at 3:38 am #125120Ken,
Maybe you can help me out but now you prefer to keep silent. Why?
Now we are tracking causes for a defect that is extremely high than last production, we just start the production after a few days’ shut down. Our line run continuously and 24 hours a day: 4 Brigades and 3 shifts per day. Our process is very complicated and long, now what we are trying to find out which area would be the root cause. We can track the defect back to process time, brigade, shift, machine. This is a very complex situation. So our sample strategy must link to our anlaysis about this defect otherwises the sampling will not show the answer. We are still working on.
Our process is too complex and it is no easy for me to make everybody understand.
Hope you can give me some breaking clues.
Thanks a lot. This is an urgent case in practice and i need your inputs now.0August 16, 2005 at 5:59 am #125121Hi Holly,
I wasn’t sure from your last note if you wanted to have a discussion, or you had a specific problem. I’m trying hard not to give the others all of the answers, and allow them to participate.
It’s difficult for me to help you with such limited data and information. The best I can do is give you some suggestions and ideas. No guarantee doing this from the US. I suppose you are in Asia, possibly China? If so, your English is very good!
Some Questions:
1. Did the problem occur before the shut-down?
2. If it occurred after the shut-down, get the maintenance logs and find out what was changed, repaired, or corrected, during the shut-down.
3. If problem occurred before the shut-down, then go to #4.
4. If problem occurred before the shut-down, or the maintenance investigation did not yield anything interesting, then collect equal samples by line and/or machine, however your production system is configured.
5. Assess the number of defects per unit, or the number of defective units by line or machine–try to identify which lines or machine provide the greatest contribution.
6. Narrow your focus to the selected machines within the key cause line that have the greatest chance of producing the defect.
7. Determine through observation whether the likely cause of the defect is due to mistakes in setting up and operating the machine, or due to the machine drifting during operation, or the settings for the machine are wrong.
8. If mistakes in set-up and operation are the cause of problem, then try to redesign the set-up/operation process so the operator cannot produce the mistake. (Mistake Proofing)
9. If machine operation settings are wrong, run small tests to find the better settings. Reset equipment and track operation via a control chart.
10. If machine is drifting during operation, then have maintenance repair the machine.
Just a few ideas. I will be heading to bed in 30 minutes, but available until then if you have additional questions.
Ken0August 16, 2005 at 9:02 am #125123Ken,
Thanks a lot for your immediate reply. Good guess about me, yes, i am from China. I am curious about what character inherent my mails claiming i am from China. My strange English?
I understand your ideas well, that is the effort that we have done at the first place. But root cause is still out there and we are now keep tracing all the possilbe clues and collecting kinds of data. This is a long story.
If you really want to challenge your professional knowledge, please reach me at [email protected] and i will tell you more about the problem and the complicated process.
Good Night!
Holly0 -
AuthorPosts
The forum ‘General’ is closed to new topics and replies.