# Confidence Level Question

Six Sigma – iSixSigma Forums Old Forums General Confidence Level Question

Viewing 21 posts - 1 through 21 (of 21 total)
• Author
Posts
• #40240

peteandsheila
Participant

I am trying to ensure that a sampling of call types is truly representative of the population distribution.  What information do I need to determine the sample size needed?

0
#124336

Akram
Participant

try this website. scroll down to the bottom of the page and you will have the answer to your question.
http://www.sixsigmafirst.com/sampling.htm
Akram-Al Qiram.

0
#124342

Ken Feldman
Participant

You did not specify the type of data being collected.  Discrete and continuous data will utilize different calculations for computing sample size.  In either case, at a generic level, three things are needed to calculate the sample size:
1.  Estimated variation of the population
2.  Desired level of confidence
3.  Desired level of precision
A fourth variable might be population size if it is small.  That way we can account for a correction factor to take into consideration the limited population.

0
#124353

Kris Brazeal
Participant

To Darth’s point, there are some differences between discrete and continuous data but here is a little more information on population sampling that you may find useful.
Precision required in estimate?  This measures how accurate you want the estimate to be (ex. if we are measuring loan processing time, we may want our estimate to be within ± 1 day).
Amount of variation in characteristic?  This measures the current variation of the set of data (ex. we know from past measurements that the variation in the processing time, as measured by standard deviation, is 6 days).
Confidence level? This measures the confidence that we want the estimate to be in the specified accuracy (precision).  This is most often represented at a 95% confidence and is a constant number of 2 in the sampling formula.
Sample Size? The sample size is calculated from the following formula: n = (2s/∆)2  n=sample size, 2=constant based on confidence, s=standard deviation, and ∆=degree of precision required.

In our example, the required sample size is 144
n = (2×6/1)2 = 144

0
#124354

Dayton
Member

Kris,

Those are really some bold(ed) assertions.
Vinny

0
#124355

Kris Brazeal
Participant

Yeah…not sure what is going on with the font there.
Kris

0
#124358

Ken Feldman
Participant

Not to be picky…..but the z value for 95% is 1.96.  Why would you possibly recommend 2?  Fine, in the old days when we had to calculate by hand, slide rule or abacus it made it easier.  But today, why not do it correctly?  Secondly, everybody seems to forget that this formula holds for large populations.  Always keep that in the back of the mind just in case the population is not large.  Finally, your formula….or some version of it holds for continuous data.  Discrete data is a totally different calculation.  There that wasn’t too picky and certainly not insulting……

0
#124361

Kris Brazeal
Participant

Good points Darth and not too picky nor insulting at all.  I agree that it is 1.96, just chose to round it up as I don’t think it makes that much of a statistical difference.
I did forget to mention that the formula was for continuous however, so good catch.
Kris Brazeal

0
#124390

Participant

peteandsheila,
As you have referenced calls, it sounds like you are in a Service Industry rather than manufacturing. I am as well.
Could you identify the type of business you are in?
I am working for the most part in a call center environment in a transportation industry and would like swap lies with you.
John

0
#124394

VoteForPedro
Member

When determining precision in sample size calculation for continuous data, are there any mathmatical rules of thumb?  I would think one would view this relative to the std dev in the process…any thoughts?  Thanks.

0
#124398

peteandsheila
Participant

I am using discrete data – categories of calls.
Is the standard deviation used to calculate the sample size the standard deviation of the distribution of calls categorized?
I work at a bank, have much call center experience – so John – yes we could possibly swap lies.  One struggle is finding normal data.  Our goal is continuous growth – so our data very rarely normal.

0
#124408

Ken Feldman
Participant

First of all, forget about normal data if you are talking about most anything in a call center.  Time, which is the most common metric will be non normal because of the natural boundary of zero on one side.  No problem dealing with it but the statistics based on a normal distribution will likely not apply.  Furthermore, data dealing with categorized or discrete data will also not be normal since you are either measuring percentage or counts, both of which again, have a natural boundary of zero or 100% if percentages.
The “s.d.” for the discrete sample size calculations is totally different than for the continuous.  Since you are doing proportion of calls into categories, it is the s.d of a proportion (binomial) that is relevant.  The following formula is what is used.  A value of .5 is used for the p if you have no idea of the true proportion.  This gives you the largest, worst case sample size.
n = (z/delta)squared times p(1-p)

0
#124409

Ken Feldman
Participant

z = 1.96 if using 95%, delta is the precision you desire, often +-5% and p is .5 if unknown or whatever your value is if known.
I also have some bank/call center experience if you want to dive deeper offline.

0
#124485

Participant

I work with a customer support call center, in a company that is just beginning (2 Yrs) to venture down Six Sigma, Lean has been around a while in our operations group.
Ops applications are much more similar to the typical training in Lean/Six Sigma, so we on the Serve side are attempting to make the transition to understand how the different techniques and tools work in a service environment.
Having access to people who have dealt with the transition in the past will be very helpful.
John

0
#125060

Abul Faisal
Participant

First, how confident do you want to be that the sample you take will capture the true population defective rate.  That is usually 95% but may be 99% if it is real critical.
Second, you need to determine how precise you want your estimate to be.  When you make a statement about the population defective rate you will say that based on your sample you are 95% confident that the defective rate is X% plus or minus some percentage.  That percentage is your precision.  It may be 1%, 5% or anything you want.
Thirdly the percentage defective that you use in your calculations would be some estimate based on historical data or 50% if you know nothing about prior performance.
Keep in mind that the more confident you want to be, the more precise you want to be and the worse your population variation is, the larger the sample size required.  In all cases, discrete sample sizes will be quite large.
This was the advice given by Ken when I asked for help on the same subject  hope it will be useful to you too.

0
#125062

“Ken”
Participant

Abul,Thanks for your reference to my earlier response. To insure my guidance is accurately conveyed to this thread I would like to make a few adjustments to your suggestions.Requirements for Statistical Sampling Determination: You need the reference variation from the control group or the best estimate of such, you need the desired confidence for the claim, you need to know what practical difference you want to detect, you also need to know the desired power of the test, you will need to know whether the test is two or one-sided, and lastly you want to select samples from the population that are representative of that population.Larger samples are warranted as the desired power and confidence increase for fixed variation and practical difference, or for a fixed variation-power-confidence the diffrence to detect decreases.Some examples sampling using continuous measures:Given – One-Sided Test of Unpaird Means
Power = 90%
Confidence = 95%
Variation = 1 SDDifference to Detect N1 & N2
————————————–
0.5 86
1.0 22
1.5 11
2.0 7Given – One-Sided Test of Paired Means
Power = 90%
Confidence = 95%
Variation = 1 SDDifference to Detect N1 & N2
————————————–
0.5 37
1.0 11
1.5 6
2.0 4Given – Two-Sided Test of Unpaird Means
Power = 90%
Confidence = 95%
Variation = 1 SDDifference to Detect N1 & N2
————————————–
0.5 70
1.0 19
1.5 9
2.0 6Given – Two-Sided Test of Unpaird Means
Power = 90%
Confidence = 95%
Variation = 3 SDDifference to Detect N1 & N2
————————————–
0.5 758
1.0 191
1.5 86
2.0 49Given – Two-Sided Test of Unpaird Means
Power = 80%
Confidence = 95%
Variation = 3 SDDifference to Detect N1 & N2
————————————–
0.5 567
1.0 143
1.5 64
2.0 37Typical values for confidence and power for conservative studies are: Power = 90%, and Confidence = 95%.Typical values for confidence and power for exploratory studies are: Power >= 80%, and Confidence >=90%.= 80%, and Confidence >=90%.= 80%, and Confidence >=90%.=90%.=90%.Ken

0
#125116

Holly
Participant

All,
I will be more interested about how to take the samples. Let’s say we need to sample 200 from a population of 2000, 200 will be just randomly selected from 2000, or a regular sample size per day, or fixed size from each operator. This is more complicated than the calculation of sample size for it need our knowledge about the process. Just remind all that data will lead us to the right way and wrong way as well. I don’t see much discussion about this sampling technology.
Thanks a lot.

0
#125119

“Ken”
Participant

Thanks Holly,
I’ve posted many responses over the past few weeks on the subject of sample selection, or what some call the “sampling frame.”  In his day, Deming was especially insistent on insuring the sampling frame properly matched the problem or question at hand.
As you suggest, selecting a representative sample is good insurance against biased results.  While selecting the right number of samples is key to insuring statistical validity, so is selecting a representative sample.
Rather than provide answers in this posting, I would like to see how others on the forum achieve this critical element that links sample selection to process understanding.
What methods do you use to insure the samples you select are representative of the system at large?  What problems have you encountered in past work where improper sampling, (not including the improper size), provided you misunderstanding in the opreation of a process?
Regards,
Ken

0
#125120

Holly
Participant

Ken,
Maybe you can help me out but now you prefer to keep silent. Why?
Now we are tracking causes for a defect that is extremely high than last production, we just start the production after a few days’ shut down. Our line run continuously and 24 hours a day: 4 Brigades and 3 shifts per day. Our process is very complicated and long, now what we are trying to find out which area would be the root cause. We can track the defect back to process time, brigade, shift, machine. This is a very complex situation. So our sample strategy must link to our anlaysis about this defect otherwises the sampling will not show the answer. We are still working on.
Our process is too complex and it is no easy for me to make everybody understand.
Hope you can give me some breaking clues.
Thanks a lot. This is an urgent case in practice and i need your inputs now.

0
#125121

“Ken”
Participant

Hi Holly,
I wasn’t sure from your last note if you wanted to have a discussion, or you had a specific problem.  I’m trying hard not to give the others all of the answers, and allow them to participate.
It’s difficult for me to help you with such limited data and information.  The best I can do is give you some suggestions and ideas.  No guarantee doing this from the US.  I suppose you are in Asia, possibly China?  If so, your English is very good!
Some Questions:
1.  Did the problem occur before the shut-down?
2. If it occurred after the shut-down, get the maintenance logs and find out what was changed, repaired, or corrected, during the shut-down.
3. If problem occurred before the shut-down, then go to #4.
4. If problem occurred before the shut-down, or the maintenance investigation did not yield anything interesting, then collect equal samples by line and/or machine, however your production system is configured.
5. Assess the number of defects per unit, or the number of defective units by line or machine–try to identify which lines or machine provide the greatest contribution.
6. Narrow your focus to the selected machines within the key cause line that have the greatest chance of producing the defect.
7. Determine through observation whether the likely cause of the defect is due to mistakes in setting up and operating the machine, or due to the machine drifting during operation, or the settings for the machine are wrong.
8. If mistakes in set-up and operation are the cause of problem, then try to redesign the set-up/operation process so the operator cannot produce the mistake. (Mistake Proofing)
9. If machine operation settings are wrong, run small tests to find the better settings.  Reset equipment and track operation via a control chart.
10.  If machine is drifting during operation, then have maintenance repair the machine.
Just a few ideas.  I will be heading to bed in 30 minutes, but available until then if you have additional questions.
Ken

0
#125123

Holly
Participant

Ken,
Thanks a lot for your immediate reply. Good guess about me, yes, i am from China. I am curious about what character inherent my mails claiming i am from China. My strange English?
I understand your ideas well, that is the effort that we have done at the first place. But root cause is still out there and we are now keep tracing all the possilbe clues and collecting kinds of data. This is a long story.
If you really want to challenge your professional knowledge, please reach me at [email protected] and i will tell you more about the problem and the complicated process.
Good Night!
Holly

0
Viewing 21 posts - 1 through 21 (of 21 total)

The forum ‘General’ is closed to new topics and replies.