Message: 26712
Posted by: Hemanth
Posted on: Tuesday, 29th April 2003
Hi all,
I wondered when they said its difficult to teach in a classroom, something which you knew ,..now I know what they meant..
I am going to conduct a session on confidence interval during the second week of GB training..I am not sure how to put forward the concepts of confidence interval and degree of confidence to someone who is not comfortable with math..
any ideas on this front would be greatly helpful
thanks in advance
Hemanth
Message: 26744
Posted by: Gabriel
Posted on: Tuesday, 29th April 2003
You don't need the math to put forward the concepts.
The confidence interval is the range where you expect something to be. By saying "expect" you live open the possibility of being wrong. The degree of confidence measures the probability of that expectation to be true.
The degree of confidence is linked with the width of the confidence interval. It's easy to be very confident that something will be within a very wide range, and vice versa. Also, the ammount of information (typically related with the sample size) has an influence on the degree of confidence and the width of the confidence interval. With more information you will be more confident that "the thing" will be within a given interval. Also, with more information, and keeping a given degree of confidence, you can narrow the interval.
Then you can finish with an example:
In a given city a survey is made. The question is: "Do you prefer Coke or Pepsi?" 60% answer Coke, and 40% answer Pepsi. So an estimation is that, in this city, 60% prefer Coke. Does it means that 60% of the population in this city prefer Coke? No unless the survey had been answered by all the population. However, you can be somehow "confident" that the actual proportion of people choosing Coke will be within some interval arround the 60% found in the sample. How confident? How wide is the interval?
If the survey is based on a sample of 100 persons, you can be 90% confident that the actual proportion of Coke will be between 52% and 68%. Also, you can be 99% confident that the actual proportion will be between 48% and 72% (for the same sample size, more confidence, wider interval).
If the survey had been on a sample of 1000 persons instead of 100, you could be 90% confident that the actual proportion is between 57.5% and 62.5% (compare with 52% and 68% for the same confidence with a sample of 100. Larger sample, narrower interval for the same degree of confidence). And you could be 99.99998% (let's say 100%?) confident that the actual proportiion will be between 52% and 68% (compare with a degree of confidence of 90% for the same interval with a sample of 100. Lrger sample, better degree of confidence for the same interval).
And all that without a single formula. Tell me if you find this useful.
Message: 26747
Posted by: Teo
Posted on: Tuesday, 29th April 2003
Very nice post
Message: 26768
Posted by: Hemanth
Posted on: Tuesday, 29th April 2003
Hi Gabriel,
This is very helpful, it really makes explaining look simpler, I will definitely incorporate your explanation, probably give them a note on confidence interval and deg of confidence based on your explanation..that way they can refer to it..
Many Thanks..
Hemanth
Message: 26840
Posted by: Dave Bastow
Posted on: Thursday, 1st May 2003
Gabriel,
In the example you gave, would you explain where the "52% and 68%" figures come from please.
"If the survey is based on a sample of 100 persons, you can be 90% confident that the actual proportion of Coke will be between 52% and 68%. Also, you can be 99% confident that the actual proportion will be between 48% and 72% (for the same sample size, more confidence, wider interval)."
Message: 26876
Posted by: George
Posted on: Thursday, 1st May 2003
An excellent explanation. But a couple of other things come in to play here, and as a teacher, one needs to be prepared for further questions. The confidence level is a measure of power (the degree to which I'm confident about these results). But how precise do we want to be (the degree to which the sample results would actually reflect the population results)? This is +/- number reported in Gallop Polls (e.g., 56% (+/- 3%). Power and precision have an inverse relationship for a given sample size: more power means less precision, and vice versa. The end result is a not-very-intuitive statement like, "I'm 95% confident that 63% (+/- 3%) of the population favors Coke over Pepsi", or "I'm 90% confident that 63% (+/- 2%) of the population favors Coke over Pepsi" (for a given sample size).
The next question to anticipate (already asked in this post) is "how do we know this (where do these numbers come from)?" The numbers are based on the the theoritical sampling distribution of the mean. If we randomly sample 100 people, how representative will the mean of the sample be compared to the population mean? There will always be sampling error involved, and random sampling insures that the this error will have a normal distribution. So if we randomly sampled 100 people 100 times and plot a frequency distribution, we would have a normal distribution of these sampling means: e.g., 95% of these means would fall with 1.96 standard deviations of the true population mean. If we increase the sample size to 200 and do the exercise again, we will increase both power and precision (but there will always be that internal trade off between the two) because there will be less sampling error (the opinion of 200 people will be more representative of the population than 100 people).
As Gabriel said, you don't need math - but you do need to have a feel for distributions. You can do this by running a couple of "monte carlo simulations" on your own: generate a 10 by 10 matrix of random numbers (say between 1 and 10) and take the means of each row. Plot these means in a frequency distribution. This is equivalent of randomly sampling 10 measures from a population of 100, 10 times. Like magic, you will see a normal distribution starting to form - but not one that you'd expect. Your raw data ranges from 1 to 10, but your sampling mean data will have a much smaller range, something like 4.5 to 5.5. Do it 2 or 3 times and you'll be able to take your distribution to a table of t-distributions and calculate power and precision for yourself. I've done this in classes I've taught: generate a 10 by 10 matrix on a piece of paper for each student. Have them calculate the means for each row (a couple of minutes time). Then start plotting their results on the black board for everyone to see. The results will amaze your students - very powerful stuff. This is also a good way to introduce the concepts of standard deviations and variance without the math.
Message: 26896
Posted by: Ulises
Posted on: Thursday, 1st May 2003
I agree !!! It couldn´t be clearer. Congratulations !!
Ulises
Message: 26919
Posted by: Gabriel
Posted on: Friday, 2nd May 2003
All the numbers are from the binomial distribution. With samples sizes of 100 and 1000, also the normal aproximation to the binomial distribution can be used with average p (the proportion in the population) and standard deviation sqrt(n*p*(1-p)), where n is the sample size.
Then, if you want the 90% confidence interval, you search the p needed to have 5% of the area at the left (lower confidence interval limit) and the p value to leave 5% of the area at the right (upper confidence interval limit).
Message: 32208
Posted by: philares Lin
Posted on: Monday, 1st September 2003
Dear Gabriel,
applying formula: p+/-1.644sqrt(p(1-p)/n) while 90% confidence level, I am not sure whether it is suitable for calculation about your data shown on april 29 2003. I am wondering that it is missing something what you were discsuuing. May you make me know, a.m. formula. Is it suitable for your case calculation, thanks in advanced.
Rgds, Philares
Message: 32210
Posted by: Gabriel
Posted on: Monday, 1st September 2003
Already answered here:
http://www.isixsigma.com/forum/showmessage.asp?messageID=26919
I didn't use p+/-1.644sqrt(p(1-p)/n). I used the BINOMDIST() in Excel, which uses the formula for the binomial distribution.
However, with sample sizes of 100 and 1000 and a min(p;1-p)=0.4 you have 40 and 400 occurrences in the sample. In such a case, the normal distribution is a very good approach to the binomial, and can be used safely.
Message: 32304
Posted by: philares
Posted on: Wednesday, 3rd September 2003
Tks! I got it.
Message: 34190
Posted by: Connie
Posted on: Sunday, 12th October 2003
If you think these explanations really explain a confidence interval you are misinformed. I have read all the posts and am no clearer to understanding confidence intervals than ever. The language is English, but it's an English only you people understand.
Message: 34226
Posted by: Gabriel
Posted on: Monday, 13th October 2003
Connie,
1) I accept your opinion. Any sugestion about how to improve would be greatly appreciated together with your critic.
2) It was just a base for a face to face training. The trainer must tune the approach on line by evaluationg the response of the trainees. In the classroom you would be invited to stop the teacher and ask whenever you fail to understand something (or the trainer fails to make you understand).
3) The expected customer of the post was the trainer who understood about CIs, but wanted to know how to approach an explanation without the formulae associated with them. It was not a "Confidence Intervals For Dummies" approach (not that I am qualifying you as a dummy, just using an analogy to the "XXX for Dummies" series of books that aim to explain complex things in simple language, accessible to about everybody)
Message: 34234
Posted by: Connie
Posted on: Monday, 13th October 2003
Gabriel,
That's exactly what I need, lol, an explanation of confidence intervals for dummies. I am taking a biostatistics course and though I got an A in statistics, for some reason I cannot get a clear understanding of why a point interval of 97% does not make the null hypothesis true if the 95% confidence interval is between 0.4 and 1.6.
Any help would be appreciated.
Connie
Message: 34236
Posted by: Gabriel
Posted on: Monday, 13th October 2003
Ok, I will try to help, but I don't understand your question:
95% is a point estimation of what?
What is the null hypothesis that "was not made true", and what do you mean with "make the null hypothesis true"? And what was the alternate hypothesis?
(0.4; 1.6) is the confidence interval of what? (it does not look compatible with a poit estimation of 97%)
Could you give a more detailed explanation of the problem, as to put things in context?
(By the way, I never got an A in statistics, but I'll see what can I do anyway) :)
Message: 45811
Posted by: V.RAJENDRAN
Posted on: Monday, 10th May 2004
Gabriel,
In the example you gave, would you explain where the "52% and 68%" figures come from please.
"If the survey is based on a sample of 100 persons, you can be 90% confident that the actual proportion of Coke will be between 52% and 68%. Also, you can be 99% confident that the actual proportion will be between 48% and 72% (for the same sample size, more confidence, wider interval)."
Message: 45817
Posted by: sandeep koul
Posted on: Monday, 10th May 2004
Dear V.Rajendran
How "wide" you have to cast your "net" to be sure of capturing the true population parameter. If my estimate of defects is 10%, I might also say that my 95% Confidence Interval is plus or minus 2%, meaning that odds are 95 out of 100 hundred that the true population parameter is somewhere between 8 and 12%.
In short we can say, the confidence we have in assuming the values/figures which are epected after the complete experiment /Project assumed during the initial stages of the project/Expt.
Regards
Sandeep Koul
Message: 56049
Posted by: Joe T
Posted on: Thursday, 30th September 2004
This is a great post - but I have a follow up question as it relates to the importance of the actual survey answers to determining the confidence levels.
Why would the sample percentage as to the response matter in determining the confidence interval?
i.e why does it change if the sample selects A 50% of the team versus if the sample selects A 90% of the time?
I understand the math (mostly) but need a common sense response to explain it
Message: 81194
Posted by: ron storteboom
Posted on: Wednesday, 12th October 2005
I perform validation testing of new product designs. We are currently developing a fuel sender unit for use in heavy trucks. One of our customers sent us a specifications that states the following, Sender shall be subjected to 4,000,000 sweep cycles at +23°C and simulate 1.2 million miles with 95% Reliability (B10 Life) and 95% Confidence".
I am supposed to figure out the required sample size. Could you please help? I would like a rather detailed explanation with entries that one would make into Excel to calculate this. Or, a software package recommendation.
thx,
ron
Message: 81242
Posted by: GomezAdams
Posted on: Thursday, 13th October 2005
Using the Binomial , the number of units needed would be 58.
You would need to run these 58 units , without failure for the 4 Mill sweep cycles each (which simulates the 1.2 Mill mile each).
If you were to allow 1 failure , you would need 99 units. 2 failures , 124 units.....
In the auto industry we have tried to get away form these type of archaic test requirements. What do they tell you ? If you completed the test and all passed , it simply means that all have passed. Youdo not know by how much.
If your competition is a concern , how do you know that you are not providing an overdesigned product? What is the cost associated with this?
More and more , testing is going to failure.
Propose the following :
1)Budget for 58 units.
2)Test 10 units , minimally 3 of which should be tested to failure (for variability assessment) , 5 better. Plot the Weibull. Assess the distribution , view the plot with bounds and determine whether or not the requirements have been met.
If the Weibull conveys that the requirements have been met, you are a hero! You just saved 48 units at prototype costs.
If not re-desssign and re-test 10 more units...
The test to failure scenario allows you to perform a degradation analysis. You can quantify how much you pass by and can estimate more accurately the reliability over the "useful life".
I would recommend obtaining the following :
From AIAG - The Reliability Methods guideline (THE-7). It has many useful examples w/r to sample planning , life data analysis , degradation analysis , and many other processes and tools. It is approx. $50.00.
From Reliasoft - Weibull ++7. Multidistribution fit , degradation analysis , simulation , test planning. It is approx. $750.00 and worth every penny.
Message: 84139
Posted by: moshrik
Posted on: Wednesday, 30th November 2005
hello
really it was wonderful explanation but i want to ask u some thing iam medical graduate and in medical statics they told us that if confidence intervals of multiple samples are overlapped so this means that samples are the same and not different in their measurments i mean the mean for example if we take three samples for checking blood pressure and their confident intervals were the same that mean the blood pressur is the same in the three sampls please can u explain what does overlap in confident intervals mean thanks again.
Message: 85381
Posted by: Tie
Posted on: Wednesday, 21st December 2005
Gabriel
How can I get your figures from minitab? I mean, if I know that sample size is 100, confidence level is 0.90 and mean is 60, how do I get the confidence interval?
Message: 88383
Posted by: AndyC36
Posted on: Tuesday, 14th February 2006
Hi Ron, sounds like I am in a very similar boat.
I have been presented with a table which shows Reliability Level, Confidence Level, Number of Units (Sample) and Failures allowed.
e.g 97.5% reliability at 95% confidence is provided by 120 units tested with zero failures.
What I am looking for is the equation from which I can verify the sample size. Put simply I have a problem with believing things without being able to test them for myself!
Did you get sufficient help to allow you to do this? I would be very grateful if you can share with me. Tx Andy.
Message: 88400
Posted by: ron storteboom
Posted on: Tuesday, 14th February 2006
Try looking at SAE Technical Paper Series 2005-01-1776. I think it gives a pretty detailed step-by-step.
thx,
ron
Message: 88853
Posted by: Suresh Babu
Posted on: Wednesday, 22nd February 2006
Hai all,
This discussion chain sounds me good to post my queries.
I am doing rainfall runoff modeling work for my Phd work.
I have proposed a new model to predict runoff, and aims to compare it with observed runoff. In this regard, I need to find out confidence interval for observed runoff data and to verify whether my estimated runoff falls in that. Is this sounds good? or I am doing anything wrong in that.?
If I am in the right track, then, I heard that first I have to find which frequency distribution the observed runoff data falls, before going for confidence interval estimate. Based on the distribution only, I can go further for confidence interval. Is it so? How i can find the freq. dist. of obs. runoff.?
I need yours detailed step by step procedure to do so. And also I need to know is there any better ways of comparing the computed and observed runoff of a model.?
Thanks in advance.
Suresh babu.
Message: 89623
Posted by: Jonh
Posted on: Monday, 6th March 2006
HI:
Can you show me the calculations of this analysis.
Thanks
Message: 94209
Posted by: Seebeck
Posted on: Wednesday, 24th May 2006
This is an extremely helpful thread. Just thought I'd let y'all know that.
Message: 96545
Posted by: Ed
Posted on: Tuesday, 4th July 2006
The basic concept everyone needs to grasp is the normal distribution. This says that if your sample is large enough, your data will form a bell-shape around the mean (average) point that is equally distributed on both sides.
The standard tables that are widely available give you information regarding a standard normal distribution (with mean 0 and standard deviation 1 I think.) This means you have to transform this standard shape to fit your data. Another popular approximation is a t-table although this approximates to the normal distribution when samples are large.
To obtain a confidence interval, the level of confidence is decided (usually 95 %.) This means that 5% or 0.05 will fall outside of this value. This is divided by 2, because 2.5% will lie above the highest value and 2.5% will lie below the lowest value (it’s symmetrical.)
The z-value is then obtained through the z-table using the tolerance of 0.025. This can be obtained from http://davidmlane.com/hyperstat/z_table.html with an easy calculator.
This value is then applied to the following calculation to get your upper and lower limits of confidence:
Mean +/- [‘z-value’ x (standard deviation / square root of number of samples)]
So for a sample of 100 that has standard deviation of 5 and mean of 3, with confidence at 95% or z-value of 1.96, this becomes:
3 +/- [1.96 x (5/10)] = 3 +/- 0.98
So there is a 95% confidence that the actual population mean falls between 2.02 (which is 3-0.98) and 3.98.
Hope I haven’t put anyone to sleep.
Ed
Message: 96557
Posted by: Darth
Posted on: Tuesday, 4th July 2006
Can confidence intervals be calculated for distributions other than the normal distribution? Do large sample sizes form this normal distribution? That is, if I take a single sample of 50, will the individual data points form a normal distribution as you seem to indicate? If I now take one sample of size 150 will that now form a normal distribution? Does the t distribution approximate a normal distribution when samples are large as you state or small? I'm confused.
Message: 96751
Posted by: Ed
Posted on: Friday, 7th July 2006
The t-distribution is mainly for when you have a small sample size (you measure something 8 or 9 times.) I'm not sure what the exact number is, but if you have a large enough sample, you will get a normal distribution. This is only true for something that is truly random and based around a mean (average.) For example if it takes the average McDonalds worked 1 minute to make a burger and you record the times of 50 workers making a burger, the results will have a normal distribution with a mean of 1 minute (the top of the bell would happen at the 1 minute mark on the bottom axis and curve down on both sides from there.) The standard deviation is a measure of what shape the bell actually is (skinny/wide, tall/short...)
Message: 96772
Posted by: Hans
Posted on: Sunday, 9th July 2006
Suresh,
If I understand your research proposal correctly, you are not dealing with a question that requires "confidence intervals" to test your hypothesis that an observed distribution fits a theoretical distribution. What you need is a simple goodness-of-fit statistic via for example a chi-square or Kolgomorov Smirnov algorithm that tests the hypothesis that the observed and theoretical distributions differ (alternative hypothesis) or don't differ from each other (null hypothesis). If you cannot reject the null hypothesis you assume that the distributions don't differ and publish your findings. Historically, the chi-square statistic of goodness-of-fit precedes the idea of the confidence interval by about 30 years. It was developed by Karl Pearson to classify different types of distributions at the turn of the century. The nicest program that I have found to run such as test is NCSS (Number Crunching for the Social Sciences) where you simply plug in the theoretical frequencies and the observed frequencies and the chi-square statistic with degrees of freedom is automatically calculated. The procedure is "hidden" under "descriptive statistics", "multinomial tests". I hope this helps.
Message: 96799
Posted by: Suresh Babu
Posted on: Monday, 10th July 2006
Dear Hans,
Thank you very much for your kind reply.
I have done already the test based on KS1 and Chisq. as you noticed in your reply. And Now I am proceeding in my research work writeups.
Once again my thanks to you sir. If you provide your contact info, I can discuss more on these statistics and have a plan to write a research paper on this with your guidance. I am at the stage of my thesis submission. Possible I may finish it with in two months.
Looking forward,
suresh babu.
Message: 96814
Posted by: Hans
Posted on: Monday, 10th July 2006
Send me an e-mail to morphologi
aol.com. I will see what I can do for you.
Message: 102507
Posted by: newtoo
Posted on: Tuesday, 10th October 2006
Hi, I'd been going through this thread and got very good insights in explaining confidence interval. In calculating sample size (Estimating Means), what is the best and most simple way to explain the "Specified Precision of the Etimate" (+/- D)? Say at 95% confidence interval and with a current specified std dev.? Thanks.
Message: 121444
Posted by: sarikatyagi
Posted on: Friday, 6th July 2007
Ron, Its probably too late for your purpose but I have worked with the most intuitive Six Sigma analysis tool - 'SPCXL'. Its economical plus wonderful for basic Six Sigma analysis (sample size, plotting graphs, confidence intervals and much more.). You can get a trial version free for about a couple weeks. Its worth exploring for anyone who doesnt need to get into the detailed statistics (calculations) & formulas for such calculations.
Thanks,
Sarika Tyagi
Message: 121445
Posted by: sarikatyagi
Posted on: Friday, 6th July 2007
Gabriel, this is an amazing explantion..exactly how I would teach folks and how I learnt from my Black belt training!
In short,
1) Confidence Interval can be performed if you are analyzing a random sample from your population
2) Confidence Interval is always a range which is impacted by following factors
a) Sample size: The larger the sample size, the smaller will be the Confidence Interval range
b) Confidence level: The lower the confidence level (90% or 95% etc), the smaller will be the confidence Interval
Message: 121446
Posted by: sarikatyagi
Posted on: Friday, 6th July 2007
Cont...
c) Standard Deviation: The lower my sample's standard deviation, the tighter (smaller) will be my Confidence Interval.
Thanks,
Sarika
Message: 121454
Posted by: anu
Posted on: Friday, 6th July 2007
Hmmm
very bookish.......
Message: 121456
Posted by: sarikatyagi
Posted on: Friday, 6th July 2007
Yep, thats true but it will still help you in gauging your trade offs (whether you want to go with smaller sample size or higher confidence level) when you are actually calculating the confidence interval.
Message: 121478
Posted by: hacl
Posted on: Saturday, 7th July 2007
The best way is have seen for demonstrating a confidence interval is with the dynamic illustrations using JMP. You basically tell JMP some information like the true population mean and the sigma, as well as sample sizes, alpha level, etc. It then draws 100 separate samples from this known distribution and computes the confidence interval for each of the 100 samples and generates a plot as it goes along. You will see that approximately 95 time out of 100, the confidence interval will encompass the true mean that you specified.
http://www.jmp.com/support/downloads/jmp_scripting_library/education/index.shtml
Message: 121480
Posted by: JMP question
Posted on: Saturday, 7th July 2007
hacl,
does JMP have an option to store the 100 samples so that one could compare individual samples against the 100 samples. In the past, I have done this in Minitab by hand, and as you can imagine, that is quite a tedious task. Thanks!
Message: 121481
Posted by: hacl
Posted on: Saturday, 7th July 2007
When I run the JMP script, I don't see an option for saving a data table for the 100 samples. The is easily done, not by me however!!
If you search on "Predictum", you will locate a resource who could enhance this script where it could generate a data table in addition to the dynamic illustration.
I am not quite sure what you mean by individual samples against the 100 samples though. Are you just indicating that you want to see the raw data for all 100 samples?
Message: 121482
Posted by: JMP question
Posted on: Saturday, 7th July 2007
Thanks for the response. Yes, what I would like to see are the individual 100 samples stored in columns so that I could run additional statistics on the individual samples and compare samples to samples and samples to the population.
Message: 121485
Posted by: Bower Chiel
Posted on: Sunday, 8th July 2007
I sometimes use the following demo in training sessions. I generate and print a random sample for each member of the group from, say, the normal distribution with mean 53 and standard deviation 4 using Minitab. For fun I bring a sealed envelope into the room with the mean written on a slip of paper enclosed. I then ask each member of the group to compute both 50% and 95% confidence intervals for the mean of the population sampled (If Z is to be used then I reveal the population standard deviation.) The sealed envelope is then opened and we then check how many of each set of intervals actually capture the 53. With a group of 20 one would expect only one of the 95% C.I.s to fail to capture 53 but you may well find no fails with such a group. Although I've only ever seen one reference to 50% confidence intervals in practice they do provide a useful demonstration in this context - and finding ths appropriate value of z or t to use in the computations is good reinforcemet for understanding.Bower Chiel
Message: 148688
Posted by: Miguel
Posted on: Thursday, 6th November 2008
Hello gabriel could you explain me the difference and relationship between Standard deviation and Standard error.
I ask you, because I have been reading your explanations in other themes, and you gave very clear and understanding explanations. I am a dummie in mathRegards
Miguel