iSixSigma

Sample size…Why 30?

Six Sigma – iSixSigma Forums Old Forums General Sample size…Why 30?

This topic contains 102 replies, has 86 voices, and was last updated by  Karim 9 years, 9 months ago.

Viewing 100 posts - 1 through 100 (of 103 total)
  • Author
    Posts
  • #29909

    DT
    Participant

    We recently concluded a GB training and the question of why a sample size of 30 was suitable and where it came from.  (What if my population was less than this or was destructive in nature?)  Im lost as there is conflicting data.  One reference says determining the sample size depends on (1) the level of confidence (2) margin of error tolerated, and (3) variability in population studied.  Another says…
    n=(z*s/E)squared. These do not take into consideration the population do they?  If I am faced with a transactional example and had 100000 accounts (population) that were in default and wanted to sample it to determine how many had credit scores less than 615 what sample size would be reflective of the population without offsetting costs by time spent gathering data. 
    When i pose this question (thinking 30 may not reflect my population) I remain unsatisfied.  Im told that its due to large samples related to the central limit theorm or also that a typical run chart that is in control is stable after 30 data points and thats why 30 is used.  Cant find this in any reference materials…can anyone help??  Please provide an example if needed…I need laymans terms! as I am not too statistically inclined.

    0
    #77364

    Sambuddha
    Member

    DT,
    You ask a very good question. There are various responses based on situation, tool you are using, type of data.
    One reference says determining the sample size depends on (1) the level of confidence (2) margin of error tolerated, and (3) variability in population studied. 
    The above reference is right in general. Formulae for sample size calculation vary depending on the test you are going to conduct.
    The parametsrs/issues you need to address are:

    Type of test e.g. 2 sample T, Z, ANOVA etc
    Standard Deviation (variability) of the process
    Delta that is significant in distinguishing 2 or more effects
    Alpha (level of significance of the test)
    Power of the test (1-beta). Beta is the probability of type-II error
    Number of levels (ANOVA). In case you know how many levels/effects are you aiming to distinguish.
    Sample size
    The interesting part is that in Minitab (assuming you would use it) allows you to vary any 2 parameters from Delta, Power and Sample size for any given number of levels. Try Stat>Power and Sample Size>ANOVA or the tool you want to use.
    That lets you know the error (or lack there of, since you are measuring power) associated with each sample size and delta for any given setting. So you could make a trade-off study and see where your sweet spot lies. In cases where testing involves capital & consumables, this is a great tool. In your case you have the data. So it is not resource intensive that way. Still this is better than using 30 blindly.
    There is a reason 30 is widely used. It is a result of simulation studies involving the Central Limit Theorem. If you are interested in the “histroy”or reason for prevalence of 30 samples as aguidance, i could give you  a few pointers.
    I have a project that is similar in tool usage. There are quite a few neat things one could do with power and sample size studies.
    Good luck,
    Sambuddha 
     
     
     

    0
    #77365

    DT
    Participant

    I would be VERY interested in the pointers for history or relevance of the 30 to samples…if its easier to email…
    darrell_tomlin@bcgroup.com

    0
    #77366

    Sambuddha
    Member

    DT,
    Check your email. I have sent some information.
    Hope that helps.
    Best,
    Sambuddha

    0
    #77368

    Hrishi
    Participant

    Sambuddha:
    Hi. I am very new to this forum.
    Can you send me the information that you sent to DT on why a sample size of 30 is required ? I too am curious.
    Thanks.

    0
    #77369

    Sambuddha
    Member

    Hrishi,
    No problem. Post your email address. Either me or DT could email you.
    The reason, I cannot post it here is that it is a scanned picture attachment and it is easier perhaps to email.
    Best and welcome to this community,
    Sam

    0
    #77376

    Gabriel
    Participant

    Sambuddha
    You can attach it here and share it with all the forum. It would be great!
    Just click on the clip here at the right, where you read “Post/attach document”. It will lead you to send an email to iSixSigma with the attachment and they will post the attachment here!
    Thanks for sharing!

    0
    #77377

    DD
    Participant

    Sambuddha
    Yes as Gabriel says you can post it on his site. I am curious too.
    Thanks for sharing
    DD

    0
    #77379

    Sambuddha
    Member

    Gabriel, DD
    I thought of posting it here. Attaching was a small hassle. But looks like I have a bigger problem. It is a scanned picture of some graphs. And mea culpa, I cannot find the reference from where I took that from. I am buried amidst a bunch of books and I can’t find it.
    I have no problem sharing it with you all individually through email. But I am afraid if I post it in a public manner without credits, I might be in trouble for copyright violation for public distribution of intellectual property.
    The good news is that the following website illustrates the same thing.
    http://http://www.statisticalengineering.com/central_limit_theorem.htm
    Public domain is great, isn’t it? Hopefully that will satisfy your curiosity.The number 30 came as a result of simple sampling simulations from different parent populations (Uniform, Normal, Exponential, Triangular) and by the time the sample sizes reached 30-32, the distribution of the means started looking normal. That is the reason for the rule-of-thumb.
    I haven’t seen any theoretical explanation yet for that i.e. what is so special about 30 from analytical point of view. Shall let you all know if I come across anything to that effect.
    Hope it still helps.
    Best,
    Sambuddha

    0
    #77399

    aush
    Participant

    Sambudhha
    Can I also share the info on sample size of 30. I will appriciate you emailing me the same at
    piyush_a@hotmail.com
     

    0
    #77410

    Rajanga Sivakumar
    Participant

    Mr. Sambudhha,
    Could you share the sample size 30 with me too? Thanks
    email to rajanga@sify.com
    Rajanga
     

    0
    #77412

    Ted
    Member

    Making the assumption that even with the ability of most software to sort and count the numbers out of your population of 100,000 records you wish to sample, there are two questions you have to ask. How many do I take and what risk can I assume in making the wrong assumption from the statistic.
    A number of answers here address why 30 samples are needed to approximate a normal distribution allowing for the estimation based on the probabilities of the normal curve. However, once the mean and std dev have been estimated, and the cumulative probability found up to and including the critical limit you set. The second question comes into play, specifically how sensitive are you to making an error in assigning that proportion to your population.
    As an example, say 6% of your population was expected to fall below your cut off, how sensitive are you that the true proportion isn’t 7% or 8% or 10% etc. You would need to calculate the Beta risk of assuming the proportion at 6% given your original sample size and the statistics you calculated. If you utilize minitab (or other software perhaps) you can adjust the minimum sample size you need to take for the risk you choose. Under the power and sample size tab – 1 proportion test, you can enter both the calculated proportion (as a percentage) and the critical proportion, along with the level of risk (beta) and it will calculate the number of sample you need to take. Go back, resample to that level and run the calculation again to find the proportion defective (credit levels below your cut off), and rerun the beta again with the new numbers. The process is iterative until you are satisfied with number of samples vs the risk you are willing to assume. Therefore you might start out with a sample of 30, find that the beta risk is too high and have to take 400 samples, do so and recalculate and find that you actually need 435 etc etc. Others here might have a better way to adjust for risk and sample size without all the iterations but that’s the only way I’ve found to consistently do it.
    My other question for you however is what do you plan on using the data for. Be careful if the intent is to show that you get higher numbers of defaults with credit scores below a certain number using those accounts in default as your population for the hypothesis. Your choice of frame for the population would be wrong in that kind of test.
    hope that helps.

    0
    #77415

    Picklyk
    Participant

    Hi Sambuddha,
    Could you also share the sample size of 30 info with me as well?  Please email morwickj@aol.com.  Thanks!
    Jay
     

    0
    #77425

    zhou
    Participant

    Dear Sambuddha,
    I was only reading about this discussion topic from the newsletter link today, could you also send me in a separate email on the 30 sample size information as well? appreciate it.
    Best regards,
    Lawrence.

    0
    #77427

    Glenn Gooding
    Participant

    Sam,
     
    Along with a great many of our colleagues, I would be interested and grateful if you could let me have a copy of the information about the rationale of the 30pc smaple size.
    My e-mail address is: –
    glenn.gooding@bespak.co.uk
    regards
    Glenn
     

    0
    #77429

    Ja
    Participant

    Dear Sambuddha,
    Could you also send me in a separate email on the 30 sample size information as well? appreciate it. jamackall@statestreet.com
    Best regards,
    JA.

    0
    #77431

    Nicholas L. Squeglia
    Participant

    In layman’s terms, if you were to prepare graph on the basis of attribute data  letting sample size vs confidence, you will see that there is quite a difference from, for example, 2 to 30. This is not the optimum, but more of a minimum. 50 would be perhaps a better choice and is what Dorian Shainan used in his “lot plot” many years ago. Th slope of the curve increases after 30/50, but at a much lower rate.The central limit theorem is somewhat different, and relys on taking averages of data to show a normal, gaussian, distribution for control chart purposes although the underlying data is non-normal
    Nicholas L. Squeglia, author, Zero Acceptance Number (c=0) Sampling Plans

    0
    #77432

    CBetts
    Participant

    Sambuddha,
     
    Could you please forward the Sample size information to me as well.  cedric.betts@scotts.com   Thank you.
    Cedric

    0
    #77442

    Janet Hunter
    Participant

    I believe you are correct about the Central Limit Theorem, at least, as I recall from my statistics classes a few years ago. You may want to contact the local college and speak to one of the professors in the mathmatics department for further direction or confirmation.
     

    0
    #77444

    Mike Carnell
    Participant

    DT,
    I have not read the entire string so if some of this is redundant I apologize. Sam gave a good answer whaen he said it was different for different situations.
    Assuming that everything works off of 30 is incorrect.
    Frequently you will see variable control charts listed as a sample size of 25-30. They typicaly are speaking of a sample size of 25-30 groups of 5. That makes it 125-150 actual samples. It is the subgrouping giving you a distribution of averages (Central Limit Theorem) that makes it work.
    When you are doing hypothesis testing and using ANOVA the sensitivity of the test is extremely dependent on sample size. I was doing site support and found a guy who could not understand why his 2 sample t test was showing significance. He was sure it should not. His sample size was >400. The test was sensitive to < a .1 sigma shift.
    There are sample size implications with virtually every tool. 30 is not a catch-all particularly if you are working with attribute data.
    Good luck.

    0
    #77447

    Antero
    Participant

    Sam:
     
    Can also send me your e-mail response to the question dealing with a sample size of 30?

    0
    #77460

    Ron
    Member

    Don’t confuse population sampling with process sampling these are two very different animals.You need population variation, power etc when considering population sampling.When process sampling the purpose of taking 30 samples is to establish with reasonable certainty these issues and develop control limits. these limits remain constant unless significant changes are made to the process.It is common in SS training that the true essence of what is meant by the mathematics behind these issues are lost.Processs sampling also assumes a process that is in statistical control. If it is not stop fix the process then proceed. 

    0
    #77514

    Bahram
    Participant

    Dear Sambuddha,
    Could I trouble you to send me the information also?
    bahram.khyltash@mkg.com
    Thanks
    Bahram

    0
    #77515

    Dewayne
    Participant

    Sambuddha,
    I, too, would appreciate your sharing/sending the information on the sample size of 30. Thanks.
    dburns@verityinst.com

    0
    #77631

    NB
    Participant

    I have also not read all previous replys to this subject, but from my experience in statistics the reason why 30 or 31 has always been the magical number is because the students t distribution approaches the normal z distribution at 30 samples. Hope this helps.
    -NB

    0
    #77649

    H.Kirchhausen
    Participant

    Hi add all,
    it will be fine if you can send me also information or an example of the magic samplesize of 30!
    Thanks in advanced
    send it please to
    saolim@gmx.de
     
     
     

    0
    #77657

    sw
    Member

    Hello Sambuddha,
    Appreciate if you can email the 30sample size info to me, too.
    My email add is: swtan1@hotmail.com

    0
    #77711

    Ged Bryant
    Participant

    Want to test magic number 30. Take any group of people, good party trick. Bet any one present that two or more of the group will have the same birthday. Month and date. This has 98% confidence.

    0
    #77712

    O’Connell
    Participant

    I did that game in a training class once. It worked!
    But how does it work? I’d love to know so I can sound intelligent next time I do it :).

    0
    #78161

    julio jaime
    Participant

    Sambuddha:
    Hi. I am very new to this forum.
    Can you send me the information that on why a sample size of 30 is required ? I too am curious.

     
    Thks.

    0
    #88659

    Allen Jacque
    Participant

    I am interested in receiving the articles identified and originated from Sambuddah that address the sample size of 30 issue.
    My email address is ajacque@bkadvice.com

    0
    #88752

    Mark Chockalingam
    Participant

    There are several web references on the Central Limit theorem and 30 that are interesting.  When the sample size approaches 30, we don’t have to worry about the distribution of the population since it can be safely assumed to be normal for inference purposes.  Here are some references:
    http://www.mathwizz.com/statistics/help/help4.htm
    http://www.statisticalengineering.com/central_limit_theorem.htm
    Here is a little more technical article on normal distributions and central limit theorem.
    http://www.itl.nist.gov/div898/handbook/index.htm

    0
    #88755

    Thanachai
    Member

    Mr. Sambuddha.May you please send me the pointer of sample size of 30, I’m very curious to know.Thanachai S,thanachs@samarts.com

    0
    #88837

    Statistician
    Member

    Mr. Sambuddha,
    I am a statistician by profession and as far as I know, sample size is determined by margin of error allowed, the estimate for the population variance, the risk factors (level of confidence, power as a function of the OCC, etc.), and most importantly, the assumed distribution of the population (or the estimable function) in study.
    In my own experience, the magic number 30 is being used to approximate the normal distribution using the Central Limit Theorem, as used in regression analysis, factor analysis, etc., but not in sample size determination. 
    I am also curios with this article. Would you be kind enough to send me a copy, too? Also, would you happen to know/ recommend six sigma training centers in the Philippines?
    Thanks,
    Beryl
    CRomero@doleasia.com
     

    0
    #90196

    Vicki
    Member

    Hi Sambuddha,
    You have been inundated with requests for this information on n=30, I am also a statistician and would really appreciate this information!  Thanks.
     

    0
    #90200

    PH
    Participant

    I WOULD ALSO LIKE TO RECEVE THE INFROMATION / POINTERS.  CAN YOU EMAIL THEM TO ME AT THE ADDRESS BELOW,
    herrerap@sybrondental.com
    Thanks!
    PH

    0
    #90282

    Sinnicks
    Participant

    I have had the same question arise in the past. Mark L. Crossley wrote a good article titled “Size Matters: How Good Is Your Cpk Really?” located at http://www.qualitydigest.com/may00/html/lastword.html that seems to address your question quite well. When Mr. Crossley’s equations are rearranged you can look at plots of Cpk vs sample size with various lines of constant Cpk and specific confidence intervals. For example, after generating the curves, one is able to directly determine sample size required for a desired Cpk of 2.0 with 90% confidence. In playing with the equations it was interesting to note the confidence level obtained for a 2.0 Cpk using a common sample size of 30.
    I hope that helps.

    0
    #90663

    mcintosh
    Participant

    I would also like to recieve the information/ pointers about the sample size of 30. My email adress is feteris@hotmail.com
    thanks!
    Tom

    0
    #90933

    Stella
    Member

    Hi Sam, add me to the distribution list please! It is last for too long time, isn’t it? Thank you inadvance!
    huang.zhaohui@zte.com.cn

    0
    #93129

    Haim
    Participant

    Dear Sambudda,
    I too am interested in the “Why 30?” discussion.  Please e-mail me at:
    haim@thepalace.org
    Thank you,
    Haim

    0
    #93132

    Rocky Firth
    Member

    I would also like to see the information. I can post it to a web location for other as well.

    0
    #93145

    Mikel
    Member

    I don’t believe 30 came from simulations involving the CLT. Please post some backup to this assertion. Sounds like Dr. Mikel’s proof of the 1.5 shift.
    By the way, what advice do you give on choosing sample size when you are interested in reducing sigma instead of moving the mean?

    0
    #101346

    singh
    Member

    Please let me know why 30
     
    SK

    0
    #104412

    SATTHISH KUMAR
    Member

    Dear sambuddha
    Thank you for your reply to that Query.Now i am in the interest to know the variables like slop,linearity,bias,and Uncertinity relation with Instrument Repeatiablity and reproduciablity.
    CAn you   send the same in my mail id sathish_801980@yhoo.com
    Expecting your reply
     
    REGARDS
     
    R.L.SATTHISH KUMAR
     

    0
    #105409

    Simon Wei
    Member

    I would be VERY interested in the pointers for history or relevance of the 30 to samples

    0
    #107505

    Sankar
    Participant

    Pls email me the info for “why sample = 30”

    0
    #108297

    Surya Gade
    Member

    Sambuddha:

    Hi. I am very new to this forum.
    Can you please send me the pointers or the links that you sent to DT on why a sample size of 30 is required? That’s the same question I have for a long time.
    Thanks.
    Surya

    0
    #108298

    Surya Gade
    Member

    Sambudha:
    I forgot to mention my e-mail in my previous message…please e-mail to the following address…lamarcardinal@yahoo.com
    Thank you for sharing.
    Surya

    0
    #108736

    Mark Chockalingam
    Participant

    Surya,
    There are several web references on the Central Limit theorem and sample size of 30 that are interesting.  When the sample size approaches 30, we don’t have to worry about the distribution of the population since the it can be safely assumed to be normal for inference purposes. 
    Remember for interval estimation, the standard error is computed from a Sampling distribution of the mean.  When the sample size approaches 30, the sampling distribution approaches normality.  Here are some references:
    http://www.mathwizz.com/statistics/help/help4.htm
    http://www.statisticalengineering.com/central_limit_theorem.htm
    Here is a little more technical article on normal distributions and central limit theorem.
    http://www.itl.nist.gov/div898/handbook/index.htm
    Mark Chockalingam

    0
    #108739

    Robert Butler
    Participant

     
    Mark,
    The central limit theorem applies to the mean not to individuals – 30 samples from a log normal distribution will not suddenly become normal.  The distribution of 30 averages of data from a log normal distribution, however, will be.  To this end, the first citation you mentioned (and as quoted below) is in error.  The second and third citations, however, are correct.  (Note: some of the text from your citations don’t copy over to the forum page so I had to rewrite the equation in the first citation). I also took the liberty of highlighting in order to emphasize the focus on distributions of means and not individuals.
    #1 The Central Limit Theorem says that if you have a random sample and the sample size is large enough (usually bigger than 30), then
    Z = (sample avg – pop avg)/(s/sqrt(n))
    where Z is the standard Normal distribution with m = 0 and s = 1. This comes in really handy when you haven’t a clue what the distribution is or it is a distribution you’re not used to working with like, for instance, the Gamma distribution.
     
    #2 The distribution of an average tends to be Normal, even when the distribution from which the average is computed is decidedly non-Normal.
    Thus, the Central Limit theorem is the foundation for many statistical procedures, including Quality Control Charts, because the distribution of the phenomenon under study does not have to be Normal because it’s average will be
     
    #3 The central limit theorem basically states that as the sample size (N) becomes large, the following occur:

    The sampling distribution of the mean becomes approximately normal regardless of the distribution of the original variable.
    The sampling distribution of the mean is centered at the population mean, , of the original variable. In addition, the standard deviation of the sampling distribution of the mean approaches .

    0
    #108749

    Mark Chockalingam
    Participant

    Rob,
    Thanks for copying and pasting from the source.  However, I submit humbly, that it is in appropriate to quote the content without acknowledging the source.  I agree it reads easier in one page but still it is important to insert the name of the source.
    Now as to your point on the error, I don’t see it.  May be it is semantics.  The CLT is a statement on the sampling distribution of the mean NOT on the sample or the original population itself.  When the sample size approaches 30, the sampling distribution approaches normality regardless of the original distribution.
    Now for intereval estimation, the big leap that is made in practice is to assume that the sample standard deviation is a sufficient estimate for the population standard deviation.  Is this what is not in agreement with you when the original citation on #1 gives the formula for the Standard normal deviate.
    Good discussion.
    thanks,
    Mark

    0
    #109016

    John C
    Participant

    Hi Sambuddha,
    Kindly mail me the same at  john.chandra@wipro.com
    Thanks,
    John C

    0
    #111071

    Paul C
    Participant

    Grateful if you could send on the background to the sample size of 30..
     
    Thanks ..
     
    PC

    0
    #111097

    SemiMike
    Member

    Those are great URL’s for any beginner to study.
    One might use some “rules of thumb” based on practical experience, as well as the more rigorous statistical methods.
    For example, if the DATA is to be from a MECHANICAL process for making discrete parts, then one should first try sampling the FAMILIES of possible variation, using sample size 2 for each family, per Shainon’s recommendations.  2 sites on each of 2 parts, repeated every hour for 2 shifts perhaps, then graphed.  Once it is clear which family of variation is the main problem, then SPC sampling (subgroups measured over time) can be used IF the problem is temporal.  But if the problem is variation WITHIN the parts, then perhaps closer stratification of data is needed, or measuring more sites per part, or conparing that variation for all similar machines, or looking at tool wear trends of this “within-part” spread over time.  Means and ranges both can drift with tool wear.  Sampling for mean data involves the famous sample size of 30 (or 15) for OOC determinations. Sampling for changes in variance require much larger samples.  So a wandering mean is not the same as a wandering variance.  Think 1000 parts. 
    RULE OF THUMB for SPC chart startup is given as 15 to 30 but if the process is non-stationary (drifts, has wandering mean and unstable variance, for example) then other methods are needed.   Box and Luceno’s book (Amazon.com) talks at great lenght about modern issues with process monitoring nad adjustment methods. Assessing non-subgrouped data is another issue.
    RULE OF THUMB:  Individual Charts are less sensitive, less powerful, (they give more false alarms for each rule added, for example) than X-bar charts. 
    RULE OF THUMB:  Subgroup size of 2 to 4 is common and usually adequate.  But for diagnostic reasons, many engineers use subgroups of 10 to 100 sites per part or parts per subgroup.  ASQ had a paper recently on effect of large subgroup sizes.  In general, it invalides the method used by most people to calculate control limits, as many of the subgroup sites or parts are CORRELATED and so the control limits would be wrong.  What is good for diagnostics is often not good for control, given various control models.  With automated gages, more data is cheap.  But how you use it depends on whether the data is really INDEPENDENT and RANDOMLY SAMPLED and IDENTICALLY DISTRIBUTED.  Most important is that INDEPENDENCE.  And if the data is AUTO_CORRELATED, its also messy (wandering mean, showing predictability instead of randomness). 
    Then there is the data that comes from CHEMICAL processes, such as continuous refining.  Read Svante Wold or Dr. John McGregor’s books on PCA/PLS multivariate methods for sensor-based data, which is HUGE stream of data.   Rule of Thumb:  Get help.
    Central Limit Theorem:   Only for subgrouped data!!! 
    Shewhart Charts:  Only for stationary processes where samples are independent!!!  (I am not a statistician, and those guys are still arguing about these issues.  See Journal of Quality Technology, Woodall’s papers, for example. 
    Don’t forget;  NIST online handbook.  http://www.itl.nist.gov/div898/handbook/

    0
    #113427

    Chelle
    Participant

    Hello Mr Sambuddha,Can you also send me the article on the 30 sample size that you sent to DT? I am also interested to know why 30? My email add is, roadcrash@hotmail.com
    Thanks,
    Rechel

    0
    #113430

    Kevin Alderson
    Participant

    Reference sample size 30, reasonable amount to measure / analyse.
    approximate 10% margin of error between a sample of 30 to 500.
    Of course it would be better to do 500 for accuracy but you must take into account the cost between 30 and 500 pending what you are measuring. Remember the std dev (10%) margin and you should be fine.

    0
    #113457

    quality_ab
    Participant

    Could somebody please email the link to me at quality_abyahoo.com
    Thanks,
    AB

    0
    #113868

    Glo
    Participant

    Hi Mr. Sambudhha,
    Could you also share the information about sample size 30 with me? I’m very much interested. Kindly email it to glo_blue10@yahoo.com
    Thanks,
    –Glo

    0
    #113877

    DrSeuss
    Participant

    DT, let me try to answer this from a practical experience approach,
    I have also asked this question and have never receive a definitive academic answer.  Here is what I have seen from analyzing real process data.  Take a continuous process and produces data that is normally distributed (near normal is also good enough) and collect your data using rational subgroups approach. Using the Minitab Six Sigma Process report to calculate short term and long process capability. Look at report #4 or #5, it shows both the Sigma ST & Sigma Lt on a graph.  Notice how their values stabilize toward a value at the number of subgroup increase.  You will notice a flating of the curves at about 10 subgroups, then around 20-25 subgroups the curves are almost horizontal.  By the time you reach 30 subgroups the sigmas have stabilized and adding anymore subgroups will only change the sigmas in the 4th or larger decimal places.  If you are an Excel wizard, you can demonstrate this very easily also.  The idea is that after about 30 subgroups (30 data points) the variance of the data typically stabilizes.   

    0
    #118555

    Leon
    Participant

    Dear Sambuddha
    could you  please send the reference to me, thanks very much
    mzzhang@vip.sina.com
    Leon

    0
    #120057

    vee
    Member

    Dear Sambuddha – –
    I’m very interested in your projects.
    Please send me mail.
    Thank you very much.
    Sincerely,
    vee

    0
    #120058

    vee
    Member

    from vee
    my email  wsu_vee@yahoo.com
    thank again

    0
    #123533

    DEEPAK JAIN
    Participant

    Sambudha:
    please also email me , because fm last few weeks  i am finiding the answer
     
    please reply me on email deepak69748@rediffmail.com
     
    D.JAIN
    9811564123

    0
    #123564

    Manav
    Participant

    Sambuddha/DT,
    Please send me the information on sample size. I know this msg is 3 years too late, but would appreciate your or anyone’s help in getting this info to me.
    Thanks
    manavbhalla1@yahoo.co.in

    0
    #127028

    Ropp
    Participant

    When the population size is greater than 100, the normality condition is met when the sample size is greater than 30.  Increase sample size depending on the process being studied and the variability of the data produced.  30 is not a “magic” number applicable to all data sets and processes.

    0
    #127036

    Darth
    Participant

    Dave, might I suggest that you check the dates on any post that you respond to.  This is a really old one.  OK, Nick, how did I do????

    0
    #131786

    Rhex
    Member

    hi sambuddha,
     
    I know that the forum thread has been going on for quite sometime now and am not sure if you would be able to receive this message but I’m requesting and  hoping that you would be able to send the reference materials to me as well.
     
    Here’s my email add: rhexryan@yahoo.com

    0
    #132163

    Kulanan
    Participant

    Dear Sambudha
    I am finding the answer about sampling size. Please kindly send me the information on why sample size = 30. Because I have to use this information for my report and if you have more information please reply me. Thank you very much.Best regards,
    Kulananhttp://mailto: mprang@gmail.com

    0
    #138875

    sue
    Member

    Hi Sambuddha,
    I’m keen to know why 30 samples too? can you send me 1 copy too?
    Email: cockatoos2000@hotmail.com
    sue
     

    0
    #138882

    Heebeegeebee BB
    Participant

    Sue,
    This is a FOUR YEAR OLD thread.

    0
    #138883

    Darth
    Participant

    Heck, that trumps my measly 18 month one earlier this week. 

    0
    #138884

    Mike Carnell
    Participant

    Heebeegeebee,
    …and unfortunately we have not see Sambuddah post on here for a couple years.
    Regards

    0
    #138892

    Heebeegeebee BB
    Participant

    Yeah,
    Whatever happened to Sambuddah???

    0
    #139587

    Mahesh Kumar S
    Participant

    Sambuddha:
    Hi. I am very new to this forum.
    Can you send me the information that you sent to DT on why a sample size of 30 is required ? I too am curious.
    mahesh.kumar.sridhar@accenture.com
    Thanks.

    0
    #139614

    Heebeegeebee BB
    Participant

    Mahesh,
    Sambuddah’s last post under that nom de plume was 2002.
    It is unlikely that you will get a rouse out a 4 year old thread.
    We are still tied at 4 years folks!

    0
    #142636

    Tan Li Ren
    Member

    Dear Sambuddha,
    Could you also send me on the 30 sample size information as well? appreciate it. tanliren@hotmail.com
    Best regards, Li Ren

    0
    #143031

    Edmond
    Participant

    Dear Mr Sambuddha,
    I work in the research field and have immense interest in knowing more about the sample size of 30, would you also send me articles and reference materials on this topic by e-mail at:
    edmondhhfung@sinaman.com
    Many thanks for your sharing.
    Best regards,
    Edmond
     

    0
    #143032

    Anonymous

    DT,
    I first came across the n = 30 rule of thumb duing a lecture by Dorian Shainin (1983). Dorian was brought to Scotland by someone called Ted Williams, who was instrumental in bring Dorian to Motorola in Phoenix some years before.
    According to Dorian, if you plot the error of the estimate of sigma as a function of n, the curve becomes asymptopic at around n = 30, where it can be estimate with a 95% confidence for n = 30. As Stan has previously pointed out Dorian always used a 95% confidence.
    As Mike Carnell has also noted, typcial X-bar and R charts use 30 subgroups of n = 3 or n= 5, which is a sample size of 90 or 150 – a far cry from n = 30.
    Another issue lost on many is the use of multiple subgroups,which provide a pessimistic estimate of sigma, since both the data and the subgroup mean vary in small subgroups; so the entropy of a mulitple subgroups is larger than a single subgroup.
    No one in their right mind would estimate process capability based on a single subgroup of n = 30.
    Regards,
    Andy
     

    0
    #143034

    Hans
    Participant

    DT,
    Avoid all of the complications of interpretations of interpretations and opinions of interpretations and interpretations of opinions of interpretations and review Gosset’s 1908 article in Biometrika: “On the probable error of the mean”. From there you can make your own informed judgement about how other statisticians incorporated and adapted his work into theirs. What is it that they say in lean: Go see for yourself :-). Regards.

    0
    #147999

    Nitesh
    Participant

    Hi Sambuddha,
    Please email me the information on the theory and history behind sample size being 30
    niteshrungta@gmail.com
    Thanks,
    Nitesh

    0
    #148033

    Ashman
    Member

    It is very simple.  Harry picked 30 because it gave 1.5 in his 2003 attempt.  Other numbers give anything between 0 and 50+ for his “correction”.

    0
    #151969

    Shon Stewart
    Member

    Please forward me information about the history on a sample size of 30 as a rule of thumb.  Your help will be very appreciated.
     
     

    0
    #151970

    Confusion about 2 papers
    Participant

    n = 25 has a truly statistical justification. At n = 25 the Law of Large numbers will start to show a pronounced symmetric/normal distribution of the sample means around the population mean. This normal distribution becomes more pronounced as n is increased.
     
    n = 30 comes from a quote from Student (Gosset) in a 1908 paper “On the probable error of a Correlation” in Biometrika. In this paper he reviews the error associated with drawing two independent samples from infinitely large population and their correlation (not the individual errors of each sample relative to the sample mean and the population mean!). The text reviews different corrections to the correlation coefficient given various forms of the joint distribution. In a few sentences, Student says that at n = 30 (which is his own experience) the correction factors don’t make a big difference. Later, Fisher showed that the sample for a correlation needs to be determined based on a z-transformation of the correlation. So, Student’s argument is only interesting historically. Also, Student wrote his introduction of the t-test in Biometrika during the same year (his prior article). Historically, the n = 30 discussed in his correlation paper has been confused with the t-test paper, which only introduced the t-statistic up to sample size 10.
     
    In sum, the n = 30 is a rule of thumb that accidentally works. But ironically the n = 30 for sampling from population was confused with the n = 30 observation from correlations.

    0
    #155817

    Pramod Thomas John
    Participant

    Dear Sambudda,
    Could you please mail this information to me (pointers for choosing sample size as 30). I recently had an interview when this question was asked and I drew a blank.
    Thank you in advance.
    Cheers
    Pramod

    0
    #160732

    Phillip
    Participant

    Hi Sam or anyone has gotten his information about why minimum sample size is 30, can you pls forward it to me?
    Thanks,
    Phillip
    pcdwang@hotmail.com
     

    0
    #160733

    Trev
    Member
    #161859

    aparna
    Participant

    hi sambuddha,
    can u e mail me that presentation as well at aparnatt@rediffmail.com

    0
    #162271

    Robin
    Member

    My background is in mathematical statistics but several years past, and I tried to read through this thread which appears to be it is quite complicated, but the actual question does not seem to be answered.  I am also new to 6 sigma, but did a google on 30/sample size and this appeared to have a good discussion so I will rephrase the original question and give my opinion:  that question is:
    1.  Is 30 some magic number that can be used for an adequate sample size for “most” purposes”?
    I recognize that in use, there are almost always assumptions about the underlying distributions and parameters, but the calculation of power and sample size are well worked out.  I can understand assumptions that the distribution of the mean for sample size of 30 should “look fairly normal for most distributions”, but the power/sample size strongly depends on the underlying variance as well as other variables that the field has not seemed to define. I suppose that we can assume that with a sample size of 30, we have a sample distribution that is normal with the population mean as the mean and the population variance divided by 30 as the sample variance.  We assume that the allowable power is .8 (why not .9 or .95?), and that the allowable difference between the true mean and estimated mean is x% of the population standard deviation.  (again arbitrary), then with all these assumptions and the right x%, perhaps a sample size of 30 might arise as a reasonable sample size.  However, we usually are more concerned with the absolute error between the sample mean and population mean which would completely negate the possibility that there is any unique N that could satisfy an adequate sample size since population variances have no bounds that I know of.  If however, there is some consensus that we are making all these assumptions it should be spelled out. 
    Reading several of the comments, I contend that the number 30 is just some number that has nestled into the literature without any true mathematical/statistical verification.   Its small enough to be practical to do, but is an arbitrary number without true mathematical signficance.  What bothers me is that if we are talking about 6 sigma, it would appear that to accept 30 as a magic number for sample size rather than using the standard known statistical procedures to estimate the proper sample size is anathema to the underlying concept of precision which I am assuming that 6 sigma represents.

    0
    #162275

    Historiography
    Participant

    I posted this response earlier. It is based on a review of Fisher’s early work.
    Overall, rules of thumb were heavily introduced into statistics when it became commericialized and therefore entered the engineering field. The rules of thumb regarding estimation of the parameter is only one example where classical statisticians gave up and gave way to the more pragmatically oriented statisticians. The histor of the magical numbers 22, 25 and 30 are replicated below. But other rules of thumb emerged to make the science more usable.
    n=22 was proposed by Fisher in Statistical Mehthod, p. 44, when he reviewed the impact of the the exeeding of the standard deviation once in evey three trials. Twice the standard deviation is exceeded in about 22 trials “For p-value = 0.05, or 1 in 20 and 1.96 or nearly 2; it is convenient to take the point as a limit in judging whether a deviation is to be condisered dignificant or not. Deviations exceeding twice the standard deviation are thus formally regarded as signif8icant. Using this criterion we should be led to follow up a false indication only once in 22 trials even if the statsitics were the only guide. Small effects will still escape notice if the data are insufficiently numerous to bring them out, but lowering of the standard of signficicance meet this difficulty.
    n = 25 has a truly statistical justification. At n = 25 the Law of Large numbers will start to show a pronounced symmetric/normal distribution of the sample means around the population mean. This normal distribution becomes more pronounced as n is increased.
     
    n = 30 comes from a quote from Student (Gosset) in a 1908 paper “On the probable error of a Correlation” in Biometrika. In this paper he reviews the error associated with drawing of two independent samples from infinitely large population and their correlation (not the individual errors of each sample relative to the sample mean and the population mean!). The text reviews different corrections to the correlation coefficient given various forms of the joint distribution. In a few sentences, Student says that at n = 30 (which is his own experience) the correction factors don’t make a big difference. Later, Fisher showed that the sample for a correlation needs to be determined based on a z-transformation of the correlation. So, Student’s argument is only interesting historically. Also, Student wrote his introduction of the t-test in Biometrika during the same year (his prior article). Historically, the n = 30 discussed in his correlation paper has been confused with the t-test paper, which only introduced the t-statistic up to sample size 10.
     
    In sum, the n = 30 is a rule of thumb that accidentally works. But ironically the n = 30 for sampling from population was confused with the n = 30 observation from correlations.
     
    So, to your point, yes there are historical reasons but the true reasons are the need for Statistics to establish itself as a useful field. Now, rules of thumb have taken over the crticial thinking about statistics. Six Sigma accelerated this movement.  

    0
    #162276

    Grasshopper
    Participant

    Arn’t you clever…oh yes you are…now reread your post and update with some additional builds to support your argument.
    Grasshopper

    0
    #162277

    Statistician
    Member

    you’re making progress, you can actually read now. great accomplishment!

    0
    #162728

    Robin
    Member

    So does the number 30 have significance in the use of an sample for an arbitrary population or is it just a number that “seems” to work because no one has actually tested it.

    0
    #162729

    Quainoo
    Member

    Hello,
    I would be interested to have a better understanding of how this sample size issue relate to SPC charts.
     
    Usually, the sample size of an SPC chart is 5, but my understanding is that the sample size should be determined according to the ‘normality’ of the underlying distribution.
    If the underlying distribution is ‘absolutely not normal’, the sample size required might be around 30 and if the underlying data is normal, there is no need to use samples and individual data can be used.
    I am correct ?
    Thanks
     
    Vincent
     

    0
    #171381

    Danny Carballo
    Participant

    Can you also “E” mail this attachment.
     
    Thanks in advance.

    0
    #171403

    Tiffany Lian
    Member

    Hi, Sambuddha:
    I am new to this forum, & very curios “why 30”? Could you please also send me the info, thank you very much.
    txlian@hotmail.com
    Tiffany Lian
     
     
     

    0
    #171996

    Devie
    Participant

    Hey Sambuddha,again, I’m a newbie here, could you please send me the info about why 30 sample size… pleasee… thank you so much.Please send it to me to devie_cynthia@yahoo.com, as i will need it for my final paper.Thanks again!

    0
    #172011

    J
    Member

    Hi Sambuddha,
    I’m new to this forum and was intereted to know more about 30pc could you send me this project when you get a chance.
    Sid

    0
    #172012

    J
    Member

    Hi sambuddha,
    forgot to write my email id, quadrisyed@hotmail.com
    appreciate your help!
    Thanks
    Syed

    0
    #172048

    J
    Member

    If any one in this group has this information please do send it to me…..
    Thanks,
    quadrisyed@hotmail.com

    0
    #172134

    BelowTheBelt Certified
    Participant

    Because the Standard Error Of the Mean improves as sample size increase to 30.

    0
Viewing 100 posts - 1 through 100 (of 103 total)

The forum ‘General’ is closed to new topics and replies.