Data points.
Six Sigma – iSixSigma › Forums › Old Forums › General › Data points.
 This topic has 11 replies, 9 voices, and was last updated 19 years, 7 months ago by Bob Johnson.

AuthorPosts

April 26, 2003 at 12:47 am #32086
How many data points (5,10, 20 or 30 ?) are needed to have statistical significance?
0April 26, 2003 at 11:45 am #85270
marklamfuParticipant@marklamfu Include @marklamfu in your post and this person will
be notified via email.No. of statistical data is related to the character of the case. If you want to study process capability, 30 data are minimum, generally, 100 or more data are perferable. For X_bar control chart, you can get 5 data as a subgroup to plot chart, for variable sampling plan(e.g. Milstd414), we can get 5, 10, 15, 20… samples based on Lot Size&AQL to calculate and determine the Lot’ acceptance or rejection
0April 26, 2003 at 11:55 pm #85274Where is the minimum “30” number derived from to calculate process capability? Need help!
0May 1, 2003 at 5:07 am #85415
HariprasadParticipant@Hariprasad Include @Hariprasad in your post and this person will
be notified via email.The number 30 came out from the CLT (Central Limit Theorem)Theorem which says if the sample size is 30 then what ever may be the population distribution the sample means follow normal distribution.
0May 1, 2003 at 5:37 am #85416
Mike NellisParticipant@MikeNellis Include @MikeNellis in your post and this person will
be notified via email.Hello Suman,
The error around your mean equals standard deviation devided by the square root of the sample size (mean error = sigma/(n^1/2)). Quadrupling the sample size will decrease the width of your confidence interval by half. So, if there is a magic minimum sample size it is determined by your willingness/comfort level to have large uncertainty/confidence interval about your mean. Don’t forget one of the most important aspects of sampling from populations, getting the data randomly. Practically I tend to take as large a sample size as my budget and time allows. In my experience, this is more often than not more than enough. It is better to be approximately right rather than precisely wrong.
Hope this helps,
Mike0May 1, 2003 at 11:40 am #85430
Bob JohnsonParticipant@BobJohnson Include @BobJohnson in your post and this person will
be notified via email.Hi Suman!
My experience is that the sample size decision is technically driven by several considerations. These are data type (attribute or variable), variance of the underlying population (value, known/unknown), the necessary power as well as the applicable test that I am performing. For example, suppose you are running a 2 sample t test with a power of .95 (alpha = 0.5) and a known historical standard deviation of 2. If we want to detect a difference between the two tested sample means of 0.5 then we will need 417 samples to meet the power requirement. if we only need to detect a difference between the two tested sample means of 1.0, then we will need 105 samples.
As a general rule (after all has been said and done) I follow the same considerations as Mike is his post. Usually I have to limit my samples for economic reasons and end up either calculating the difference that my test will “see” (and understanding that any lesser difference will be subject to Type II error) or going in with a target difference and finding out the resulting power based on the data characteristics. If the power is too low, I use this to justify more samples or scrap the test.
Hope this helps!
Bob Johnson0May 1, 2003 at 1:18 pm #85437Bob –
You wrote “For example, suppose you are running a 2 sample t test with a power of .95 (alpha = 0.5)”
This seems to imply that power is 1alpha. The rest of your post reveals you must understand the concepts pretty well.
You probably meant power of 0.95(Beta = 0.05, alpha = 0.05) ? Could be a little confusing for those new to these things. Many I ghave encountered seem to believe that alpha has something to do with power, when in fact it does not.
Alpha could be 0.05 with power at 0.95 as is customary or most other values that you might choose.
0May 1, 2003 at 1:34 pm #85440According to some information I read in Wheeler’s Advanced Topics in Statistical Process Control, He is quoting Shewhart in the following:
“… It appears reasonable, therefore, that the criterion [control limits] may be used even when we have only two subsamples of size not less than four.” (Wheeler quotes this from p. 315 of Shewhart’s Economic Control of Quality of Manufactured Product.
Wheeler goes on to say, “So when limited amounts of data are available, go ahead and calculate control limits, and then, if and when additional data becomes available, recalculate the limits.”
I have heard the expression, more data is better, but less will do. I am of the opinion that if you have only 10 or 20 samples, you can calculate the control limits. You should recognize that there is more error with fewer samples instead of 30 or more samples.
Matt
http://www.pqsystems.com0May 1, 2003 at 1:41 pm #85443
Bob JohnsonParticipant@BobJohnson Include @BobJohnson in your post and this person will
be notified via email.Dave,
Exactly right! Sorry about the error….
Best Regards,
Bob0May 1, 2003 at 3:55 pm #85452Isn’t there a consideration for what type of statistical calculation you will be performing? You said “how many data points (5, 10, 20 or 30)” are you referring to control chart data points or hypothesis testing (f test, t test, etc.)? The ‘rule of thumb’ I use is that for control chart testing I need atleast 20 data points (these can be 20 individual data points or 20 subgroups of ‘x’ data points averaged within each subgroup). For hypothesis testing I use at least 40 data points (typically individual data points) as a ‘rule of thumb’.
0May 3, 2003 at 4:05 pm #85522Hi Bob,
“My experience is that the sample size decision is technically driven by several considerations. These are data type (attribute or variable), variance of the underlying population (value, known/unknown), the necessary power as well as the applicable test that I am performing. For example, suppose you are running a 2 sample t test with a power of .95 (alpha = 0.5) and a known historical standard deviation of 2. If we want to detect a difference between the two tested sample means of 0.5 then we will need 417 samples to meet the power requirement. if we only need to detect a difference between the two tested sample means of 1.0, then we will need 105 samples.”
Could you explain as to the math behind arriving at a sample size or 417/105 based on the level of difference between the two samples.
Thanks.0May 4, 2003 at 11:50 am #85527
Bob JohnsonParticipant@BobJohnson Include @BobJohnson in your post and this person will
be notified via email.Hi Subbu!
I’ll have to research the underlying equations for you but I use minitab (V13.32) to do the calculations for me.
Best Regards
Bob Johnson0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.