iSixSigma

Sample size for normal distribution prediction

Six Sigma – iSixSigma Forums Old Forums General Sample size for normal distribution prediction

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
    Posts
  • #33326

    Ranjan
    Participant

    I’d like to know what the minimum sample size needs to be for me to predict that a particular sample was chosen from a normally distributed population.

    0
    #90596

    Vivek
    Member

    To predict normal distribution, it is important that the sample size should be selected so that all common cause variations are included…. but at the same time special causes should not be there. Thus it depends on the process.
    However personally I’ll suggest data points should be more than 100 so that we are more close to the Z sistribution.
    vivek

    0
    #90602

    Yashwant M Joshi
    Member

    As a thumb-rule , the sample size should be 10 %.However this 10 % has to be carefully selected to sufficiently cover different subpopulations of the total population e g  In a Manufacturing process, we should take 10% sample from each shift, each process,each vendor-supply, even different seasons( since climatic conditions do change drastically) i e the sample should represent all the variable parameters.In fact we need to stratify the data variable parameter-wise and then test the theories for each of the population sample.
    Similarly, in case of Marketing, regionwise,salesmanwise etc stratification of data is required to be done and then take 10 % sample for testing of theories. We can use chi square test too to verify the representativeness of the sample.

    0
    #90619

    Ranjan
    Participant

    Vivek,
    I disagree on the need to have a minimum number of data points. While in  theory, you’d ideally want to have a large sample size, in practise, collecting such data could be prohibitevly expensive. I think you need to make a judgement call, based on your understanding of variation. However, if you are looking at sources of variation, it is easy to subgroup according to known sources of variation. But unless you can cover upto, say 80% of that special cause variation, there is little sense in collecting 100 points to begin with.

    0
    #90647

    marklamfu
    Participant

    If the data is from a normal distribution, I think the minimum sample size is 30, >50 is perfer

    0
    #90657

    Robert Butler
    Participant

    As phrased, your question does not give enough detail to permit a specific answer.  As written, the answer to your question is 2.  I realize that this sounds like I’m trying to be “cute” but in fact you can attempt an estimate with only 2.  Obviously, depending on what you are trying to do, such an estimate may either be adequate or of no use whatsoever. 
      If you are confronted with the task of testing the assumption of normality of a population you can use the “eyeball” approach to a graphical analysis of your data. You can also use quantitative tests such as the Anderson-Darling, the Chi-Square, or the W test to check for assumptions of data normality. 
      If you plot your data on normal probability paper you can examine the plot to determine how well it approximates a straight line. Many computer packages will do this as well as run one of the above tests on a data set of any size to give you a sense of normality. 
      If you have access to a book of graphical tables such as the Rand Corporation plots you can visually compare your plot against the examples to get a sense of how well your data is approximating a normal.  In describing such plots we have the following from Fitting Equations to Data by Daniel and Wood: “Our sample sizes (from a random normal distribution) range from 8 to 364. As might be expected, samples of 8 tell us almost nothing about normality, whereas samples of 384 seem very stable execept for their few lowest and highest points. Sets of 16 show shocking wobbles; sets of 32 are visibly better behaved; sets of 64 nearly always appear straight in their central regions but fluctuate at their ends.”
      The above quote highlights the key problem associated with using quantitative tests without also examining a graphical representation of your data. Some tests are more sensitive to variation in the ends of the probability plot and others are sensitive to variation in the central region.  Thus, even random data taken from a known normal distribution could fail a normality test if the test and the data were mismatched.
      The Chi-Square test does not do well with small data sets and there are additional issues surrounding the arbitrary arrangement of data into cells.
      The W test (Shapiro-Wilk) is a very good test for normality when you have a data set with less than 50 points. If you don’t have this option available to you in your statistical software you can consult Hahn and Shapiro – Statistical Models in Engineering pp. 294-297 of the first edition.  You will also need to use Table IX (in that book) to complete the calculations.  In the past, I have used a combination of graphical plots and the W test to investigate the normal properties of data sets with as few as 10 data points.
      As a final point it should be remembered that statistical tests provide objective methods for testing whether or not an assumed distribution provides an adequate description of the observed data.  They never allow one to prove that the assumed distribution is the correct one. 

    0
Viewing 6 posts - 1 through 6 (of 6 total)

The forum ‘General’ is closed to new topics and replies.