By definition, a sample of size n is random if the probability of selecting the sample is the same as the probability of selecting every other sample of size n. If the sample is not random, a bias in introduced which causes a statistical sampling or testing error by systematically favoring some outcomes over others. It is the responsibility of the quality professional to ensure that samples are random, unbiased and representative of the population.

Let’s examine three examples from manufacturing, transaction and ebusiness life that require sampling to ensure process capability:

  1. Parts on a manufacturing conveyor line going from one station to the next need to be examined to ensure proper tolerancing.
  2. Statements being stuffed into envelopes and then sealed by an automatic machine need to be verified that they are completely sealed.
  3. Users visiting your Internet site and clicking through your product catalog should be polled about their online experience.

In these three cases we would like to select a random sample of parts, envelopes and users from a population of 1000 parts, envelopes and users that are produced, sealed or visit the site daily. Let’s assume a 95 percent confidence level, 15 percent margin of error and population size of 1000. The sample size needed to represent the population is 41. In each of the three cases, there will be significant bias if we were to select the first 41 of the 1000 for that day. That would be convenience sampling and the ‘early birds’ of each of the processes may not represent the population very well. We cannot select the parts, envelopes or users that we think are appropriate either, as this would introduce serious problems.

How do we decide which parts, envelopes and users to select for our sampling? With a population size of 1000, we could randomly select 41 numbers between 1 and 1000. Where could we get the numbers? They could be generated by a computer program such as Minitab or Microsoft Excel. For instance, in Excel you would use the following cell formula to derive the first random number of the 41 needed:


=RANDBETWEEN(bottom,top)
where bottom is the smallest integer RANDBETWEEN will return (in this case 1) and top is the largest integer RANDBETWEEN will return (in this case 1000). If this function is not available, you may need to install the Analysis ToolPak by selecting it the Add-Ins command on the Tools menu.

Remember – Users visiting your Internet site always have a choice to close the window if they prefer not to take your survey. Ensure that your sample size is the total number of users you randomly selected minus the number of users that refuse to provide feedback.

One final note on the sample: In the case of the parts and envelopes, they have no choice but to be sampled if you select them. Users visiting your site, on the other hand, always have a choice to close the window if they prefer not to take your survey. Ensure that your sample size is the total number of users you randomly selected minus the number of users that refuse to provide feedback. That’s it! You now have established an unbiased method for obtaining a random sample.

About the Author