# Probability

Six Sigma – iSixSigma › Forums › Old Forums › General › Probability

This topic contains 5 replies, has 5 voices, and was last updated by BillR2 12 years, 11 months ago.

- AuthorPosts
- October 11, 2006 at 1:26 pm #44857
Hello All,

This is a tough one, at least for me. Perhaps someone has the knowledge here. I have 600 employees in mutliple locations (25). There are a different number of employees at each location, but I have that data. I want to take a random sample of 50 employees using a random number generator. The question was asked to me, what is the probability of getting x (let say 5) employees in one location in the sample. The concern is, I gues, skewing the sample with too many people from one location. I know you can select randomly from each location to avoid that, but that is not what leadership wants to do. They want to consider the entire population as one and then determine the probability of getting x from one location. Does anyone know how to calculate that?

Thanks.0October 11, 2006 at 1:57 pm #144529Bill,

The probability will depend on the distribution of employees at different locations.

The probability of finding ‘x’ employee from each location in a sample size of 50 = 25x/50 = 0.50x.

Since you are considering a single population and having employees from all locations, you have to provide weightage to each location, depending on the employees at each location.

Porobability of finding ‘x’ employee in one location in the sample = weightage factor X 0.50x

Hope this works..just try it out ..!!0October 11, 2006 at 2:17 pm #144533Management would only be asking the question if they had a concern about differences between the locations. If the locations were assumed to be totally homogenous then who cares how many are taken from each site. Since that is a concern yet they want to consider the entire company as the population then possibly you need to consider doing some stratified random sampling. This is accomplished by first determining the overall required sample size and then randomly selecting from each site in the proportion to the size of the site. For example if you had two sites, one was 100 people and one 200 people. You calculate that you need a sample size of 75. You would randomly pick 25 from site one and 50 from site two. That way you have proportional representation from each site.

0October 11, 2006 at 5:32 pm #144551I agree with you. I still need to answer their question and I don’t think that the math given before works out.

0October 11, 2006 at 6:12 pm #144553Good question! Wondering if some of our stat friends can help out. My thought is there could be something done with the hypergeometric probability distribution using the probability the an event can occur out of the 600 total based on the size of each site. You find the probability of finding exactly 5 at each site out of 50 based on their probability and then combine to get a total. I’m not sure this would work, would need some more time.

Help Stat Friends!0October 11, 2006 at 8:20 pm #144568First I would ask management why they want to do it this way? It may be that they have a belief that doing a systematic random sampling from each location will not give them a prediction of the total population. Sampling is one of the keys to getting an unbiased estimate of the population. Taking some samples from each strata of the population is valid.

One possible concern of management could arise if the survey is planned by intranet where they may not be able to control selections by location. This is not always a good survey technique. However, including a location question on the survey would catch that problem.

To calculate the possibility of getting 5 from one location can be done. You have to know the number at each location to be more accurate. Assuming all locations are not the same size, five people from a large location would not necessarily be bad.

Using the combinations formula nCr for each location where r is 5 and n is the location number multiplied by the other location combinations would be the traditional way to do this and dividing by the total possible combinations where n=600. Since this is just one possible outcome you have to do others as well. This is tedious even using Excel.

Using the complement would get the probability of five or more from any one site. Again you would have to do this for each site.

Using the Monte Carlo method might be a better way to estimate the probability.

The total number of surveys returned or accepted would also change the probability. You seldom get back more than a small percentage of surveys you send out.0 - AuthorPosts

The forum ‘General’ is closed to new topics and replies.