January 27, 2007 at 9:16 am #45960
I have 30 people available at one site to take 30 data points each, over 23 weeks, on a process that is performed by several hundred at multiple sites.
The process is then going to be repeated with the same people and no. of data points, with improved tooling, and analysed with a 2sample ttest.
How do I express my sample size statistically in relation to the much higher population across the multiple sites?
Any other tips you could give would be much appreciated.
0January 27, 2007 at 12:01 pm #151143Hi DD,
in principle (fortunately) there is no direct link between the size of the popuéation and the sample size.
You take it into account by the estimate of the standard deviation of the population and by designing a sampling that will yield a representative sample.In your case I’d worry about the representativeness of the sampling – you must be sure that the results of your tests can be applied to all the locations even though you only test in one location. If you can convince yourself (and the stakeholders) that this is true than fine, go ahead, but if you see reasons why the results from this one site can not be applied to the others you might want to change the test strategy by testing at more sites that better cover the whole population.This is not entirely black and white though – maybe you can pick a big enough subset of locations that are very similar to the one where you test and apply the results to those only in the first step.I hope this helps a bit!Regards
Sandor0January 27, 2007 at 1:40 pm #151146Thank you Sandor,
I think I get what you’re saying. Because I have 30 people each taking 30 data points (in this case, time to do the process), that one site should be well represented, assuming there are no special causes, like time of year, shift etc., which in this case it is safe to assume there aren’t.
I similarly don’t see reasons why this site should differ greatly from the others, I could perhaps screen this by asking a set of questions to the other sites, as a justification for not including them on the test.
At the end of the day I asked the stakeholders if they wanted one site only or multiple sites, their response was ‘one would be easier’.
I just wondered about the ‘margin of error’ rule of thumb, where a survey of 50 could have a 14% error, not sure that applies in this case though, perhaps you could help confirm this?
Thanks0January 27, 2007 at 3:36 pm #151151Hi DD,
you want to plan a 2 sample ttest so I guess you’re measuring a continous variable and you want to detect whether some change you apply to the process will make a difference or not.The first thing to do will be to decide what is the minimum level of difference that you want to detect with your test. (E.g if you’re measuring cycle times before and after a change you might want to be able to detect a difference of minimum 30 minutes meaning that the test will probably tell you that there is no difference IF the real difference is smaller then 30 minutes.)Then, you need to decide the power of the test: that is the probability that the test will detect a difference IF the difference is 30 minutes at least – the standard value for power is 80%. I’m assuming you’d want to stick with the standard alpha level of 95%.Given the difference you want to detect and the power, you’ll have to estimate the standard deviation of the measured values and with these three you can get the sample size – if you have no statistical toolkit like minitab just sent this data to me and I’ll gladly do the calculation for you.A remark on the stakeholders answer “easier to do it this way” sounds very much like one of the foremost causes of bias in testing – the socalled convenience sampling.
My guess is that you will get a much smaller sample size then the 900 you plan now – maybe you can trade this off to do the test in several locations with the smaller number ?Regards
Sandor0January 27, 2007 at 7:28 pm #151155
be notified via email.The random assignment of a treatment is the core idea behind experimentation. Thus, you estimate the “effect” of the treatment.
By contrast, in a survey sampling exercise you estimate the parameter of the population from the sample. The “representativeness” of the sample is ensured by the random sampling of the units. Random sampling ensures that you can estimate the error due to sampling within certain boundaries.
Thus, the assessment of samples via effecti size, power and standard deviation (as outlined by S) needs to be done within the context of the experimental design. Unless you have a solid experimental design your findings will be vunerable to all of the factors influencing the validity of observational studies, i.e. you cannot unambiguously link the causal relationship between treatment and effect. If you cannot assign the treatment randomly, then you have to think through the threats to validity due to the fact that you are dealing with a “quasiexperimental desgn”. There is literature on quasiexperimental designs from Stanley and Campbell that you can buy on amazon or one of those places. (You basically deal with 10 different threats to validity). Think Experiment, Not Survey.0January 27, 2007 at 8:16 pm #151156Thanks to both of you.
The average time is 10mins., quickest 5mins., worst case 40 mins., and the new tooling should enable the proces to be done consistently in less than 1 minute.
Please help with the estimation of standard deviation if you can.
Regarding random sampling & quasiexperimental design, the process is done 2 / 3 times a day, so I was planning on getting 10 people at 3 different sites (as per Sandor’s post) to take the data over 3 weeks (to give the 30 data points per person), then repeat with the new tooling.
This I think does not qualify as random, but the problem is I am time limited and need to complete the study within the 6 weeks.
So maybe I need to explore the quasiexperimental design in more detail as suggested, unless you can advise further.
Many thanks
0January 27, 2007 at 8:51 pm #151157
be notified via email.Given the timeconstraints and the cost of conducting your research, you’ll probably end up with what is called a “pretestpost design” (Do a Google search on this term). I would review the key threats to the validity of your finding, but I suspect the main questions that you will have to answer when you do your presentation are as follows:
How did you make sure that the operators didn’t work more effectively due to the fact that they knew that they were measured (called reactivity to the measurement or testing effect in the literature)? The socalled Hawthorne effect will be always with us. But if the issue comes up in a presentation remind the person who askes it that when the data of the Hawthorne studies were reanalyzed in the 1970s, researchers found that the effect was much smaller than was originally reported in the 1930s.
Are there other changes at the three sites that may account for the change in performance (the technical term is “history’). This can be “tested out” by baseline measurements that extend beyong one measurement so that you can show a trend. Thus, you may conduct your measurements at several intervals before and after the intervention, rather than doing a one time shot before and after. Also, the fact that you have three groups will be helpful by testing out if the effect was stronger at one site than at another. This may also give you some additional valuable information about site differences.
It is good that you are investigating these questions prior to conducting the research. Normally, we are faced with post hoc questions and explanations. I think that you should be ready to go. Let us know about the outcome. It would be interesting to hear what happened.
0January 27, 2007 at 9:20 pm #151158
be notified via email.What is the standard deviaiton that you calculate based on your existing data? If all you have is the data that you described above you could use a rule of thumb by calculating the range (40 1 = 39) and divide 39/6 = 6.5. Given this “estimated” standard deviation of 6.5 minutes and an expected mean difference of 9 minutes (10 – 1 = 9), your difference in standard units is calculated as 9/6.5 = 1.3, i.e. the means differ by 1.3 times the standard deviation, which is a large effect size. As a result, you will need only about 10 samples per site to detect a difference of this magnitude (assuming an alpha level of .05 … I looked this up in a table and checked it against Minitab. Maybe Sandor can verify this).
Even with a much larger standard deviation, which is very unlikely given the range of your data and the location of your average, your problem will be that you will find a significant difference from a pvalue point of view (There is a “point of no returen” where you have so much power, that it becomes very improbable not to find a difference. After all, the ttest is a small sample statistics. Student first published it with tvalues of samples 2 – 10). The question will then be, did you reach the difference of 9 minutes. This will be seen by the abolute difference between the measurements. I hope this helps.
0January 27, 2007 at 9:46 pm #151160Research Design,
thanks for your last post, it definitely helps, you have also helped in another way;
I looked up pretest/post design and also the Hawthorne effect;
if they are doing the process 2 / 3 times a day, and knowing that the process can be simulated in a ‘classroom’ setting, I could get their supervisor to ‘time’ them do the process 2 / 3 times at the start of each shift (in the classroom), then leave them to get the actual data for the shift.
This would constitute quite an effective control I think for when people ask me that ‘difficult question’.
Great stuff!
and thanks again
DD0January 27, 2007 at 10:50 pm #151165
be notified via email.It’s a pleasure. It’s also nice to see someone take this stuff and run with it … there are no readymade answers for all of the potential ways to handle these issues.
0 
