|
|
 |
Sample Size, Error Rate, Minitab
 |
|
|
Message: 58072 Posted by: Robin Sarac Posted on: Thursday, 28th October 2004
Good Day,
Like many people, I'm struggling with determining if my sample size is representative of my population.
I've put some ideas together as far as sample size for a project I'm working on. I would appreciate it if you all could confirm that I've got it right, or point me in the right direction if I've got it all wrong. :)
I am measuring error rates on shipments from our distribution centre to our branches (all transfers for a given month would be the population). We have one branch that represents about 14.5% of our total volume (for September, total of 5901 transfers)...this would be the sample.
If I plug my numbers into MINITAB (2 Sample t, Power and Sample Size)...
Sample Size = 5901, Power Value = 0.95
MINITAB tells me that the difference I would be able to see would be about 0.06 between 2 means.
Does this have anything to do with representation? I don't want to compare the sample to the population, I just want to feel comfortable that measuring error rates at this branch would be representative of error rates at all branches.
Does this mean that this one branch is representative of all branches if I'm satisfied with being able to see a difference of 0.06 between the branch and all branches? Or does it simply mean that I would see that difference if I did compare the two.
Would I be better off to randomly sample transfers as they leave the distribution centre, regardless of branch? It doesn't give me as convenient a way to tie errors at shipment to errors at receipt but if it would give me a better idea of error rate, I would work with it.
I would appreciate your guidance as it relates to sample size in this case. Message: 58100 Posted by: Bob J Posted on: Friday, 29th October 2004
Robin,
A couple of thoughts for you....
The power/sample size calculation that you used is specific to the 2-sample t which means that for that sample size and that power you will be able to see a difference in the average value between the two groups of 0.06. It doesn't seem from your write up that this is going to have much significance for your problem...
Since the area you are most concerned about is whether the error rate at the branch is representative of all branches, you might want to start at the data from the distribution center. Gather error rate and volume data for each of the branches and use a Chi Squared Contengency Table to test the hypothesis that the error rate is the same across the sampled branches. As for sample size here, one of the limitations of the contingency table is that you have to have at least 5 errors from each branch so you will need to ensure that your samples are high enough from each branch to yield at least 5 errors. The tool does not require that you have an equal number of samples from each branch as long as you yield at least 5 errors...
When you do this study you should also evaluate the types of errors you find... This is important because one of the issues you need to consider is whether your data associated with the target branch is stratified or has a significantly different mix of errors than the other branches. A contingency table will help you evaluate this but may require a larger sample to yield the necessary 5 for each cell...
You also need to consider how the error rate changes over time. I assume that you are using a control chart (P chart) for this...
Hope this helps...
Best Regards,
Bob J
Message: 58137 Posted by: Robin Sarac Posted on: Friday, 29th October 2004
Bob, thank you for replying with so much great information.
I guess I was hoping that the 2-sample t dialog would give me some indication of representation, but now that you've commented on it further, I see how it's really irrelevant for what I'm trying to do. :)
I'm trying to avoid too much sampling as the process owner isn't very receptive to it and it really does impact productivity fairly siginificantly in this case. Certainly, there is a reward, but this is one case where I would consider sampling "expensive".
If I understand what you're telling me, before I can use a single branch to represent all branches I must determine if there error rates are about the same statistically. If they are, I can then choose a representative sample size...does that sound right?
Where does the number 5 come from? Is it just a generally accepted number for that tool? Is it something MINITAB insistis on having?
I'm definitely going to track the type of errors as well as error rate. If we are not where we think we are on terms of error rate, we will have to know where to concentrate our improvement efforts.
And yes, I had put a P chart in my measurement plan to give me some indication of whether or not the process is in control. At least I got one part of it right! :) An average error rate is one thing, but unless it's fairly consistent the people we're dealing with here won't go for the changes we're proposing.
Thanks again for your help. I really appreciate the advice of someone who obviously know what they're doing. :)
Message: 58140 Posted by: Robin Sarac Posted on: Friday, 29th October 2004
What about ANOVA? Would this work here? From what I can remember, ANOVA is a statistical method of comparing several sample means. Also, if I turned on the Tukey comparisons could I not get a clear picture of which branches might differ from each other statistically if any of them are different? Message: 58141 Posted by: Bob J Posted on: Friday, 29th October 2004
Robin,
No problem at all... Glad to help...;-)
You are definately on the right path.... Your logic for determining whether you can use a single site as a proxy for all branches is right. If the defect rates are not significantly different, the defect mix is the same and the p chart shows that the process is stable then you are on (reasonably) firm ground... If you have a breakdown in any of these, you should consider breaking your project into several smaller projects dealing with improving your most significant site or most significant problem and then (after that project is completed) moving on to the next most significant etc.
I'm not sure why the number of 5 (in some texts it's as low as 3) is required but it's one of those rules that is necessary to validate the use of the tool...
Good luck with the project!
Best Regards,
Bob J
Message: 58142 Posted by: Bob J Posted on: Friday, 29th October 2004
Robin,
ANOVA would be a good choice if we had a number of groups of samples where we have a variable data characteristic we are trying to evaluate. ANOVA will compare the mean of the representative sample sets and tell us if one or more of the means is significantly different from the others.
If you are evaluating a defect rate, the assumption is that you are tracking a defect count and not a variable characteristic. If I am wrong on this then let me know and I'll take another pass at it...
Best Regards,
Bob J
Message: 58143 Posted by: Robin Sarac Posted on: Friday, 29th October 2004
I certainly can't way whether you're right or wrong, but this is what Stat Guide tells me. It seems to fit the bill nicely as far as comparing the mean error rate between branches. Do you disagree?
A one-way ANOVA can be used to tell you if there are statistically significant differences among the level means. The null hypothesis for the test is that all population means (level means) are the same. The alternative hypothesis is that one or more population means differ from the others.
In addition to helping you evaluate whether all the level means are the same, MINITAB also provides output to help you determine which level means are different when differences exist.
© All Rights Reserved. 2000 Minitab, Inc.
Message: 58144 Posted by: sqe Posted on: Friday, 29th October 2004
I believe the 5 comes from the AIAG manuals for defect counts. You must have at least 5 defects in each sample. So if you have a low defect rate, you'll need more samples. For example, if your ppm is 10,000 (or defect rate) you would need 5/(10,000/1,000,000)= 500 samples. But if your ppm is 500, you'd need 5/(500/1,000,000)= 10,000!!!
Attributes are much harder to sample, mostly because of economy.
Message: 58159 Posted by: Bob J Posted on: Friday, 29th October 2004
Robin,
I disagree....
ANOVA is used for variable data... It is similar in that it provides a means to hypothesis test group averages (same as the contingency table) but does not effectively work for attribute data. If you are counting defects and you want to contrast the average defect rates over multiple sample groups (branch offices), it is the wrong tool...
Best Regards,
Bob J
Message: 58162 Posted by: Robin Sarac Posted on: Friday, 29th October 2004
I see...the old continuous versus discrete issue again! :)
I will do more research and I'm certain I will come to the same conclusion you have. Thank you again for your valuable input.
Message: 58163 Posted by: Bob J Posted on: Friday, 29th October 2004
Robin,
You are certainly welcome.... Have a good weekend!
Best Regards,
Bob J
"The Bottom Line" Links
|
 |
|