Home › Forums › Old Forums › General › Pooled Standard Deviation
This topic contains 9 replies, has 1 voice, and was last updated by Robert J. Chen 10 years, 4 months ago.
I’ve read the definition of “pooled standard deviation”, but I’m trying to understand it. In layman’s terms that is.
I have strips which I measure 5 samples from (see the variance across the strip). I measure about one strip a day. How can I figure out my variance from strip to strip? Would I use the averages of strips, then take the variance of the strip averages? Would I use the pooled StDev of the strip StDev (variance)?
Have you tried an x-bar & r chart. It sounds most appropriate for what I think you’re trying to do.–jeremy
Pooled StDev is used when performing a hypothesis test, more specifically the t-test to determine differences in samples. If you want to see if there are significant differences between strips, then you can do a t-test.
It does sound more like you are trying to determine whether your process is in control more than it seems like you want to perform hypothesis tests. The X-bar/R chart would be good for that.
Howell,
Both of the responses above are good. The pooled estimate of variation is typically employed with inferential statistics-specifically the t-test for comparison of two means. Basically to utilize the pooled estimate you need to have a firm belief that the two estimates of variation are similar in nature. If they appear to have a large discrepency or a large difference in the D.O.F of each estimate then it wouldn’t be wise to clump them together as a unified estimate.
Reading your scenario I agree that the Xbar-R method would be best to answer the question that you have posed. By taking the 5 measures you take across the strip as a subgroup and then collecting these measures across multiple days you now can pose the question that you are asking. The range section of the control chart is fundamentally telling you what the estimate of variability is from the aspect of within sub-group variation. If the pattern of the subgroups of n=5 is random in nature and within the control limits after about 20 data points you should have a fairly good estimate of standard deviation (within) through Rbar/d2. So this becomes your estimate of within strip variation. Strip-to-strip variation would be graphically displayed in the averages section of the control chart. If the average chart is random, and in control, that tells you that the estimate of variability from within strip is larger than that which is expected from strip-to-strip. So what to do then? Start asking what is going on within subgroup. If the average chart is out of control, you can pull the converse. And now start asking what variables and/or noise factors are changing and/or not accounted for from day to day.
A recommendation could be to look at this from a Multi-Vari perspective. Think about your process and the various modalities with which variation can creep in. It sounds like you feel there is a location effect (within the strip) and you are interested in day-to-day variation. What would additionally be of interest? Sequential (or the next reading of 5 for the next strip off of the process)? It seems like only one reading per day could be missing some pretty large signals from the process. Do you have shifts within a day, month, quarter??? Do you know their relative impact to the variation of the process? Do you have multiple lines (locations) producing the material? A multi-vari chart would graphically display up to 4 levels of variation within a process and give you insight into where the largest contributors lie. Good luck with your analysis.
Regards,
Erik
Please provide the formula for pooled standard deviation and how you use it to interpret the ANOVA. Thanks,
The formula for pooled standard deviation is:
s = sqrt[((n1-1)s1^2 + (n2-1)s2^2)/(n1+n2-2)]
As requested, please explain how you use this equation to evaluate an ANOVA? Thanks,
None of these response have really answered your question.
First, looking at where the pooled variance would be used and where it should not be used (the answer to you question is after that):
The key to your answer has to do with the means of each day’s measurements. Do you think they are all equal (the distribution does not shift from day-to-day) or do you think they could be different (the distribution could wander slightly from day-to-day, which is likely the case).
If the mean is constant (no shift), then you can just calculate the standard devaition using all the data piled together (after four days you would calculate the standard deviation of 20 values).
If the mean is not constant (chance of shift), then you need to calculate the pooled variance using the individual variances. If you have four days worth of data (four variances), calculate the pooled variance as follows: first calculate s^2*(n-1) for each of them, sum up those values, and then divide by the sum of the (n-1)’s.
Now, it doesn’t sound like this is really what you want. It sounds like you want to separate out the within strip variation from the day-to-day variation (which could also be called the strip-to-strip variation, but there could be other day-to-day effects added in too). To do that you’ll need to do a variance component analysis. Do you have Minitab? If so, you can run the Fully Nested ANOVA. You should enter your data so you have a column that identifies the day, and another column with the measurement (there should be five rows of values for each day).
Start the Fully Nested ANOVA tool. Put the data column name into Responses field and the Day column name into Factors field. Click OK.
Find the Variance Components table in the session window. The first row will give your day-to-day (also called strip-to-strip) variance, % of total, and standard deviation. Your within strip varation is listed as Error.
Most other stats packages will have similar capability.
I have to agree with you, as far as “answering my question”. Actually, I had stumbled onto the Fully Annested ANOVA. And you are right it is exactly what I wanted. Unfortunately, our samples do not cover all the levels where variation can occur (but that’s another problem, that’s easier to fix).
thanks,
Howie
m = (m1*n1+m2*n2)/(n1+n2)s = sqrt[((n1-1)*s1^2 + n1*(m-m1)^2 + (n2-1)*s2^2 + n2*(m-m2)^2)/(n1+n2-1))
The forum ‘General’ is closed to new topics and replies.
© Copyright iSixSigma 2000-2014. User Agreement. Any reproduction or other use of content without the express written consent of iSixSigma is prohibited. More »