Importance of Degrees of Freedom in Randomized Desig

Six Sigma – iSixSigma Forums Old Forums General Importance of Degrees of Freedom in Randomized Desig

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
  • #45346


    What is the importance of degrees of freedom in randomized design (in plain English)?


    Robert Butler

     I’m not trying belittle you or your interest in asking a question but, as written, your question is meaningless.  If you can think of a way to rephrase it perhaps I or someone else on this forum may be able to offer an answer.



    I don’t take it personally. I have no idea what I am talking about. This is a question from a doctorate level quantitative analysis class. I can’t seem to find an answer. I am no statistics expert. I am working on a doctoral degree in educational leadership. (Just trying to gain a little new knowledge).


    Eric Maass

    Well, the question is not very clear, but I’ll at least try to give you a start.  I’m nto sure if I can give it in “plain English”, but I’ll give it a try….”Degrees of freedom” is a rather abstract term that refers to the number of independent values.  For example, if one value depends on another value, then the number of independent values will be reduced by 1. 
    I may be wrong, but I’ll assume that “randomized design” refers to a designed experiment in which the runs are performed in a random order.  A designed experiment is a structured approach where you vary several factors and measure the results on the response or output you are interested in.The degrees of freedom are important in a designed experiment because we use degrees of freedoms in our calculations to determine which factors are important.We do this by first calculating an overall mean for the response. We then calculate the square of the difference between each value for the response in each run of the designed experiment and the overall mean.
    Then, we separate the sum of squares – the sum of the squared differences between the measured values and the overall mean – into a sum of squares for the differences due to the factors and the sum of squares that is not due to the factors.
    We then divide these sums of squares by degrees of freedoms to give us  “Mean Square” values. The Mean Squares are similar to variance due to the factors, and  variances not due to the factors.We can then do a statistical test to see if the Mean Square or Variance due to the factors is significantly more than what you would expect from the Mean Square or Variance not due to the factors.
    So, the Degrees of Freedom help us in a randomized design is important in allowing us to perform statistical tests to determine what factors are significant in terms of their effects on the response.
    Best regards,Eric



    Ann House
    There are two components in randomized experiments: the actual design and the statistical analysis of the experiment. Fisher (1922) introduced the term “degrees of freedom” in the context of the chi-square test, but then extended it to the Analysis of Variance procedure. ANOVA was developed as a computationally simplified form of the regression analysis. Thus, some textbooks will emphasize analysis of variance, others regression analysis and others the general linear model approach which subsumes these two procedures. I am not sure which of the procedures your profs and school prefer.
    Anyway, practically every application of probability theory to statistical analysis is related to degrees of freedom rather than the total number of values in the data under analysis. The term “degrees of freedom” refers to the number of values in a data set free to vary when restrictions are placed on the data. One degree of freedom is lost with each additional restriction.
    Example: You have four numbers, and all four numbers have to add to 15. Here you have one degree of freedom because the four numbers have to add up to 15. Only three numbers have the freedom to vary, the fourth number is fixed because the sum of all four numbers must be equal to 15.
    In statistics, the general rule for computing the degrees of freedom for any sum of squares is: number of independent observations – number of population estimates.
    Thus, if you have four numbers and the mean is fixed at 2.5 you have 3 degrees of freedom. If you have a second set of data and there are four numbers and the mean is fixed at 3.5, you also have 3 degree of freedom.
    This information shows up in a simple (independent sample) t-test as follows: You compare the means of two groups. Thus you estimate two parameters in this case the population mean. The sample mean is an estimate of the population mean.
    Given the example above you will have 8 observations (2 groups with four numbers) and you estimate two parameters, i.e. the mean of group 1, and the mean of group 2. Given the general rule for calculating the degrees of freedom you get 8 observations– 2 parameter estimates = 6 degrees of freedom.
    The calculation of the number of degrees of freedom becomes more complex with the complexity of the experimental design. In essence, you lose one degree of freedom for every parameter that you estimate. In your class you will learn how calculate the degrees of freedom given the total number of observations and the total number of parameters that you will estimate. The t-test is probably the simplest of all analyses in regards to the calculation of degrees of freedom. (For an ANOVA using the same data above the calculation of the degrees of freedom is more complex than for the t-test, but they are functionally equivalent).
    The calculation of the correct number of degrees of freedom is important because the number of degrees of freedom determines the critical value at which you accept or reject a hypothesis. This is a mechanic that will become routine with time.
    I hope this helps. Good luck with your program.  
    The example below shows you a simple t-test output.
    Two-sample T for x1 vs x2
        N      Mean     StDev   SE Mean
    x1  4      2.50      1.29      0.65
    x2  4      3.50      1.29      0.65
    Difference = mu x1 – mu x2
    Estimate for difference:  -1.000
    95% CI for difference: (-3.234, 1.234)
    T-Test of difference = 0 (vs not =): T-Value = -1.10  P-Value = 0.315  DF = 6
    The actual numbers used to come up with this table are as follows:
    x1     x2
    1      2
    2      3
    3      4
    4      5


    Erik L

    Dear Poster,
    In any statistical analysis we are attempting to develop insight into a key ratio.  The simplest way that I can state this is that we are looking for a signal to noise ratio.  The signal is typically the ‘thing’ that we’re interested in gaining more insight into.  We say that something is “significant”, “critical”, “influential” when it’s signal rises above the hurdle set by expected variation (or noise). 
    DOE is typically a preferred methodology for establishing “causality” between inputs (or X’s) and responses (or Y’s).  Why?  Because we control the factors of interest, and mitigate the impact of background/noise/lurking variables which are not of primary interest in the experiment.  Through our experimental design we hope to maximize the possible signals that could be given by the factor and at the same time to develop as accurate as possible an estimate of the variation in the design space.  The randomization of a design is one of the most powerful tools that an experimenter has in his hip pocket to increase the believability of the results obtained in an experiment.  Why?  The idea is that through randomization we have, in effect, peanut butter spread unaccounted variation/noise/lurking variables across all of the active factors in the experiment and will not allow this to unduly bias (or load up) on any one specific factor.  The best title I have ever seen given to randomization is that it is one of three “experimenters’ insurance plans.”  With randomization we feel much more comfortable stating “…this factor caused the impact on the response (either through average effects or through variability effects).”  There’s much more into the mechanics of DOE that are important to understand for analyzing this ‘signal/noise’ concept.  We’ve just brushed the surface on the numerator.
    What about the denominator, or the noise component?  The noise component is, for a DOE, an estimate of the variation within the design space.  We utilize ANOVA techniques to partition the variation into two major sources.  DOFs give us some insight into the believability of the estimates for our statistics.  Remember there are two primary purposes for stats.  First, these are agreed upon ways to describe data.  Second, they allow us to make decisions with a known amount of risk.  I like to think of DOF as my account balance in my statistical piggy bank.  If I want to know something about my data, it’s going to cost 1 DOF to gain that insight.  After I lose a DOF for the overall average of the response, within the DOE, and the “effects” of the factors, whatever is left over is tossed into estimating my ‘noise.’  The better my estimate of the noise, the more believable are my results when I contrast my signal over that estimate of the noise.  Practically speaking, more data is typically better, so all things considered seeing 2-3 times the DOFs in the within sources of error, is a warm and fuzzy thing that we believe we have as true and estimate of noise as is possible.  There are multiple ways of getting these mysterious DOFs.  Replication is the truest.  There is also multiple center points (as a quasi estimate of the variation within the design space). 
    Well, I hope this has helped to pull back the vail a little.   Best of luck.
    Erik L

Viewing 6 posts - 1 through 6 (of 6 total)

The forum ‘General’ is closed to new topics and replies.