iSixSigma

DOE Sample Size

Six Sigma – iSixSigma Forums Old Forums General DOE Sample Size

Viewing 12 posts - 1 through 12 (of 12 total)
  • Author
    Posts
  • #29775

    Marty
    Participant

    Hello All, we are planning our first DOE as a result of a Six Sigma project.  The team has narrowed down the significant factors and we will conduct a 2 level 6 factor DOE.  This will consist of 32 runs with a resolutioin of V.  Our question is “How many samples to collect for each run?”  Will one sample for each run be sufficiant?  Would we be better served to take an average of 10, 20, or 30 samples for each run?  I have pushed for one sample from each run and believe this to be correct  However, the team is somewhat uneasy about making a decision based on only one sample.  Any help would be appreciated.
    Thanks,
    Marty
     
     
     

    0
    #76848

    James A
    Participant

    Marty,
    I remember my DOE trainer saying that averaging data loses you good information, and the more you average, the more you lose.  I think it was good advice.
    I’d stick to your guns.
    Just my tuppence worth.
    Regards
    James A.

    0
    #76849

    Ropp
    Participant

    Marty –
    It’s a question of power. Do you have MINITAB. If not, find someone with good training in power calculations and run this by them. If you have MINTAB , go into power and sample size calculator and put in your design. It won’t even calculate a power for this design directly because you have no degrees of freedom for error. However, if you enter two center points, which you may or may not run, but will allow the power calculations you will find you have only a 17% chance of seeing a one sigma effect with only one replicate. In other words you will not see a one sigma effect. Your team will be discouraged and think that they have choosen the wrong factors and your credibitlity will be shot. Worse, you may miss a powerful effect. Indeed,to acheive 90% power you will require an effect that is 7.4 times larger than the process standard deviation. Is that the minimum effect you want to detect? Now go back in to the calculator and put down just two replicates. Your power is 97% for finding a one sigma effect.You have 90% power to find an effect that is 0.83 your standard deviation. Is the measurement of this characteristic so costly or difficult that two reps is out of the question? Always, always do power calculations and decide from that the economic sample size to find your required information.I believe the instructor James A. is quoting is either being taken out of context or he is an incompetant hack. Can someone explain how averaging “loses” good information? In fairness to him, I will believe he is being quoted out of context and had some reason or situation to say such an outlandish thing.Perhaps he meant that outliers would not be detected. However, normal residual analysis will cover that. Replication and repetition is the way to acheive statistical validity in the experiment.

    0
    #76850

    Opey
    Participant

    Marty,
    I assume this is supposed to be a screening experiment (if I don’t assume this, I can’t focus too well on the question), and that your project involves finding sources of process variation.
    As such, the purpose of this DOE is to identify the critical Xs that are causing variation in your process.  You said that your team “has narrowed down the significant factors.”  Again, in order to focus, I must assume that this narrowing down was legitimate, based on good knowledge (not opinions or hunches) that none of the excluded factors were critical Xs.
    All this being given, here’s my simple answer: If you can’t see any main effect signals with a 16-run Res IV, there aren’t any there.  A corollary: If you need 32 runs (16 runs is insufficient) to see main effect signals, then the signals aren’t that big.  (This depends upon you varying the 6 factors enough to overcome experimental error.)  If however you vary the factors at least as much as they vary naturally in the process, and you still see no signal, that means your measurement system probably is overwhelming the signal – go to work on that.
    Taking 10, 20 or 30 samples per run will beat down variation from the measurement process.  If your measurement process is still a big source of variation, you need to take a step back and work on that before trying to find significant process variables.  Many a SS project has succeeded by only improving the measurement process.
    Opey

    0
    #76852

    Robert Butler
    Participant

     
         If it is not to difficult to take multiple samples for each experimental condition it is worth the effort if for no other reason than team comfort.  If you take the time to do this then you should do the following:
    1. Label each sample to indicate time order.
    2. Choose the first sample from each group of samples and perform the planned set of measurements.
    2. Keep the other samples in reserve.
    3. If any of the measured results for any particular experiment are “suprising” pull the additional samples and measure them for confirmation.  If the additional samples confirm the initial measurement, put them aside and keep your original measurement.  If the duplicates (note these are NOT replicate measurements because they constitute multiple samples from the same experimental run) do not confirm the initial results you will have to investigate to determine which measurement is correct.
    4. Run your analysis with a single measurement for each independent experimental run from your DOE.
      I wouldn’t recommend averaging anything.  You can hide the world behind an average and never see it.  You also do not want to include all of your duplicate measurements in your analysis.  The reason for this is that your software will interpret these duplicates as genuine replicates and you will wind up with an error estimate based on duplicate, not replicate, variability.  Duplicate variability will be much smaller than replicate variability and the end result will be an analysis that indicates significant terms where none really exist.
      If questions concerning such things as trending over time should arise you can take advantage of your stored samples and do such things as analyze the last sample in each run and then rerun your DOE analysis to see if the model terms change or if there is a significant shift in the coefficients of the original model.

    0
    #76854

    TomF
    Member

    Hi Marty,
    The philosophy regarding DOE’s that I, personally, have bought into is:  Get the information you need at minimal cost.  Remember that DOE’s are expensive.  You want to get results–even if it is that there are no significant factors. 
    This philosophy would suggest that you use one trial and not replicate each trial.  Otherwise, you could run a full factorial and bypass the sophisticated analysis. 
    In the analysis, you are finding the difference in the average of each level balanced across all the other factors.  In other words, you are comparing the averages for each level of a factor for a significant difference against the variation of the process.  For each trial where Factor A is set that the high level, there is a corresponding trial with all the other factor are the same when Factor A is set at the low level. 
    The fact that experiments are balanced and compare averages is what gives DOE’s their strength.
    Some additional thoughts:  I highly recommend doing a gage study and SPC on the process BEFORE starting your DOE.  As stated in an e-mail above, many problems can be traced back to measurement variation.  Since DOE’s are very sensitive, measurement variation can cause havoc on your results.  Likewise, an unstable process (one out of statistical control) could potentially give you misleading results.  DOE’s are most effective when you have reduce the variation of the process under study and measurement process.
    Lastly, I have found it beneficial to run smaller experiments before a large screening experiment, especially if you are working in a production setting.  I know that this may seem contradict to purpose of a screening experiment.  I have found that even under the best of plans that some unanticipated variation creeps into the experiment and decisions will need to be made in middle of the experiment.  You will probably learn as much from observing the experiment as the analysis. 
    Based upon the observations you have made during a “pre-screening” experiment, you can run more effective follow-up experiments.
    I hope that this helps you.
    Tom F

    0
    #76861

    Marty
    Participant

    OOPS! Newbie me posted this in the main forum first.  Here it goes again.
    Thank for your all of you responses.
    Dave- I do have Minitab and have used the  Power and Sample Size  feature for Anova’s and T Tests.  I wasn’t sure how or if I could use it for a DOE.
    Opey- I would say that this is a screening experiment.  We are looking for those X’s that are causing the most influence and variation in the process.  These X’s came from smaller screening trials and past experience on similar products.  I do believe that we need to vary one of the factors more and will propose this to the team.  In the order of the DMAIC cycle we have already done Gage R&R’s  and I’m very satisfied with the measurement system.
    Robert- Good suggestions.  This would help to satisfy both schools of thought on the team.  If I’m reading interpreting your response correctly, I should push for replicates as apposed to duplicates at each run.  I did not know about the variation of duplicates being less than replicates and  I’m not sure I understand why yet.  I’ll look into it futher.
    TomF- We have done some preliminary screening experiements and Gage R&R’s.  I think that you’re suggesting no replicates (sorry if I read this the wrong way).  I thought that a replicate would only help to strengthen the results and make the picture clearer.
    Thanks again for your responses.  I take some of your suggestions to our team meeting.  You’re all right that a DOE can be expensive and time consuming.  I don’t want to  waste either and come up empty handed.
    Marty

    0
    #76864

    TomF
    Member

    Hi Marty,
    I apologize if I didn’t express myself clearly. 
    I believe that in most typical fractional factorial DOE’s replication does not add value to the experimental results.  This is because the ANOVA already calculates and compares averages. 
    It sounds like you have done your homework.  I hope your experiment is successful.
    Tom F

    0
    #76872

    James A.
    Participant

    Dave,
    Ouch!  Am I pleased I didn’t mention the poor guy’s name.  The point he was trying to make was that if you have your feet in the freezer, and your head in the oven, then on data average you could say you’re comfortable.
    In real life this is obviously not the case.  As a subsequent poster said “You can hide the world behind an average, and lose it” (or something like that).
    If you want to use averages in a DOE, then fine, but the results you get may not always get you where you want to be.
    That is the point I was trying to make. Sorry if I upset your sensibilities.
    Regards
    James A.

    0
    #76878

    Robert Butler
    Participant

      I was re-reading all of the posts to this thread last night and while each post is excellent advice I think that all of us are at risk of misleading Marty because several of us (myself included) have used the same term to mean different things.  This becomes apparent when I re-read Marty’s thank you to all of us.
      Given the complexity of the discussion I would first echo Dave’s advice to another poster on a similar topic – get a copy of Box Hunter and Hunter’s book Statistics for Experimenters. 
      I would like to address what I think is a key mis-communications between all of us (if I am in error in my understanding of the previous posts please accept my apologies in advance).
    Replication vs. Duplication
      Central to the discussion was the issue of experimental replication.  A genuine replicate of an experimental design point requires the experimenter to COMPLETELY rerun the experimental condition.  This means that you have to start all over and run the experiment again.  Thus, if you are going to replicate an entire design you will have to run double the number of experiments.  While, as Dave noted, this will drastically increase your power this can also be very costly. 
      The compromise that is often used is to run either a replicated center point (assuming that it is possible to build a center point in the design) or to replicate one or two of the design points in the design.  While you will not be able to detect as small a difference as you may wish, you will still find that you are able to find significant effects if they are indeed present.
      A duplicate is a repeat measure on the same experimental condition.  For example, if I am measuring output viscosity of a process and for a single experimental condition I take repeated measurements on the viscosity of that condition every minute for 15 minutes I am taking a duplicate measurement. Multiple grab samples from the output of a machine for a given experimental condition also constitutes duplicate measurements. If I try to treat the results of these duplicate measurements as replicates what I will do is substitute analytical variance for run-to-run variance.  In general, analytical variance is much smaller than run-to-run and the computer program will use the analytical variance to determine the significance of the effects.  The end result will be that a number of effects will test significant when they really aren’t.
      It is possible to use duplicate measurements in your analysis.  The field is called repeat measures analysis and you will need the services of a highly trained statistician in order to have any hope of doing it.
      If you can get the Box, Hunter, Hunter book check section 10.6 – calculation of standard errors for effects using replicated runs – for further discussion of the difference between duplicate and replicate.  You might also want to read section 10.8 which discusses the ways of getting an estimate of error if no replication can be performed.

    0
    #76888

    Marty
    Participant

    Robert, yes that did clear up some things.  Even in the books I’ve been referencing the duplicate/replicate issue can be confusing.  I thought that the samples collected at each level would be within run variation and not run-to-run variation.
    I’m handling this experiment at our plant for another BB.  It is his project but he’s located at one of ou Michigan plants and most of the manufacturing take place at our Ohio plant.  We did have a meeting yesterday and were able to eliminate 2 more factors based on the input of the design engineer.  The experiment is reduced to a 4 factor 2 level experiment.  We did discuss runs, replicates, and sample size for each condition.  I suggested a full factorial (16 runs) , 1 replicate for power, and 1 sample at each condition for a total of 32 runs.  The BB in charge would like to run a fractional factoral with 8 runs, 1 replicate, and 25 samples at each  condition for a total of .  I asked him why he wanted so many samples and to clarify the purpose of the experiemet.  The purpose is to optimise the process, identify wich variable have the greatest impact and are causing the most variation.  I don’t think that his experiment design is going to give him what he’s looking for.  There will be some confounding of the two way interactions with the fractional design, I don’t know how we can see the variation if we are averaging the samples.  Would it be valid to use the sample std. dev. of each 25 pc sample as an output?  I’m no statistician but this seems wrong somehow.  I could be way off but the design doesn’t look right to me for his purpose and won’t give valid results we can use.
    I will look for the book you both mentiond.  I’ve seen it reference in other posts.  I have been looking through Implementin Six Sigma (F. Breyfogle) and Basic Statistics (Kiemele, Schmidt, & Berdine).  The Basic Stats book does have a section and table for “sample sizes for estimating  variance models with a 95% chance of finding a variance shift factor.”  When k=4 and there are 16 runs, they suggest a sample size of 3 at each condition.
     
     

    0
    #76893

    Robert Butler
    Participant

     The design your BB is proposing – half fraction of a 2^4 – one complete replicate of the entire fractional factorial – 25 samples per condition for a total of 16 runs and 400 samples would permit an assessment of the effects of process changes on the within run variability and an assessment of the impact of factors on process variability too.  The design you are proposing will permit an assessment of the impact of process factors on process variability.
      Given what you have written, it sounds like your BB is confusing within and between run variability.  If within run variability is indeed of concern then as long as you understand that you will have to compute within run variation and run-to-run variation for each experimental condition and model the two types of variation independently you should have no problem. I’ve built and analyzed a number of designs over the years that focused on the issue of variables impacting process variability but I’ve never had to look at within experiment variation. 
      For assessing variables impacting process variability, the approach that I have used is to take the resultant design, add one of the design points to that design (for a design replicate) and then replicate the entire design including the replicate point.  Thus for each design point you will have a two point estimate of the variability associated with that particular experimental condition and you will have a two point estimate for the variability of the replicate point as well.
      If you run a stepwise regression against these computed variabilities you can develop a model describing the process variability as a function of process variables.  You can also use the same data to identify those variables impacting the process mean by running your analysis in the usual manner.
       Since, with this approach, you only have a two point estimate for the variation at each design point you should focus on big hitters first and worry about interactions later.  Both of your designs will only give two point estimates of process variability associated with each design point. Apossible compromise between you and your BB would be to take your full factorial and select those experimental conditions corresponding the the half replicate.  Randomize this half fraction and run them and their full replicate first.  You will have to include a 9th data point from the fraction for purposes of replication of the process variation.  Analyze the data from this and then make a decison as to whether or not you want to continue with the other half of the experiment.

    0
Viewing 12 posts - 1 through 12 (of 12 total)

The forum ‘General’ is closed to new topics and replies.