iSixSigma

2k or CCD

Viewing 18 posts - 1 through 18 (of 18 total)
  • Author
    Posts
  • #55575

    Brendans14a
    Participant

    Hi all,

    I currently run 2k DoE’s a good bit… usually with 3 or 4 factors.

    I always use centerpoints in the designs so I can evaluate for curvature (if required).

    This leads to rough estimates about parameter levels to yield a target response.

    This works well when a confirmatory sample run at the optimum parameter’s is carried out.

    I have been looking at CCD (central composite designs) to replace the 2k DoE’s. From my research I don’t see why not. The plus side with CCD is the options for axial points… or any variation around the design I see fit.

    My question is this:
    Has anyone a good knowledge of 2k versus CCD?
    What are the pros and cons of each?

    Many thanks.

    Brendan.

    0
    #200582

    MBBinWI
    Participant

    @Brendans14a – I assume by “2k” you mean a two level full or fractional factorial design. By adding center-points you can evaluate for curvature. You can even use this information to create a surface plot. The advantage of a CCD is that the “star points” extend outside the region of the corner points, which provides better understanding over the entire design space than a Box-Behnken type design. Generally speaking, if you have good information that the optimal solution is in the middle of the design space, then use a Box-Behnken design. If you don’t know or suspect that the optimal solution might be closer to the outside of the design space, then use a CCD.

    Good luck.

    0
    #200586

    Chris Seider
    Participant

    I would advise against doing a CCD immediately. Confirm the cause/effect of the experimental design and you can always attempt to augment with CCD points later if all factors were statistically significant, etc.

    0
    #200597

    MBBinWI
    Participant

    @cseider – I assumed that if the number of factors was already down to 3 or 4 that a screening design had already been run to weed out inconsequential factors.

    0
    #200598

    Chris Seider
    Participant

    @MBBinWi

    Hey my friend and colleague….I wasn’t considering your post when I gave my advice. :)
    You give good advice, sir.

    0
    #200599

    Sergey
    Participant

    Building on what already said..
    If you run DOE on a regular basis you might suggest curvature during planning based on your experience. If it is highly expected, you may go directly to response surface designs. Box-Benhken can save you some amount of trials (for 3 factors – 15 trilas versus 20 in CCD). The big assumption here that optimal settings are sitting inside the cube for Box-Benhken. Also this design eliminates most extreme points (usually, not feasible – like +1, +1, +1). I believe it works well but didn’t encounter good investigation CCD vs. B-B.

    Any ideas/links on that?

    0
    #200602

    Robert Butler
    Participant

    I’m not aware of any experimental design that assumes the optimal settings are inside the hyper-geometric region of the X matrix. The usual procedure (at least my usual procedure) is that one builds the design, runs the analysis and the resulting equations will point in the direction of optimum settings. Sometimes these settings are inside the hyper-cube but more often the equations point to a region external to the initial design region. Once the direction has been identified I will build a second (smaller) design with the variable settings determined by the equations from the initial design.

    0
    #200605

    Brendans14a
    Participant

    Folks,

    Many thanks for you advice, all is taken on-board.
    My typical DoE approach is as follows:
    1) Run a screening DoE to investigate main effects and identify potential significant factors.
    2) Run a 2k full factorial DoE on significant factors from 1) with center-points to investigate main and interaction effects.
    3) If curvature is detected augment 2k to CCD (not Face points as the process is not restricted my limits) or use BBD if I suspect the optimum is between the factor highs and lows to investigate quadratic effects.
    4) Check R^2 and ensure it is at least 80%. This is simply my opinion as the nature of the studies I carry out should be predictable. Hence, high R^2 values should go without saying. Anyone have any other opinions around this hypothesis??? Do any of you rely on the R^2 for models?
    4) If no curvature is observed, determine optimum factor levels from 2k DoE and run a confirmatory study at the 95% CL & 95% R (i.e. approx. 60 samples for binomial distribution (c=0)) to prove factor levels are correct.

    On another note, which some might have opinions on:
    What I like about surface responses is the contour plots….. A center-point of a plateau (large enough plateau) on a contour plot, to me, indicates an ideal situation for set-points of factors to be picked as this is most likely to reduce variation in the response of interest….. Would this be a fair statement? This is my own personal taking from contour plots.

    Cheers folks.

    0
    #200606

    Robert Butler
    Participant

    If you are looking for a region of minimum variance you will want to run a Box-Meyers analysis. This takes the results from your design and analyzes the output in terms of variance minimization instead of mean maximization. When you have the results of this analysis you can look at the equations describing mean shifts and the equations describing variance shifts and identify those settings that will give you the best trade off between variance minimization and means maximization.

    0
    #200607

    Brendans14a
    Participant

    @RobertButler I really like this!

    I must investigate. Any good references?

    0
    #200608

    Robert Butler
    Participant

    Here you go

    If variance as a response from a DOE is your concern then the most efficient approach to the problem is the Box-Meyers method.
    1. Use a resolution 4 or higher design to avoid confounding interaction and dispersion effects.
    [Note: you can do this with a Res III design but you are now in the realm of gambling. If Res III is all you can get go ahead and run this analysis but keep in mind that you are confounding interaction and dispersion effects]

    2. Fit the best model to the data (that is Y = f(X1, X2, X3, etc)
    3. Compute the residuals
    4. For each level of each factor, compute the standard deviation of the residuals.
    5. Compute the difference between the two standard deviations for each factor (that is the standard deviation for all of the + levels and the standard deviation of all of the – levels)
    6. Rank the differences in #5
    7. Use a Pareto chart, a normal probability plot, or the ratio of the log of the variance of the + to the variance of the – for each factor to determine the important dispersion effects.
    For purposes of control
    1. For factors with important dispersion effects, determine the settings for least variation. and set location effect factors to optimize the response average.
    2. If a factor for minimizing variance is in conflict with a factor for optimizing location use a trade-off study to determine which setting is most crucial to product quality.
    From – Understanding Industrial Designed Experiments 4th Edition section 5-28 to 5-32 -Schmidt and Launsby – 1997

    0
    #200609

    Sergey
    Participant

    @Brendans14a

    Some comments regarding your guidance list
    In 3) it is important to note that CCD can go by augmenting 2k, but BBD will require completely new trials. For overall scheme in that case BBD seems not practical.

    In 4) number one you mention only R-Sqr. The target for it depends on industry, may vary from 50 to 100%. The more important thing is not to rely only on R-Sqr! Significance of the model and all factors and their interactions you will see with p-values in ANOVA table as well as Lack of Fit, Pure Error and Total error are important for model adequacy. Use residuals analysis (graphical-wise may be enough – normal probability plot, histogram, run charts vs. fits and order) looking at some points exposing more than two standard deviations from fit. Based on all these parameters make a decision whether your model is ok.

    Hope it helps.

    0
    #200616

    Brendans14a
    Participant

    @rButler:

    This is great information and I can only assume it took you some time to aquire this knowledge. This is not completely foreign language to me…. But before I practice I have decided to purchase this book you mentioned.
    I like to delve in and be 100% sure of what I do, therefore, reading this article for myselfree is the way.


    @ssobolev
    :

    This is an obvious mistake… how could I possibly augment 2k with BBD, both have a different base design which is not comparable. Thanks for pointing out, that could have been embarrassing.

    Onto the R^2: this is a fair point, I am guessing it is possible to have R^2 values lower than 50% but have a p-value for the lack of kit statistic greater than the significance level (I.e. fail to reject the null hypothesis), meaning the model is statistically fitted by the data gathered, but the variation in the factors making up the model are not severe (high or low) enough to yield a high R^2.
    I may be raving a bit here.

    B.

    0
    #200618

    Robert Butler
    Participant

    Check interlibrary loan first before deciding you have to make a purchase. Essentially, what I’ve done is copy the how-to instructions from the book.

    0
    #200623

    MBBinWI
    Participant

    @rbutler – I doubt that I’m teaching you anything new, but it is typical that the Box-Behnken type DOE is done when the optimal solution is thought to be within the design space, and not close to the corner points, as it only uses corner and center points. The Central Composite, on the other hand, uses “star” points that are outside of the corner points, thus providing some information on the design space at and beyond the extremes of the corner points. While any design can be extrapolated beyond the corners, with less certainty the further away you get, since CCD extends the experimental space, it also expands the area of interpolation.

    0
    #200624

    Robert Butler
    Participant

    Some additional thoughts on the following points:

    4) Check R^2 and ensure it is at least 80%. This is simply my opinion as the nature of the studies I carry out should be predictable. Hence, high R^2 values should go without saying. Anyone have any other opinions around this hypothesis??? Do any of you rely on the R^2 for models?
    4) If no curvature is observed, determine optimum factor levels from 2k DoE and run a confirmatory study at the 95% CL & 95% R (i.e. approx. 60 samples for binomial distribution (c=0)) to prove factor levels are correct.

    Short version: #4 By itself – no, opinion by itself – no, other opinions – yes, by itself -no.
    Short version second #4 – you need to do more in the first #4 before trying anything in the second #4.

    First #4: You have missed a critical part of the practice of regression analysis – graphical analysis of the various relations between the individual X’s and the Y as well as evaluation of the residuals. You can’t properly judge the value of the regression if you don’t run a thorough residual analysis.

    For example – you can have a fantastic R2 which, once you check the residual pattern, will prove to be of no value – the reverse is also true. I don’t know of any statistician who would pass judgement on a regression just by looking at R2. For proof try running a simple regression on the following x,y data pairs (1,.15), (1.1,.2), (2,.22), (1.4,.3), (1.1,.31), (1.6, .17), (1.3, .2), (2.1, .11), (1.8, .14), (8, .9).

    With all 10 points you will get an R2 = .85 and p <.0001. Now run it again without the last x,y pair – poof!! R2 = .17. A check of the residuals in the first case would indicate that the whole regression was depedent on that 10th x,y pair. Essentially the entire data set consists of two points – a fuzzy one in the region of X from 1-2 and another data point at X = 8.

    As for high R2 going without saying – again not true even if we ignore cases with influential data points. The fact of the matter is that many things you have to measure and examine with methods of regression are naturally noisy and there isn’t much you can do about it. As before, you have to plot the data and look at your regression in light of you plots.

    Many years ago I ran an analysis for water contaminates. The R2 was 4%, however, the trend line was in the direction of increasing contamination over time and the slope of the regression line was significantly different from 0. What the analysis showed was that if the pollution level of the contaminate (cyanide) continued to increase at the rate identified by my regression the water would be rendered useless for consumption. The analysis was ignored and about 3 years later the level of the contamination reached the point where the water was not fit for human or animal consumption.

    0
    #200629

    Chris Seider
    Participant

    Remember that one reason for lower R2 might be a poorly measuring device.

    0
    #200663

    MBBinWI
    Participant

    @cseider – and another can be uncontrolled noise factors.

    0
Viewing 18 posts - 1 through 18 (of 18 total)

You must be logged in to reply to this topic.