I recently have run a customized Factorial Desgin with 4 factors, 2 level, and 6 replicates. Due to material shortages, we need to cut in half the number of replicates for certain factors and we also have to remove some of the 2-way interactions, so we ended up with a total of 48 runs.
As a result, the SE Coef is different for some of the factors (affected by the customization) and they become bigger than the other factors, what is actually the best way to calculate in this case the SE Coef for those factors?
And by the way, who could help to further advise on SECoef per the below link from minitab?
As noted in the Minitab link you provided the number of degrees of freedom for the computation of the SE is a function of the number of center points and corner points. If these counts vary for the predictor terms of interest then the degrees of freedom for the estimates of the SE of the various coefficients will vary as well. This, in turn, will result in different estimates of SE. The best way to compute the SE is the way it is done in Minitab.
I’m not trying to do Monday morning quarterbacking but the description of your research effort gives the impression of very poor planning and execution. 4 factors at 2 levels would be a total of 16 experiments. A full replication of that full factorial design would have been 32 experiments. Calling for six full replicates is extreme overkill and suggests a complete lack of understanding of the point of designed experimentation.
If you still have the data and if you actually ran all of the points of the design the first time through then I would recommend you re-run the analysis on just those points and see what you see. Next, add just a couple of replicates of individual points and see what happens. Unless you have a very odd situation, the difference between what you see with the above approach and what you saw with all of your data should be minimal to non-existent (we are assuming here that you are running a complete regression analysis and not just plugging numbers into a computer and generating output).
Thank you for the reply.
The reason for having 6 replicates for 4 factors at 2 levels, is becasue of the large level of varibility in the expected experimental errors, and if were had only 2 replicates, we would have had a completly different output, due to the nature of the DUT involved since the testing is highly subjected to outside envrioment and other components in the same products and having 2 replicate may not represent the real leve of noise, and we might have ended up seeing high P-values for most of the factors. So the purpose is really to maintain a high level of POWER (>80%) to ensure we capture the significant factors.
With respect to the best way of estimating Se Coff, I know MINITAB is the best (easist) way but could you advise manually how this can be done for a customized FFD? I can send you the raw data and would like to have your help on it, my understanding is that a reduced level of Degrees of Freedom for any given factor-level combinations will reduce its experimental POWER therefore we are more likly to accept the NULL.
In my reduced experiment with 48 total number of runs, the factors with 24 runs at each level (‘+’ / ‘-‘) MINITAB shows an ‘n’ of 32, whereas the reduced factor(s) with 36 runs at ‘+’ and 12 at ‘-‘ has a lower ‘n’ of 24, which to me makes sense, but how this is calculated based upon the MSE?
Do you want to have the data?
The problem here appears to be a misapprehension of the concept of power. Power does not guarantee an alpha level, it has nothing to do with ensuring the capture of significant factors and it has nothing to do with being more likely to accept a null. If you run an experiment and if you get an effect that meets your criteria of significance (say alpha < .05), then that effect is going to test out with an alpha < .05 regardless of the power.
Power is a measure of the confidence that, should you completely re-run your experiment, you will find the same effect to be significant at an alpha close to the level of significance you found the first time around. The philosophy of experimental design focuses on the identification of significant effects (alpha level) and beta (or power = 1-beta) is left to fend for itself.
Operating in this manner guarantees minimum effort for maximum return. Should the initial effort find one or more effects to be significant the usual procedure is to run a post-hoc power analysis to get a sense of the odds of repeated success with respect to finding significance on the re-run and then, if the statistically significant effect is deemed to be of value one runs a series of confirmation experiments. To this end I can only repeat what I said in the first post – try running the analysis with just the results from the first run and all or part of the first replicate and see if the results differ appreciably from the results when using all of the design points.
As for computing SE of the coefficients there is, as far as I know, just one way to do it and that method is independent of data structure. A description of how to do this manually for any particular block of data is beyond the scope of a forum of this type. Books that discuss regression in detail will have this information. One place to start would be Applied Regression Analysis 2nd Edition Draper and Smith pp. 24-28 and, the matrix equivalent pp. 82-83.
Some thoughts on the data you do have:
If we assume that you ran your regression analysis properly and that you did all of the things you are supposed to do with regards to residual analysis (plots, LOF, etc.) then there are still some things that might be worth checking.
1.If a system is noisy and there isnt much you can do about the noise then you should focus your investigation on main effects because as the noise level increases the first things to get lost in the noise are the effects of interactions.
2.In your case, the description of the noise does suggests the following might be worth considering:.
Lay out a plot matrix in the following form:
………………………….Levels of X3
Levels of X2….Y |
(You’ll have to pardon all of the dots – it’s the only way I seem to be able to trick this message program into showing what I want it to show. – The plot matrix would be a series of Y vs. X1 plots for each combination of X2 and X3 with the entire matrix run again for X4.)
And then plot, using different symbols, the values for each of you replicate runs. What you are looking for is between replicate shifts in the data. In other words, systematic shifts as you go from replicate to replicate. If you find a situation where one or more replicates exhibit a significant shift away from the rest of the data set eliminate these runs and repeat the analysis. If you have the interesting case where there appears to be a relatively constant shift over time then put in a dummy identifier for time for the replicates in the proper time order and run the analysis with this factor included.
If you dont see complete systemic shifts from replicate-to-replicate, look at the plots to see if there are certain combinations whose combined plots give a visual appearance of variation well in excess of the rest of the design. If this occurs delete all of the data points with that particular combination and re-run the analysis. Obviously you will not be able to test all of the interactions when you do this but you may see an improvement in your results.
3. You said the system was noisy. If this applies to the X’s as well then you will need to make sure the program is using the actual X values and not the idealized X values from the design matrix. If you just plug in the ideal matrix of values and not the actual levels at which the experiments were run then the machine will assume the ideal are real and you will mask the actual variation of the experimental effort. This, in turn, can cause problems with term significance.
I re-tested my FFD after reducing the replicates from 6 down to 2, randomly removing some of the runs leaving the rest for the re-test. See below MINITAB effect analysis for both the initial (1st table) and the re-test run (2nd table), from a statistically perspective I would agree with your comments that the re-tested results seem to make more sense since main factor C2 and C4 turned out to be both statistically significant and these are the exactly factors I considered to be practically significant (I used the absolute effect and compared them to our engineering specifications), the problem with the initial test is that due to the extremely high POWER (>99%), it basically considered all factors to be significant (high discrimination power or able to detect smaller differences in effects), whereas the re-test has a POWER of about 83%, therefore it seems to have approriaptely distinguished significant ones from insignificant ones.
What is your thoguths on deciding the most appropriate number of replicates? Typically would be 2 but how far we should go? Should it be based upon the POWER since as we have more replicates, the POWER goes up and if it becomes too high, it will over-identify factors to be significant as we experienced in my FFD.
With respect to the minimum efforts with maximum results, since our product line was batch-based and if we were to produce 1 piece, we need to produce 1 batch anyway, so from a material perspective there is little to none difference, just testing and analysis takes a bit more time.
Term Effect Coef SE Coef T P
Constant -99,29 0,4573 -217,09 0,000
C1 2,21 1,10 0,5281 2,09 0,044
C2 5,56 2,78 0,4573 6,08 0,000
C3 3,64 1,82 0,4573 3,98 0,000
C4 -13,65 -6,82 0,4573 -14,92 0,000
C1*C3 -6,84 -3,42 0,5281 -6,48 0,000
C1*C4 -0,86 -0,43 0,5281 -0,81 0,422
C2*C3 -3,13 -1,56 0,4573 -3,42 0,002
C2*C4 0,24 0,12 0,4573 0,26 0,797
C3*C4 -0,59 -0,29 0,4573 -0,64 0,525
C1*C3*C4 2,56 1,28 0,5281 2,42 0,021
C2*C3*C4 – 2,10 -1,05 0,4573 -2,30 0,027
Term Effect Coef SE Coef T P
Constant -98,88 0,6818 -145,03 0,000
C1 -0,76 -0,38 0,6818 -0,56 0,586
C2 -4,79 -2,39 0,6818 -3,51 0,004
C3 -2,28 -1,14 0,6818 -1,67 0,121
C4 13,70 6,85 0,6818 10,05 0,000
C1*C3 -6,06 -3,03 0,6818 -4,45 0,001
C1*C4 -1,81 -0,91 0,6818 -1,33 0,208
C2*C3 -1,56 -0,78 0,6818 -1,15 0,274
C2*C4 0,71 0,36 0,6818 0,52 0,611
C3*C4 -0,32 -0,16 0,6818 -0,24 0,816
C1*C3*C4 -3,61 -1,81 0,6818 -2,65 0,021
C2*C3*C4 1,94 0,97 0,6818 1,42 0,181
If I understand your post correctly you ran a single batch and then took multiple samples from that batch and chose to call those multiple samples replicates. If this is correct then the fact that everything is significant isn’t surprising because you don’t have replicates-you have repeated measures and these are a very different beast.
With repeated measures the program is interpreting the variability between the repeated measures as a measure of the the actual process variability. Repeated measure variability is much smaller than the actual variability of the process. Since the machine can’t tell the difference it is using repeated measure variation for testing and not the actual experimental variation and the results will be exactly as described.
Again, as I mentioned in the earlier post – this is not about power and the assumption that too many samples will result in too much power which will, in turn, make everything significant is in error.
Before saying anything more – could you either confirm the above with respect to sample selection or, if this isn’t the case, provide a more detailed description of the process of making/selecting replicate samples?
If your replicates really are repeated measures then you will have to analyze the data in a completely different fashion.
The replicates are not simply the repeats of the samples, the samples are taken from each panel produced as a standard size, but the DOE dependent response is measuring the complete higher-level assembly of which the samples are a part, so the replicates basically measure the variability in the entire assembly process involving many other noise factors like assembly and other materials infludencing the dependent response.
In any hypothesis testing, an increase in the sample size ‘n’ will definitely increase the likelyhood of rejecting the ‘Null’, therefore increasing the POWER, as is domenstrated in many of the DOE I have run including these customized FFD, the tables I posted last time clearly show a difference in P-values for the same effects when ‘n’ changes (replicates in 2-way ANOVA).
And still, do you have any adivse on deciding the most appropriate number of replciates for a given DOE? Now I tend to think of using estimated POWER to estimate the numbers, but we need some historical data on variation and also of course the minimum difference you intend the DOE to be able to detect.
I’m now planning another new DOE with 2 factors and 2 levels, and I’m trying to determine the number of replicates required based upon an estimated StdDev of 3 (historical data) and a minimum difference of 4 that I want my DOE to be able to detect, I used MINITAB to calculat that with a preferred POWER level of 0,8. The result is that I will need about 10 replciates for the DOE to be able to generate the statistical output that can distingusih significant ones from insignificant ones.
Do you think this numebr of replicates is reasonable since it seems to be in contradiction with your previous statement that too many replicates would overkill the experiment? I admit that my previous DOE had excessive duplciates since after randomly reducing the duplicates the statistical output (P-value) is more reflective of our engineering concusion.
My understanding of this is that we can run multiple replicates (>2 or even up to 10 or more) but it all depends on other parameters (StdDev, minimum difference, desired power level) which vary from one DOE to another.