# DOE problem , very basic question

Six Sigma – iSixSigma Forums Old Forums General DOE problem , very basic question

Viewing 21 posts - 1 through 21 (of 21 total)
• Author
Posts
• #41486

Puzzled
Participant

Dear all ,
4 replicates for each run , 33 runs.
For 3 of the runs one data point is not on the normal line.
Four points aren’t much to define an outlier.
Should I delete the points out of the normal line or should I keep them ?
Best regards

0
#130100

Deep
Participant

Puzzled:
You said>>4 replicates for each run , 33 runs

What kind of design is this? If you have 4 replicates then the runs should be an even number, because you multiply anything with an even will give you an even number.
Regarding the normality question, ANOVA does not need the data points to be normal, but the residuals. Please search this forum for ANOVA assumptions. Read those threads and you will get a better understanding.
Deep

0
#130102

Puzzled
Participant

Thanks Deep ,
4 replicates for each run , 33 runs
4*33= 132 experiments
Sorry might be I used the wrong words.
Ok for the assumption about the residuals to be normaly distributed.
Should I then delete the experiments whose residuals are not normally distributed and re-run the analysis ?
Thnaks

0
#130103

Jered Horn
Participant

Can you give more details?  33 runs is not the result of a “typical” design.  Why 4 replicates?  Not sure why people are so keen on throwing out data.  If you have a significant error component in your results, you most likely don’t have the right factors in your experiment.  Throwing out data to reduce error is not a good practice.  This is Six Sigma, not accounting…we’re supposed to be analyzing data, not manipulating it.

0
#130105

Puzzled
Participant

13 factors in four blocks.
2 level
center points yes
std fractional factorial would be 32 + 1 ( center point ) -> 33 runs
4 replicates ( I am an even number freak )
Does this sound so odd ?

0
#130106

Mikel
Member

Kind of makes you sorry you asked for help, doesn’t it?
Your description is not very clear. Did you do a normality test of the residuals and if so could you tell us what it said. Minitab also should have given you unusual obervations associated with the points that are troubling you. What is the information associated with that?

0
#130108

Jered Horn
Participant

No, that’s not really odd.  Seems like a lot of factors, and 4 replicates is more than I’ve ever used.  If 132 runs, while manipulating 13 factors doesn’t intimidate you, then more power to you.
I still wouldn’t recommend throwing out data.  Especially when (what did you say?) 3 or 4 of your runs exhibited this “outlier” phenomenon.
However, if you do throw out those data points, and your analysis shows significant factors with a small error component to the variation, I’d go all out with your verification run(S).  Make sure those outliers don’t crop up again.

0
#130110

Puzzled
Participant

‘Stan I didn’t mean to be rude if this is what you mean.
Apologize.
This is what I get when I am not deleting any point( experiment )

Is the graph attached ? I can’t see it.
As you can see , hopefully , there points that are outlier and , as a matter of fact Minitab displays this results :
Factorial Fit: 492-620 versus Block, Atenolol, Bezafibrate, …
Estimated Effects and Coefficients for 492-620 (coded units)
Term Effect Coef SE Coef T P
Constant 0.77279 0.006808 113.52 0.000
Block 1 0.16291 0.010108 16.12 0.000
Block 2 -0.02385 0.010022 -2.38 0.019
Block 3 0.03979 0.010022 3.97 0.000
Atenolol 0.00598 0.00299 0.006808 0.44 0.661
Bezafibrate 0.02254 0.01127 0.006808 1.66 0.100
Carbamazepine 0.00746 0.00373 0.006808 0.55 0.585
Ciprofloxacin -0.02692 -0.01346 0.006808 -1.98 0.050
cyclophosphamide -0.00942 -0.00471 0.006808 -0.69 0.490
furosemide 0.01464 0.00732 0.006808 1.08 0.284
hydrochlorothiazide -0.00973 -0.00487 0.006808 -0.71 0.476
ibuprofen -0.01286 -0.00643 0.006808 -0.94 0.346
lincomycin -0.02714 -0.01357 0.006808 -1.99 0.048
ofloxacin -0.01714 -0.00857 0.006808 -1.26 0.210
ranitidine 0.00536 0.00268 0.006808 0.39 0.694
salbutamol -0.00308 -0.00154 0.006808 -0.23 0.821
sulfamethoxazole -0.00621 -0.00310 0.006808 -0.46 0.649
Ct Pt -0.00279 0.012994 -0.21 0.830

S = 0.0766785 R-Sq = 74.81% R-Sq(adj) = 72.08%

Analysis of Variance for 492-620 (coded units)
Blocks 3 2.64563 2.63996 0.879988 149.67 0.000
Main Effects 13 0.09577 0.09576 0.007366 1.25 0.248
Curvature 1 0.00027 0.00027 0.000271 0.05 0.830
Residual Error 157 0.92310 0.92310 0.005880
Lack of Fit 18 0.07241 0.07241 0.004023 0.66 0.847
Pure Error 139 0.85068 0.85068 0.006120
Total 174 3.66477

Unusual Observations for 492-620
Obs StdOrder 492-620 Fit SE Fit Residual St Resid
2 2 0.54000 0.93291 0.01499 -0.39291 -5.22R
89 89 1.03000 0.84084 0.02728 0.18916 2.64R
160 160 1.23000 0.67818 0.02727 0.55182 7.70R
169 169 0.49000 0.67818 0.02727 -0.18818 -2.63R
R denotes an observation with a large standardized residual.
Now , the question is : should I delete the unusual ( large residual ) observation and re-run the analysis ?
Thanks

0
#130112

Participant

Puzzled,
A more statistically valid approach to screening your data would be to construct an Xbar and R chart of the experimental data.  An out of control point on the range chart would indicate an unusal observation within a specific run.  Provided no out of control points are identified, your data is what it is.  If out of control points are identified you have two options.  Review your experimental run log, searching for an additional X that has that ability to significantly move your process and include that X in futher experimentation.  In the event that a probable root cause can not be identified, replace the suspect value with the average of the three remaining data points and rerun the analysis.

0
#130114

Jered Horn
Participant

No, the graph(s) didn’t show up for me either.
Go ahead and try dropping those “unusual observations” points, and rerun the analysis.  Doing that can’t really hurt.  You just have to be careful how you interpret the results.  Post them here when you do that…maybe we’ll be able to help.
Minitab may not react very well to deleting data from the results.  I don’t recall trying this myself.  If you have to change the design, it won’t be pretty.  Maybe there’s an easy way to accomplish this that I’m not aware of.

0
#130115

Deep
Participant

Puzzled:

I have some questions here. Why you are worried about normality now? I am assuming that you wanted to delete the data with high residuals because the residuals are not normal. What is the aim for this study?
From your analysis it is clear that you havent reduced your model or anything.
Why are you checking normality and other assumptions before reducing the model?
If you new to DOE please read through the different DOE posts here also try to get a good book.
Deep.

0
#130116

Jered Horn
Participant

Puzzled,
replace the suspect value with the average of the three remaining data points and rerun the analysis…from Ironhead793
That’s the Minitab method you’d have to use if you delete data points from your results.  The rest of what he said is the right approach as well.
What does it mean to you if your blocks are as significant as your current analysis says?  For me, that’s usually bad news because they are most likely factors I can’t control with current technology.

0
#130118

Puzzled
Participant

Thanks all ,
here follow the results for the reduced model ( three factors instead of 13 )
Factorial Fit: 492-620 versus Block, Bezafibrate, Ciprofloxacin, …
Estimated Effects and Coefficients for 492-620 (coded units)
Term Effect Coef SE Coef T P
Constant 0.77277 0.006715 115.09 0.000
Block 1 0.16286 0.009966 16.34 0.000
Block 2 -0.02383 0.009887 -2.41 0.017
Block 3 0.03980 0.009887 4.03 0.000
Bezafibrate 0.02259 0.01129 0.006715 1.68 0.094
Ciprofloxacin -0.02696 -0.01348 0.006715 -2.01 0.046
lincomycin -0.02710 -0.01355 0.006715 -2.02 0.045
Ct Pt -0.00277 0.012819 -0.22 0.829

S = 0.0756565 R-Sq = 73.92% R-Sq(adj) = 72.82%

Analysis of Variance for 492-620 (coded units)
Blocks 3 2.64563 2.64150 0.880499 153.83 0.000
Main Effects 3 0.06298 0.06296 0.020987 3.67 0.014
Curvature 1 0.00027 0.00027 0.000267 0.05 0.829
Residual Error 167 0.95589 0.95589 0.005724
Lack of Fit 12 0.05715 0.05715 0.004762 0.82 0.628
Pure Error 155 0.89874 0.89874 0.005798
Total 174 3.66477

Unusual Observations for 492-620
Obs StdOrder 492-620 Fit SE Fit Residual St Resid
2 2 0.54000 0.93286 0.01478 -0.39286 -5.29R
89 89 1.03000 0.82831 0.01664 0.20169 2.73R
160 160 1.23000 0.63226 0.01663 0.59774 8.10R
R denotes an observation with a large standardized residual.
I still have three cases where the residual is very big.
Shall I keep these cases in the analysis ?
Yes I am new to DOE and the reason why I am concerned about the normality of my data is that in these experimentations is not difficult to get outliers.
Rgds

0
#130119

Puzzled
Participant

HornJM ,
when experimenting with living things ( cells ) it is very important to keep under control the variation coming from natural cycles.
So if you expose cells to drugs it is not uncommon that you get one answer today and a rather different one tomorrow.
My experiment wants to see effects keeping the big variation under control ( mathematically , I hope ).

0
#130127

Robert Butler
Participant

Thanks for providing the table with the reduced model.  Based on that it appears you are concerned about three data points with high residuals.  With the results from 132 experiments the fact that 3 of the results exceed the 95% limits shouldn’t, in itself, be a cause for alarm.
The bigger question is – are these points influential – i.e. are they the tails wagging the dog.  To answer this you will need to look at your residual plots and see what they are doing to them.  If they are not skewing the plots in any way they probably don’t matter.  If you are still concerned about their effect – re-run the analysis with all of them coded as missing and see what happens to the terms in the model and their levels of significance.
One comment: I’d recommend re-running your analysis with data from just a single replicate.  Take a look at the final model and then do it all over again with just some of the runs from any one of the other replicates and then will all of the data from another replicate.  Take a look at how your model terms are behaving.  If you don’t see any big changes in the final model as you work through this exercise it would suggest you don’t need to run as many replicates as you have.

0
#130136

Puzzled
Participant

Thank you very much Robert ,
I am pretty sure that 4 replicates might seem far too many but please remember that I am working with cell cultures and it is quite common to get contaminations , low adhesion, unsual behaviours and so on.
replicates are in this case very very cheap.
thanks to all those who contributed

0
#130137

Robert Butler
Participant

I agree, replicates in that case are cheap. However, there is something else you might have to consider. Since you are working with cell cultures is there any chance that your measurements are count data?  For example, are you taking cultured cells and staining them with a reagent that changes color if the desired chemical output is present and then counting the number of sites per area that have changed color?  If you are, then you probably don’t want to run a standard OLS regression for model development.
In the biostatistical world for data of this type, we usually use regression methods that allow us to specify the underlying distribution of the Y’s.  For count data I usually do negative binomial regression.  If your program has this capability you might want to run your data through it as well and see if there are any big differences between the final models from the two methods.

0
#130164

Puzzled
Participant

Robert ,
I am not really staining cells , I feed the cells with a chemical that is converted by cell’s oxidative system into a colored substance.
The absorption at a given wavelenght is proportional to the number of living cells.My Y is continuous and it is not a count , what do you think ?Am I on the right track ?
All the best

0
#130165

Bill Craig
Participant

Puzzled,
For a basic question, you sure got lots of inputs!  My suggestion would be do use another set of model adequacy checks to peel the onion further. Aside from the normal probality plot of residuals, you can look at the residuals versus run order, and versus the factors. Try to explain why the outliers happened. There is some valuable information to be had here!
Best of luck….

0
#130166

Bill Craig
Participant

OK..it is 12:50 am! Here is my spell checked and grammar checked version!
For a basic question, you sure received lots of inputs!  My suggestion would be to use another set of model adequacy checks to peel the onion further. Aside from the normal probability plot of residuals, you can look at the residuals versus run order, and versus the factors. Try to explain why the outliers happened. There is some valuable information to be had here!
Best of luck….

0
#130184

Robert Butler
Participant

In your case it sounds like you are OK with an OLS regression.  However, I would recommend you really look at your residual plots – not just the usual residual vs predicted but residual by independent variable and, if you know the time order particularly the time order of the replicates – residual by time.  I would also want to make sure that the time duration for cell growth from replicate to replicate was “about” the same and I would check this by plotting the residuals for the replicates in order of their time duration.
All of this plotting will help you identify influential points (if there are any) and it will also tell you if there is an underlying time component which you may have to take into account when building your final model.

0
Viewing 21 posts - 1 through 21 (of 21 total)

The forum ‘General’ is closed to new topics and replies.