iSixSigma

DOE problem , very basic question

Six Sigma – iSixSigma Forums Old Forums General DOE problem , very basic question

Viewing 21 posts - 1 through 21 (of 21 total)
  • Author
    Posts
  • #41486

    Puzzled
    Participant

    Dear all ,
    4 replicates for each run , 33 runs.
    For 3 of the runs one data point is not on the normal line.
    Four points aren’t much to define an outlier.
    Should I delete the points out of the normal line or should I keep them ?
    Best regards

    0
    #130100

    Deep
    Participant

    Puzzled:
    You said>>4 replicates for each run , 33 runs
     
    What kind of design is this? If you have 4 replicates then the runs should be an even number, because you multiply anything with an even will give you an even number.
    Regarding the normality question, ANOVA does not need the data points to be normal, but the residuals. Please search this forum for ANOVA assumptions. Read those threads and you will get a better understanding.
    Deep

    0
    #130102

    Puzzled
    Participant

    Thanks Deep ,
    4 replicates for each run , 33 runs
    4*33= 132 experiments
    Sorry might be I used the wrong words.
    Ok for the assumption about the residuals to be normaly distributed.
    Should I then delete the experiments whose residuals are not normally distributed and re-run the analysis ?
    Thnaks

    0
    #130103

    Jered Horn
    Participant

    Can you give more details?  33 runs is not the result of a “typical” design.  Why 4 replicates?  Not sure why people are so keen on throwing out data.  If you have a significant error component in your results, you most likely don’t have the right factors in your experiment.  Throwing out data to reduce error is not a good practice.  This is Six Sigma, not accounting…we’re supposed to be analyzing data, not manipulating it.

    0
    #130105

    Puzzled
    Participant

    13 factors in four blocks.
    2 level
    center points yes
    std fractional factorial would be 32 + 1 ( center point ) -> 33 runs
    4 replicates ( I am an even number freak )
    Does this sound so odd ?

    0
    #130106

    Mikel
    Member

    Kind of makes you sorry you asked for help, doesn’t it?
    Your description is not very clear. Did you do a normality test of the residuals and if so could you tell us what it said. Minitab also should have given you unusual obervations associated with the points that are troubling you. What is the information associated with that?

    0
    #130108

    Jered Horn
    Participant

    No, that’s not really odd.  Seems like a lot of factors, and 4 replicates is more than I’ve ever used.  If 132 runs, while manipulating 13 factors doesn’t intimidate you, then more power to you.
    I still wouldn’t recommend throwing out data.  Especially when (what did you say?) 3 or 4 of your runs exhibited this “outlier” phenomenon.
    However, if you do throw out those data points, and your analysis shows significant factors with a small error component to the variation, I’d go all out with your verification run(S).  Make sure those outliers don’t crop up again.
     

    0
    #130110

    Puzzled
    Participant

    ‘Stan I didn’t mean to be rude if this is what you mean.
    Apologize.
    This is what I get when I am not deleting any point( experiment )
     
     
    Is the graph attached ? I can’t see it.
    As you can see , hopefully , there points that are outlier and , as a matter of fact Minitab displays this results :
    Factorial Fit: 492-620 versus Block, Atenolol, Bezafibrate, …
    Estimated Effects and Coefficients for 492-620 (coded units)
    Term Effect Coef SE Coef T P
    Constant 0.77279 0.006808 113.52 0.000
    Block 1 0.16291 0.010108 16.12 0.000
    Block 2 -0.02385 0.010022 -2.38 0.019
    Block 3 0.03979 0.010022 3.97 0.000
    Atenolol 0.00598 0.00299 0.006808 0.44 0.661
    Bezafibrate 0.02254 0.01127 0.006808 1.66 0.100
    Carbamazepine 0.00746 0.00373 0.006808 0.55 0.585
    Ciprofloxacin -0.02692 -0.01346 0.006808 -1.98 0.050
    cyclophosphamide -0.00942 -0.00471 0.006808 -0.69 0.490
    furosemide 0.01464 0.00732 0.006808 1.08 0.284
    hydrochlorothiazide -0.00973 -0.00487 0.006808 -0.71 0.476
    ibuprofen -0.01286 -0.00643 0.006808 -0.94 0.346
    lincomycin -0.02714 -0.01357 0.006808 -1.99 0.048
    ofloxacin -0.01714 -0.00857 0.006808 -1.26 0.210
    ranitidine 0.00536 0.00268 0.006808 0.39 0.694
    salbutamol -0.00308 -0.00154 0.006808 -0.23 0.821
    sulfamethoxazole -0.00621 -0.00310 0.006808 -0.46 0.649
    Ct Pt -0.00279 0.012994 -0.21 0.830
     
    S = 0.0766785 R-Sq = 74.81% R-Sq(adj) = 72.08%
     
    Analysis of Variance for 492-620 (coded units)
    Source DF Seq SS Adj SS Adj MS F P
    Blocks 3 2.64563 2.63996 0.879988 149.67 0.000
    Main Effects 13 0.09577 0.09576 0.007366 1.25 0.248
    Curvature 1 0.00027 0.00027 0.000271 0.05 0.830
    Residual Error 157 0.92310 0.92310 0.005880
    Lack of Fit 18 0.07241 0.07241 0.004023 0.66 0.847
    Pure Error 139 0.85068 0.85068 0.006120
    Total 174 3.66477
     
    Unusual Observations for 492-620
    Obs StdOrder 492-620 Fit SE Fit Residual St Resid
    2 2 0.54000 0.93291 0.01499 -0.39291 -5.22R
    89 89 1.03000 0.84084 0.02728 0.18916 2.64R
    160 160 1.23000 0.67818 0.02727 0.55182 7.70R
    169 169 0.49000 0.67818 0.02727 -0.18818 -2.63R
    R denotes an observation with a large standardized residual.
    Now , the question is : should I delete the unusual ( large residual ) observation and re-run the analysis ?
    Thanks

    0
    #130112

    Ironhead793
    Participant

    Puzzled,
    A more statistically valid approach to screening your data would be to construct an Xbar and R chart of the experimental data.  An out of control point on the range chart would indicate an unusal observation within a specific run.  Provided no out of control points are identified, your data is what it is.  If out of control points are identified you have two options.  Review your experimental run log, searching for an additional X that has that ability to significantly move your process and include that X in futher experimentation.  In the event that a probable root cause can not be identified, replace the suspect value with the average of the three remaining data points and rerun the analysis.

    0
    #130114

    Jered Horn
    Participant

    No, the graph(s) didn’t show up for me either.
    Go ahead and try dropping those “unusual observations” points, and rerun the analysis.  Doing that can’t really hurt.  You just have to be careful how you interpret the results.  Post them here when you do that…maybe we’ll be able to help.
    Minitab may not react very well to deleting data from the results.  I don’t recall trying this myself.  If you have to change the design, it won’t be pretty.  Maybe there’s an easy way to accomplish this that I’m not aware of.

    0
    #130115

    Deep
    Participant

    Puzzled:
     
    I have some questions here. Why you are worried about normality now? I am assuming that you wanted to delete the data with high residuals because the residuals are not normal. What is the aim for this study?
    From your analysis it is clear that you haven’t reduced your model or anything.
    Why are you checking normality and other assumptions before reducing the model?
    If you new to DOE please read through the different DOE posts here also try to get a good book.
    Deep.

    0
    #130116

    Jered Horn
    Participant

    Puzzled,
    replace the suspect value with the average of the three remaining data points and rerun the analysis…from Ironhead793
    That’s the Minitab method you’d have to use if you delete data points from your results.  The rest of what he said is the right approach as well.
    What does it mean to you if your blocks are as significant as your current analysis says?  For me, that’s usually bad news because they are most likely factors I can’t control with current technology.

    0
    #130118

    Puzzled
    Participant

    Thanks all ,
    here follow the results for the reduced model ( three factors instead of 13 )
    Factorial Fit: 492-620 versus Block, Bezafibrate, Ciprofloxacin, …
    Estimated Effects and Coefficients for 492-620 (coded units)
    Term Effect Coef SE Coef T P
    Constant 0.77277 0.006715 115.09 0.000
    Block 1 0.16286 0.009966 16.34 0.000
    Block 2 -0.02383 0.009887 -2.41 0.017
    Block 3 0.03980 0.009887 4.03 0.000
    Bezafibrate 0.02259 0.01129 0.006715 1.68 0.094
    Ciprofloxacin -0.02696 -0.01348 0.006715 -2.01 0.046
    lincomycin -0.02710 -0.01355 0.006715 -2.02 0.045
    Ct Pt -0.00277 0.012819 -0.22 0.829
     
    S = 0.0756565 R-Sq = 73.92% R-Sq(adj) = 72.82%
     
    Analysis of Variance for 492-620 (coded units)
    Source DF Seq SS Adj SS Adj MS F P
    Blocks 3 2.64563 2.64150 0.880499 153.83 0.000
    Main Effects 3 0.06298 0.06296 0.020987 3.67 0.014
    Curvature 1 0.00027 0.00027 0.000267 0.05 0.829
    Residual Error 167 0.95589 0.95589 0.005724
    Lack of Fit 12 0.05715 0.05715 0.004762 0.82 0.628
    Pure Error 155 0.89874 0.89874 0.005798
    Total 174 3.66477
     
    Unusual Observations for 492-620
    Obs StdOrder 492-620 Fit SE Fit Residual St Resid
    2 2 0.54000 0.93286 0.01478 -0.39286 -5.29R
    89 89 1.03000 0.82831 0.01664 0.20169 2.73R
    160 160 1.23000 0.63226 0.01663 0.59774 8.10R
    R denotes an observation with a large standardized residual.
    I still have three cases where the residual is very big.
    Shall I keep these cases in the analysis ?
    Yes I am new to DOE and the reason why I am concerned about the normality of my data is that in these experimentations is not difficult to get outliers.
    Rgds

    0
    #130119

    Puzzled
    Participant

    HornJM ,
    when experimenting with living things ( cells ) it is very important to keep under control the variation coming from natural cycles.
    So if you expose cells to drugs it is not uncommon that you get one answer today and a rather different one tomorrow.
    My experiment wants to see effects keeping the big variation under control ( mathematically , I hope ).
     

    0
    #130127

    Robert Butler
    Participant

      Thanks for providing the table with the reduced model.  Based on that it appears you are concerned about three data points with high residuals.  With the results from 132 experiments the fact that 3 of the results exceed the 95% limits shouldn’t, in itself, be a cause for alarm.
     The bigger question is – are these points influential – i.e. are they the tails wagging the dog.  To answer this you will need to look at your residual plots and see what they are doing to them.  If they are not skewing the plots in any way they probably don’t matter.  If you are still concerned about their effect – re-run the analysis with all of them coded as missing and see what happens to the terms in the model and their levels of significance.
      One comment: I’d recommend re-running your analysis with data from just a single replicate.  Take a look at the final model and then do it all over again with just some of the runs from any one of the other replicates and then will all of the data from another replicate.  Take a look at how your model terms are behaving.  If you don’t see any big changes in the final model as you work through this exercise it would suggest you don’t need to run as many replicates as you have.

    0
    #130136

    Puzzled
    Participant

    Thank you very much Robert ,
    I’ll follow the advice given .
    I am pretty sure that 4 replicates might seem far too many but please remember that I am working with cell cultures and it is quite common to get contaminations , low adhesion, unsual behaviours and so on.
    replicates are in this case very very cheap.
    thanks to all those who contributed

    0
    #130137

    Robert Butler
    Participant

      I agree, replicates in that case are cheap. However, there is something else you might have to consider. Since you are working with cell cultures is there any chance that your measurements are count data?  For example, are you taking cultured cells and staining them with a reagent that changes color if the desired chemical output is present and then counting the number of sites per area that have changed color?  If you are, then you probably don’t want to run a standard OLS regression for model development. 
      In the biostatistical world for data of this type, we usually use regression methods that allow us to specify the underlying distribution of the Y’s.  For count data I usually do negative binomial regression.  If your program has this capability you might want to run your data through it as well and see if there are any big differences between the final models from the two methods.

    0
    #130164

    Puzzled
    Participant

    Robert ,
    I am not really staining cells , I feed the cells with a chemical that is converted by cell’s oxidative system into a colored substance.
    The absorption at a given wavelenght is proportional to the number of living cells.My Y is continuous and it is not a count , what do you think ?Am I on the right track ?
    All the best 

    0
    #130165

    Bill Craig
    Participant

    Puzzled,
    For a basic question, you sure got lots of inputs!  My suggestion would be do use another set of model adequacy checks to peel the onion further. Aside from the normal probality plot of residuals, you can look at the residuals versus run order, and versus the factors. Try to explain why the outliers happened. There is some valuable information to be had here!
    Best of luck….

    0
    #130166

    Bill Craig
    Participant

    OK..it is 12:50 am! Here is my spell checked and grammar checked version!
    For a basic question, you sure received lots of inputs!  My suggestion would be to use another set of model adequacy checks to peel the onion further. Aside from the normal probability plot of residuals, you can look at the residuals versus run order, and versus the factors. Try to explain why the outliers happened. There is some valuable information to be had here!
    Best of luck….

    0
    #130184

    Robert Butler
    Participant

      In your case it sounds like you are OK with an OLS regression.  However, I would recommend you really look at your residual plots – not just the usual residual vs predicted but residual by independent variable and, if you know the time order particularly the time order of the replicates – residual by time.  I would also want to make sure that the time duration for cell growth from replicate to replicate was “about” the same and I would check this by plotting the residuals for the replicates in order of their time duration.
      All of this plotting will help you identify influential points (if there are any) and it will also tell you if there is an underlying time component which you may have to take into account when building your final model.

    0
Viewing 21 posts - 1 through 21 (of 21 total)

The forum ‘General’ is closed to new topics and replies.