2×3 DOE with missing data
Six Sigma – iSixSigma › Forums › Old Forums › General › 2×3 DOE with missing data
- This topic has 19 replies, 11 voices, and was last updated 18 years, 4 months ago by
Williams.
-
AuthorPosts
-
March 28, 2004 at 5:45 pm #35057
I recently conducted a 3 factor 2 level DOE with eight runs. I have three responses to evaluate. I have all the data for two of the responses. Unfortunately I only have five data points for the third response, which is another story. Is there a way to handle this missing data or is it a lost cause? Does anyone know of a book to reference? Any help appreciated.
Thanks,Marty0March 29, 2004 at 6:55 am #97495Maybe you can use confidence interval to estimate other three points.
And you need add some middle points to make sure your DOE is correct.0March 29, 2004 at 7:20 am #97496Marty,
The best way is to repeat this DOE, when you can. After all it are only 8 runs.
An even better way is to review this experiment and refine it, rather than just repeating it. There is something you can learn from performing this DOE, still.
There are ways to treat missing data; of course you will lose some information. I can have a look at your results, if you want. Please send me your data at [email protected], best in CSV, XLS or such.
Regards, iwb0April 12, 2004 at 5:36 am #98336Hi Marty,
Well obviously loss of data points is loss of information but there are various techniques to handle missing data. You could refer Design of Experiments by Douglas C Montgomery 5th Ed. The techniques to handle missing data have been explained very well. If i could be of any help, feel free to email me ([email protected]) . I am trained to deal with messy/ missing data.
Shilpa
0April 13, 2004 at 7:24 am #98399Can you somehow assume that there is correlation between the responses? (perhaps you can verify using the available data)
If yes, (and it is assumed true for the missing constellations, too), then you may use matrix projection: see how the other two responses change as you move to a point where the third response is missing, and align the third response alike.0April 13, 2004 at 4:33 pm #98436If you have SAS STAT, you can use Type 3 SSE to deal with missing data.
How many replications do you have in each run?
I don’t have any book that exactly tell to how to deal with it. You may also try help from internet. If you company has a Statistic software, you can ask the company how to deal with it. they would be able to tell you, I think.0April 13, 2004 at 5:16 pm #98448Shilpa, I don’t have a copy myself but I think I know where I can get my hands on a copy. I’ll check it out. I can email the results to you. If you have time, I’d be interested in seeing how you would deal with it. We are planning another DOE. Even though I am missing some of the data, it was pretty evident that we need to take another look at it.
Thanks,
Marty
0April 13, 2004 at 5:22 pm #98449
Robert ButlerParticipant@rbutlerInclude @rbutler in your post and this person will
be notified via email.Marty, just post the matrix in Yates order on this site and tell us which of the runs are missing. Evaluating a matrix of that size can be done quickly and I’d be happy to do it for you.
0April 13, 2004 at 5:42 pm #98452Cindy, I have Minitab rel 14.11. I’ve looked through the stat guide and help menu but I don’t see anything for missing data. I am going to take a look at Design of Experiments by Douglas C Montgomery 5th Ed, which Shilpa recommended. This was a non-replicated design. Even with only eight runs it took five hours to complete. Also, to measure the output, the sample has to be destroyed. We have decided to conduct another DOE, which someone recommended to me earlier. Its just easier said than done.
Thanks for the reply,
Marty0April 13, 2004 at 5:57 pm #98455Robert, here are the results (Hope they show up okay). The inputs are three separate settings on a straightening machine. The output is residual stress measured at different locations down the length of the sample. The goal is to minimize the residual stress. Obviously we don’t want the operator to create the stress that we found in location “z”, which they may be doing since each operator runs the machine differently. We’ve theorized in the past on what creates the residual stress. This is the first statistical study to validate the theories. We were quite shocked by the results in area “z.”
Thanks,
Marty
Inputs
Outputs
Run Order
A
B
C
x
y
z
1
-1
1
-1
2.2
4.0
39.2
2
1
-1
1
1.0
3.5
1.7
3
1
1
-1
10.6
4.6
4
-1
-1
-1
1.8
0.2
1.7
5
1
-1
-1
2.3
2.4
6
1
1
1
11.2
8.2
28.5
7
-1
1
1
6.2
2.0
8
-1
-1
1
-1.9
2.0
1.2
0April 13, 2004 at 6:11 pm #98456Hi Marty,
I have seen a way to approximate a bad or missing data points from a DOE but 3 missing points, I don’t have any way out. You could use the five points you have to learn a little bit more about the influence they have on your Y and use that to design your second DOE.
You could take want you have and do a fractional design (4 runs, 3 factors) to learn more but I don’t know if that is possible with your data since I have not seen it.
Well, good luck.
Pitzou0April 13, 2004 at 6:36 pm #98460I just read your two replies.
(1) Minitab does not deal with missing data as far as I know in school with 8.2 verson.
(2) The scope of your study is to reduce the Z residue stress and suspect that operator would bring the cause.
(3) In this way, you have to bring another factor, operator to your study as well. I have a question, in this study, do you have the same opertator for all of the 8 runs? if it was, you should have replications to confirm that operator is nothing to do with z residue stress.
(4) if you use different operator for these 8 runs, this noise take a lot variation of you precision of the experiment. It is very hard for you to find true story without varation analysis.
(5) Page 403 of Douglas C. Montgomery book may help you. I didn’t read details. (5th edition)
(6) do you have study that x y z are independent or not?
(7) I don’t quit know the details, my suggestions or questions might be very stupid, if they are wrong, just ignore them.
Cindy
0April 13, 2004 at 6:45 pm #98462
Robert ButlerParticipant@rbutlerInclude @rbutler in your post and this person will
be notified via email.For the z response, C is confounded with A. The reduced matrix loses all information concerning C, AC and BC. You can use the results of your matrix to investigate the effects of A, B and AB on the z response. You can build test a full model (A,B,C, AB,AC, BC) for the other two responses.
If you look at the reduced matrix (5 runs) you can see the C differs from A only in the last experiment.
-1 1 -1
1 -1 1
-1 -1 -1
1 1 1
-1 -1 1
A real quick check of the data – (you should confirm all of this with your package) found nothing correlating with y (p cutoff of .15) , got b and ab with x (p .06 and .11) and b with z (p .02). A very quick check of the residuals indicates a non-normal distribution on the residuals for x so you may want to consider a data transform of some kind.
0April 13, 2004 at 6:48 pm #98463Marty –
Is this for real or an academic test?
You have two full 2 factor factorials, one in AB and one in BC. Simple examination of the data for those with respect to z response shows such a huge effect for B that it is almost unbelievable.
You should dump it into MINTAB et al to check for interactions, but it appears that there are none. The effects for x and y are not quite as dramatically evident from examination alone, but appear to be there for B also in x with an interaction with A.Y on first glance looks like noise fromn these factors.
You need to do the full analysis to quantify.
0April 13, 2004 at 8:13 pm #98468Cindy, Yes the operator was the same for all 8 runs. They do influence the residual stress, without knowing it, because we cannot tell them what might make the stress worse. The outputs x, y, and z all come from the same sample but at different locations down the length of the part. For example, they may be located at 3 ft, 9 ft and 12 ft. Basically its the front, middle, and end of the same part. The stress is not the same throughout the length of the part. If there are three controls to make a straight part but one or two, or some interaction of those, have a big influence on the stress, wed like to know that. It could enable us to create a straight part while minimizing the residual stress. Wed like to take the operator out of the equation .if thats even possible.
Nothing stupid in your thoughts and suggestions. I really appreciate the help and input.
Thanks,
Marty0April 14, 2004 at 12:23 pm #98499I dont know what happens in this case specifically, but in general you can always compute a transfer function. The difference will be in the aliasing structure.
0April 14, 2004 at 9:56 pm #98538I think you can start with interaction analysis. I saw some patterns just briefly looking at the data and drew a graph. If some conclusions are obviousely, you can simplify your next experiment. For example, regardless Z value because you missed some data, let’s look at A, B and X value. We can see, B =-1 with any A combination will give you better solution, then go back to look at C and Y value, to see if it is a matter or not. from Y value, we see, this combination is good for Y value as well. Finally, I checked Z value, all of them are good regardless missing value. So, I thought the B=1 is the major contribution to increase the residue stress.
Then, next experiment I just try to confirm my conclusion. so, it will reduce your experiment cost.
Sometime, in real work, the paper work doesn’t work. I have to use my tricks. Let me know if my thought are different from yours.
Cindy
0April 18, 2004 at 8:24 pm #98789Sure, I will be glad to apply some techniques that i know and cross check whether it leads to similar conclusion as the second round of experimentation.
Good Luck with the second round of experiments.
Shilpa
0April 20, 2004 at 6:50 pm #98887
WilliamsParticipant@robtewmsInclude @robtewms in your post and this person will
be notified via email.Marty,
Thanks for sharing your data. In any case, it seems that you would want B to be at the low setting. If A is at the low setting, then B can be at either high or low setting. If C is at the high setting, then A can be at either the high or low setting. If A is at the high setting, then C must be at the high setting. If C is at the low setting, then A must be at the low setting. I believe this is not so much a case of missing data as it is trying to defy physics. In other words, the combinations for which you have no data are likely impossible. Thus, it is highly likely that you will re-run this experiment and get the same results. It0April 20, 2004 at 6:58 pm #98888
WilliamsParticipant@robtewmsInclude @robtewms in your post and this person will
be notified via email.It.. seems I sent response before totally finished, but essentially included all thoughts about the experiment. It is likely that you will be forced to choose A and C settings due to production/ cycle time requirements. If so, you may need to get more involved in the use of more exotic DOE analysis such as CCD, Response Surface, etc. If you can hold B constant at the low setting, then it may be easier than you think.
0 -
AuthorPosts
The forum ‘General’ is closed to new topics and replies.