How MiniTab deals with missing data
Six Sigma – iSixSigma › Forums › Old Forums › General › How MiniTab deals with missing data
 This topic has 5 replies, 5 voices, and was last updated 15 years, 6 months ago by Ozarski.

AuthorPosts

January 29, 2007 at 2:24 pm #45971
mmoore828Participant@mmoore828 Include @mmoore828 in your post and this person will
be notified via email.My Masters Thesis includes a 32 run, 2 level, 6 factor DoE in order
to examine the effects of the plung process on FrictionStir Welds.
One of the pieces of information that we’re really looking at are the
temperature profiles within the tool and on the weld plate. 2 of the
7 channels have a complete list of results for all of the runs. The
rest do not. A couple of the channels are missing up to 4 results.
The missing data is totally at random. I plugged all the values I
know into MiniTab and found it can still analyze the DoE and give
me some answers. Does anyone know how it does this? Can you
point me to some sources so I can read up about this? I’m hoping I
can figure out if MiniTab’s method it legit for my purposes and also
be able to explain what’s going on in my defense so I look smart!Thanks,
Matt0January 30, 2007 at 5:23 am #151215
Eric MaassParticipant@poetengineer Include @poetengineer in your post and this person will
be notified via email.Hello Matt,
Well, first off – I’d recommend that you check out whether your local library or your university library has copies of any of a series of books by George A. Milliken and Dallas E. Johnson that start with the title, “Analysis of Messy Data…”
Since your design is probably a 2^(61) for 32 runs, it probably has a resolution of V or VI if there were no missing data. If each cell had replicates, and no set of replicates were lost, you probably have no real loss of information after all. Even if some data is lost, it is possible that you might be okay…
If you entered the design and entered Asterisks for the missing data, then Minitab will analyze it using the Multiple Linear Regression tool that Minitab uses to analyse DOE.DOE – full and fractional factorial – is “orthogonal” to allow you to separate the effects of each factor. I would suggest that you check whether any of your main effects are correlated…you can do this by simply doing Stat/Basic Statistics/Correlation and selecting all of the main effect factors. You might have one or more main effects highly correlated with others – so that those main effects are confounded.You could then generate all of the two factor interactions using the calculator in Minitab, and then check the correlations between main effects and 2way interactions and between 2way interactions and other 2way interactions.
Again, please check out one of the books on Analysis of Messy Data. It will give you more ideas, and also help you prepare for your defense of a thesis involving analysis of some “messy data”.
Good luck!
0January 30, 2007 at 7:45 am #151221
Idiotin math.Participant@Idiotinmath. Include @Idiotinmath. in your post and this person will
be notified via email.Congratulation Eric for your excellent comprehensive reply.A real expert answer ,not like those few pretenders who just introduce comments without any values.
0January 30, 2007 at 2:16 pm #151233
Eric MaassParticipant@poetengineer Include @poetengineer in your post and this person will
be notified via email.Why, thank you, “idiotin math”! It’s great to hear nice, generous comments and feedback like this!Incidentally – I hope that you are just joking with the “idiotin math” name…I know several people who convinced themselves that there were not good at math – but, after spending a little time with a much better instructor or a good tutor, they discovered that they were much better at math then they ever thought they could be. Math can be made easy to understand, and fun…or it can be made to seem daunting and downright impossible through the convoluted efforts of a poor instructor or an instructor whose real motives are to impress and overwhelm rather than to teach.
Best of luck! And – thanks again!0January 30, 2007 at 2:33 pm #151240
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.A couple of questions:
You said that “2 of the 7 channels have a complete list of results for all of the runs.” Does this mean that you ran a full half rep of a 2^6 for each channel?
If so, are you running a separate analysis for each channel or (assuming a full half rep for each channel) are you trying to combine all of the data and include channel as another variable?
Eric has given you good advice. The Messy Data books are very good and based on what you have said I’d recommend starting with Volume I which focuses on data from designed experiments.
When data goes missing in a design usually what goes missing along with it is a loss of information about one of the interactions. It’s been my experience that you have to lose far more than 4 design points before you will lose information on a main effect.
If you want to get an empirical sense of how much of a loss is too much take some time and set up a design matrix for a half rep of a 2^6 and make your Y value equal the count (i.e. results for experiment #1 = 1, experiment #2 = 2, etc). Then just randomly delete 4, 5, 6, 7 etc. experiments and run VIF’s (which I understand Minitab can do) as well as simple plots and correlations of your various main effects and see how many experiments you have to lose before you impact your main effects.
Even better than this would be to rummage around your university and find someone who has SAS and whose package includes proc reg. Under this proc there are two commands – vif and collin which will really allow you to look at the X matirx and assess the impact of losing experiments.
The collin command allows you to run an eigenvector analysis on the X matrix and, in addition to looking at the eigenvector matrix it also generates condition indices. These two allow you to assess the issues of correlations of many against many as opposed to onetoone.
If you can find someone with SAS the commands you would need to run would be the following:
First the data.
data mmoore1;
input v1 v2 v3 v4 v5 v6;
Yval = _n_;
v12 = v1*v2;
v13 = v1*v3; – these are the expressions for all of your 2 way interactions.
etc.
lines;
1 1 1 1 1 1
….etc ( your design matrix)
;
run;
proc reg data = mmoore1;
model yval = v1 v2 v3 v4 v5 v6 v12 v13 …../vif collin;
run;
quit;
First run this with all data present so you can see what the VIF’s and the eigenvector matrix look like and then start dropping design points from the matrix and watch what happens to your diagnostics.0January 30, 2007 at 4:08 pm #151248You should give our technical support department a call. They will be able to give you some answers. The number is 8142312682 in the US. If you are outside of the US, check http://www.minitab.com for the number to call.
0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.