Effect of few discrete Xs on DOE
Six Sigma – iSixSigma › Forums › Old Forums › General › Effect of few discrete Xs on DOE
 This topic has 10 replies, 3 voices, and was last updated 13 years, 5 months ago by Sinnicks.

AuthorPosts

August 21, 2008 at 1:09 pm #50792
Hello
We have run a DOE test where all five Xs are attribute variables, with only 23 possible values for each. The outcome of the test (Y) is numeric discrete figure, possible results being integers between 0 and 8.
When we analyse results using GLM we do identify Xs with P<0.05. However, RSq(Adj) is below 40 %. Residuals are not normal and some pattern exists but we can not identify reason for this. No covariance exists as far as we can tell.
Ofcourse it is possible that we have missed some X. But is it possible that poor fit is caused only because of discrete Xs and Y, with so few possible levels and we can trust our P values? Or if the method is not applicable, can you recommend any other analysis method?
0August 21, 2008 at 1:44 pm #175069A few questions…with the possible outputs being between 0 and 8, did your observed outputs end up utilizing the full range ?
A few things to try…
1. Ordinal Logistic Regression – your output basically sounds ordinal
2. Rank transformation on the data then reanalyze using GLM the transformed y.
Missing an x is always a possibility but you may just have a lot of noise in the system.
e.0August 21, 2008 at 2:13 pm #175072
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.A couple of questions
You say your X’s are attribute – what kind ordinal or nominal? If nominal and if the nominal have more than two levels you will have to reexpress them X’s in terms of dummy variables and rerun the analysis.
You also said the residuals are not normal and “some pattern exists”. Does the pattern look like strata or a series of sloping lines (slope of 1 and cutting across the residual value of 0)? If they do it is because Y is constrained to the series of 9 values which should result in 9 lines. Plots of this type are a function of the constraints on Y and there is nothing to worry about.0August 21, 2008 at 5:42 pm #175083My X’s are nominal. They consist of raw material type (A, B), treatment of samples (A, B, C) etc. My Y is number of observed defects. Thus, 8 is not upper limit for Y.
What actually means “reexpress them X’s in terms of dummy variables”?
Unfortunately I do not have my files or Minitab with me today anymore but I will look residuals again tomorrow and try also suggestions from this forum. Thank you guys so far. If you can give more advice based on my answers that would be great.
0August 21, 2008 at 9:23 pm #175108
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.In your first post you said “The outcome of the test (Y) is numeric discrete figure, possible results being integers between 0 and 8.” which is why I offered the comment concerning a possible residual pattern. If Y is no longer constrained but is limited to integer values then, depending on how your defect count is distributed in the sample you may still have some sense of banding but it won’t be the sharp strata one would see with the limits described in your first post.
You state in your second post that the X’s are nominal. When you say “treatment of samples (A, B, C) etc.” does the “etc.” imply you have situations where you have more than three nominal levels or is it a case of either two or three? Regardless, if all of your X’s are nominal and if you have more than just a few of them, your design has to be quite large.
Dummy variables: If you have more than two levels in a nominal variable and you want to use this variable in a regression analysis you will have to do the following: Assume you have 3 levels (A,B,C) then
If nominal level = A then do dummy1 = 1, dummy2 = 0
If nominal level = B then do dummy1 = 0, dummy2 = 1
if nominal level = C then do dummy1 = 0, dummy2 = 0
and your model will be of the form Y = fn (dummy1 dummy2)
If you try to just code A,B, and C as 1,0,1 or 1,2,3 and run the regression against these values the machine will treat the levels of the nominal variable as though they were actually interval and you have a very good chance of developing regression models of no value.
For further information on this issue check Applied Regression Analysis 2nd edition – Draper and Smith – pp.241 “The Use of Dummy Variables in Multiple Regression”.0August 22, 2008 at 7:18 am #175120Sorry for my initail inaccurate description. As you noticed, I meant that at this experiment results happened to be between 0 and 8, but 8 is by no means the highest possible value. Distribution of Y is Poissonlike (most Y values lie at 0, few at 8).
One of the Xs has three levels, the other Xs have two levels. In the DOE we had 74 rows. Would you consider this as not enough?
Thank you both for the advice concerning dummy variables and logistic regression. I tried them both but in both cases RSq(Adj) was still about 40 %.
Residual plot vs. Fits shows to some extent megaphone pattern. I transformed Y by square root but running analyse again didn’t really change the result much.
Anything else you might suggest?
0August 22, 2008 at 12:24 pm #175121
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.If one variable had 3 levels and the rest had 2 and if all of your variables were nominal then the total number of experiments in the basic design would have to be divisible by 3×2^N. The closest you can get to 74 with this structure is 3 x 2^4 = 48 or 3×2^5 = 96. which would correspond to either a 5 or 6 variable design study.
With the levels given and with the constraint that all of the X’s are nominal I don’t know of any way to get a basic74 point design. This odd combination raises questions about the actual independence of the variables you did study.
As for what you have developed (assuming everything is ok as far as variable independence is concerned and that you did all of the usual things you should do when analyzing the data) the results of your final model would suggest either you have a noisy process or there are one or more critical variables in the process that were not included in your study.
0August 22, 2008 at 12:33 pm #175123Thank you for pointing this out. We’ll go back into the DOE and see what was wrong.
0August 28, 2008 at 8:47 am #175263Unfortunately our DOE was not done in scientific way, due to lack of experience (DOE is somewhat new tool for us). After creating full factorial design it was reduced in size, otherwise number of test runs would have been far too large. But this was done by simply omitting rows and trying to keep balance (by eye).
I guess optimized design would have been the right scientific solution here, but what do you think (no experience about that)?
However, even in this wrong way with RSq(adj) about 40 % we still identified two factors with Pvalue as low as 0.002. Can we trust that at least these factors are significant?0August 28, 2008 at 12:21 pm #175268
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.With every variable a nominal variable optimized design would have been of no value. When you have nothing but nominal values you are stuck with having to run all of the combinations.
While this amounts to 2020 hindsight it is something to think about should you try to run another design. Often, your first impression will be that one or more variables have to be nominal but when you dig into the reasons for the interest in a particular variable you can find some aspect of the variable that is continuous and which is important to your outcome. If you then recast the variable in terms of the continuous component you can bring all of the firepower of design fractionation to bear.
For example, let’s say I had an additive I knew was important to making my product and all I knew at first was that I had 4 different suppliers each with a different additive. The initial thought would be that this constituted 4 nominal levels, however, upon investigation I discovered that the issue concerning the importance of those additives was the inherent viscosity of the product. Suddenly, I don’t care about who made it all I care about is the I.V. and for the design I will select the product on that basis.
There are times, of course, when this doesn’t happen and perhaps yours is one of them but it would be worth your while to think about the properties of the variables you did use in the design to see if something like I described above could have been possible.
As for what you have – you could certainly use the model you have to predict an optimum setting and see if there is agreement between what the model predicts and what you get. The issue here is that you will have to keep in mind the prediction error around the prediction – if your actual value falls anywhere inside that region you will have to state that there was agreement between the two and for a model with as much noise as you apparently have, this may be an agreement of little worth.
If you wanted to check for confounding in the X matrix of the points you did run you could run a VIF check. What you really need is the ability to compute eigenvalues and condition indices but, as far as I know, most programs won’t do this. Another possibility would be to run a multivariate regression of all of the other X’s on the two that did test as significant to see if there are significant correlations between the two that were significant and those that weren’t. If there are no significant correlations you would have additional evidence to support the idea that the two significant variables are clear of other variables of interest and that their correlation with the Y is meaningful.0August 28, 2008 at 12:31 pm #175270Robert, thank you very much for your thorough explanation! It has been indeed helpful and I appreciate it.
0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.