# Effect of few discrete Xs on DOE

Six Sigma – iSixSigma Forums Old Forums General Effect of few discrete Xs on DOE

Viewing 11 posts - 1 through 11 (of 11 total)
• Author
Posts
• #50792

Sinnicks
Participant

Hello
We have run a DOE test where all five Xs are attribute  variables, with only 2-3 possible values for each.  The outcome of the test (Y) is numeric discrete figure, possible results being integers between 0 and 8.
When we analyse results using GLM we do identify Xs with P<0.05. However, R-Sq(Adj) is below 40 %. Residuals are not normal and some pattern exists but we can not identify reason for this. No covariance exists as far as we can tell.
Ofcourse it is possible that we have missed some X. But is it possible that poor fit is caused only because of discrete Xs and Y,  with so few possible levels and we can trust our P values?  Or if the method is not applicable, can you recommend any other analysis method?

0
#175069

Evil
Participant

A few questions…with the possible outputs being between 0 and 8, did your observed outputs end up utilizing the full range ?
A few things to try…
1. Ordinal Logistic Regression – your output basically sounds ordinal
2. Rank transformation on the data then re-analyze using GLM the transformed y.
Missing an x is always a possibility but you may just have a lot of noise in the system.
e.

0
#175072

Robert Butler
Participant

A couple of questions
You say your X’s are attribute – what kind ordinal or nominal?  If nominal and if the nominal have more than two levels you will have to re-express them X’s in terms of dummy variables and re-run the analysis.
You also said the residuals are not normal and “some pattern exists”. Does the pattern look like strata or a series of sloping lines (slope of -1 and cutting across the residual value of 0)?  If they do it is because Y is constrained to the series of 9 values which should result in 9 lines. Plots of this type are a function of the constraints on Y and there is nothing to worry about.

0
#175083

Sinnicks
Participant

My X’s are nominal. They consist of raw material type (A, B), treatment of samples (A, B, C) etc. My Y is number of observed defects. Thus, 8 is not upper limit for Y.
What actually means “re-express them X’s in terms of dummy variables”?
Unfortunately I do not have my files or Minitab with me today anymore but I will look residuals again tomorrow  and try also suggestions from this forum. Thank you guys so far. If you can give more advice based on my answers that would be great.

0
#175108

Robert Butler
Participant

In your first post you said “The outcome of the test (Y) is numeric discrete figure, possible results being integers between 0 and 8.”  which is why I offered the comment concerning a possible residual pattern.  If Y is no longer constrained but is limited to integer values then, depending on how your defect count is distributed in the sample you may still have some sense of banding but it won’t be the sharp strata one would see with the limits described in your first post.
You state in your second post that the X’s are nominal.  When you say “treatment of samples (A, B, C) etc.”  does the “etc.” imply you have situations where you have more than three nominal levels or is it a case of either two or three? Regardless, if all of your X’s are nominal and if you have more than just a few of them, your design has to be quite large.
Dummy variables:  If you have more than two levels in a nominal variable and you want to use this variable in a regression analysis you will have to do the following:  Assume you have 3 levels (A,B,C) then
If nominal level = A then do dummy1 = 1, dummy2 = 0
If nominal level =  B then do dummy1 = 0, dummy2 = 1
if nominal level = C then do dummy1 = 0, dummy2 = 0
and your model will be of the form Y = fn (dummy1 dummy2)
If you try to just code A,B, and C as -1,0,1 or 1,2,3 and run the regression against these values the machine will treat the levels of the nominal variable as though they were actually interval and you have a very good chance of developing regression models of no value.
For further information on this issue check Applied Regression Analysis 2nd edition – Draper and Smith – pp.241 “The Use of Dummy Variables in Multiple Regression”.

0
#175120

Sinnicks
Participant

Sorry for my  initail inaccurate description. As you noticed, I meant that at this experiment results happened to be between 0 and 8,  but 8 is by no means the highest possible value. Distribution of Y is Poisson-like (most Y values lie at 0, few at 8).
One of the Xs has three levels, the other Xs have two levels.  In the DOE we had 74 rows. Would you consider this as not enough?
Thank you both for the advice concerning dummy variables and logistic regression. I tried them both but in both cases R-Sq(Adj) was still about 40 %.
Residual plot vs. Fits  shows to some extent megaphone pattern. I transformed Y by square root but running analyse again didn’t really change the result much.
Anything else you might suggest?

0
#175121

Robert Butler
Participant

If one variable had 3 levels and the rest had 2 and if all of your variables were nominal then the total number of experiments in the basic design would have to be divisible by 3×2^N.  The closest you can get to 74 with this structure is 3 x 2^4 = 48 or 3×2^5 = 96. which would correspond to either a 5 or 6 variable design study.
With the levels given and with the constraint that all of the X’s are nominal I don’t know of any way to get a basic74 point design.  This odd combination raises questions about the actual independence of the variables you did study.
As for what you have developed (assuming everything is ok as far as variable independence is concerned and that you did all of the usual things you should do when analyzing the data) the results of your final model would suggest either you have a noisy process or there are one or more critical variables in the process that were not included in your study.

0
#175123

Sinnicks
Participant

Thank you for pointing this out. We’ll go back into the DOE and see what was wrong.

0
#175263

Sinnicks
Participant

Unfortunately our  DOE was not done in scientific way, due to lack of experience (DOE is somewhat new tool for us). After creating full factorial design it was reduced in size, otherwise number of test runs would have been far too large. But this was done by simply omitting rows and trying to keep  balance (by eye).
I guess optimized design would have been the right scientific  solution here, but what do you think (no experience about that)?
However, even in this wrong way with R-Sq(adj) about 40 % we still identified two factors with P-value as low as 0.002. Can we trust that at least these factors are significant?

0
#175268

Robert Butler
Participant

With every variable a nominal variable optimized design would have been of no value.  When you have nothing but nominal values you are stuck with having to run all of the combinations.
While this amounts to 20-20 hindsight it is something to think about should you try to run another design.  Often, your first impression will be that one or more variables have to be nominal but when you dig into the reasons for the interest in a particular variable you can find some aspect of the variable that is continuous and which is important to your outcome.  If you then re-cast the variable in terms of the continuous component you can bring all of the firepower of design fractionation to bear.
For example, let’s say I had an additive I knew was important to making my product and all I knew at first was that I had 4 different suppliers each with a different additive.  The initial thought would be that this constituted 4 nominal levels, however, upon investigation I discovered that the issue concerning the importance of those additives was the inherent viscosity of the product.  Suddenly, I don’t care about who made it all I care about is the I.V. and for the design I will select the product on that basis.
There are times, of course, when this doesn’t happen and perhaps yours is one of them but it would be worth your while to think about the properties of the variables you did use in the design to see if something like I described above could have been possible.
As for what you have – you could certainly use the model you have to predict an optimum setting and see if there is agreement between what the model predicts and what you get.  The issue here is that you will have to keep in mind the prediction error around the prediction – if your actual value falls anywhere inside that region you will have to state that there was agreement between the two and for a model with as much noise as you apparently have, this may be an agreement of little worth.
If you wanted to check for confounding in the X matrix of the points you did run you could run a VIF check.  What you really need is the ability to compute eigenvalues and condition indices but, as far as I know, most programs won’t do this.  Another possibility would be to run a multivariate regression of all of the other X’s on the two that did test as significant to see if there are significant correlations between the two that were significant and those that weren’t.  If there are no significant correlations you would have additional evidence to support the idea that the two significant variables are clear of other variables of interest and that their correlation with the Y is meaningful.

0
#175270

Sinnicks
Participant

Robert, thank you very much for your thorough explanation! It has  been indeed helpful and I appreciate it.

0
Viewing 11 posts - 1 through 11 (of 11 total)

The forum ‘General’ is closed to new topics and replies.