Appropriate ANOVA?
Six Sigma – iSixSigma › Forums › Old Forums › Healthcare › Appropriate ANOVA?
- This topic has 11 replies, 4 voices, and was last updated 11 years, 10 months ago by
BritW.
- AuthorPosts
- April 9, 2009 at 1:13 am #25756
Im trying to work out the size of the impact of a single x to my Y.
I regressed the two and came up with the following printout:
Am I right in thinking that my x (complexity) is correlated to my Y (ALOS) to the tune of about 30% or am I completely out to lunch?0April 9, 2009 at 11:25 am #62363The annotated session window output isnt showing up properly. I’m going to see if I can attach it.
Lorax
0April 10, 2009 at 4:45 pm #62364Lorax: I can help but your session window results are still not available.
0April 10, 2009 at 4:47 pm #62365Your thoughts are very much appreciated.
I’m trying to assess how much impact Patient Clinical Complexity has on Length of Stay (LOS).
I messed around with the data and found that a quadratic regression equation fitted best. (Stat>Regression>Fitted Line plot)
My R squared adjusted number is 30.8% which seems to be saying that there is Complexity is a factor in LOS but is most likely not the only one (which stands to reason).
The next thing I was doing was looking at the SS column of numbers on the ANOVA table and concluding from comparing the values for the “Complexity” and for “Error” that Complexity is responsible for determining about 30% of the LOS for a patient.
The following is the Session Window:
Polynomial Regression Analysis: Acutelos versus Unweighted Complexity Score
The regression equation is
Acutelos = 4.138 + 0.2462 Unweighted Complexity Score + 0.2968 Unweighted Complexity Score**2
S = 7.10199 R-Sqr = 30.8 R-Sqr(adj)= 30.8
Analysis of Variance
Source DF SS MS F P
Regression 2 50444 25221.8 500.05 0.000
Error 2245 113234 50.4
Total 2247 163677
What was calculated was that this Complexity measure is contributing 50444 to the variation
Other stuff is contributing 113234
This works out to be:
Complexity causes about 30% of the variation in ALOS
Im at adamjlennoxhotmail.com if I can give you more details0April 10, 2009 at 4:48 pm #62366
Robert ButlerParticipant@rbutlerInclude @rbutler in your post and this person will
be notified via email.If we just focus on your output and assume you did all of the things you should have done before running the regression then one way to view the regression relationship is as you have done – approximatly 30% of the observed variation in LOS can be explained by PCC.
Now for the inquisition:
1. Before running the regression did you plot your data?
2. If you plottted the data does the plot give the visual impression that a quadratic would indeed be the best fit?
3. Assuming the answers to #1 and #2 are yes – what does the residual pattern look like? (residuals plotted against predicted and against values of PCC) – patterns? clusters? Anything that gives the impression of something other than a shotgun blast?
If the answer to #1 or #2 ias “no” then you really ought to go back and start over.
If everything is ok then you might have the beginnings of a model construction effort that might bear some fruit. Given the vagaries of patient populations 30% for a single factor isn’t too bad.0April 10, 2009 at 5:25 pm #62367OK. First, since I don’t know how much you know about regression analysis, excuse me if I start too elementary :-) Second, it looks like you’re using Minitab, which I have, too. If you’ll email me your Minitab file it’ll be a lot easier working with you – and I don’t mind talking with you over the phone, which would be even better.
Just because a predictor variable (an “X” – also called an explanatory variable, an independent variable) accounts for a relatively large proportion of the variation in a response variable ( the “Y”) doesn’t necessarily establish cause and effect between the two – it only means for sure they are correlated (i.e., they tend to “move together”).
In order to justify keeping an X in a regression model, it should account for a statistically significant proportion of the total variation in Y. This will be shown by the p-value of the t statistic associated with the X’s estimated coefficient. Typically you want the p-value to be less than 0.05 (on a scale of 0 to 1). This means you’ve chosen an alpha value of 0.05, which is the probability of making a Type I Error (concluding the true value of the coefficient of an X is significantly different from zero, when it really isn’t). Since you didn’t include all the output, I don’t have the X and X**2 coefficients’ t statistics and p-values. But often, while the linear term’s (X) coefficient is significant, the quadratic term’s (X**2) is not. So check that, and if the p-value of the estimated coefficient of the X**2 term is greater than 0.05 it should not be included in the model.
R-squared (R**2) is the multiple correlation coefficient and is the amount of the total variation of Y about its mean accounted for by all the terms you’ve included in the current model – 30.8% in your case. Because R**2 will increase every time you add another term to a model (just due to the math), even if its coefficient is *not* significant, R**2 adjusted attempts to compensate for this by “penalizing” R**2 as the number of terms in the model increases.
Yes, with almost 70% of the variation in Y unaccounted for, other factors not (yet) in the model are associated with Y. “Error” sum of squares (SS) in this case is just the (as yet) unaccounted for variation of Y about its mean (that almost 70%). I’m not sure what is meant by (included in) “Patient Complexity of Care” – sounds similar to acuity to me. If your Complexity metric is estimated from a combination of other factors, I’d try including them directly in the regression instead of Complexity. You could also attempt to distinguish differences in LOS by including a dummy variable (surrogate) for case type (initial diagnosis), age, gender, race or ethnicity, etc. – any factor that might contribute to LOS. You can use Stepwise Regression (Forward Selection or Backward Elimination) to let the statistical significance of the estimated coefficients “automatically” (sort of) build a model.
OK – I’ll leave it there for now and wait for feedback from you as to whether this is too elementary, helpful, or overwhelming :-)
Wayne
*** Wayne G. Fischer, BS, MS, PhD ***
Certified Manager of Quality & Organizational Excellence
Certified Quality Engineer / Certified Quality Auditor
Perioperative Quality Analyst
U of TX MD Anderson Cancer Center
Houston, TX 77030
office = 713.794.4340 / cell = 281.360.75840April 11, 2009 at 5:49 pm #62368Mornin Robert,
Thanks for this.
Yes I plotted the data first
A quadratic seems to be the best fit. I tried with a few other lines and the all either had a poor r2 (adj) or gave a very small increase in it as a result of making the equation more complex so I stuck with quadratic
The residual pattern looked kinda odd. The variation in residuals got bigger as the LOS got bigger. I’m surmising that this is because there is more opportunity to extend a patient’s stay in hospital, the longer they are here.
It’s good to hear that 30% is a lot get for a single factor. It’s a rough and ready measure of the patient’s clinical complexity and does not take into account interactions between things that are wrong (eg anemia interacting with diabetes…)
I really appreciate input on this.
Lorax
0April 11, 2009 at 6:33 pm #62369
Robert ButlerParticipant@rbutlerInclude @rbutler in your post and this person will
be notified via email.So, it sounds like your mean LOS and the variance are coupled – increase mean LOS and increase variance. You might try logging the LOS and plotting/regressing that against PCC. If the residual pattern changes from a funnel shape to more like a shotgun blast then this would suggest a model on log LOS might be the better choice and the act of logging has decoupled the mean and the variance.
0April 13, 2009 at 1:29 pm #62370Robert, wondering:
Given that there are so many factors that contribute to LOS other than the acuteness of the patient (many of which can be controlled/influenced by nurse/doc/other interactions), why would you not have suggested a multiple regression with other measurable variables? Since there is roughly 70% of variation left in the current model, just wondering – I had suggested that and since I’m positive you are more statistically savvy than I am, I was hoping you could help me understand. Appreciate the feedback….0April 13, 2009 at 5:42 pm #62372Robert, wondering:
Given that there are so many factors that contribute to LOS other than the acuteness of the patient (many of which can be controlled/influenced by nurse/doc/other interactions), why would you not have suggested a multiple regression with other measurable variables? Since there is roughly 70% of variation left in the current model, just wondering – I had suggested that and since I’m positive you are more statistically savvy than I am, I was hoping you could help me understand. Appreciate the feedback….0April 13, 2009 at 6:49 pm #62373
Robert ButlerParticipant@rbutlerInclude @rbutler in your post and this person will
be notified via email.The point of my original post and the short follow up was to answer Loraxs questions concerning the interpretation of the relationship between his PCC measure and LOS which he had observed. I didnt offer anymore because I was making the posts during pauses in what turned out to be a rather hectic two days.
As for a multivariable study certainly. I would hope that armed with his initial success and a realization that there was still 70% of the variation to be explained Lorax would plan on doing exactly this.0April 13, 2009 at 7:01 pm #62374Thanks – I just wanted to make sure I wasn’t providing unsound advice.
0 - AuthorPosts
The forum ‘Healthcare’ is closed to new topics and replies.