# Appropriate ANOVA?

Six Sigma – iSixSigma Forums Old Forums Healthcare Appropriate ANOVA?

Viewing 12 posts - 1 through 12 (of 12 total)
• Author
Posts
• #25756

Lorax
Participant

Im trying to work out the size of the impact of a single x to my Y.
I regressed the two and came up with the following printout:

Am I right in thinking that my x (complexity) is correlated to my Y (ALOS) to the tune of about 30% or am I completely out to lunch?

0
#62363

Lorax
Participant

The annotated session window output isnt showing up properly. I’m going to see if I can attach it.
Lorax

0
#62364

waynergf
Member

Lorax: I can help but your session window results are still not available.

0
#62365

Lorax
Participant

Your thoughts are very much appreciated.
I’m trying to assess how much impact Patient Clinical Complexity has on Length of Stay (LOS).
I messed around with the data and found that a quadratic regression equation fitted best. (Stat>Regression>Fitted Line plot)
My R squared adjusted number is 30.8% which seems to be saying that there is Complexity is a factor in LOS but is most likely not the only one (which stands to reason).
The next thing I was doing was looking at the SS column of numbers on the ANOVA table and concluding from comparing the values for the “Complexity” and for “Error” that Complexity is responsible for determining about 30% of the LOS for a patient.
The following is the Session Window:
Polynomial Regression Analysis: Acutelos versus Unweighted Complexity Score
The regression equation is
Acutelos = 4.138 + 0.2462 Unweighted Complexity Score  + 0.2968 Unweighted Complexity Score**2

S = 7.10199     R-Sqr = 30.8   R-Sqr(adj)= 30.8

Analysis of Variance
Source        DF      SS       MS       F      P
Regression     2   50444  25221.8  500.05  0.000
Error       2245  113234     50.4
Total       2247  163677

What was calculated was that this Complexity measure is contributing 50444 to the variation
Other stuff is contributing 113234
This works out to be:
Complexity causes about 30% of the variation in ALOS

Im at adamjlennoxhotmail.com if I can give you more details

0
#62366

Robert Butler
Participant

If we just focus on your output and assume you did all of the things you should have done before running the regression then one way to view the regression relationship is as you have done – approximatly 30% of the observed variation in LOS can be explained by PCC.
Now for the inquisition:
1.  Before running the regression did you plot your data?
2.  If you plottted the data does the plot give the visual impression that a quadratic would indeed be the best fit?
3.  Assuming the answers to #1 and #2 are yes – what does the residual pattern look like? (residuals plotted against predicted and against values of PCC)  – patterns? clusters? Anything that gives the impression of something other than a shotgun blast?
If the answer to #1 or #2 ias “no” then you really ought to go back and start over.
If everything is ok then you might have the beginnings of a model construction effort that might bear some fruit.  Given the vagaries of patient populations 30% for a single factor isn’t too bad.

0
#62367

waynergf
Member

OK.  First, since I don’t know how much you know about regression analysis, excuse me if I start too elementary   :-)  Second, it looks like you’re using Minitab, which I have, too.  If you’ll email me your Minitab file it’ll be a lot easier working with you – and I don’t mind talking with you over the phone, which would be even better.

Just because a predictor variable (an “X” – also called an explanatory variable, an independent variable) accounts for a relatively large proportion of the variation in a response variable ( the “Y”) doesn’t necessarily establish cause and effect between the two – it only means for sure they are correlated (i.e., they tend to “move together”).

In order to justify keeping an X in a regression model, it should account for a statistically significant proportion of the total variation in Y.  This will be shown by the p-value of the t statistic associated with the X’s estimated coefficient.  Typically you want the p-value to be less than 0.05 (on a scale of 0 to 1).  This means you’ve chosen an alpha value of 0.05, which is the probability of making a Type I Error (concluding the true value of the coefficient of an X is significantly different from zero, when it really isn’t).  Since you didn’t include all the output, I don’t have the X and X**2 coefficients’ t statistics and p-values.  But often, while the linear term’s (X) coefficient is significant, the quadratic term’s (X**2) is not.  So check that, and if the p-value of the estimated coefficient of the X**2 term is greater than 0.05 it should not be included in the model.

R-squared (R**2) is the multiple correlation coefficient and is the amount of the total variation of Y about its mean accounted for by all the terms you’ve included in the current model – 30.8% in your case.  Because R**2 will increase every time you add another term to a model (just due to the math), even if its coefficient is *not* significant, R**2 adjusted attempts to compensate for this by “penalizing” R**2 as the number of terms in the model increases.

Yes, with almost 70% of the variation in Y unaccounted for, other factors not (yet) in the model are associated with Y.  “Error” sum of squares (SS) in this case is just the (as yet) unaccounted for variation of Y about its mean (that almost 70%).  I’m not sure what is meant by (included in) “Patient Complexity of Care” – sounds similar to acuity to me.  If your Complexity metric is estimated from a combination of other factors, I’d try including them directly in the regression instead of Complexity.  You could also attempt to distinguish differences in LOS by including a dummy variable (surrogate) for case type (initial diagnosis), age, gender, race or ethnicity, etc. – any factor that might contribute to LOS.  You can use Stepwise Regression (Forward Selection or Backward Elimination) to let the statistical significance of the estimated coefficients “automatically” (sort of) build a model.

OK – I’ll leave it there for now and wait for feedback from you as to whether this is too elementary, helpful, or overwhelming   :-)

Wayne

*** Wayne G. Fischer, BS, MS, PhD ***
Certified Manager of Quality & Organizational Excellence
Certified Quality Engineer / Certified Quality Auditor
Perioperative Quality Analyst
U of TX MD Anderson Cancer Center
Houston, TX  77030
office = 713.794.4340 / cell = 281.360.7584

0
#62368

Lorax
Participant

Mornin Robert,
Thanks for this.
Yes I plotted the data first
A quadratic seems to be the best fit. I tried with a few other lines and the all either had a poor r2 (adj) or gave a very small increase in it as a result of making the equation more complex so I stuck with quadratic
The residual pattern looked kinda odd. The variation in residuals got bigger as the LOS got bigger. I’m surmising that this is because there is more opportunity to extend a patient’s stay in hospital, the longer they are here.
It’s good to hear that 30% is a lot get for a single factor. It’s a rough and ready measure of the patient’s clinical complexity and does not take into account interactions between things that are wrong (eg anemia interacting with diabetes…)
I really appreciate input on this.
Lorax

0
#62369

Robert Butler
Participant

So, it sounds like your mean LOS and the variance are coupled – increase mean LOS and increase variance.  You might try logging the LOS and plotting/regressing that against PCC. If the residual pattern changes from a funnel shape to more like a shotgun blast then this would suggest a model on log LOS might be the better choice and the act of logging has decoupled the mean and the variance.

0
#62370

BritW
Participant

Robert, wondering:
Given that there are so many factors that contribute to LOS other than the acuteness of the patient (many of which can be controlled/influenced by nurse/doc/other interactions), why would you not have suggested a multiple regression with other measurable variables?  Since there is roughly 70% of variation left in the current model, just wondering – I had suggested that and since I’m positive you are more statistically savvy than I am, I was hoping you could help me understand.  Appreciate the feedback….

0
#62372

BritW
Participant

Robert, wondering:
Given that there are so many factors that contribute to LOS other than the acuteness of the patient (many of which can be controlled/influenced by nurse/doc/other interactions), why would you not have suggested a multiple regression with other measurable variables?  Since there is roughly 70% of variation left in the current model, just wondering – I had suggested that and since I’m positive you are more statistically savvy than I am, I was hoping you could help me understand.  Appreciate the feedback….

0
#62373

Robert Butler
Participant

The point of my original post and the short follow up was to answer Loraxs questions concerning the interpretation of the relationship between his PCC measure and LOS which he had observed.  I didnt offer anymore because I was making the posts during pauses in what turned out to be a rather hectic two days.

As for a multivariable study  certainly. I would hope that armed with his initial success and a realization that there was still 70% of the variation to be explained Lorax would plan on doing exactly this.

0
#62374

BritW
Participant

Thanks – I just wanted to make sure I wasn’t providing unsound advice.

0
Viewing 12 posts - 1 through 12 (of 12 total)

The forum ‘Healthcare’ is closed to new topics and replies.