iSixSigma

Tests for Confounding Issues with Discrete Data

Six Sigma – iSixSigma Forums Old Forums Finance Tests for Confounding Issues with Discrete Data

Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
    Posts
  • #24598

    sue
    Member

    My project is on cycle time (continuous y) and, my x’s are all discrete. Through Mood’s Median and Levene’s Test, I’ve identified 7 x’s which appear to be statistically significant (p =0).  Are there any tests I can run which will show if any one x is more significant than the other to the y?  Are there any tests that identify if there is confounding amongst the x’s? 

    0
    #58998

    GLM
    Participant

    The more traditional approach is to run a General Linear Model with cycle time as response and the x variables as either fixed or random (I assume they are random … Minitab may force you to input them as “model” rather than “random” … there is a difference, but I wouldn’t worry too much about it). The other approach is CHAID (Chi-square automatic interaction detection) which makes fewer assumptions and will give you optimal splits and combinations of your x variable in relationship to cycle time.

    0
    #58999

    sue
    Member

    GLM only works when the y data is normal, correct?  Or can both of these approaches be used with non-normal data?

    0
    #59000

    GLM
    Participant

    Sue,
    One of the assumptions is that the residuals are “multinomially normally distributed”. Violations of normality does not invalidate the regression equation. To what degree non-normality has an impact will depend on the goals of your analysis. For example, a bi-modal distribution of your residuals may indicate that you omitted an important variable. If you have a funnel, this may indicate that your prediction is more accurate in some range of y, but not in another. There are several criteria for determining the “goodness” of the equation. Non-normality of the residuals is one of your concerns (mostly statisticians glance at it, determine if this approximately holds and move on). Explained variance, width of confidence or prediction intervals are other concerns.

    0
    #59001

    GLM
    Participant

    Just for the records: drop the multinomial in the previous post, you are running a straight forward GLM.
    One more point about your concerns regarding normality. If you wanted to be 100% correct you’d have to run a test for normality. However, Minitab does not do that for you. You’d have to correlate  the residual with its expected value under normality, which is time consuming if you calculate it by hand.
    Also, the four main things to review in regards to the residuals are: randomness of the residuals  (runs test), test of constancy of variance (there are formulas, but you look at the plot and see what’s going on), tests for outliers and tests for normality (see above). There are remedial measures that you may have to look up in a text book of regression (they all have a section on this issue).

    0
    #59002

    sue
    Member

    Thank you. I will try both approaches .. is the CHAID different from the Chi-Square Test in MiniTab?  I could not a reference to the CHAID in the version of MiniTab I have …

    0
    #59003

    GLM
    Participant

    Sue,
    CHAID is a data mining tool, and yes it is very different from the chi-square module in Minitab. CHAID creates optimal splits in your data that tell you which combination of variables and categories interact with each to optimally predict your response/outcome. The nice thing about the tool is that you can include potential cost or savings calculations and that the output is extremely easy to understand for non-statistiticians. Managers love the charts because they are simple to read and don’t require knowledge of the mechanics that go into the calculations. Up until the 1980s we used to do this by hand (Of course, I wasn’t born yet :-). There is a stand alone package called Answer Tree. It is offered by SPSS (see http://www.spss.com for details on (the cost and training materials).
    The tool is useful to explore large data sets with many variables and cases. In a second step you can then develop a more precise model based on the classical inferential techniques. Especially in the financial services industry the tool is very useful. BankOne for example uses it to mine its operations.

    0
Viewing 7 posts - 1 through 7 (of 7 total)

The forum ‘Finance’ is closed to new topics and replies.