Non Parametric – Which Factor Has Greatest Influence

Six Sigma – iSixSigma Forums General Forums Tools & Templates Non Parametric – Which Factor Has Greatest Influence

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
  • #54138

    Chris Richardson

    I am currently working on a problem with results in terms of % scrap, hence the results are non normally distributed, we have 9 potential hypothesis that have shown statistical significant difference between the groups when using Moods Median testing.
    How can I understand which hypothesis is giving the greatest influence and if there are interdependancies between each of the factors?


    Robert Butler

    When you say “we have 9 potential hypothesis that have shown statistical significant difference between the groups” what exactly do you mean?

    a) It sounds like you are saying you have 9 possible variables which, in a univariate setting, have exhibited some kind of significant correlation with percent scrap.

    b) If this is the issue and if by “interdependancies between each of the factors” you mean that they are not perfectly orthogonal then there are various ways to attack your problem.

    One possibility:

    Given that a) and b) above are true you would first want to test the X matrix (you factor matrix) to see if the variables of interest are sufficiently independent of one another to permit inclusion in a multivariable regression. To this end you would want to look at VIF’s and eigenvalues/condition indices. If the factors exhibit acceptable independence then you could run a standard regression analysis (backward elimination and stepwise) to check for variable significance.

    If, on the other hand a) and/or b) are not true then if you could provide additional details perhaps I or someone else could offer some additional thoughts.


    Chris RIchardson

    Hi Robert,

    Thanks for your reply, I have a % scrap value and 9 potential root causes, all of which are attribute type values, the % scrap values are not normally distributed therefore I have carried out moods median tests on each of attributes, most of which are returning statistical significant differences between the groups. I’d like to understand which of the 9 attributes have the greatest impact on the % scrap figures to identify where to start the improvements. I’d also like to understand if there are any interdepedancies between the 9 parameters.
    I am unable to carry out any regression analysis as the 9 factors are attributes, if the factors were variable then I could take the R2 value to understand which of the factors has the greatest influence on the % scrap.
    As you can probably tell this stats is fairly new to me, I completed a 6sigma course 8 years ago and have not used my training until recently.
    Appreciate any help you can provide


    Robert Butler

    It would appear that what you are dealing with is an understanding of statistical methods at the intro course boilerplate level. It’s ok as far as it goes but your problem has pushed way past whatever you learned there.

    Additional Information:

    1. When discussing regression non-normality is not an issue as far as the Y’s or the X’s are concerned. In regression, the only place normality enters the discussion is when talking about the residuals.

    2. In regression X variables may be continuous, ordinal, or nominal (attribute).

    The way to proceed is as follows:

    1. Take your attribute measures and code them using dummy variables.
    Example: I have an attribute for value: poor, acceptable, excellent. In order to include value in the regression we make up a dummy variable dumbvalue
    which takes on the following levels:

    poor: dumbvalue1 = 0, dumbvalue2 = 0
    acceptable: dumbvalue1 = 1, dumbvalue2 = 0
    excellent: dumbvalue1 = 0, dumbvalue2 = 1

    and the model we fit is response = fn(dumbvalue1, dumbvalue2)

    If you code your attributes in this manner what you will then need to do is assess the X matrix of the dummy variables in the manner outlined in my first post. Once you know what you can and cannot include in the multivariable analysis you can run the regression analysis as mentioned before.

    With dummy variables you cannot look for curvilinear terms (dummyvalue1*dummyvalue1) but you can look for interactions between the dummy value terms for DIFFERENT attributes in the model. Thus if you had dummy values for “value” and you had another variable “boss happiness” you could look at the interactions of the dummy values for these two (assuming the test of the X matrix indicated the interactions were acceptably independent of the other terms in the multivariable study).

    As for using R2 to understand which of the factors might have the greatest influence on %scrap – don’t. Run a regression analysis (as opposed to just letting the machine run a regression) and assess the final results by examining the residuals and looking at prediction error.

    Before trying any of this I’d recommend a little review of regression methods. Get a copy of Regression Analysis by Example by Chattergee and Price through inter-library loan and read the chapters on Qualitative Variables as Regressors (this is all about dummy variables and their construction), and Analysis of Collinear Data (this is about assessing the X matrix). Given that you have been away from the stats for 8 years I’d also recommend reviewing the chapters on Simple Linear Regression and Detection and Correction of Model Violations: Simple Linear Regression.

    The book is well written and, because the examples include the data, you can put the data on a spreadsheet and run the analysis while following the discussion in the text.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.