iSixSigma

ANOVA for count data

Six Sigma – iSixSigma Forums Old Forums General ANOVA for count data

Viewing 17 posts - 1 through 17 (of 17 total)
  • Author
    Posts
  • #47315

    Fonseca
    Participant

    I need to perform a “two-way ANOVA” but my response data are count (number of defects). Even if tha ANOVA assumptions are satisfied, is there any risk in using it with count data ?  Could you tell me if there is a better statistical procedure to use in this case ?
    Thank you,
    Marcelo

    0
    #157685

    Snow
    Participant

    Use anova with a continous response variable (s) Y and more than 2  levels for a discreet X…..i dont believe your anova is correct here…i believe you are looking for something more in line with a chi square test….good luck.

    0
    #157688

    Fonseca
    Participant

    Thanks.
    In fact, I need to study possible interactions, what I cannot do if I use Chi-Sq.  I have more than 2 levels for each one of the 2 factors.

    0
    #157694

    Contingency tables
    Participant

    Do some research on categorical data analysis, in particular three-way contingency tables (to start with). Potentially the Cochran-Mantel-Haenszel-Test (2x2xk table) may work for you. Otherwise check for logit models. Finally, loglinear models for contingency tables are also an option. Two-factor interaction terms will be okay to analyze with these tools. With three-factor interaction it’ll become complicated.

    0
    #157696

    Fonseca
    Participant

    Thank you.  It is a completely new procedure to me. I am not sure if I can find a statistical package that is able to help me with it. But I will try for sure.
     

    0
    #157701

    Craig
    Participant

    I have seen ANOVA used on Likert data (a rating scale from 1 to 7 as an example). I once did a 4 factor DOE and the response was the number of solder balls on the surface of a PCB (near a through-hole connector) The DOE software used was Design Expert. All model adequacy checks were satified in terms of residuals analysis, etc.
    My humble opinion is that it is safe to use ANOVA for your data as long as you satisfy the conditions of normally distributed residuals, etc.
     

    0
    #157706

    Contingency table
    Participant

    Marcelino,
    it would be helpful to see the first 5 or 6 rows of your data including the columns. Also, wiith two-way ANOVA you may run into problems if you have an unbalanced design. You’ll get an error message in Minitab if you use that option and the design is not balanced.
    Hacl,
    Likert scale data has an underlying scale. In this respect it is not count data. You can make the assumption that the underlying scale is ordinal or interval. There are arguments for both. Also, a true Likert scale is more than just the 1 thru 7 response categories. Likert scales need to be validated and require scaling procedures. They are, however, never discussed, so that we are now taught (even in Phd level methodology classes) that the response options 1 – 7 are what makes the Likert scale a “scale”. Not a big deal, but at least it should be mentioned.

    0
    #157707

    Contingency table
    Participant

    Marcelino,
    One more comment: If you run your data as an ANOVA make sure to also take into consideration if you are dealing with fixed or random factors. This may make a difference in your critical F-value and subsequent p-value calculation.

    0
    #157713

    Fake Gary Alert
    Participant

    Please  guide me on how  to  calculate  the  P-Value,thanks

    0
    #157714

    Fake Gary Alert
    Participant

    What  do you  suggest  to  calculate  the  figueres in  the  ANOVA  table  for  better  understanding  or  just  to  use the  Minintab?

    0
    #157734

    Fonseca
    Participant

    Thank you, “Contingency”.
    I will use one of my factors as a random factor.
     

    0
    #157747

    Craig
    Participant

    Contingency Table,
    Thanks for the inputs. I suppose I was illustrating examples of using non-continuous data (Counts, likert scales, etc.)
    Your point about the Likert scales sounds interesting. If you could, elaborate on it using this example:
    I want to do a DOE where “discoloration” is my response. If I classify the levels of discoloration from 1 to 7 where 1 is perfect and 7 is worst case, is this ordinal? If I can safely do ANOVA with this response, can I also do ANOVA on count data where this also might be classified as ordinal? (the more defects, the worse it is)
    Thanks,
    HACL

    0
    #157755

    Contingency table
    Participant

    hacl,
    in a nutshell, with “likert scale type” data you sample two things: response items out of a universer of response items characterizing your response domain and responses from respondents. Example: Satisfaction with customer service: You sample out of the domain of satisfaction asking about rep knowledge, courtesy, access etc.. These responses should add up to a scale that measures “Satisfaction with customer service”. This is the true scale that you target. The responses 1 thru 7 are just response options for a response item that is part of a response scale. As a result, it is not the 1 thru 7 response options, but the responses on the combined scale that is ordinal or interval. There are various procedures including Coomb’s methods, item response theory etc. to establish the “scale” Based on these methods you determine what scale level you have and how you come up with a scale (summation of responses, average responses etc.
    In your practical case, you’ll assume that the one item represents the domain adequately and assume that the responses 1 thru 7 have equidistance or at least have a rank order. As you can imagine, measuring one item representing a whole response domain is risky business, but that is what practicioner’s do. Nevertheless, Likert scaling is one of the most misunderstood ideas in Six Sigma because it involves psychometric measurement theory and differs from classical operational definitions and subsequent Gage R&R study which derive from a whole measurement theory (operationism) in itself.. I hope this helps.
     

    0
    #157759

    Contingency table P.S.
    Participant

    Likert used the internal consistency method to determine if items truly represented a scale. He then used the summation of the scale items (summated ratings) to come up with the overall scale. The subsequent statistical analysis was performed on the summated response of the survey scale, which could then be converted into z-tables. When the responses are normally distributed and can be converted into z-scores the conditions of interval scale are satisfied and parametric techniques like ANOVA can be applied. Minitab 15.0 finally included internal conistency analysis into their statistics menu (it was only developed in the 1920s … go figure:-).

    0
    #157763

    Craig
    Participant

    Contingency Table,
    Very informative post once again. I agree that the Likert concept is not fully understood as evidenced by my posts!  I googled this topic and found a paper that decribes the dimensionality of scales. Uni-dimensional scales are things like height, weight, etc. The article also speaks of multidimensional models like intelligence. (Math and Verbal as an example). It seems like the fields of study such as pyschology and sociology would use REAL likert scales.
    Although this multidimensional scaling might work well for surveys, intelligence models, etc., I haven’t seen much of an application in manufacturing. I typically will treat responses independently, and use the DOE sofware to optimize them simultaneously. (Each response might have it’s own desired state such as maximize, minimize, hit a target, etc.) If I have a process with 3 key outputs like average etch rate, etch uniformity across wafer, and staining, it seems less effective to co-mingle these into one response if each has it’s own desired state.
    In this case I would use 3 separate responses, 2 of which are continuous and 1 of which is a uni-dimensional scale.
    HACL
     

    0
    #157765

    Contingency table
    Participant

    hacl,
    you are correct. in your manufacturing case you don’t have an underlying dimension (such as intelligence, i.e. a hypothetical construct) but a tangible object. In this case the operational definitions apply (your three items/”dimensions”), and you need to ensure that whatever you measure meets the gage criteria (reproducibility/repeatibility). the internal conistency measures are the equivalent measure of reproducibility in survey research (by having consistent responses to let’s say 10 similar questions, you “test out” if the respondent is consistent in the responses or in terms of engineering reproduces the response). the dimensionality of the test (math/verbal etc.) is established via a factor analysis. as a result, a survey should be first factor analyzed, and the factor dimensions should be subjected to an internal consistency analysis. the summated scores of the factors (you could even use the summated factor scores) is what goes as a response variable into an GLM. the factor analysis also gives you information about the question if the underlying construct (verbal/math intelligence etc.) exist. here is where it may become tautological because the dimension is defined by the factor. there are additional tools such as validity matrices that determine to what degree your definitions are circular. key point: when it comes to survey research, get a phd involved, the black belt education as good as it is, is inadequate for those purposes AND is based on a different measurement model. Good luck!

    0
    #157771

    Craig
    Participant

    Thanks for the insight. Very much appreciated!

    0
Viewing 17 posts - 1 through 17 (of 17 total)

The forum ‘General’ is closed to new topics and replies.