# ANOVA for count data

Six Sigma – iSixSigma › Forums › Old Forums › General › ANOVA for count data

- This topic has 16 replies, 7 voices, and was last updated 12 years, 8 months ago by Craig.

- AuthorPosts
- June 19, 2007 at 7:40 pm #47315

FonsecaParticipant@Marcelo**Include @Marcelo in your post and this person will**

be notified via email.I need to perform a “two-way ANOVA” but my response data are count (number of defects). Even if tha ANOVA assumptions are satisfied, is there any risk in using it with count data ? Could you tell me if there is a better statistical procedure to use in this case ?

Thank you,

Marcelo0June 19, 2007 at 7:51 pm #157685Use anova with a continous response variable (s) Y and more than 2 levels for a discreet X…..i dont believe your anova is correct here…i believe you are looking for something more in line with a chi square test….good luck.

0June 19, 2007 at 8:26 pm #157688

FonsecaParticipant@Marcelo**Include @Marcelo in your post and this person will**

be notified via email.Thanks.

In fact, I need to study possible interactions, what I cannot do if I use Chi-Sq. I have more than 2 levels for each one of the 2 factors.0June 19, 2007 at 9:21 pm #157694

Contingency tablesParticipant@Contingency-tables**Include @Contingency-tables in your post and this person will**

be notified via email.Do some research on categorical data analysis, in particular three-way contingency tables (to start with). Potentially the Cochran-Mantel-Haenszel-Test (2x2xk table) may work for you. Otherwise check for logit models. Finally, loglinear models for contingency tables are also an option. Two-factor interaction terms will be okay to analyze with these tools. With three-factor interaction it’ll become complicated.

0June 19, 2007 at 9:40 pm #157696

FonsecaParticipant@Marcelo**Include @Marcelo in your post and this person will**

be notified via email.Thank you. It is a completely new procedure to me. I am not sure if I can find a statistical package that is able to help me with it. But I will try for sure.

0June 19, 2007 at 11:40 pm #157701I have seen ANOVA used on Likert data (a rating scale from 1 to 7 as an example). I once did a 4 factor DOE and the response was the number of solder balls on the surface of a PCB (near a through-hole connector) The DOE software used was Design Expert. All model adequacy checks were satified in terms of residuals analysis, etc.

My humble opinion is that it is safe to use ANOVA for your data as long as you satisfy the conditions of normally distributed residuals, etc.

0June 20, 2007 at 3:16 am #157706

Contingency tableParticipant@Contingency-table**Include @Contingency-table in your post and this person will**

be notified via email.Marcelino,

it would be helpful to see the first 5 or 6 rows of your data including the columns. Also, wiith two-way ANOVA you may run into problems if you have an unbalanced design. You’ll get an error message in Minitab if you use that option and the design is not balanced.

Hacl,

Likert scale data has an underlying scale. In this respect it is not count data. You can make the assumption that the underlying scale is ordinal or interval. There are arguments for both. Also, a true Likert scale is more than just the 1 thru 7 response categories. Likert scales need to be validated and require scaling procedures. They are, however, never discussed, so that we are now taught (even in Phd level methodology classes) that the response options 1 – 7 are what makes the Likert scale a “scale”. Not a big deal, but at least it should be mentioned.0June 20, 2007 at 3:24 am #157707

Contingency tableParticipant@Contingency-table**Include @Contingency-table in your post and this person will**

be notified via email.Marcelino,

One more comment: If you run your data as an ANOVA make sure to also take into consideration if you are dealing with fixed or random factors. This may make a difference in your critical F-value and subsequent p-value calculation.0June 20, 2007 at 6:32 am #157713

Fake Gary AlertParticipant@Fake-Gary-Alert**Include @Fake-Gary-Alert in your post and this person will**

be notified via email.Please guide me on how to calculate the P-Value,thanks

0June 20, 2007 at 6:34 am #157714

Fake Gary AlertParticipant@Fake-Gary-Alert**Include @Fake-Gary-Alert in your post and this person will**

be notified via email.What do you suggest to calculate the figueres in the ANOVA table for better understanding or just to use the Minintab?

0June 20, 2007 at 2:18 pm #157734

FonsecaParticipant@Marcelo**Include @Marcelo in your post and this person will**

be notified via email.Thank you, “Contingency”.

I will use one of my factors as a random factor.

0June 20, 2007 at 7:00 pm #157747Contingency Table,

Thanks for the inputs. I suppose I was illustrating examples of using non-continuous data (Counts, likert scales, etc.)

Your point about the Likert scales sounds interesting. If you could, elaborate on it using this example:

I want to do a DOE where “discoloration” is my response. If I classify the levels of discoloration from 1 to 7 where 1 is perfect and 7 is worst case, is this ordinal? If I can safely do ANOVA with this response, can I also do ANOVA on count data where this also might be classified as ordinal? (the more defects, the worse it is)

Thanks,

HACL0June 20, 2007 at 9:08 pm #157755

Contingency tableParticipant@Contingency-table**Include @Contingency-table in your post and this person will**

be notified via email.hacl,

in a nutshell, with “likert scale type” data you sample two things: response items out of a universer of response items characterizing your response domain and responses from respondents. Example: Satisfaction with customer service: You sample out of the domain of satisfaction asking about rep knowledge, courtesy, access etc.. These responses should add up to a scale that measures “Satisfaction with customer service”. This is the true scale that you target. The responses 1 thru 7 are just response options for a response item that is part of a response scale. As a result, it is not the 1 thru 7 response options, but the responses on the combined scale that is ordinal or interval. There are various procedures including Coomb’s methods, item response theory etc. to establish the “scale” Based on these methods you determine what scale level you have and how you come up with a scale (summation of responses, average responses etc.

In your practical case, you’ll assume that the one item represents the domain adequately and assume that the responses 1 thru 7 have equidistance or at least have a rank order. As you can imagine, measuring one item representing a whole response domain is risky business, but that is what practicioner’s do. Nevertheless, Likert scaling is one of the most misunderstood ideas in Six Sigma because it involves psychometric measurement theory and differs from classical operational definitions and subsequent Gage R&R study which derive from a whole measurement theory (operationism) in itself.. I hope this helps.

0June 20, 2007 at 9:33 pm #157759

Contingency table P.S.Participant@Contingency-table-P.S.**Include @Contingency-table-P.S. in your post and this person will**

be notified via email.Likert used the internal consistency method to determine if items truly represented a scale. He then used the summation of the scale items (summated ratings) to come up with the overall scale. The subsequent statistical analysis was performed on the summated response of the survey scale, which could then be converted into z-tables. When the responses are normally distributed and can be converted into z-scores the conditions of interval scale are satisfied and parametric techniques like ANOVA can be applied. Minitab 15.0 finally included internal conistency analysis into their statistics menu (it was only developed in the 1920s … go figure:-).

0June 20, 2007 at 10:24 pm #157763Contingency Table,

Very informative post once again. I agree that the Likert concept is not fully understood as evidenced by my posts! I googled this topic and found a paper that decribes the dimensionality of scales. Uni-dimensional scales are things like height, weight, etc. The article also speaks of multidimensional models like intelligence. (Math and Verbal as an example). It seems like the fields of study such as pyschology and sociology would use REAL likert scales.

Although this multidimensional scaling might work well for surveys, intelligence models, etc., I haven’t seen much of an application in manufacturing. I typically will treat responses independently, and use the DOE sofware to optimize them simultaneously. (Each response might have it’s own desired state such as maximize, minimize, hit a target, etc.) If I have a process with 3 key outputs like average etch rate, etch uniformity across wafer, and staining, it seems less effective to co-mingle these into one response if each has it’s own desired state.

In this case I would use 3 separate responses, 2 of which are continuous and 1 of which is a uni-dimensional scale.

HACL

0June 21, 2007 at 12:07 am #157765

Contingency tableParticipant**Include @Contingency-table in your post and this person will**

be notified via email.hacl,

you are correct. in your manufacturing case you don’t have an underlying dimension (such as intelligence, i.e. a hypothetical construct) but a tangible object. In this case the operational definitions apply (your three items/”dimensions”), and you need to ensure that whatever you measure meets the gage criteria (reproducibility/repeatibility). the internal conistency measures are the equivalent measure of reproducibility in survey research (by having consistent responses to let’s say 10 similar questions, you “test out” if the respondent is consistent in the responses or in terms of engineering reproduces the response). the dimensionality of the test (math/verbal etc.) is established via a factor analysis. as a result, a survey should be first factor analyzed, and the factor dimensions should be subjected to an internal consistency analysis. the summated scores of the factors (you could even use the summated factor scores) is what goes as a response variable into an GLM. the factor analysis also gives you information about the question if the underlying construct (verbal/math intelligence etc.) exist. here is where it may become tautological because the dimension is defined by the factor. there are additional tools such as validity matrices that determine to what degree your definitions are circular. key point: when it comes to survey research, get a phd involved, the black belt education as good as it is, is inadequate for those purposes AND is based on a different measurement model. Good luck!0June 21, 2007 at 4:41 am #157771Thanks for the insight. Very much appreciated!

0 - AuthorPosts

The forum ‘General’ is closed to new topics and replies.