# Help with Survey Data

Six Sigma – iSixSigma Forums Operations Call Centers Help with Survey Data

Viewing 12 posts - 1 through 12 (of 12 total)
• Author
Posts
• #54777

Ken
Guest

I need some help.

We send surveys out to customers. The number of surveys sent to a particular population is based on product type. For larger product types, we get a greater number of phone calls so they get a greater number of transaction-based surveys.

Example:

Product A = 2941 surveys sent; 544 responses received
Product B = 118 surveys sent; 10 responses received

The surveys use a 5 point scale. So, my problem is this: if Product B gets a few low scores, the overall average is impacted more than it would with survey A because survey A has a larger population.

I usually just use the total population from all products (I would combine populations from Product A and Product B; likewise, I would combine the responses).

But, when my boss wants me to break them out by product to compare, how do I do that and be fair to the product with the lower population? Is there some weighting I can do.

Any suggestions?

Thanks!

0
#197088

Chris Seider
Participant

I’m not a big fan of quoting averages of scales of 1-5 but you could and state the confidence level. The width of the confidence level will be much larger (wider) for Product B.

0
#197089

Ken
Guest

Chris:

Thanks for the response. So, am I to infer that there isn’t really a good way to compare averages here? Rather, I am better off showing and explaining confidence levels?

I am also not a big fan of surveys like these, but this one was not my call so to speak.

I am also not a big fan of having to explain to my boss each month why Product B has a lower mean satisfaction score than Product A and, said explanation, is that the population is smaller as is the response rate. As such, a few lower sat scores have a bigger impact.

The response is always: well, how can I tell which product is most satisfied? Oy.

0
#197091

Mike Carnell
Participant

It is already weighted in the Standard Error of the Mean calculation. why don’t you figure out how to convey that concept to your boss.

In terms of his question ask him what difference will it make? if people are more satisfied with A but they really are dissatisfied with both does it mean that A is good.

0
#197092

Robert Butler
Participant

There are a number of issues here. Likert scale data is ordinal not interval which means you have to take some care when running analysis with data of this type. The fact that you are just looking at averages would suggest this hasn’t been done.

1. For multiple product comparison using averages you will want to use ANOVA and you will need to do the following:

a. Don’t get cute with empty precision – remember the precision of your final result can be no more precise that the precision of your least precise measure. To be completely true to this you would have to limit your reports to averages rounded up or down to the nearest whole number. The usual practice is to permit a reporting of the averages to the tenths place and to ignore the rest.

b. Increase the cut point for the level of significant differences from .05 to .01. I don’t have the reference handy for this one (today is Sunday) but the short version is that this practice offsets the “granularity” of the Likert Scale.

c. Since you are comparing multiple products with ANOVA you will have to make adjustments to the significance level to take this into account.

Another interesting item:

I’m assuming the example you gave is real and that the “mailed” responses are actually requests to fill in an on-line survey. If this is the case the proportions are curious because on-line surveys ususally have response rates in the 20-30% region whereas physically mailed responses have a response rate in the 10-15% range. Your response rates of 8 and 18% are both on the low end of the electronic response rate and, if representative, might suggest a problem with the way you are approaching the request for feedback.

0
#197093

Robert Butler
Participant

Arrrrrgh! I hit “send” too soon. If you run the ANOVA in the correct fashion what you will very likely see is a large number of products with numerically but not statistically different averages so the answer to you boss may well be, in many instances, that there are no significant changes/differences in product satisfaction and there is no such thing as a single product with which customers are “most satisfied”.

Yet another thought: Are these the same customers and products? If so, then in addition to having Likert data you also have repeated measures data and your approach to the analysis of this data will have to take this into account.

0
#197095

Damon
Guest

10 responses is too small of a sample size to be reliable. 30 is considered the minimum according to central limit theorem and 50 would be even better. If your response rate (8.5%) for Product B holds steady, you would need to send out between 375 and 588 surveys to get the sample sizes referenced above.

0
#197100

Robert Butler
Participant

No, 10 samples is not too small a sample size to be “reliable”. As for the central limit theorem – it applies to distributions of averages – not to distributions of individual samples.

0
#197104

Chris Seider
Participant

@rbutler Nice job on keeping them “straight”.

I long for the days when BB’s would actually know why confidence intervals are…what a p-value comes from and why 30 is a good rule of thumb but not an absolute rule for sampling.

I’m sure MBB’s would be best served if they were exposed to my old colleague’s great assertion of Bloom’s taxonomy.

Not every MBB has the same skills but …

0
#197121

Damon
Guest

10 samples for product B could be reliable if they were obtained randomly and thus truly representative of the underlying population, but in this case they are not since you are relying on customers to respond to a survey. With such a small sample size, you run a huge risk of nonresponse bias (non-responders differ from responders on characteristics of interest). An effort should be made to increase response rates and sample size first to mitigate this risk, particularly if you are looking to make comparisons to Product A. Otherwise, your findings could be seriously flawed.

0
#197122

Nik
Participant

Between the 1-5 scale and the small number of samples, you have to ask yourself, what do you really want to know? Are you trying to see if there is a statistical difference between the answer demographics of Product A and B? Or if Product B is better/worse than Product A?

If you just want to compare the distributions, use a contingency table. It’ll show you expected values (if there was no difference) and show you where the largest variation in values lies (though the individual Chi-Square statistics). Although with only 10 samples, you’ll likely get a sample size error because expected values need to be greater than 5. But with a larger sample, you could compare the two products.

If you just wanted to know if Product B was better or worse than A, you could do a “top shelf” technique where you convert the ordinal data to binomial data. For example, Any scores 4 and above could be converted to “good” (or 1), and any scores 3 and below could be converted to “bad” (or 0) (you decide where you want the “top shelf” to be depending on the conclusion). Then you could do a proportion test.

For example,
Product A had 330 out of 544 favorable responses (scores above 4)
Product B had 7 out of 10 favorable responses

Now you can run a two proportions test or simply calculate the proportion for each value with corresponding confidence intervals.

0
#197124

Robert Butler
Participant

Reliability and bias are two different things but, for the purposes of this discussion, I’ll accept the idea that they are supposed to be one and the same. Given this – the idea that gathering more samples in the same manner will somehow overcome non-response and/or self selection bias is incorrect (for example, recall the 1936 Literary Digest debacle. They had over 2 million samples. The sampling plan suffered from both non-response and self selection bias and the results of the 2 million plus would have been no different that samples orders of magnitude smaller). Both of these things are a fact of life for surveys run/constructed in the manner described by the OP and need to be remembered when looking at the results of any analysis one might run with the resultant data.

0
Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic.