Help with Survey Data
Six Sigma – iSixSigma › Forums › Operations › Call Centers › Help with Survey Data
 This topic has 11 replies, 4 voices, and was last updated 8 years, 4 months ago by Robert Butler.

AuthorPosts

June 27, 2014 at 11:15 am #54777
KenGuest@BookGuy306 Include @BookGuy306 in your post and this person will
be notified via email.I need some help.
We send surveys out to customers. The number of surveys sent to a particular population is based on product type. For larger product types, we get a greater number of phone calls so they get a greater number of transactionbased surveys.
Example:
Product A = 2941 surveys sent; 544 responses received
Product B = 118 surveys sent; 10 responses receivedThe surveys use a 5 point scale. So, my problem is this: if Product B gets a few low scores, the overall average is impacted more than it would with survey A because survey A has a larger population.
I usually just use the total population from all products (I would combine populations from Product A and Product B; likewise, I would combine the responses).
But, when my boss wants me to break them out by product to compare, how do I do that and be fair to the product with the lower population? Is there some weighting I can do.
Any suggestions?
Thanks!
0June 27, 2014 at 11:26 am #197088
Chris SeiderParticipant@cseider Include @cseider in your post and this person will
be notified via email.I’m not a big fan of quoting averages of scales of 15 but you could and state the confidence level. The width of the confidence level will be much larger (wider) for Product B.
0June 27, 2014 at 11:34 am #197089
KenGuest@BookGuy306 Include @BookGuy306 in your post and this person will
be notified via email.Chris:
Thanks for the response. So, am I to infer that there isn’t really a good way to compare averages here? Rather, I am better off showing and explaining confidence levels?
I am also not a big fan of surveys like these, but this one was not my call so to speak.
I am also not a big fan of having to explain to my boss each month why Product B has a lower mean satisfaction score than Product A and, said explanation, is that the population is smaller as is the response rate. As such, a few lower sat scores have a bigger impact.
The response is always: well, how can I tell which product is most satisfied? Oy.
0June 28, 2014 at 10:06 pm #197091
Mike CarnellParticipant@MikeCarnell Include @MikeCarnell in your post and this person will
be notified via email.It is already weighted in the Standard Error of the Mean calculation. why don’t you figure out how to convey that concept to your boss.
In terms of his question ask him what difference will it make? if people are more satisfied with A but they really are dissatisfied with both does it mean that A is good.
0June 29, 2014 at 6:40 am #197092
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.There are a number of issues here. Likert scale data is ordinal not interval which means you have to take some care when running analysis with data of this type. The fact that you are just looking at averages would suggest this hasn’t been done.
1. For multiple product comparison using averages you will want to use ANOVA and you will need to do the following:
a. Don’t get cute with empty precision – remember the precision of your final result can be no more precise that the precision of your least precise measure. To be completely true to this you would have to limit your reports to averages rounded up or down to the nearest whole number. The usual practice is to permit a reporting of the averages to the tenths place and to ignore the rest.
b. Increase the cut point for the level of significant differences from .05 to .01. I don’t have the reference handy for this one (today is Sunday) but the short version is that this practice offsets the “granularity” of the Likert Scale.
c. Since you are comparing multiple products with ANOVA you will have to make adjustments to the significance level to take this into account.
Another interesting item:
I’m assuming the example you gave is real and that the “mailed” responses are actually requests to fill in an online survey. If this is the case the proportions are curious because online surveys ususally have response rates in the 2030% region whereas physically mailed responses have a response rate in the 1015% range. Your response rates of 8 and 18% are both on the low end of the electronic response rate and, if representative, might suggest a problem with the way you are approaching the request for feedback.
0June 29, 2014 at 6:52 am #197093
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.Arrrrrgh! I hit “send” too soon. If you run the ANOVA in the correct fashion what you will very likely see is a large number of products with numerically but not statistically different averages so the answer to you boss may well be, in many instances, that there are no significant changes/differences in product satisfaction and there is no such thing as a single product with which customers are “most satisfied”.
Yet another thought: Are these the same customers and products? If so, then in addition to having Likert data you also have repeated measures data and your approach to the analysis of this data will have to take this into account.
0June 30, 2014 at 7:24 am #19709510 responses is too small of a sample size to be reliable. 30 is considered the minimum according to central limit theorem and 50 would be even better. If your response rate (8.5%) for Product B holds steady, you would need to send out between 375 and 588 surveys to get the sample sizes referenced above.
0June 30, 2014 at 3:01 pm #197100
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.No, 10 samples is not too small a sample size to be “reliable”. As for the central limit theorem – it applies to distributions of averages – not to distributions of individual samples.
0July 1, 2014 at 6:34 am #197104
Chris SeiderParticipant@cseider Include @cseider in your post and this person will
be notified via email.@rbutler Nice job on keeping them “straight”.
I long for the days when BB’s would actually know why confidence intervals are…what a pvalue comes from and why 30 is a good rule of thumb but not an absolute rule for sampling.
I’m sure MBB’s would be best served if they were exposed to my old colleague’s great assertion of Bloom’s taxonomy.
Not every MBB has the same skills but …
0July 7, 2014 at 8:28 am #19712110 samples for product B could be reliable if they were obtained randomly and thus truly representative of the underlying population, but in this case they are not since you are relying on customers to respond to a survey. With such a small sample size, you run a huge risk of nonresponse bias (nonresponders differ from responders on characteristics of interest). An effort should be made to increase response rates and sample size first to mitigate this risk, particularly if you are looking to make comparisons to Product A. Otherwise, your findings could be seriously flawed.
0July 7, 2014 at 5:02 pm #197122Between the 15 scale and the small number of samples, you have to ask yourself, what do you really want to know? Are you trying to see if there is a statistical difference between the answer demographics of Product A and B? Or if Product B is better/worse than Product A?
If you just want to compare the distributions, use a contingency table. It’ll show you expected values (if there was no difference) and show you where the largest variation in values lies (though the individual ChiSquare statistics). Although with only 10 samples, you’ll likely get a sample size error because expected values need to be greater than 5. But with a larger sample, you could compare the two products.
If you just wanted to know if Product B was better or worse than A, you could do a “top shelf” technique where you convert the ordinal data to binomial data. For example, Any scores 4 and above could be converted to “good” (or 1), and any scores 3 and below could be converted to “bad” (or 0) (you decide where you want the “top shelf” to be depending on the conclusion). Then you could do a proportion test.
For example,
Product A had 330 out of 544 favorable responses (scores above 4)
Product B had 7 out of 10 favorable responsesNow you can run a two proportions test or simply calculate the proportion for each value with corresponding confidence intervals.
0July 8, 2014 at 5:40 pm #197124
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.Reliability and bias are two different things but, for the purposes of this discussion, I’ll accept the idea that they are supposed to be one and the same. Given this – the idea that gathering more samples in the same manner will somehow overcome nonresponse and/or self selection bias is incorrect (for example, recall the 1936 Literary Digest debacle. They had over 2 million samples. The sampling plan suffered from both nonresponse and self selection bias and the results of the 2 million plus would have been no different that samples orders of magnitude smaller). Both of these things are a fact of life for surveys run/constructed in the manner described by the OP and need to be remembered when looking at the results of any analysis one might run with the resultant data.
0 
AuthorPosts
You must be logged in to reply to this topic.