Pete

]]>As you suspect, there are a number of issues with this survey approach. The fact that the company uses the same sample size for regions having differing transaction totals is okay, but there are a number of other concerns with the survey system you have described:

1. Sample size. With only 22 samples from each region, there is enormous uncertainty about the population percentage of customers giving a “5” rating. For example, if a region has 19 out of the 22 surveys where the customer rated a “5” (which comes out to 86%), then the true population percentage of “5” ratings for that region could be anywhere between 65% and 97%. So even though your sample percentage was 86%, you can’t claim that that the population percentage for that region is also 86%–it could be anywhere within that 65-97% confidence interval range.

2. Only counting “5” ratings as successes. A lot of organizations due this–the customers giving a “5” are considered “promoters” who are likely to recommend your company to others. So companies focus only on increasing promoters and as a consequence they consider a “4” rating (or any other rating) to be a failure. However when evaluating regions, your company is treating a region who had 10 out of 22 surveys rating a “5” and the rest of the ratings were a “1” the same as a region with 10 out of 22 surveys rating a “5” and the rest of the surveys rated a “4”. So on top of the statistical uncertainty there is also a loss of information about non “5” raters which can make regions seem similar that are, in fact, very different.

3. Response rate. If the response rate is low then the surveys may have significant non-response bias, meaning that the survey respondents are not necessarily representative of the population. So my question would be: how many customers were called in order to get 22 responses? If your company called, say, 25 people and got 22 responses, then you would have minimal non-response bias. However, if your company had to call hundreds of people to get 22 responses, then the non-response bias could be significant.

4. Population representation. If there are diverse subgroups of customers within each region, then the company must use a “stratified random” sampling approach in order to assure that the 22 samples fairly represent the population of that region. That requires sampling from each subgroup in proportion to the population size of that subgroup. If this is not done, then subgroups can be over-represented or under-represented in the survey results, which means that the results are NOT fairly representing the population of the region.

Based on all of these issues (particularly the low sample size issue), I suspect that your regions are being rewarded or punished primarily due to chance having little to do with their “true” performance with the customers. Only very large differences (outside of the confidence intervals) should be considered significant and the rest should be understood as sampling fluctuations. Most companies don’t understand this and inflict all kinds of bad things onto their staff as a result. W. Edwards Deming ran his famous “Red Bead Experiment” to address this particular type of management misuse of data. It’s far better to not do any customer surveys at all rather than randomly reward or punish people as a result of not properly understanding and interpreting the results.

]]>There is a lot of information out there about constructing surveys but I don’t know off the top of my head of any studies comparing surveys administered by direct points of contact versus a neutral third party. There has been plenty of research showing that people don’t like to let down folks to whom they have a relationship and so “leading questions” that imply a desired response do, in fact, increase the percentage of desired responses. My guess would be the same as yours: that having familiar points of contact administer the survey would result in higher scores than having a neutral third party administer the survey. But I don’t have a reference with data to verify that assumption.

]]>Can you advise where I might be able to find more information about this? I am trying to figure out if surveys given to customers by their direct points of contact within a company might be biased versus surveys sent from a neutral generic company email. Intuitively I would say yes. But I wondered if I was right and if there is any research on this. Can you shed light on this?

Thanks.

]]>For those reasons, personally I would try to keep things much more simple and analyze the data using measures of association which are designed to find relationships between ordinal, discrete data. For example, running a Pearson correlation between your questions would determine which questions show statistically significant correlations (p-values) and will give the strength of the relationship (Pearson’s r). This is very straightforward in Minitab (Stat>Basic Statistics>Correlation) although I imagine that JMP or any other statistical software will have this built-in analysis as well.

As always, keep in mind that survey data is usually very noisy and also potentially biased (particularly if you have a low response rate). So even if you find the “perfect” analysis your conclusions could be inaccurate. If you are trying to find the drivers of customer satisfaction then rather than looking for correlations between different questions and the “overall satisfaction” rating I would recommend asking your customers to rate the importance of your different performance aspects. In other words, ask them to rate the importance of, say, responsiveness, quality of service, reliability of product, ease of use, etc. This will at least give you a direct measure of what you are looking for instead of relying on correlations or regressions to find drivers of overall satisfaction. Again, this will be subject to the usual noise and biases so you’ll need a high response rate to feel confidence about the solutions.

I think the bottom line is that no matter how “sophisticated” the analysis, you are typically dealing with noisy and biased data from surveys. That needs to be at the forefront of everyone’s mind as you try to draw conclusions. Hopefully, you’ll be able to draw some useful, general information about what makes people happy or unhappy but you need to be careful to not be misled. Thus if you have any other “hard” customer data such as complaint data, warranty data, attrition data, etc. then those should be analyzed very thoroughly so that you are not strictly relying on surveys to get information about what your customers are looking for.

Hope that helps…

]]>Thank You.

Some very interesting points and I also recommend the readers to have a look at the 2nd and 3rd chapter in Darrel Huff’s book How to lie with statistics.

Thanks again!

]]>At a sample size of 15, your 2 out of 15 top-box responses give a sample proportion of 13% but the confidence intervals for the “true” population proportion are between 2% and 40%. So there is a huge uncertainty there due to the sample size of only 15 (although it seems that even at the high end of the uncertainty you are still well under the 89.5% goal). Of course, this all assumes that those 15 responses actually represent the population (the 300 people that you serviced). Which leads us to the second issue.

The second issue to me is the bigger problem. At a response rate of 5%, there is a high likelihood of a non-response bias. Only 15 out of 300 people were motivated to answer the survey and so it’s unlikely that those responses are representative of the entire 300. If the motivated 15 had a higher proportion of unsatisfied folks than the “silent majority” of non-responders (often the responders have a negative bias compared to the non-responders), then you wind up getting penalized because of this bias.

Unless response rates are high (> 80%) and statistical uncertainty is taken into account, survey results can be very misleading and can lead to bad decisions, unfair evaluations, and all kinds of other nasty things. It is much, much better to evaluate people based on the concrete things that we know drive customer satisfaction and loyalty: fast responses, short problem resolution times, high quality of service (as defined by specific actions), etc. These attributes CAN be measured accurately and improving those attributes WILL make customers more happy (although this increased happiness may well be missed on a small-sample and/or low response-rate survey).

I think sometimes it’s easy for leadership at a company to throw out surveys as a means of evaluation without really thinking about what they’re doing (and how the misleading results are hurting the employees). This is unfortunate, but also very, very common.

Hope that helps…

]]>To take it a step further, the survey uses a top-box approach, with the goal of 89.5 of respondents rating their overall experience as a 5. So if 2 respondents out of 15 do not award a 5, the performance goal isn’t met. Any thoughts on the merit of a score given by 2 respondents out of a population of 300?

]]>Thanks for the comment. My response to your questions are below:

1. If your sample size is large (>30) then you can use the formula stated in the article. If your sample size is smaller then that, then you should use the t-distribution formula for calculating 95% confidence intervals of the mean. I just now tried to type it in here, but without a math font it’s pretty much unreadable. However, you can find on Google quite easily if needed. Now these formulas assume that the data is more or less normally distributed. If, instead, your data is highly skewed (which is often the case for survey data), then it’s better to use median scores instead of means. In that case, use the confidence interval formula for medians (which you can also find using Google).

2. Be sure that your performance targets are outside the confidence intervals of your baseline data. This is important–if your targets are set within the confidence intervals then you can hit or miss them based purely on chance. I’ve seen many cases where the maximum value on the scale (e.g. 10 in a 1-10 scale) falls within the confidence intervals of the baseline data. This indicates that the sample size is too small to distinguish any improvement in customer satisfaction.

3. If you have hit your target, run a 2-sample t test on the “before” and “after” data to determine whether or not the improvement is statistically significant. If you are using median scores instead of means, then run a Mood’s Median test. A p-value less than 0.05 indicates that you can be more than 95% certain that the target was reached due to a real improvement in scores and not due to a statistical fluctuation of the data.

Hope that helps–let me know if you have any additional questions…

— Rob

]]>1. How would you measure the margin of error of a customer satisfaction metric that uses a mean score (e.g. 7.6, scale: 1 to 10). Do you apply the same formula?

2. Setting targets is always tricky. How would you set a target for a customer satisfaction metric? I assume you can’t just multiple your baseline by +3% or 10% (not very scientific…)? Instead, and from your article, would you set a target that is ‘significant’ to reach?

3. And finally how do you know when your metric is beating your ‘targets’ with a confidence level of 95%? (do you have to factor in 2 margin of errors, one for the target and one for actuals?)

Thank you

]]>