Why 5 Star or Bust?

You may have noticed a trend in customer feedback surveys - they really want 5 stars. 4 star is garbage. Or on a 10 point scale, they really want that 9 or 10. I even saw a sign at CVS, tapped the the pharmacy counter, that had a smiley face next to the 10 (and maybe 9) and frowny faces on all other scores. A frowny face for a 8! Thats a sold B, above average.

So what is causing this trend? The short answer: Net Promoter Score. Some marketing/consulting/probably full of it company decided the average wasn't good enough for them. Neither was the median or the mode. They wanted to make an entirely new metric. In this metric, on a ten point scale, your 10s and 9s are +1. Your 8s and 7s are neutral, worth zero points. 6 and below are -1. Then, you average your new series of -1s,0s, and 1s to get a NPS.

The concept isn't entirely bad. The question is often phrased as "On a scale of 0-10, how likely would you be to recommend us to your friends/family?" The idea is your promoters (10s and 9s) will go sing your praises and recruit their friends as customers. 8s and 7s do nothing. 6s and below are your detractors, and will probably tell people not to shop with you.

I will try to be nice, I am sure there are some practical applications for this Frankenstein calculation. But let me detail why I hate it logically, mathematically, and morally


1. Logically

Why they didn't change the responses to "likely, not likely, meh, would not, and definitely would not" is beyond me. I mean, clearly we should impose an artificial value system to numbers instead of asking directly what we want.

The flaw here is that most customers think an 8/10 is pretty good. Maybe I'm harsh, but even a 6 isn't horrible. I would rank a decent amount of fast food as 6/10 and I continue to eat it. I have given 4 star reviews to good items I would recommend to friends. 5 stars is perfect and hard to accomplish.

So if the customer is giving you feedback that they interpret as meaning one thing, and you interpret to mean a different thing, how is this effective?

2. Mathematically

This is the most frustrating because it is clearly what the users of this metric do not understand. There are three main areas to address:

  • Distribution. A score of 0 could mean that you half 7s and half 8s (averaging to 7.5). Or you could have half 9s and half 0s (averaging to 4.5) and still have a NPS of 0. Or you could have half 6s and half 7s (6.5 average) and a -50% NPS. I'll show other more flushed out examples later, but i think this conveys the idea that a NPS score tells you nothing of your mean, variance, mode, or median.
  • Sample size. The law of large numbers tells us that over time your average will approach the true mean at a rate related to your sample size. Also, given your sample, you can tell from the standard deviation how confident you are in your average. This is not true for NPS.
  • Accuracy. It is possible to never know the true NPS. Let's take a trivial example of 3 people and you only sample 2. The scores are 6,6, and 10. Your true NPS is -1/3. In your samples, you will either get -1, 0, or 0. While this may seem trivial, imagine surveying a team of 10 people (yes, I have seen supposedly smart people actually look at this small of a sample). Let's imagine 8 fill our the survey, and half the team is a 6s and the other half 10s. Your true score is 0, but depending on who the 2 people are that miss the survey your score could be -20%, 0, or 20% (five 6s and three 10s, four of each, and three 6s five 10s). In contrast, your averages would be 7.5, 8, or 8.5.


So that was long winded and didn't have pictures. Let's change that. Here are 9 sample probability distributions. For each distribution, one has an average of 5, another 7 and the last 9. I picked:

  1. Uniform - equal probability of each response. However, I modified this and made it a linear distribution so I could achieve different averages. Forgive me for continuing to call it uniform.
  2. Normal - Bell Curve, Gaussian, that nice symmetrical one everyone loves. However, again I had to change the average and I did use a skewed normal for 2 of the scenarios.
  3. Bivariate Normal - two normal curves smooshed on top of each other. This represents two distinct populations with in the sample. You could think of this as your optimists who give everything a high score vs your pessimists who already rate low. Or it could be your average employee vs your management (if management's performance is tied to the metric, they might be incentivized to skew it up)

Since we made the distributions, we know the true mean and NPS*. Let's run samples and see how long it takes us to get to these averages. On the graphs, blue is average and red is NPS. Averages go to the left axis and NPS goes to the right axis.

*Averages are 5, 5, 5, 7, 7, 7, 9, 9, 9. NPSs are -0.45, -0.82, -0.60, -0.04, -0.20, -0.05, 0.70, 0.72, 0.63

Interesting. Notice how the mean approaches the true value faster and with a bit less volatility. Also, note the scale. Averages go from the mean ±2 (representing 40% of the scale) and NPSs go from NPS ±40% (again, 40% of the scale).

Let's consider the middle case of the normal average 7. After sampling about 50 people the mean starts to level out close to the true value. After this point, your mean is always between ~6.7 and 7. However, if you stoped sampling at 50 your NPS is ~-25%. At 100 your NPS is ~-32%. The actual value is supposed to be -20%. We can tell that both of these samples didn't get us a very good representation of the NPS, and doubling our sample actually made our answer less accurate.


But this is just one simulation. Let's run the simulation multiple times. Let's run the simulation 100 times for each sample size. On the graph below, the x axis represents the number of samples carried out. The lines represent the 90% confidence interval, the 50% inter quartile, and the average (in other words, the lines in descending order are 95th percentile, 75th percentile, average as a solid line, median as a dotted line, 25th percentile, and 5th percentile). Blue is average and red is NPS.

So, looking at the middle graph, after running 100 simulations of sampling 25 people, 90% of simulations had averages between 6.5 and 7.5 (the 5th and 95th percentile). However, NPS fell between -10% and -50% 90% of the time. Comparing the lines, you can see the average approaches the true value a lot faster and more consistently.

Why? Because it is not a stable metric. This is why we don't use mode when analyzing survey results over time. Consider this sample population that is all 5s or 9s. The average is 7. You are going to sample an odd number of people. The mode will just depend on which you happen to get more of, either a 5 or 9. It has a 50/50 chance of being a 5 or 9. The change in the metric reflects the change in sampling, not a change in the population. The same is true of NPS. The true NPS is 0. However, sampling an odd number of people you will never get a 0 NPS. Wether it's above or below 0 only depends on the sample, and how much above or below it is only depends on the number of people you sampled.

3. Morally

So as I detailed, the math of this metric is questionable at times. You can get unexpected and random results. Your score may be up or down for no reason - and that's why management likes it. It moves. It does something. The average can only move so much between months (assuming nothing changes). But not NPS! A change in sample can lead to big swings. Consider how management would respond to the three scenarios:

  • Big increase. Whoa! We did awesome and I am a great department head and have increased customer service through my initiatives. Give me more money!
  • Big decrease. Wow! We need to change. Let's reorganize the company and move key players to important positions and get rid of those we don't need. I am super important now because you need me to fix this. When, due to volatility, the metric inevitable swings back up, I will claim my actions fixed it and it will be really hard to disprove that!
  • No meaningful change. Eh, cool? I guess thats good, but it is not showing me adding value and it is not presenting a crisis for me to fix and show off...


Final Thoughts on Net Promoter Score

Don't use it. It's stupid. I know people say "it's supposed to be responsive." I concede that it is responsive, but not always to true signal. They might as well say "its supposed to be volatile and a bit random."

If you company forces you to adopt it, let them know how unstable the metric is. And congratulations, you have realized your company is susceptible to whatever management trends are going around. Your company probably went through Lean and Six Sigma at some point and those were slowly forgotten. Your management team wants to do better but is unclear how they actually add value, so they will keep adopting things and waving their hands around. Ask yourself, is it working or are they just reshuffling the deck?

Also, when filling our surveys, don't play there game. If you really like the company, know that a 8 could break their heart. Or if you had a mediocre customer service call, a 6 could count against them despite being above average.

But for the love of Euclid, let's not make this black mirror where it is a 5 star or bust system.



Code


#imports and initial condition import pandas as pdimport numpy as npfrom random import choicesfrom math import factorialimport os
import matplotlib.pyplot as pltfrom matplotlib.pyplot import figurefrom matplotlib.ticker import FuncFormatter
#Survey scores to returnpop = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]nps_pop = [-1, -1, -1, -1, -1, -1, -1, 0, 0, 1, 1]#titles = ['Uniform - Average 5','Uniform - Average 7','Uniform - Average 9',# 'Normal - Average 5','Normal - Average 7','Normal - Average 9',# 'Bivariate - Average 5','Bivariate - Average 7','Bivariate - Average 9']titles = ['Uniform - Average 5','Normal - Average 5','Bivariate - Average 5', 'Uniform - Average 7','Normal - Average 7','Bivariate - Average 7', 'Uniform - Average 9','Normal - Average 9','Bivariate - Average 9']cols = ['Sample','Average','1%','5%','10%','25%','50%','75%','90%','95%','99%']
#univariate distribnution wgtU5 = [1/11, 1/11, 1/11, 1/11, 1/11, 1/11,1/11,1/11,1/11,1/11,1/11]wgtU7 = [x * 2/110 for x in pop]wgtU9 = [0, 0, 0, 0, 0, 0, 0, .1, .2, .3, .4]
#Binomial (Normal) distributionsn = len(pop)-1p = .3wgtN3 = [factorial(n)/(factorial(n-x)*factorial(x))*(p**x)*((1-p)**(n-x)) for x in pop]p = .5wgtN5 = [factorial(n)/(factorial(n-x)*factorial(x))*(p**x)*((1-p)**(n-x)) for x in pop]p = .7wgtN7 = [factorial(n)/(factorial(n-x)*factorial(x))*(p**x)*((1-p)**(n-x)) for x in pop]p = .8wgtN8 = [factorial(n)/(factorial(n-x)*factorial(x))*(p**x)*((1-p)**(n-x)) for x in pop]p = .9wgtN9 = [factorial(n)/(factorial(n-x)*factorial(x))*(p**x)*((1-p)**(n-x)) for x in pop]p = 1wgtN10 = [factorial(n)/(factorial(n-x)*factorial(x))*(p**x)*((1-p)**(n-x)) for x in pop]
#BiVariate distributionwgtB5 = [(x + y)/2 for x, y in zip(wgtN3, wgtN7)]wgtB7 = [(x + y)/2 for x, y in zip(wgtN5, wgtN9)]wgtB9 = [(x + y)/2 for x, y in zip(wgtN8, wgtN10)]
#dist_list = [wgtU5,wgtU7,wgtU9,wgtN5,wgtN7,wgtN9,wgtB5,wgtB7,wgtB9]dist_list = [wgtU5,wgtN5,wgtB5,wgtU7,wgtN7,wgtB7,wgtU9,wgtN9,wgtB9]
nps_total_val = []avg_total_val = [5,5,5,7,7,7,9,9,9]for x in dist_list: temp = [(x*y) for x, y in zip(nps_pop, x)] nps_total_val.append(np.sum(temp))
#simulate sample and keep track of averages as you draw one more def sim_sample_time(sample = 50, population = pop, dist=wgtN5): results = [] results_avg = [] nps = [] nps_avg = [] for x in range(0,sample): y = choices(pop, dist) y = y[0] if y >=9: z = 1 elif y<=6: z = -1 else: z = 0 results.append(y) results_avg.append(np.average(results)) nps.append(z) nps_avg.append(np.average(nps)) return results_avg,nps_avg

#function for summing hands (consdiering Aces)def sim_mult_sample(iterations = 100, sample = 50, population = pop, dist=wgtN5): iters_res = [] iters_nps = [] for x in range(0,iterations): results_avg,nps_avg = sim_sample_time(sample = sample, population = population, dist=dist) iters_res.append(results_avg[-1]) iters_nps.append(nps_avg[-1]) summary_res = [sample,np.average(iters_res), np.percentile(iters_res, 1),np.percentile(iters_res, 5), np.percentile(iters_res, 10),np.percentile(iters_res, 25),np.percentile(iters_res, 50), np.percentile(iters_res, 75),np.percentile(iters_res, 90),np.percentile(iters_res, 95), np.percentile(iters_res, 99)] summary_nps = [sample,np.average(iters_nps), np.percentile(iters_nps, 1),np.percentile(iters_nps, 5), np.percentile(iters_nps, 10),np.percentile(iters_nps, 25),np.percentile(iters_nps, 50), np.percentile(iters_nps, 75),np.percentile(iters_nps, 90),np.percentile(iters_nps, 95), np.percentile(iters_nps, 99)] return summary_res,summary_nps
#plot distributionfig = plt.figure(num=None, figsize=(12, 12), dpi=160, facecolor='w', edgecolor='k') fig.tight_layout()

num = 1for x in dist_list: ax = fig.add_subplot(3,3,num) ax.set_facecolor((239/255,238/255,236/255)) ax.bar(pop,x, align='center') ax.set_title(titles[num-1]) ax.set_xticklabels(pop) ax.set_xticks(pop) ax.set_ylim([0,.6]) ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y))) num = num+1 fig.patch.set_facecolor((239/255,238/255,236/255))ax.set_facecolor((239/255,238/255,236/255))os.chdir('/Users/erikolson/Desktop/Writing Projects/Data Sci Cat Blog/Blog Graphs/') plt.savefig('nps_dist.png',bbox_inches = 'tight',facecolor=(239/255,238/255,236/255))
#plot average and NPS as sample size increasesfig = plt.figure(num=None, figsize=(12, 12), dpi=160, facecolor='w', edgecolor='k') fig.tight_layout()
num = 1for x in dist_list: results_avg,nps_avg = sim_sample_time(sample = 250, population = pop, dist=x) ax1 = fig.add_subplot(3,3,num) ax2 = ax1.twinx() ax1.set_facecolor((239/255,238/255,236/255)) ax1.plot(results_avg,color='b') ax2.plot(nps_avg,color='r') ax1.set_title(titles[num-1]) ax1.set_ylim([avg_total_val[num-1]-2,avg_total_val[num-1]+2]) ax2.set_ylim([nps_total_val[num-1]-.4,nps_total_val[num-1]+.4]) ax2.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y))) num = num+1 fig.tight_layout() fig.patch.set_facecolor((239/255,238/255,236/255))ax.set_facecolor((239/255,238/255,236/255))os.chdir('/Users/erikolson/Desktop/Writing Projects/Data Sci Cat Blog/Blog Graphs/') plt.savefig('nps_overtime.png',bbox_inches = 'tight',facecolor=(239/255,238/255,236/255))
#plot Confidence intervalsfig = plt.figure(num=None, figsize=(12, 12), dpi=160, facecolor='w', edgecolor='k') fig.tight_layout()
num = 1for x in dist_list: data_res = [] data_nps = [] sample_list = [25,50,100,250,500,1000]#,5000,10000] pos_list = list(range(len(sample_list))) for a in sample_list: t1, t2 = sim_mult_sample(sample = a,dist = x) data_res.append(t1) data_nps.append(t2) dfr = pd.DataFrame(data_res, columns=cols) dfn = pd.DataFrame(data_nps, columns=cols) ax1 = fig.add_subplot(3,3,num) ax2 = ax1.twinx() ax1.set_facecolor((239/255,238/255,236/255)) ax1.plot(dfr['Average'], color = 'b') #ax1.plot(dfr['1%'], color = 'lightblue',linestyle = ':') ax1.plot(dfr['5%'], color = 'mediumblue',linestyle = '--') #ax1.plot(dfr['10%'], color = 'lightblue',linestyle = '-.') ax1.plot(dfr['25%'], color = 'lightblue',linestyle = ':') ax1.plot(dfr['50%'], color = 'lightblue',linestyle = '--') ax1.plot(dfr['75%'], color = 'lightblue',linestyle = ':') #ax1.plot(dfr['90%'], color = 'lightblue',linestyle = '-.') ax1.plot(dfr['95%'], color = 'mediumblue',linestyle = '--') #ax1.plot(dfr['99%'], color = 'lightblue',linestyle = ':') ax2.plot(dfn['Average'], color = 'r') #ax2.plot(dfn['1%'], color = 'lightcoral',linestyle = ':') ax2.plot(dfn['5%'], color = 'firebrick',linestyle = '--') #ax2.plot(dfn['10%'], color = 'lightcoral',linestyle = '-.') ax2.plot(dfn['25%'], color = 'lightcoral',linestyle = ':') ax2.plot(dfn['50%'], color = 'lightcoral',linestyle = '--') ax2.plot(dfn['75%'], color = 'lightcoral',linestyle = ':') #ax2.plot(dfn['90%'], color = 'lightcoral',linestyle = '-.') ax2.plot(dfn['95%'], color = 'firebrick',linestyle = '--') #ax2.plot(dfn['99%'], color = 'lightcoral',linestyle = ':') ax1.set_title(titles[num-1]) ax1.set_xticklabels(sample_list) ax1.set_xticks(pos_list) ax1.set_ylim([avg_total_val[num-1]-2,avg_total_val[num-1]+2]) ax2.set_ylim([nps_total_val[num-1]-.4,nps_total_val[num-1]+.4]) ax2.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y))) num = num+1
fig.tight_layout() fig.patch.set_facecolor((239/255,238/255,236/255))ax.set_facecolor((239/255,238/255,236/255))os.chdir('/Users/erikolson/Desktop/Writing Projects/Data Sci Cat Blog/Blog Graphs/') plt.savefig('nps_ranges.png',bbox_inches = 'tight',facecolor=(239/255,238/255,236/255))