You may have noticed a trend in customer feedback surveys - they really want 5 stars. 4 star is garbage. Or on a 10 point scale, they really want that 9 or 10. I even saw a sign at CVS, tapped the the pharmacy counter, that had a smiley face next to the 10 (and maybe 9) and frowny faces on all other scores. A frowny face for a 8! Thats a sold B, above average.
So what is causing this trend? The short answer: Net Promoter Score. Some marketing/consulting/probably full of it company decided the average wasn't good enough for them. Neither was the median or the mode. They wanted to make an entirely new metric. In this metric, on a ten point scale, your 10s and 9s are +1. Your 8s and 7s are neutral, worth zero points. 6 and below are -1. Then, you average your new series of -1s,0s, and 1s to get a NPS.
The concept isn't entirely bad. The question is often phrased as "On a scale of 0-10, how likely would you be to recommend us to your friends/family?" The idea is your promoters (10s and 9s) will go sing your praises and recruit their friends as customers. 8s and 7s do nothing. 6s and below are your detractors, and will probably tell people not to shop with you.
I will try to be nice, I am sure there are some practical applications for this Frankenstein calculation. But let me detail why I hate it logically, mathematically, and morally
Why they didn't change the responses to "likely, not likely, meh, would not, and definitely would not" is beyond me. I mean, clearly we should impose an artificial value system to numbers instead of asking directly what we want.
The flaw here is that most customers think an 8/10 is pretty good. Maybe I'm harsh, but even a 6 isn't horrible. I would rank a decent amount of fast food as 6/10 and I continue to eat it. I have given 4 star reviews to good items I would recommend to friends. 5 stars is perfect and hard to accomplish.
So if the customer is giving you feedback that they interpret as meaning one thing, and you interpret to mean a different thing, how is this effective?
This is the most frustrating because it is clearly what the users of this metric do not understand. There are three main areas to address:
So that was long winded and didn't have pictures. Let's change that. Here are 9 sample probability distributions. For each distribution, one has an average of 5, another 7 and the last 9. I picked:
Since we made the distributions, we know the true mean and NPS*. Let's run samples and see how long it takes us to get to these averages. On the graphs, blue is average and red is NPS. Averages go to the left axis and NPS goes to the right axis.
*Averages are 5, 5, 5, 7, 7, 7, 9, 9, 9. NPSs are -0.45, -0.82, -0.60, -0.04, -0.20, -0.05, 0.70, 0.72, 0.63
Interesting. Notice how the mean approaches the true value faster and with a bit less volatility. Also, note the scale. Averages go from the mean ±2 (representing 40% of the scale) and NPSs go from NPS ±40% (again, 40% of the scale).
Let's consider the middle case of the normal average 7. After sampling about 50 people the mean starts to level out close to the true value. After this point, your mean is always between ~6.7 and 7. However, if you stoped sampling at 50 your NPS is ~-25%. At 100 your NPS is ~-32%. The actual value is supposed to be -20%. We can tell that both of these samples didn't get us a very good representation of the NPS, and doubling our sample actually made our answer less accurate.
But this is just one simulation. Let's run the simulation multiple times. Let's run the simulation 100 times for each sample size. On the graph below, the x axis represents the number of samples carried out. The lines represent the 90% confidence interval, the 50% inter quartile, and the average (in other words, the lines in descending order are 95th percentile, 75th percentile, average as a solid line, median as a dotted line, 25th percentile, and 5th percentile). Blue is average and red is NPS.
So, looking at the middle graph, after running 100 simulations of sampling 25 people, 90% of simulations had averages between 6.5 and 7.5 (the 5th and 95th percentile). However, NPS fell between -10% and -50% 90% of the time. Comparing the lines, you can see the average approaches the true value a lot faster and more consistently.
Why? Because it is not a stable metric. This is why we don't use mode when analyzing survey results over time. Consider this sample population that is all 5s or 9s. The average is 7. You are going to sample an odd number of people. The mode will just depend on which you happen to get more of, either a 5 or 9. It has a 50/50 chance of being a 5 or 9. The change in the metric reflects the change in sampling, not a change in the population. The same is true of NPS. The true NPS is 0. However, sampling an odd number of people you will never get a 0 NPS. Wether it's above or below 0 only depends on the sample, and how much above or below it is only depends on the number of people you sampled.
So as I detailed, the math of this metric is questionable at times. You can get unexpected and random results. Your score may be up or down for no reason - and that's why management likes it. It moves. It does something. The average can only move so much between months (assuming nothing changes). But not NPS! A change in sample can lead to big swings. Consider how management would respond to the three scenarios:
Don't use it. It's stupid. I know people say "it's supposed to be responsive." I concede that it is responsive, but not always to true signal. They might as well say "its supposed to be volatile and a bit random."
If you company forces you to adopt it, let them know how unstable the metric is. And congratulations, you have realized your company is susceptible to whatever management trends are going around. Your company probably went through Lean and Six Sigma at some point and those were slowly forgotten. Your management team wants to do better but is unclear how they actually add value, so they will keep adopting things and waving their hands around. Ask yourself, is it working or are they just reshuffling the deck?
Also, when filling our surveys, don't play there game. If you really like the company, know that a 8 could break their heart. Or if you had a mediocre customer service call, a 6 could count against them despite being above average.
But for the love of Euclid, let's not make this black mirror where it is a 5 star or bust system.
#imports and initial condition
import pandas as pd
import numpy as np
from random import choices
from math import factorial
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from matplotlib.ticker import FuncFormatter
#Survey scores to return
pop = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
nps_pop = [-1, -1, -1, -1, -1, -1, -1, 0, 0, 1, 1]
#titles = ['Uniform - Average 5','Uniform - Average 7','Uniform - Average 9',
# 'Normal - Average 5','Normal - Average 7','Normal - Average 9',
# 'Bivariate - Average 5','Bivariate - Average 7','Bivariate - Average 9']
titles = ['Uniform - Average 5','Normal - Average 5','Bivariate - Average 5',
'Uniform - Average 7','Normal - Average 7','Bivariate - Average 7',
'Uniform - Average 9','Normal - Average 9','Bivariate - Average 9']
cols = ['Sample','Average','1%','5%','10%','25%','50%','75%','90%','95%','99%']
#univariate distribnution
wgtU5 = [1/11, 1/11, 1/11, 1/11, 1/11, 1/11,1/11,1/11,1/11,1/11,1/11]
wgtU7 = [x * 2/110 for x in pop]
wgtU9 = [0, 0, 0, 0, 0, 0, 0, .1, .2, .3, .4]
#Binomial (Normal) distributions
n = len(pop)-1
p = .3
wgtN3 = [factorial(n)/(factorial(n-x)*factorial(x))*(p**x)*((1-p)**(n-x)) for x in pop]
p = .5
wgtN5 = [factorial(n)/(factorial(n-x)*factorial(x))*(p**x)*((1-p)**(n-x)) for x in pop]
p = .7
wgtN7 = [factorial(n)/(factorial(n-x)*factorial(x))*(p**x)*((1-p)**(n-x)) for x in pop]
p = .8
wgtN8 = [factorial(n)/(factorial(n-x)*factorial(x))*(p**x)*((1-p)**(n-x)) for x in pop]
p = .9
wgtN9 = [factorial(n)/(factorial(n-x)*factorial(x))*(p**x)*((1-p)**(n-x)) for x in pop]
p = 1
wgtN10 = [factorial(n)/(factorial(n-x)*factorial(x))*(p**x)*((1-p)**(n-x)) for x in pop]
#BiVariate distribution
wgtB5 = [(x + y)/2 for x, y in zip(wgtN3, wgtN7)]
wgtB7 = [(x + y)/2 for x, y in zip(wgtN5, wgtN9)]
wgtB9 = [(x + y)/2 for x, y in zip(wgtN8, wgtN10)]
#dist_list = [wgtU5,wgtU7,wgtU9,wgtN5,wgtN7,wgtN9,wgtB5,wgtB7,wgtB9]
dist_list = [wgtU5,wgtN5,wgtB5,wgtU7,wgtN7,wgtB7,wgtU9,wgtN9,wgtB9]
nps_total_val = []
avg_total_val = [5,5,5,7,7,7,9,9,9]
for x in dist_list:
temp = [(x*y) for x, y in zip(nps_pop, x)]
nps_total_val.append(np.sum(temp))
#simulate sample and keep track of averages as you draw one more
def sim_sample_time(sample = 50, population = pop, dist=wgtN5):
results = []
results_avg = []
nps = []
nps_avg = []
for x in range(0,sample):
y = choices(pop, dist)
y = y[0]
if y >=9:
z = 1
elif y<=6:
z = -1
else:
z = 0
results.append(y)
results_avg.append(np.average(results))
nps.append(z)
nps_avg.append(np.average(nps))
return results_avg,nps_avg
#function for summing hands (consdiering Aces)
def sim_mult_sample(iterations = 100, sample = 50, population = pop, dist=wgtN5):
iters_res = []
iters_nps = []
for x in range(0,iterations):
results_avg,nps_avg = sim_sample_time(sample = sample, population = population, dist=dist)
iters_res.append(results_avg[-1])
iters_nps.append(nps_avg[-1])
summary_res = [sample,np.average(iters_res), np.percentile(iters_res, 1),np.percentile(iters_res, 5),
np.percentile(iters_res, 10),np.percentile(iters_res, 25),np.percentile(iters_res, 50),
np.percentile(iters_res, 75),np.percentile(iters_res, 90),np.percentile(iters_res, 95),
np.percentile(iters_res, 99)]
summary_nps = [sample,np.average(iters_nps), np.percentile(iters_nps, 1),np.percentile(iters_nps, 5),
np.percentile(iters_nps, 10),np.percentile(iters_nps, 25),np.percentile(iters_nps, 50),
np.percentile(iters_nps, 75),np.percentile(iters_nps, 90),np.percentile(iters_nps, 95),
np.percentile(iters_nps, 99)]
return summary_res,summary_nps
#plot distribution
fig = plt.figure(num=None, figsize=(12, 12), dpi=160, facecolor='w', edgecolor='k')
fig.tight_layout()
num = 1
for x in dist_list:
ax = fig.add_subplot(3,3,num)
ax.set_facecolor((239/255,238/255,236/255))
ax.bar(pop,x, align='center')
ax.set_title(titles[num-1])
ax.set_xticklabels(pop)
ax.set_xticks(pop)
ax.set_ylim([0,.6])
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
num = num+1
fig.patch.set_facecolor((239/255,238/255,236/255))
ax.set_facecolor((239/255,238/255,236/255))
os.chdir('/Users/erikolson/Desktop/Writing Projects/Data Sci Cat Blog/Blog Graphs/')
plt.savefig('nps_dist.png',bbox_inches = 'tight',facecolor=(239/255,238/255,236/255))
#plot average and NPS as sample size increases
fig = plt.figure(num=None, figsize=(12, 12), dpi=160, facecolor='w', edgecolor='k')
fig.tight_layout()
num = 1
for x in dist_list:
results_avg,nps_avg = sim_sample_time(sample = 250, population = pop, dist=x)
ax1 = fig.add_subplot(3,3,num)
ax2 = ax1.twinx()
ax1.set_facecolor((239/255,238/255,236/255))
ax1.plot(results_avg,color='b')
ax2.plot(nps_avg,color='r')
ax1.set_title(titles[num-1])
ax1.set_ylim([avg_total_val[num-1]-2,avg_total_val[num-1]+2])
ax2.set_ylim([nps_total_val[num-1]-.4,nps_total_val[num-1]+.4])
ax2.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
num = num+1
fig.tight_layout()
fig.patch.set_facecolor((239/255,238/255,236/255))
ax.set_facecolor((239/255,238/255,236/255))
os.chdir('/Users/erikolson/Desktop/Writing Projects/Data Sci Cat Blog/Blog Graphs/')
plt.savefig('nps_overtime.png',bbox_inches = 'tight',facecolor=(239/255,238/255,236/255))
#plot Confidence intervals
fig = plt.figure(num=None, figsize=(12, 12), dpi=160, facecolor='w', edgecolor='k')
fig.tight_layout()
num = 1
for x in dist_list:
data_res = []
data_nps = []
sample_list = [25,50,100,250,500,1000]#,5000,10000]
pos_list = list(range(len(sample_list)))
for a in sample_list:
t1, t2 = sim_mult_sample(sample = a,dist = x)
data_res.append(t1)
data_nps.append(t2)
dfr = pd.DataFrame(data_res, columns=cols)
dfn = pd.DataFrame(data_nps, columns=cols)
ax1 = fig.add_subplot(3,3,num)
ax2 = ax1.twinx()
ax1.set_facecolor((239/255,238/255,236/255))
ax1.plot(dfr['Average'], color = 'b')
#ax1.plot(dfr['1%'], color = 'lightblue',linestyle = ':')
ax1.plot(dfr['5%'], color = 'mediumblue',linestyle = '--')
#ax1.plot(dfr['10%'], color = 'lightblue',linestyle = '-.')
ax1.plot(dfr['25%'], color = 'lightblue',linestyle = ':')
ax1.plot(dfr['50%'], color = 'lightblue',linestyle = '--')
ax1.plot(dfr['75%'], color = 'lightblue',linestyle = ':')
#ax1.plot(dfr['90%'], color = 'lightblue',linestyle = '-.')
ax1.plot(dfr['95%'], color = 'mediumblue',linestyle = '--')
#ax1.plot(dfr['99%'], color = 'lightblue',linestyle = ':')
ax2.plot(dfn['Average'], color = 'r')
#ax2.plot(dfn['1%'], color = 'lightcoral',linestyle = ':')
ax2.plot(dfn['5%'], color = 'firebrick',linestyle = '--')
#ax2.plot(dfn['10%'], color = 'lightcoral',linestyle = '-.')
ax2.plot(dfn['25%'], color = 'lightcoral',linestyle = ':')
ax2.plot(dfn['50%'], color = 'lightcoral',linestyle = '--')
ax2.plot(dfn['75%'], color = 'lightcoral',linestyle = ':')
#ax2.plot(dfn['90%'], color = 'lightcoral',linestyle = '-.')
ax2.plot(dfn['95%'], color = 'firebrick',linestyle = '--')
#ax2.plot(dfn['99%'], color = 'lightcoral',linestyle = ':')
ax1.set_title(titles[num-1])
ax1.set_xticklabels(sample_list)
ax1.set_xticks(pos_list)
ax1.set_ylim([avg_total_val[num-1]-2,avg_total_val[num-1]+2])
ax2.set_ylim([nps_total_val[num-1]-.4,nps_total_val[num-1]+.4])
ax2.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
num = num+1
fig.tight_layout()
fig.patch.set_facecolor((239/255,238/255,236/255))
ax.set_facecolor((239/255,238/255,236/255))
os.chdir('/Users/erikolson/Desktop/Writing Projects/Data Sci Cat Blog/Blog Graphs/')
plt.savefig('nps_ranges.png',bbox_inches = 'tight',facecolor=(239/255,238/255,236/255))