Sensitivity vs. Specificity in Logistic Regression

Logistic Regression is a statistical analytical technique which has a wide application in business. It is one of the most commonly used techniques having wide applicability especially in building marketing strategies.

Some business examples include identifying the best set of customers for engaging in a promotional activity. For example a telecom company wants to promote a new marketing strategy and they have a limited budget to market it to lets say 10,000 customers. Now given the population, the company may be interested in those customers that are most likely to respond to their promotional offers. This is where results from a logistic regression model come handy. Not only does it help in studying the attributes that would drive the customers to respond to promotions but it can also help in understanding the most likely set of customers who may respond to the promotions.

In a binary set up, the dependent variable or the target variable in a logistic regression is the probability of the event that a customer is likely to respond or not likely to respond. We have to evaluate these probabilities on the real set of data. In short, what actually happened vs. what the model is giving us. Based on the fit of the model, we can then apply the results for predictions and identify the best set of customers.

There are number of methods of evaluating whether a logistic model is a good model. One such way is sensitivity and specificity. In theory this is how both these terms are defined –

Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as classification function:

Sensitivity (also called the true positive rate, or the recall in some fields) measures the proportion of actual positives which are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition), and is complementary to the false negative rate. Sensitivity= true positives/(true positive + false negative)

Specificity (also called the true negative rate) measures the proportion of negatives which are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition), and is complementary to the false positive rate. Specificity=true negatives/(true negative + false positives)

For any test, there is usually a trade-off between the measures.

Now how do we apply this to business terms? The assumption here is that you are running this test on a historical data. If a model is good, you want it to predict the probability that the customer is responding to a strategy when in reality the customer has responded. This is a sensitivity index. It is likely that a customer is responding and the model predicts that the customer has not responded and vice – versa. The concept is similar to that of Type 1, Type 2 error in statistics. Similarly, if the customer has not responded, you want to model also to give the real picture so that you can make actionable business decisions based on the data.

Now if your model is a good model, one way to check is to have a high sensitivity and specificity. There is a trade off in accuracy levels and it depends on the business situation.

But in general, higher the sensitivity, better it is amongst other measures in a logistic model.

Curious to learn more? Check out UNext’s Business Analytics Course in association with IIM Indore and open yourself to a whole new world. Data and Analytics = Career Success

Image courtesy Wikipedia

Related Articles

Please wait while your application is being created.
Request Callback