Logistic Regression in Python – An Introduction
Logistic Regression in Python – An Introduction
Logistic regression is a statistical method for classifying objects. This chapter introduces logistic regression with some examples.
Classification
To understand logistic regression, you should know what classification means. Let’s consider the following examples to better understand this:
- A doctor classifies a tumor as malignant or benign.
- A bank transaction can be fraudulent or genuine.
Humans have been performing such tasks for many years – albeit error-prone. The question is, can we train machines to perform these tasks for us with better accuracy?
An example of machine classification is your email client, which classifies each incoming email as “spam” or “not spam” with a high degree of accuracy. The statistical technique of logistic regression has been successfully applied to email clients. In this case, we’ve trained our machine to solve a classification problem.
Logistic regression is just one type of machine learning used to solve binary classification problems like this. There are other machine learning techniques that have been developed and used in practice to solve other types of problems.
If you’ve noticed, in all of the examples above, the predicted outcome has only two values—yes or no. We call these classes—so we say that our classifier separates objects into two classes. In technical terms, we can say that the outcome or target variable is dichotomous in nature.
There are also classification problems where the output may be divided into more than two categories. For example, imagine you’re given a basket full of fruit and asked to separate the different types of fruit. Now, the basket might contain oranges, apples, mangoes, and so on. Therefore, when you separate the fruit, you’re dividing them into more than two categories. This is a multivariate classification problem.