Machine Learning: how to calculate ROC and AUC in Python?

Machine Learning: how to calculate ROC and AUC in Python?

Datura updated 4 years, 7 months ago 1 Member · 1 Post
Python Forum

Datura

Member

November 19, 2020 at 9:51 am

In machine learning, confusion matrix, TP/ FN/TN/FP, ROC curve and AUC etc are the important KPIs to evaluate the separation power and predictive performance of a statistical model. In Python, we can use some modeling packages to accomplish them.

I recommend this reference, it is simple, straightforward and easy to follow. A very good article for beginners:

https://stackabuse.com/understanding-roc-curves-with-python

I only copy the Python code to here. You can compute them easily by using the syntax.</div><div>

Step 1: Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# roc curve and auc score
from sklearn.datasets import make_classification
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score

Step 2: Defining a python function to plot the ROC curves.
def plot_roc_curve(fpr, tpr):
    plt.plot(fpr, tpr, color='orange', label='ROC')
    plt.plot([0, 1], [0, 1], color='darkblue', linestyle='--')
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver Operating Characteristic (ROC) Curve')
    plt.legend()
    plt.show()

Step 3: Generate sample data.
data_X, class_label = make_classification(n_samples=1000, n_classes=2, weights=[1,1], random_state=1)

Step 4: Split the data into train and test sub-datasets.
trainX, testX, trainy, testy = train_test_split(data_X, class_label, test_size=0.3, random_state=1)

Step 5: Fit a model on the train data.
model = RandomForestClassifier()
model.fit(trainX, trainy)

Step 6: Predict probabilities for the test data.
probs = model.predict_proba(testX)

Step 7: Keep Probabilities of the positive class only.
probs = probs[:, 1]

Step 8: Compute the AUC Score.
auc = roc_auc_score(testy, probs)
print('AUC: %.2f' % auc)
Output: AUC: 0.95

Step 9: Get the ROC Curve.
fpr, tpr, thresholds = roc_curve(testy, probs)

Step 10: Plot ROC Curve using our defined function
plot_roc_curve(fpr, tpr)

Summary:

AUC-ROC curve is one of the most commonly used metrics to evaluate the performance of machine learning algorithms particularly in the cases where we have imbalanced datasets. We can compute them and plot the ROC curve easily by using the above code.

This discussion was modified 4 years, 7 months ago by Datura.
This discussion was modified 4 years, 7 months ago by Datura.
This discussion was modified 4 years, 7 months ago by Datura.