Home Forums Main Forums Python Forum Machine Learning: how to calculate ROC and AUC in Python?

• # Machine Learning: how to calculate ROC and AUC in Python?

1 Member · 1 Post
• ### Datura

Member
November 19, 2020 at 9:51 am

In machine learning, confusion matrix, TP/ FN/TN/FP, ROC curve and AUC etc are the important KPIs to evaluate the separation power and predictive performance of a statistical model. In Python, we can use some modeling packages to accomplish them.

I recommend this reference, it is simple, straightforward and easy to follow. A very good article for beginners:

https://stackabuse.com/understanding-roc-curves-with-python

I only copy the Python code to here. You can compute them easily by using the syntax.</div><div>

`Step 1: Import librariesimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns# roc curve and auc scorefrom sklearn.datasets import make_classificationfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import roc_curvefrom sklearn.metrics import roc_auc_scoreStep 2: Defining a python function to plot the ROC curves.def plot_roc_curve(fpr, tpr):    plt.plot(fpr, tpr, color='orange', label='ROC')    plt.plot([0, 1], [0, 1], color='darkblue', linestyle='--')    plt.xlabel('False Positive Rate')    plt.ylabel('True Positive Rate')    plt.title('Receiver Operating Characteristic (ROC) Curve')    plt.legend()    plt.show()Step 3: Generate sample data.data_X, class_label = make_classification(n_samples=1000, n_classes=2, weights=[1,1], random_state=1)Step 4: Split the data into train and test sub-datasets.trainX, testX, trainy, testy = train_test_split(data_X, class_label, test_size=0.3, random_state=1)Step 5: Fit a model on the train data.model = RandomForestClassifier()model.fit(trainX, trainy)`
`Step 6: Predict probabilities for the test data.probs = model.predict_proba(testX)Step 7: Keep Probabilities of the positive class only.probs = probs[:, 1]Step 8: Compute the AUC Score.auc = roc_auc_score(testy, probs)print('AUC: %.2f' % auc)Output: AUC: 0.95Step 9: Get the ROC Curve.fpr, tpr, thresholds = roc_curve(testy, probs)Step 10: Plot ROC Curve using our defined functionplot_roc_curve(fpr, tpr)`

Summary:

AUC-ROC curve is one of the most commonly used metrics to evaluate the performance of machine learning algorithms particularly in the cases where we have imbalanced datasets. We can compute them and plot the ROC curve easily by using the above code.

Log in to reply.

Original Post
0 of 0 posts June 2018
Now