Home Forums Main Forums Python Forum Machine Learning: how to calculate ROC and AUC in Python?

  • Machine Learning: how to calculate ROC and AUC in Python?

     Datura updated 3 years, 10 months ago 1 Member · 1 Post
  • Datura

    Member
    November 19, 2020 at 9:51 am
    Up
    0
    Down

    In machine learning, confusion matrix, TP/ FN/TN/FP, ROC curve and AUC etc are the important KPIs to evaluate the separation power and predictive performance of a statistical model. In Python, we can use some modeling packages to accomplish them.

    I recommend this reference, it is simple, straightforward and easy to follow. A very good article for beginners:

    https://stackabuse.com/understanding-roc-curves-with-python

    I only copy the Python code to here. You can compute them easily by using the syntax.</div><div>

    Step 1: Import libraries
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    # roc curve and auc score
    from sklearn.datasets import make_classification
    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import roc_curve
    from sklearn.metrics import roc_auc_score

    Step 2: Defining a python function to plot the ROC curves.
    def plot_roc_curve(fpr, tpr):
    plt.plot(fpr, tpr, color='orange', label='ROC')
    plt.plot([0, 1], [0, 1], color='darkblue', linestyle='--')
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver Operating Characteristic (ROC) Curve')
    plt.legend()
    plt.show()

    Step 3: Generate sample data.
    data_X, class_label = make_classification(n_samples=1000, n_classes=2, weights=[1,1], random_state=1)

    Step 4: Split the data into train and test sub-datasets.
    trainX, testX, trainy, testy = train_test_split(data_X, class_label, test_size=0.3, random_state=1)

    Step 5: Fit a model on the train data.
    model = RandomForestClassifier()
    model.fit(trainX, trainy)
    Step 6: Predict probabilities for the test data.
    probs = model.predict_proba(testX)

    Step 7: Keep Probabilities of the positive class only.
    probs = probs[:, 1]

    Step 8: Compute the AUC Score.
    auc = roc_auc_score(testy, probs)
    print('AUC: %.2f' % auc)
    Output: AUC: 0.95

    Step 9: Get the ROC Curve.
    fpr, tpr, thresholds = roc_curve(testy, probs)

    Step 10: Plot ROC Curve using our defined function
    plot_roc_curve(fpr, tpr)

    Summary:

    AUC-ROC curve is one of the most commonly used metrics to evaluate the performance of machine learning algorithms particularly in the cases where we have imbalanced datasets. We can compute them and plot the ROC curve easily by using the above code.

    • This discussion was modified 3 years, 10 months ago by  Datura.
    • This discussion was modified 3 years, 10 months ago by  Datura.
    • This discussion was modified 3 years, 10 months ago by  Datura.

Log in to reply.

Original Post
0 of 0 posts June 2018
Now