Illustrating Classification
Today’s example will build on material in the “Content” tab.
Some (F23) Business
I won’t be here the Thursday after Thanksgiving
We will meet the Tuesday before Thanksgiving
Predicting Defaults
Today, we will continue to use the ISLR
data on defaults:
library(ISLR)
library(tibble)
Default = ISLR::Default
In our first breakout:
Clean the data so that the
default
column is a binary indicator for defaultBuild a logistic model to predict
default
using any combination of variables and interactions in the data. For now, just use your best judgement for choosing the variables and interactions – your lab for this week will have you formally tune a logistic model to optimize results.Use a Bayes Classifier cutoff of .50 to generate your classifier output. Do you need to alter the cutoff?
Back in class, let’s look at how we did. What variables were most useful in explaining default
?
In our second breakout, we will create a ROC curve manually. To do this
Take your model from the first breakout, and using a loop (or
sapply
), step through a large number of possible cutoffs for classification ranging from 0 to 1.For each cutoff, generate a confusion matrix with accuracy, sensitivity and specificity.
Combine the cutoff with the sensitivity and specificity results and make a ROC plot. Use
ggplot
for your plot and map the color aesthetic to the cutoff value.Calculate the AUC (the area under the curve). This is a little tricky but can be done with your data.