To continue on this first path, it’s logical to proceed with hyperparameter tuning on the three algorithms previously mentioned in part 1.
Here the Random Forest Classifier (R.F.C) pulls ahead with 77% accuracy while the other two are still around 75%. Where there were three on this road, there is now one. The next step was to inject some more nuanced classification metrics with the R.F.C classifier.
My model has 94% accuracy on the training set, but the bulk of its mistakes are false negatives, i.e predicting people with bad credit to have good credit. Consequently my recall for the positive class is low at 81%. As an overall metric the A.U.C score is 90% which is good. I’ll use this as my guide as I move forward since it best embodies my overall goal. The A.U.C score is born of a comparison between the true positive and false positive rates. Reducing the number of bad credit risks projected as having good credit is intrinsic into the former.
To test the best way to up the true positive rate of my model a look at thresholds was in order. Most classification algorithms produce probabilities, and use 05 as the threshold for making a classification. My goal was to look at alternate thresholds and see how these affected the model’s results. There is an increased chance of overfitting the training data with this procedure, but it a chance to take.
Over a hundred different thresholds from 0.1 to 0.9 , my highest A.U.C score of almost 96% came with a threshold of 0.3667. To elaborate, this means that if the model returns a probability greater than or equal to 36.667% then it predicts that the person will have bad credit.
Using this threshold gives me a better looking confusion matrix. I’m making more false positives (predicting those who will have good credit to have bad credit), but my false negatives are down
to 5/226. All this improvement means little if the model doesn’t perform well with unseen data. Unfortunately, the model with the custom threshold only had 72% accuracy on the testing data while using the default threshold produced an accuracy of close to 78%. However, the custom threshold performed better in reducing the number of false negatives – 28/74 compared to 44/74.
Obviously there is much more room for improvement here, but another piece of information to extract from the model is feature importance. In general the random forest model is low on the explainability scale, but leveraging the importance of features to its making decisions is a way to dampen its intrinsic black box nature. The top five important features to the model’s decisions were:
- Amount of credit requested
- The credit duration (in months)
- Age
- Having no money in one’s checking account
- Having no checking account
Even with this glimpse of insight into the model’s inner workings, in such a setting using a more interpretable model would probably be preferable for its high descriptive power. It all comes down to the particulars of the environment the problem is being solved in, outside of ensuring that those with a high risk of having bad credit aren’t accepted.
©ODSC 2016