{"id":15655,"date":"2021-11-11T14:00:00","date_gmt":"2021-11-11T22:00:00","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/machine-learning-in-loan-risk-analysis-3\/"},"modified":"2024-01-08T10:42:40","modified_gmt":"2024-01-08T18:42:40","slug":"machine-learning-in-loan-risk-analysis","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/machine-learning-in-loan-risk-analysis\/","title":{"rendered":"Machine Learning in Loan Risk Analysis"},"content":{"rendered":"<p><span style=\"background-color: transparent;\" data-contrast=\"auto\">While Finance may be a complex and multifaceted industry, the core goal of any financial institute is very straightforward: to detect and mitigate risks while maximizing profit. This objective is easy to summarize, however it is certainly no small feat to attain!<br \/>\n<\/span><\/p>\n<p><span style=\"background-color: transparent;\" data-contrast=\"auto\"><img decoding=\"async\" style=\"width: 1000px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/10\/iStock-1189050206-1.jpg\" alt=\"iStock-1189050206-1\" width=\"1000\" \/><\/span><\/p>\n<p><span style=\"background-color: transparent;\" data-contrast=\"auto\"><br \/>\nT<\/span><span style=\"background-color: transparent;\" data-contrast=\"auto\">he\u00a0ever-expanding\u00a0role of technology in\u00a0the\u00a0Finance sector poses several risks to consumers, which directly affect an organization\u2019s reputation. Furthermore, as the reach of innovative technology expands, so too does market size. According to IBIS, market size (as measured by revenue) is slated to increase by 4.4% in 2021 as a result of more per capita disposable income. Unfortunately, these growth increases are accompanied by risk increases; more clients with more perceived expendable income can result in higher loan risks. Essentially, financial institutes are burdened with the dichotomy of having to mitigate risks to customers, while also mitigating customer risks to themselves. <\/span><span style=\"background-color: transparent;\" data-ccp-props=\"{\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">To combat these challenges,\u00a0many\u00a0financial\u00a0agencies\u00a0are\u00a0turning to <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/3cloudsolutions.com\/resources\/machine-learning-models-what-you-need-to-know\/\" rel=\"noopener\">Machine Learning<\/a><\/span>. Machine Learning is a branch of Artificial Intelligence (AI)\u00a0in which\u00a0computer algorithms improve and \u201clearn\u201d automatically through training data. These models can then be applied to new data to make predictions or decisions relevant for the business. <\/span><\/p>\n<p><span data-contrast=\"auto\">In <\/span><a href=\"\/blog\/detecting-financial-fraud-with-machine-learning\" rel=\"noopener\">our last Financial risk mitigation post<\/a><i><span data-contrast=\"auto\">, <\/span><\/i><span data-contrast=\"auto\">we<\/span><i><span data-contrast=\"auto\">\u00a0<\/span><\/i><span data-contrast=\"auto\">demonstrated how Machine Learning can tackle alleviating risks to customers by using classification methods to detect fraudulent charges. <\/span><span data-contrast=\"auto\">In this blog, we&#8217;ll address the benefit of financial companies protecting themselves from potentially risky customers by applying Machine Learning methods to loan applications. To demonstrate the benefits of Machine Learning in loan risk analysis, we&#8217;ll walk through the process of building a simple Gradient Boosting Tree (GBT) model within <a href=\"\/blog\/microsoft-and-databricks-top-5-modern-data-platform-features-part-1\" rel=\"noopener\">Azure\u00a0Databricks<\/a>.\u00a0<\/span><\/p>\n<h3>Preparing the Data<\/h3>\n<p><span lang=\"EN-US\" data-contrast=\"auto\">This notebook uses\u00a0<a href=\"https:\/\/www.kaggle.com\/wordsforthewise\/lending-club\" rel=\"noopener\">a\u00a0public dataset\u00a0from Lending Club<\/a> comprised of\u00a02,260,701\u00a0funded loans from 2012 through 2017. Each of\u00a0these loans\u00a0include information provided by applicants, as well as the current loan status and latest payment information\u00a0as shown in the Databricks screenshot below:\u00a0<\/span><\/p>\n<p><img decoding=\"async\" style=\"width: 650px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/10\/image-png-Nov-10-2021-08-14-29-38-PM.png\" width=\"650\" \/><\/p>\n<p><span lang=\"EN-US\" data-contrast=\"auto\"><br \/>\nOnce we\u00a0have\u00a0the\u00a0data\u00a0loaded\u00a0into the notebook, we can perform data cleansing (i.e. handling null values, converting columns to appropriate formats, etc.) and feature engineering (the process of creating new,\u00a0potentially\u00a0insightful variables based off existing ones). Since the\u00a0objective is to predict which loans will most likely\u00a0default,\u00a0we need to create\u00a0a\u00a0target prediction label.<\/span><\/p>\n<p>This column, named\u00a0<em><span lang=\"EN-US\" style=\"font-weight: bold;\" data-contrast=\"auto\">bad_loan<\/span><\/em><span lang=\"EN-US\" data-contrast=\"auto\">, is produced\u00a0by\u00a0encoding paid-off loans as 0 and defaulted loans as 1.\u00a0Creating a\u00a0<\/span><em><span style=\"font-weight: bold;\"><span lang=\"EN-US\" data-contrast=\"auto\">net<\/span><span lang=\"EN-US\" data-contrast=\"auto\">\u00a0<\/span><\/span><\/em><span lang=\"EN-US\" data-contrast=\"auto\">amount column\u00a0(<\/span><em><span lang=\"EN-US\" style=\"font-weight: bold;\" data-contrast=\"auto\">total_payments<\/span><\/em><span lang=\"EN-US\" data-contrast=\"auto\">\u00a0\u2013\u00a0<\/span><em><span lang=\"EN-US\" style=\"font-weight: bold;\" data-contrast=\"auto\">loan_amount<\/span><\/em><span lang=\"EN-US\" data-contrast=\"auto\">)\u00a0will also be beneficial for evaluating the\u00a0solution\u2019s\u00a0overall business value. In this model, we are analyzing only loans classified as closed.<\/span><span data-ccp-props=\"{\">\u00a0<\/span><\/p>\n<h3>Exploring the Data<\/h3>\n<p><span style=\"color: #58595b; font-size: 17px;\"><span lang=\"EN-US\" data-contrast=\"auto\">With the preprocessing phase complete, we can move into data exploration to acquire a better understanding of the data\u2019s distribution and relationships. As we might expect, we can see in the map below that there are higher loan counts among the more populous states.<\/span>\u00a0<\/span><\/p>\n<p><img decoding=\"async\" style=\"width: 650px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/10\/image-png-Nov-10-2021-08-27-48-69-PM.png\" width=\"650\" \/><\/p>\n<p><span style=\"color: #58595b; font-size: 17px;\"><span lang=\"EN-US\" data-contrast=\"auto\"><br \/>\nWe can\u00a0also see\u00a0that bad loans\u00a0account for 20% of our cases; however,\u00a0their loan amount on average is nearly identical to paid-off loans.\u00a0<\/span><span data-ccp-props=\"{\">\u00a0<\/span><\/span><\/p>\n<p><img decoding=\"async\" style=\"width: 650px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/10\/image-png-Nov-10-2021-08-28-21-48-PM.png\" width=\"650\" \/><\/p>\n<p><span style=\"color: #58595b; font-size: 17px;\"><span data-ccp-props=\"{\"><span lang=\"EN-US\" data-contrast=\"auto\"><br \/>\nExamining distributions of individual variables, as well as possible correlations is also beneficial for the modeling process. Despite GBT\u2019s ability to handle multi-collinearity in predictions, removing highly linearly correlated variables could improve statistical inferences including feature importance. Below is a scatter plot of several of the numerical dependent variables. It appears that annual income is skewed more to the right than the other variables, and that significant relationships do not exist. <\/span>\u00a0<\/span><\/span><\/p>\n<p><img decoding=\"async\" style=\"width: 500px; margin-left: auto; margin-right: auto; display: block;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/10\/image-png-Nov-10-2021-08-29-12-04-PM.png\" width=\"500\" \/><\/p>\n<h3>Building, Training, and Testing the Model<\/h3>\n<p><span lang=\"EN-US\" data-contrast=\"auto\">With this knowledge of our dataset, we are now ready to build, train, and test a model. We start by labeling our independent and dependent variables, and splitting our data into a training and validation set. This technique allows us to see how well our model performs with data it has yet to encounter.\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{\"> <img decoding=\"async\" style=\"width: 500px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/10\/image-png-Nov-10-2021-08-33-34-81-PM.png\" width=\"500\" \/><br \/>\n<img decoding=\"async\" style=\"width: 500px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/10\/image-png-Nov-10-2021-08-33-48-99-PM.png\" width=\"500\" \/><br \/>\n<\/span><\/p>\n<p><span data-ccp-props=\"{\"><span lang=\"EN-US\" data-contrast=\"auto\"><br \/>\nNext, we construct an input vector based off our labeled variables and a GBT classifier with hardcoded parameters based on data size. Using a pipeline, we then execute these stages on the training data set. <\/span>\u00a0<\/span><\/p>\n<p><img decoding=\"async\" style=\"width: 600px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/10\/image-png-Nov-10-2021-08-34-09-08-PM.png\" width=\"600\" \/><\/p>\n<p><span data-ccp-props=\"{\"><span lang=\"EN-US\" data-contrast=\"auto\"><br \/>\nWe can evaluate model performance through the Area Under the Curve (auc) score, where 100% indicates a perfect model. For both the training and validation sets, the score we\u2019ve achieved is approximately 70%. This is a decently performing model, considering that for simplicity\u2019s sake, only one parameter set was tested. In a standard practice, we would use parameter grid search plus cross-validation techniques to train the model for the most optimal parameters.<\/span>\u00a0<\/span><\/p>\n<p><img decoding=\"async\" style=\"width: 350px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/10\/image-png-Nov-10-2021-08-35-46-51-PM.png\" width=\"350\" \/><\/p>\n<p><span data-ccp-props=\"{\"><span lang=\"EN-US\" data-contrast=\"auto\"><br \/>\nIn addition to obtaining accurate predictions, knowing the\u00a0model\u2019s\u00a0highest contributing features\u00a0can aid in driving business decisions. In our model,\u00a0the term of the loan and\u00a0the\u00a0state are the most significant features.<\/span>\u00a0<\/span><\/p>\n<p><img decoding=\"async\" style=\"width: 650px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/10\/image-png-Nov-10-2021-08-37-07-05-PM.png\" width=\"650\" \/><\/p>\n<p><span data-ccp-props=\"{\"><span lang=\"EN-US\" data-contrast=\"auto\"><br \/>\nFinally, we can evaluate\u00a0if the\u00a0main objective of decreasing a company\u2019s loan risk\u00a0was achieved. By combining the confusion matrix with the created\u00a0<\/span><span lang=\"EN-US\" data-contrast=\"auto\">net\u00a0<\/span><span lang=\"EN-US\" data-contrast=\"auto\">column, we can calculate a monetary value of the model. In this case, the model saved approximately $22 million by correctly labeling a bad loan. Even correcting for approximately $5 million in false positives, the overall net value of the model is $17 million.<\/span>\u00a0<\/span><\/p>\n<p><img decoding=\"async\" style=\"width: 500px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/10\/image-png-Nov-10-2021-08-41-43-52-PM.png\" width=\"500\" \/><\/p>\n<p><span data-ccp-props=\"{\"><span lang=\"EN-US\" data-contrast=\"auto\"><br \/>\nOverall, Machine Learning is an excellent resource for navigating the tough financial sector. It can not only assist companies in better serving their clients, but also prevents companies from investing in risky scenarios.<\/span>\u00a0<\/span><\/p>\n<h4>More Information<\/h4>\n<p>3Cloud offers a variety of <a href=\"\/resources\/\" target=\"_blank\" rel=\"noopener\">resources<\/a> to help you learn how you can leverage Machine Learning in your sector. Please <a href=\"\/get-started\/\" target=\"_blank\" rel=\"noopener\">contact us<\/a> directly to see how we can help you explore your about modern data analytics options and accelerate your business value.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>While Finance may be a complex and multifaceted industry, the core goal of any financial institute is very straightforward: to detect and mitigate risks while maximizing profit. This objective is easy to summarize, however it is certainly no small feat to attain!<\/p>\n","protected":false},"author":21,"featured_media":12358,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[395,260],"tags":[308,309,304],"class_list":["post-15655","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science-ai","category-data-ai","tag-financial-services","tag-modern-ai-ml","tag-modern-data-platform","topics-blog","industries-financial-services"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15655","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=15655"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15655\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/12358"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=15655"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=15655"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=15655"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}