Data Science and Machine Learning in banking? Myth or reality?

Classification techniques are an essential part of Machine Learning and Data Mining applications.

Approximately 70% of problems in Data Science are classification problems.

And we bankers love classifications, don’t we? Will the borrower be able to repay the loan or not? How could we classify potential customers and retain the existing ones based on their satisfaction? Can we detect some frauds and irregularities in transactional patterns?

Data Science has some very fast, effective and fun techniques for answering those questions.

Logistic Regression

A very popular classification technique to predict binary outcomes (only two possible outcomes: 1/0, Yes/No, True/False) is called Logistic Regression.

You can also think of logistic regression as a special case of linear regression which gives us a real number as an outcome, but using log of odds as dependent variable to normalize everything to be between 0 and 1. Then we can interpret the number we get as a probability!

In other words, logistic regression predicts the probability of occurrence of an event by fitting data to a logit function using corresponding set of independent variables. It is the supervised learning task for modelling and predicting categorical variables.
Some familiar examples of logistics regression:
• Marketing: Prediction of a customer's propensity to purchase a product, Yes/No
• Engineering: Prediction of a success of a given process, Yes/No
• Weather forecast: It will rain, Yes/No

Whereas in banking, there are many questions that we can answer by logistic regression, such as:
• Will the client accept pre-approved loan offer?
• Will the client repay its loan prematurely?
• Will the client accept higher price?
• Will the client be able to repay its whole loan?
• Will the client miss paying its next installment?
• Is the observed transaction fraudulent?

Therefore, speaking in a language of a logistic regression, if we have three variables: X1 – profession, X2 – the subject of prediction and X3 – model used for prediction, for the following input (banker; credit scoring model; logistic regression) outcome would most certainly be: happy with the probability of 100%.