In a recent post, we discussed how artificial intelligence is assisting fintech in gaining a greater toehold in the financial services sector. In this post, we review a similar type of technology – predictive analytics – and its increasing use in debt collection.

As long ago as 2013, Forbes magazine reported organizations using data analytics were five to six percent more profitable than competitors. 

First, let’s be clear about what data analytics consists of. At its most basic, it’s the science of analyzing raw data in order to make conclusions about that information. Many of the techniques used have been automated into algorithms that evaluate raw data to reveal trends and metrics then used to optimize business processes.

As of late 2017, the Federal Reserve Bank of New York reported over $600 billion of U.S. household debt was delinquent, with $400 billion of that in delinquency for more than 90 days. This is not only a consumer problem, but also a problem for companies owning the debt, as delinquent payments significantly cut into revenue and increase the cost of credit for everyone.

Traditionally, debt collection has been handled by some automated processes – letters and emails – but other portions require human collectors and phone or in-person outreach. While an experienced debt collector can decipher a consumer’s problems and determine the best course of action to obtain repayment, the typically large number of open cases debt collection agencies handle make such a method inefficient. 

As an example of the use of data analytics in debt collection, we turn to Oracle, the multi-national computer technology company.

Data scientists from the company developed a machine-learning model based on a dataset of 80,000 consumers of a single insurance company between the 2014-2016 period and examined basic information regarding the case and a log of interactions between consumer and collector.

The team developed an algorithm to directly predict the eventual collection outcome from any point in time. This allowed them to estimate the value of calling a consumer at any time by calculating the difference in expected eventual repayment with and without calling the consumer.

The graph below shows the possibility of predicting consumers’ likelihood of repayment through machine learning. The figure below plots the Receiving Operating Characteristic (ROC) curve of the eventual repayment predictions for consumers who are 25 days into the collection process.


While the prediction performance isn’t great, it still shows that eventual repayment is predictable, and demonstrates the potential in this approach.

The Oracle group then estimated the value of calling the consumer by predicting the change in repayment likelihood with and without making an additional phone call from their current state. The optimal decision is then to call the consumers where the value of calling is the highest for as much as capacity allows.

The next step was to live-test the system the industry partner initially providing the data—a Dutch collection agency that handles over 250,000 collection cases annually totaling €120 million of principal. 

The team conducted a live experiment that randomly assigned newly arriving consumers into two groups. The first group, called Incumbent Policy (IP) was the control group with 466 cases and followed the existing collection policy.

The second group, named GBDT Optimized Collection Policy handled 455 cases with the data-analytic-driven policy.  At the start of each day, all of the cases meeting the rules specified by IP and the top 20% of the cases as ranked by GOCP (GBDT Optimized Collection Policy) were put into a central pool of outstanding cases for collectors to call that day. The collectors were unaware of the experiment and performed their duties without knowing that the cases came from two different groups. Finally, the cases were tracked for a minimum of 60 days and a number of performance indicators were calculated at the end of the experiment as reflected in the table below.

Clearly, the method using data analytics performed better than the control group, and recovered the debt in less time.

In our next post, we’ll dig more into uses of data analytics in the finance industry.