This page contains a Flash digital edition of a book.
CCR2 Information


Big data, little data, and data science


Having the right information and analysis is crucial, but you also need to ensure that it is actionable


Murray Bailey Editor, Credit Scoring: Principles and Practicalities fourth edition murray.bailey @windsorcme.com


Artificial intelligence (AI), what in the scoring- world we used to call ‘new technology’, has been around for decades, but made little impact on credit scoring until the late 2000s. The change was precipitated by the explosion of so-called Big Data, open-source, and cloud computing. The amount of data produced grows exponentially, with a predicted 150 billion networked sensors by the end of this decade although predictions are becoming harder to make as satellite systems are rapidly being developed to provide a global wireless internet service, increasing access and speeds. Back in 2006, I designed the application


system for a new payday lender. The pilot launch (under another brand name) resulted in a staggering 100,000 applications within a three-month trial. From this huge dataset we were able to build initial models since the outcome from the loan was typically less than 45 days. We built a traditional logistic regression scorecard but within a wide Grey Zone used a neural network, and a KVM model. I had wanted a Nearest Neighbour solution but we did not have the computing power within the production environment to support it. The three models provided the ‘bidding system’ that I had envisaged and it worked exceptionally well. With the aid of a data-science team, after


a number of iterations, we let the system update itself – although there was a tendency for it lead to a lower acceptance rate if reject inference was not considered. This is the classic issue of AI. A model built on biased data will result in an exacerbation of those biases.


30 It will, furthermore, favour characteristics


influenced by that bias in its decision making. The other area to be aware of was the


grouping and interpretation of new data. As well as thousands of pieces of data from the credit bureau (both provided and calculated), the lender accessed social-media data and also used interactive information captured during the application. In the good old days of paper application forms, fraudulent applications could be spotted by ‘floating points’. These were dots on the form where the fraudster had rested their pen as they perhaps checked details and completed the form carefully. I was fascinated to find that positioning of the mouse on a screen was similarly predictive of performance. Other powerful new variables included the movement of the sliders for the amount and period applied for, and the time spent on each section. From this experience, there is no doubt


that Big Data can provide a wealth of predictive information, whether the source be an employer (for salary finance), student body (for student loans), or generally the


internet, social media in particular. However the mistake I have seen is to assume that this Big Data will be predictive. Most organisations do not have the luxury afforded to large payday lenders: they do not know the outcomes that would be associated with these new variables. To obtain a benefit from increased


discrimination, we see either an improvement in risk or acceptance. Most organisations that I have discussed Big Data with, see the opportunity to increase acceptance – to approve people that are viewed as having ‘thin files’ or being wrongly penalised by old credit records. The problem with this is that there is no outcome data to support the hypothesis. As a result, many companies have built ‘expert models’ or applied models built on other outcomes.


Data science AI has spawned a new breed of scorecard developers (if you are one, I hope you do not mind the classification), understanding the mathematics behind the algorithms, understanding the statistics, but more importantly understanding the failings. However, time and time again I come


Most organisations that I have discussed Big Data with, see the opportunity to increase acceptance – to approve people that are viewed as having ‘thin files’ or being wrongly penalised by old credit records


www.CCRMagazine.com


across statisticians calling themselves a ‘data scientist’ with neither the background in AI, nor the experience as a traditional scorecard developer. Such amateurs can damage a business if allowed to build and deploy inadequate and error-ridden models. Building an AI solution, requires a deep


appreciation of machine learning architecture which means that true data scientists are


March 2020


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52