This page contains a Flash digital edition of a book.
CCR2 Information


likely to have a PhD-level qualification. With such skills they can appreciate the broader issues, address error and bias, and potential problems with the system. However they must be wary of:


l Aiming for increased automation over the genuine benefits to the company. l Delivering a practical solution. l Producing a solution that is explainable.


Automation Machine learning produces exciting results and I have worked with third-parties selling solutions with the view of total automation: model build, learn, and re-build. This is attractive, but the user should remember to consider the issues of error and bias. If you turn on such a device without addressing such issues – or providing a mechanism to spot when they arise or become significant – you are likely to be jeopardising your future success. With a short-term loan, the outcome and learning process is quick. However, traditional, longer-term loans and revolving products take years to mature and irreparable damage could have been done by AI before the mistake is identified.


Practicality The ability to implement a model has always been key and I experienced an international modelling team who were great statisticians, but failed to have any solutions implemented in the two years they operated. The main issue was that they did not consider the operating environments. Modern tools were not available for deployment and so when they built neural networks, the models could not be used in legacy production systems. The modern issues tend to be slightly


different. Firstly, the data scientist should check the availability of the data. It may well be in the modeling sample, but is it available in the live environment? An example of this is student data captured by the compiling organisation who then provided it anonymised to the data scientist. However the lender may not be able to or have previously asked the question, or validate the response if they can. Data protection has added another level to


this area of concern. When the lender first started, they obtained detailed Facebook data without asking for consent. Initially they did not need to. It was not long after this became public, that access was removed.


March 2020


over their rivals and know that data can provide this. However this desire can lead to over-complication and, as a result, worse performance for the company. I have always been a champion of the KISS (keep it simple stupid) principle and this definitely applies to scorecard development. FICO used to recommend a minimum of 1,500 goods, bads, and rejects for a model build. Logistic regression is favoured by the


The data scientist should check the availability of the data. It may well be in the modeling sample, but is it available in the live environment?


Explainable There are two reasons why models should make sense. Firstly, nonsense variables can creep into a model. The problem is we have a sample of the whole population and there will be errors (statistics is all about errors and understanding them). As a result, spurious things can happen, and variables can appear predictive simply because of the bias in the sample. So knowing and sanity checking the variables and their contribution to a decision is key. Some machine-learning solutions convert a complex algorithm into something that a human can understand. Regulators around the world are already


grappling with the problem of how to ensure that ‘scorecards’ are fair and non-discriminatory. This is the second reason why the model should be explainable. If you cannot demonstrate to the


regulator that your decision system does not discriminate – and does not use any data that is not allowed – then the business is likely to be fined. It might not happen in your country yet, but it is coming, and fast.


Little data The desire to build AI solutions is both strong within the data-science world and investors. The latter want real discrimination


www.CCRMagazine.com


traditional scorecard developer because of the anticipated exponential increase in risk as quality deteriorates. It provides a better fit when considering dummy variables, however regression provides the correlation coefficients giving the relationship between the independent and dependent (good or bad) variable and we know that the larger the sample (principally, the number of bads) the smaller the spread of outcomes (errors). Building a regression model on small


samples, therefore, runs the risk of identifying and overstating these coefficients. The result is over-fitting: a model that works well on the sample but will be suboptimal on future samples. This problem is exacerbated by machine-


learning approaches, because it is not just about the final model, but ‘binning’ the data as well: grouping attributes sensibly to provide robustness. If this is not scrutinised, it can result in super-errors – where the attributes have been combined because of spurious outcomes, thereby boosting the error. All of which means that AI models must have very large datasets (possibly tens of thousands of bads) and intense scrutiny of the variables and variable construction.


Conclusion If the data just is not there, do not fool yourself into thinking you can use a more advanced technique than is justified. And of course, do not go looking for additional data (principally bads) from samples that are not representative: too young (resulting in a mis-classification issue), or too old (being a representative data issue), or from a different product. A simple model that is understood, will


nearly always out perform a complex model that is not. Learning should be a continuous process and by starting simple, you will learn from the model so that the next iteration is better, and the next better still. CCR2


31


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52