Maximizing Machine Learning Model Performance and Shelf-Life

By: Justin McDonald


This article discusses critical factors that influence how well machine learning (ML) risk models perform and how long that performance will last, with a focus on data quality, post-transaction outcome data management and supervision of machine learning features.


As discussed in The Fraud Practice’s latest white paper, there was a significant difference between merchants relying on rules-based risk management strategies versus a machine learning (ML) model-based architecture with regards to pivoting risk strategies to maintain strong fraud prevention performance as both good customer and fraudster activity ramped up in 2020. While true, not all model-based risk management strategies are the same, and likewise there are factors that influenced whether some organizations adjusted and maintained strong fraud prevention practices or struggled to adapt.


Custom modeling and ML model-based fraud scoring are likely to make adjusting to new buyer and fraudster behavior easier, but there are many factors that facilitate maintaining an adaptive and responsive risk management strategy. It all starts with data. Not just the breadth of data, but also the quality of the data, which requires post-transaction operational tasks.



Learning from Mistakes


Risk models undergo training based on actual data. This includes recent historical data when first setting up the models, then continues to learn from data as the organization continues to screen and process transactions. This means that order data must be properly defined and labeled, including updating outcomes when there are missed fraud events as well as sales insults. The latter is more of a challenge.


We can’t learn from our mistakes until we first recognize that we have made them. This is easy with missed fraud – the merchant gets a chargeback. Where this becomes more difficult is recognizing sales insults, or false positives. Many legitimate customers turned away on suspicion of fraud will not return, and not all sales insults can be recognized. Other times, consumers will try again or make a customer service call and validate the legitimacy of their order attempt. This is where organizations are significantly less likely to retrain or make model adjustments, either because they are not well equipped to recognize sales insults or they do not have the data feedback loop to provide this information back to their models for retraining and improvement.


Models will adapt and iterate over time via machine learning, and ideally supervised learning. The challenge is that models are well poised to learn from missed fraud but typically struggle to learn from sales insults, which over time can lead to models that are better at preventing fraud with deteriorating performance when it comes to minimizing false positives.


ML models are as effective as the data that feeds them. Trust and safety teams need to keep in mind that the performance of the models they’re training and maintaining has everything to do with the breadth and complexity of the information they ingest—and that’s on the humans. Analysts who prioritize reducing friction, lowering false positives, preventing chargebacks, and doing it all in real time, are building the capacity for their teams and ML to recognize more fraudulent signals, more frequently, and with greater accuracy and speed. - Kevin Lee, VP of Trust & Safety at Sift 


Supervision