top of page

Maximizing Machine Learning Model Performance and Shelf-Life

By: Justin McDonald

This article discusses critical factors that influence how well machine learning (ML) risk models perform and how long that performance will last, with a focus on data quality, post-transaction outcome data management and supervision of machine learning features.

As discussed in The Fraud Practice’s latest white paper, there was a significant difference between merchants relying on rules-based risk management strategies versus a machine learning (ML) model-based architecture with regards to pivoting risk strategies to maintain strong fraud prevention performance as both good customer and fraudster activity ramped up in 2020. While true, not all model-based risk management strategies are the same, and likewise there are factors that influenced whether some organizations adjusted and maintained strong fraud prevention practices or struggled to adapt.

Custom modeling and ML model-based fraud scoring are likely to make adjusting to new buyer and fraudster behavior easier, but there are many factors that facilitate maintaining an adaptive and responsive risk management strategy. It all starts with data. Not just the breadth of data, but also the quality of the data, which requires post-transaction operational tasks.

Learning from Mistakes

Risk models undergo training based on actual data. This includes recent historical data when first setting up the models, then continues to learn from data as the organization continues to screen and process transactions. This means that order data must be properly defined and labeled, including updating outcomes when there are missed fraud events as well as sales insults. The latter is more of a challenge.

We can’t learn from our mistakes until we first recognize that we have made them. This is easy with missed fraud – the merchant gets a chargeback. Where this becomes more difficult is recognizing sales insults, or false positives. Many legitimate customers turned away on suspicion of fraud will not return, and not all sales insults can be recognized. Other times, consumers will try again or make a customer service call and validate the legitimacy of their order attempt. This is where organizations are significantly less likely to retrain or make model adjustments, either because they are not well equipped to recognize sales insults or they do not have the data feedback loop to provide this information back to their models for retraining and improvement.

Models will adapt and iterate over time via machine learning, and ideally supervised learning. The challenge is that models are well poised to learn from missed fraud but typically struggle to learn from sales insults, which over time can lead to models that are better at preventing fraud with deteriorating performance when it comes to minimizing false positives.

ML models are as effective as the data that feeds them. Trust and safety teams need to keep in mind that the performance of the models they’re training and maintaining has everything to do with the breadth and complexity of the information they ingest—and that’s on the humans. Analysts who prioritize reducing friction, lowering false positives, preventing chargebacks, and doing it all in real time, are building the capacity for their teams and ML to recognize more fraudulent signals, more frequently, and with greater accuracy and speed. - Kevin Lee, VP of Trust & Safety at Sift 


Machine learning model-based risk strategies should never be “set it and forget it.” The nature of fraud is that it’s always evolving and models eventually face some level of decay. Machine learning and artificial intelligence are great for extending a model’s shelf life, but should be supplemented with supervision. The personnel supervising the model performance should have general experience in digital channel payment fraud, but also specific experience regarding the fraud trends and patterns in your industry and specific to your business. This should be vertical-specific, such as experience with digital goods, as well as focusing on experience with the organization’s meaningful channels, such as omni-channel or mobile.

The most effective modeling systems mesh machine learning and human intervention while leveraging bi-directional feedback. Machine learning and AI will identify data interaction points a person may never think of, and with incredible speed, but risk management experts must still perform their due diligence to ensure this feature or data interaction point is relevant and will provide uplift without creating problems. Again, this is particularly important around sales insults, where data to train models on reducing false positives is scarce relative to data that supports false negatives, or missed fraud.

Conversely, humans can identify patterns and risk signals based on real-world experience as well as iterative improvements added in response to fraud events and changing patterns over time. These human-defined features are typically the foundations for building models, but modeling analytics should be leveraged to validate these features are performing as planned, while machine learning determines how to tweak and improve these human prescribed features.


Sift is the leader in Digital Trust & Safety, empowering digital disruptors to Fortune 500 companies to unlock new revenue without risk. Sift dynamically prevents fraud and abuse through industry-leading technology and expertise, an unrivaled global data network of 70 billion events per month, and a commitment to long-term customer partnerships. Global brands such as Twitter, AirBnB, and Twilio rely on Sift to gain a competitive advantage in their markets. Visit us at and follow us on Twitter @GetSift.


bottom of page