top of page

Three Core Considerations for Evaluating Model-Based Fraud Scoring Providers

By: Justin McDonald

A recent white paper from the Fraud Practice emphasized the significance of having a risk management strategy that can evolve with changing fraud and consumer trends as well as support omni-channel or Unified Commerce. While model-based solutions typically fare better than rule-based systems when it comes to maintaining effective risk management in the face of dynamic fraud trends, not all model-based solutions are created equal.

Whether in the market to move from legacy fraud prevention systems to a model-based strategy or looking to assess a current modeling-based solution provider, organizations should consider the following factors when evaluating the capabilities and strengths of prospective or incumbent model-based fraud scoring providers.

What Point Tools and Technology Tools are Included?

Model-based risk scoring models guide us towards making very important decisions: do we accept, reject or escalate a given transaction? That direction or recommendation is based on the presence of thousands of individual signals. For any given transaction, there will be signals that are very meaningful, somewhat meaningful or nearly meaningless when it comes to how much that signal influences the recommended decision regarding a given transaction attempt. The more signals that are cultivated, the more likely there are to be meaningful risk signals, but the quality of risk signals is important as well. It is important to understand the multitude of signals a model-based fraud scoring provider can identify and use to facilitate risk management decision making, considering prospective vendors’ relative strengths in terms of the quantity and quality of risk signals available for their models to leverage. One of the appealing aspects of utilizing a model-based fraud scoring provider is the fact that many point tools, technology tools and signals are all included. Organizations that develop and manage their own custom models have to use a multitude of third party providers to derive signals, such as device fingerprinting and consumer identity data, which organizations typically cannot build in-house.

One of the values of model-based fraud scoring is that these various point tools and technologies are included with their services, both simplifying and reducing third party vendor costs for the merchant. Not all model-based fraud scoring providers are created equal in this regard, as each has their own vendor partnerships and homegrown tools or technology that feed and impact the accuracy of their models.

In general, the more signals the better, but quality of the signal has to be addressed as well. Two great examples for considering the quality of a risk signal are device identification and data sharing components.

Device identification needs to continue to recognize end users even as they intentionally try to morph their device and appear to be someone new. This return user recognition may include the use of behavioral characteristics as well. Device ID is a very strong anchor point for velocity checks, which can in turn be a very strong signal for catching morphing fraud schemes. However, a modeling feature or risk signal around the device fingerprint is only as strong as the ability to continually recognize the return device or end user. When one provider has a stronger or more reliable and accurate underlying device or behavioral identification technology, all modeling features utilizing device ID are stronger as well.

Data sharing modeling features are also a consideration, which should include cross-merchant velocities beyond just shared negative lists. Data sharing assets are another strength of model-based fraud scoring providers in general. Merchants can benefit from the fact that a nefarious end user has had irregular or high volume activity across a third party provider’s network, stopping that user even though it is the first time the merchant has seen this bad actor directly. Different model-based fraud scoring providers will have varying strengths when it comes to data sharing based on two primary factors: breadth of data and the number of modeling features leveraging data sharing assets. Breadth of data refers to the network effect or the fact that the more clients a vendor has participating in the data sharing pool, the more likely all merchants are to see a valuable high risk signal. The number of modeling features utilizing data assets includes the number of shared velocity of use, shared velocity of change and shared negative list features that are leveraged within the fraud scoring model.

Assessing Their Level of Expertise

Expertise is relevant at two levels: overall and vertical-specific. Overall expertise is related to the data scientists’ knowledge and ability to create effective models for risk management but is more closely related to general data science knowhow and the advanced understanding and application of statistics. Vertical-specific expertise is more closely related to applying models to risk management related fields and, more specifically, understanding the differences between different industries and vertical markets. There are countless nuances and considerations that lead to effective risk management strategies varying quite a bit across different industries or verticals.

One way to begin to assess a prospective vendor’s general expertise is to start with simple questions that may have complicated answers. One such example is seeking to understand the underlying statistical modeling approach. If a model-based fraud scoring provider is utilizing random forest models, statistical pattern recognition or neural networks, that is a good sign that there is overall modeling expertise within the organization. There are several other types of modeling approaches that signify this overall level of expertise and this is not meant to be an exhaustive list. The idea is that these modeling approaches (as well as others not mentioned) had to be designed by, and are likely still maintained by, data scientists with a deep level of knowledge and understanding around building high-performing predictive models.

Depending on who you are speaking with, you may not be able to glean much detail beyond the fact that a provider built their solution on the use of random forest models, but they should be able to get you in contact with someone who can speak to the advantages of this approach for machine learning in more detail.

On the other side of the spectrum, if a prospective vendor says their model-based risk scoring approach is built on a multiple linear regression model, ordinary least squares (OLS) model or single root decision tree, a very high level of general statistical and predictive modeling expertise is possible but less likely.

Vertical-specific expertise is related to understanding the relevant variables and what different variables or signals mean in the context of other signals as it pertains to a specific industry or vertical market. Money remittance is very different from mobile ordering from quick service restaurants (QSRs), which is very different from the risk profile for electronics retailers and different from the risk patterns and profiles an apparel retailer will see. Vertical-specific expertise often comes from payment and eCommerce risk professionals whereas general expertise will come from data scientists and Ph.D. statisticians.

Vertical expertise is critical for developing different models for different industries as well as supervising machine learning to maintain model performance over time. One way to gage vertical-specific expertise is to begin by asking if there is a base model specific to your industry or if all clients begin with the same model. From there, dive deeper by asking what differentiates the model that will be utilized by your organization relative to ones used by other types of organizations.

Next, see if you can gain any insights by looking at case studies or public client lists, looking for organizations in your vertical market or a related area. The nature of the fraud and risk management industry means many of these client relationships will not be public, but you can ask a prospective vendor how many clients they have that are within or adjacent to your vertical market. Don’t limit this analysis to the number of merchants, consider the total volume these merchants are likely to represent. The client list and collective volume in your industry is not only an indicator of vertical-specific expertise, but also sheds light on the relevant breadth of data with respect to data sharing risk signals.

Traditional fraud prevention platforms aren’t capable of proactively learning from large data sets or manual analyst feedback. Every user gets subjected to the same level of scrutiny, usually using a pretty narrow range of criteria that’s collected over a short period of time. These systems are always behind on data, so the people using them are regularly too late to stop fraud attacks, and have to clean up the consequences instead. Worse, the inflexibility of rules-based platforms makes scoring inaccurate, and fraudsters know how to reverse engineer the security measures of static systems—which usually leads businesses to add more friction, rupturing the user experience and driving customer insult rates through the roof. Model-based fraud scoring systems ingest real-time, real-world data and adapt alongside it, giving risk teams an opportunity to proactively protect the business and its customers.
Kevin Lee, VP of Trust & Safety at Sift 

How Can You Use the Platform?

Model-based fraud scoring providers should offer some platform for graphical user interface (GUI) but these platforms can vary greatly in terms of capability and usability. Capability refers to the variety of functions you are able to accomplish or perform while using the platform and usability refers to ease at which a non-technical user can navigate and utilize the platform to carry out these functions.

First think about what aspects of the platform are important to your organization, then seek to determine whether the solution provider supports these features and how likely it is that the personnel utilizing these features will be able to use them effectively. Here are some of the ways merchants may look to use the model-based fraud scoring platform:

Does the platform provide transparency? Are you able to see what specific signals led to the risk score of a given order attempt? This is important for orders labeled as high risk, when the merchant wants to validate that the model is not causing too many false positives. This is also important for missed fraud labeled as low risk, so the merchant can begin performing analysis intended to improve model performance.

Does the platform facilitate efficient manual reviews? If there is a fraud score range that results in an escalate or review outcome, can order risk review agents access an interface providing and prioritizing the risk signals to aid in their manual review? Usability is critical here, as review agents will benefit from a clean presentation of the most pertinent risk signals, and additional features like link analysis and “drill-down” tools can increase the speed and performance of performing manual review.

Does the platform support post-transaction management to maintain or improve model performance? Consider any steps you may need to take to close the feedback loop once final transaction outcomes are determined. There may be an automated mechanism that reports all missed fraud such that those orders can be analyzed to determine what risk signals or patterns may have been overlooked or underweighted. False positives are less likely to be automatically recognized and an organization should take the steps to report this feedback to their model or risk scoring provider. This will help retrain the models and goes a long way in maximizing model performance and shelf-life.


Sift is the leader in Digital Trust & Safety, empowering digital disruptors to Fortune 500 companies to unlock new revenue without risk. Sift dynamically prevents fraud and abuse through industry-leading technology and expertise, an unrivaled global data network of 70 billion events per month, and a commitment to long-term customer partnerships. Global brands such as Twitter, AirBnB, and Twilio rely on Sift to gain a competitive advantage in their markets. Visit us at and follow us on Twitter @GetSift.


bottom of page