Skip to content

Edge Evaluation

Once you have a trained model, you use it to generate edges (links) between entities (nodes). These edges will have a Match Weight and corresponding Probability.

There are several strategies for checking whether the links created in your pipeline perform as you want/expect.

Consider the Edge Metrics

Edge Metrics measure how links perform at an overall level.

First, consider how you would like your model to perform. What is important for your use case? Do you want to ensure that you capture all possible matches (i.e. high recall)? Or do you want to minimise the number of incorrectly predicted matches (i.e. high precision)? Perhaps a combination of both?

For a summary of all the edge metrics available in Splink, check out the Edge Metrics guide.


To produce Edge Metrics you will require a "ground truth" to compare your linkage results against (which can be achieved by Clerical Labelling).

Spot Checking pairs of records

Spot Checking real examples of record pairs is helpful for confidence in linkage results. It is an effective way to build intuition for how the model works in practice and allows you to interrogate edge cases.

Results of individual record pairs can be examined with the Waterfall Chart.

Choosing which pairs of records to spot check can be done by either:

As you are checking real examples, you will often come across cases that have not been accounted for by your model which you believe signify a match (e.g. a fuzzy match for names). We recommend using this feedback loop to help iterate and improve the definition of your model.

Choosing a Threshold

Threshold selection is a key decision point within a linkage pipeline. One of the major benefits of probabilistic linkage versus a deterministic (i.e. rules-based) approach is the ability to choose the amount of evidence required for two records to be considered a match (i.e. a threshold).

When you have decided on the metrics that are important for your use case, you can use the Threshold Selection Tool to get a first estimate for what your threshold should be.


The Threshold Selection Tool requires labelled data to act as a "ground truth" to compare your linkage results against.

Once you have an initial threshold, you can use Comparison Viewer Dashboard to look at records on either side of your threshold to check whether the threshold makes intuitive sense.

From here, we recommend an iterative process of tweaking your threshold based on your spot checking then looking at the impact that this has on your overall edge metrics. Other tools that can be useful during this iterative process include the Precision-Recall Chart, the ROC Chart as well as spot checking where the model has gone wrong.

In Summary

Evaluating the edges (links) of a linkage model depends on your use case. Defining what "good" looks like is a key step, which then allows you to choose a relevant metric (or metrics) for measuring success.

Your desired metric should help give an initial estimation for a linkage threshold, then you can use spot checking to help settle on a final threshold.

In general, the links between pairs of records are not the final output of linkage pipeline. Most use-cases use these links to group records together into clusters. In this instance, evaluating the links themselves is not sufficient, you have to evaluate the resulting clusters as well.