Blog¶

January 29, 2026
in Production Splink pipelines
15 min read

Running Splink in Production at the Ministry of Justice

We have published plenty on record linkage theory, Splink's capabilities, and how to build a model. What we have not covered in depth is the engineering side of running linkage as a repeatable data product at scale. Splink gives you the statistical machinery, but it is intentionally unopinionated about how you productionise it.

Getting Splink to run once is usually straightforward. Getting it to run every week, across multiple datasets, while keeping outputs auditable and recoverable is the harder part.

This post sets out how we do that at the Ministry of Justice, how we keep the pipeline modular rather than fragile, and how we catch issues early enough to recover safely.

December 2, 2024
in Bias
5 min read

Bias in Data Linking, continued

This blog is the second in our series dedicated to Bias in Data Linking. Here we wrap up work completed during the the six-month Alan Turing Institute internship on 'Bias in Data Linking', and share some final thoughts.

August 19, 2024
in Bias
6 min read

Bias in Data Linking

In March 2024, the Splink team launched a 6-month 'Bias in Data Linking' internship with the Alan Turing Institute. This installment of the Splink Blog is going to introduce the internship, its goals, and provide an update on what's happened so far.

July 24, 2024
in Feature Updates
4 min read

Splink 4.0.0 released

We're pleased to release Splink 4, which is more scalable and easier to use than Splink 3.

For the uninitiated, Splink is a free and open source library for record linkage and deduplication at scale, capable of deduplicating 100 million records+, that is widely used and has been downloaded over 8 million times.

Version 4 is recommended to all new users. For existing users, there has been no change to the statistical methodology. Version 3 and 4 will give the same results, so there's no urgency to upgrade existing pipelines.

April 2, 2024
in Feature Updates
5 min read

Splink 3 updates, and Splink 4 development announcement - April 2024

This post describes significant updates to Splink since our previous post and details of development work taking place on the forthcoming release of Splink 4.

January 23, 2024
in Ethics
5 min read

Ethics in Data Linking

Welcome to the next installment of the Splink Blog where we’re talking about Data Ethics!

Why should we care about ethics?

Splink was developed in-house at the UK Government’s Ministry of Justice. As data scientists in government, we are accountable to the public and have a duty to maintain public trust. This includes upholding high standards of data ethics in our work.

December 6, 2023
in Feature Updates
4 min read

Splink Updates - December 2023

Welcome to the second installment of the Splink Blog!

Here are some of the highlights from the second half of 2023, and a taste of what is in store for 2024!

July 27, 2023
in Feature Updates
6 min read

Splink Updates - July 2023

Welcome to the Splink Blog!

Its hard to keep up to date with all of the new features being added to Splink, so we have launched this blog to share a round up of latest developments every few months.

So, without further ado, here are some of the highlights from the first half of 2023!