Skip to content

Blog

Bias in Data Linking

In March 2024, the Splink team launched a 6-month 'Bias in Data Linking' internship with the Alan Turing Institute. This installment of the Splink Blog is going to introduce the internship, its goals, and provide an update on what's happened so far.

We're pleased to release Splink 4, which is more scalable and easier to use than Splink 3.

For the uninitiated, Splink is a free and open source library for record linkage and deduplication at scale, capable of deduplicating 100 million records+, that is widely used and has been downloaded over 8 million times.

Version 4 is recommended to all new users. For existing users, there has been no change to the statistical methodology. Version 3 and 4 will give the same results, so there's no urgency to upgrade existing pipelines.

Ethics in Data Linking

Welcome to the next installment of the Splink Blog where we’re talking about Data Ethics!

❓ Why should we care about ethics?

Splink was developed in-house at the UK Government’s Ministry of Justice. As data scientists in government, we are accountable to the public and have a duty to maintain public trust. This includes upholding high standards of data ethics in our work.