This blog is the second in our series dedicated to Bias in Data Linking. Here we wrap up work completed during the the six-month Alan Turing Institute internship on 'Bias in Data Linking', and share some final thoughts.
In March 2024, the Splink team launched a 6-month 'Bias in Data Linking' internship with the Alan Turing Institute. This installment of the Splink Blog is going to introduce the internship, its goals, and provide an update on what's happened so far.
We're pleased to release Splink 4, which is more scalable and easier to use than Splink 3.
For the uninitiated, Splink is a free and open source library for record linkage and deduplication at scale, capable of deduplicating 100 million records+, that is widely used and has been downloaded over 8 million times.
Version 4 is recommended to all new users. For existing users, there has been no change to the statistical methodology. Version 3 and 4 will give the same results, so there's no urgency to upgrade existing pipelines.
This post describes significant updates to Splink since our previous post and details of development work taking place on the forthcoming release of Splink 4.
Splink was developed in-house at the UK Government’s Ministry of Justice. As data scientists in government, we are accountable to the public and have a duty to maintain public trust. This includes upholding high standards of data ethics in our work.
Its hard to keep up to date with all of the new features being added to Splink, so we have launched this blog to share a round up of latest developments every few months.
So, without further ado, here are some of the highlights from the first half of 2023!