Introductory tutorial¶
This is the introduction to a seven part tutorial which demonstrates how to de-duplicate a small dataset using simple settings.
The aim of the tutorial is to demonstrate core Splink functionality succinctly, rather that comprehensively document all configuration options.
The seven parts are:
Throughout the tutorial, we use the duckdb backend, which is the recommended option for smaller datasets of up to around 1 million records on a normal laptop.
You can find these tutorial notebooks in the docs/demos/tutorials/
folder of the splink repo, or click the Colab links to run in your browser.
End-to-end demos¶
After following the steps of the tutorial, it might prove useful to have a look at some of the example notebooks that show various use-case scenarios of Splink from start to finish.
Interactive Introduction to Record Linkage Theory¶
If you'd like to learn more about record linkage theory, an interactive introduction is available here.