Skip to content

Introductory tutorial

This is the introduction to a seven part tutorial which demonstrates how to de-duplicate a small dataset using simple settings.

The aim of the tutorial is to demonstrate core Splink functionality succinctly, rather that comprehensively document all configuration options.

The seven parts are:

Throughout the tutorial, we use the duckdb backend, which is the recommended option for smaller datasets of up to around 1 million records on a normal laptop.

You can find these tutorial notebooks in the docs/demos/tutorials/ folder of the splink repo, or click the Colab links to run in your browser.

End-to-end demos

After following the steps of the tutorial, it might prove useful to have a look at some of the example notebooks that show various use-case scenarios of Splink from start to finish.

Interactive Introduction to Record Linkage Theory

If you'd like to learn more about record linkage theory, an interactive introduction is available here.

LLM prompts

If you're using an LLM to suggest Splink code, see here for suggested prompts and context.