Skip to content

Introductory tutorial

This is the introduction to a five part tutorial which demonstrates how to de-duplicate a small dataset using simple settings.

The aim of the tutorial is to demonstarate core Splink functionality succinctly, rather that comprehensively document all configuration options.

The five parts are:

Throughout the tutorial, we use the duckdb backend, which is the recommended option for smaller datasets of up to around 1 million records on a normal laptop.

You can find these tutorial notebooks in the splink_demos repo, and you can run them live in your web browser by clicking the following link:

Binder