Skip to content

Topic GuidesΒΆ

This section contains in-depth guides on a variety of topics and concepts within Splink, as well as data linking more generally. These are intended to provide an extra layer of detail on top of the Splink tutorial and examples.

The topic guides are broken up into the following categories:

  1. Record Linkage Theory - for an introduction to data linkage from a theoretical perspective, and to help build some intuition around the parameters being estimated in Splink models.
  2. Linkage Models in Splink - for an introduction to the building blocks of a Splink model. Including the supported SQL Backends and how to define a model with a Splink Settings dictionary.
  3. Data Preparation - for guidance on preparing your data for linkage. Including guidance on feature engineering to help improve Splink models.
  4. Blocking - for an introduction to Blocking Rules and their purpose within record linkage. Including how blocking rules are used in different contexts within Splink.
  5. Comparing Records - for guidance on defining Comparisons withing a Splink model. Including how comparing records are structured within Comparisons, how to utilise string comparators for fuzzy matching and how deal with skewed data with Term Frequency Adjustments.
  6. Model Training - for guidance on the methods for training a Splink model, and how to choose them for specific use cases. (Coming soon)
  7. Clustering - for guidance on how records are clustered together. (Coming Soon)
  8. Evaluation - for guidance on how to evaluate Splink models, links and clusters (including Clerical Labelling).
  9. Performance - for guidance on how to make Splink models run more efficiently.