Skip to content
Splink
Clerical Labelling
Initializing search
GitHub
Home
Getting Started
Tutorial
Examples
Topic Guides
Documentation
Contributing
Blog
Splink
GitHub
Home
Getting Started
Tutorial
Tutorial
Introduction
1. Data prep prerequisites
2. Exploratory analysis
3. Blocking
4. Estimating model parameters
5. Predicting results
6. Visualising predictions
7. Evaluation
Examples
Examples
Introduction
DuckDB
DuckDB
Deduplicate 50k rows historical persons
Linking financial transactions
Linking two tables of persons
Real time record linkage
Evaluation from ground truth column
Estimating m probabilities from labels
Quick and dirty persons model
Deterministic dedupe
Febrl3 Dedupe
Febrl4 link-only
PySpark
PySpark
Deduplication using Pyspark
Athena
Athena
Deduplicate 50k rows historical persons
SQLite
SQLite
Deduplicate 50k rows historical persons
Topic Guides
Topic Guides
Introduction
Record Linkage Theory
Record Linkage Theory
Why do we need record linkage?
Probabilistic vs Deterministic linkage
The Fellegi-Sunter Model
Linked Data as Graphs
Linkage Models in Splink
Linkage Models in Splink
Splink's SQL backends - Spark, DuckDB etc
Splink's SQL backends - Spark, DuckDB etc
Backends overview
PostgreSQL
Link type - linking vs deduping
Defining Splink models
Retrieving and querying Splink results
Data Preparation
Data Preparation
Feature Engineering
Blocking
Blocking
What are Blocking Rules?
Prediction Blocking Rules
Model Training Blocking Rules
Computational Performance
Comparing Records
Comparing Records
Defining and customising comparisons
Out-of-the-box comparisons
Comparing strings
Comparing strings
Choosing comparators and thresholds
String comparators
Phonetic transformations
Regular Expressions
Term-Frequency adjustments
Evaluation
Evaluation
Overview
Model
Edges (Links)
Edges (Links)
Overview
Edge Metrics
Clerical Labelling
Clusters
Clusters
Overview
Graph metrics
How to compute graph metrics
Performance
Performance
Run times, performance and linking large data
Spark Performance
Spark Performance
Optimising Spark performance
Salting blocking rules
DuckDB Performance
DuckDB Performance
Optimising DuckDB performance
Documentation
Documentation
Introduction
API
API
Linker API
Linker API
Full API
Exploratory analysis
Blocking
Estimating model parameters
Predicting results
Visualisation
Evaluation
Comparisons Library API
Comparisons Library API
Comparison Template Library
Comparison Library
Comparison Level Library
Comparison Composition
Comparison Helpers
Blocking Rule Library API
Blocking Rule Library API
Blocking Rule Library
Blocking Rule Composition
EM Training Session API
SplinkDataFrame API
Comparisons API
Comparisons API
Comparison
Comparison Level
Charts Gallery
Charts Gallery
Exploratory Analysis
Exploratory Analysis
completeness chart
missingness chart
profile columns
Blocking
Blocking
cumulative num comparisons from blocking rules chart
Comparison Helpers
Comparison Helpers
comparator score chart
comparator score threshold chart
phonetic match chart
Model Training
Model Training
comparison viewer dashboard
match weights chart
m u parameters chart
parameter estimate comparisons chart
tf adjustment chart
unlinkables chart
waterfall chart
Clustering
Clustering
cluster studio dashboard
Model Evaluation
Model Evaluation
accuracy chart from labels table
precision recall chart from labels table
roc chart fromm labels table
In-built datasets
Settings
Settings
Settings Dictionary Reference
Interactive Settings Editor
Contributing
Contributing
Contributing to Splink
Contributing to Splink
Contributor Guide
Development Quickstart
Linting and Formatting
Testing
Contributing to Documentation
Managing Dependencies with Poetry
Releasing a Package Version
Contributing to the Splink Blog
How Splink works
How Splink works
Understanding and debugging Splink
Transpilation using sqlglot
Performance and caching
Performance and caching
Caching and pipelining
Spark caching
Comparison and comparison level libraries
Comparison and comparison level libraries
Creating new comparisons and comparison levels
Extending existing comparisons and comparison levels
Charts
Charts
Understanding and editing charts
Building new charts
User-Defined Functions
Settings Validation
Settings Validation
Settings Validation Overview
Extending the Settings Validator
Dependency Compatibility Policy
Blog
Blog
Categories
Categories
Ethics
Feature Updates
Clerical Labelling
ΒΆ
This page is under construction - check back soon!
Back to top