Splink
Settings Editor
Initializing search
GitHub
Splink
GitHub
Home
Topic Guide
Topic Guide
Splink's SQL backends - Spark, DuckDB etc
Retrieving and querying Splink results
Blocking rules for prediction vs estimation
Link type - linkings vs deduping
Defining and customising comparisons
Run times, performance and linking large data
Optimising Spark performance
Comparators
Phonetic transformations for blocking
Salting blocking rules
API Reference
API Reference
Linker API
Linker API
Full API
Exploratory analysis
Estimating model parameters
Predicting results
Visualisation and quality assurance
EM Training Session API
SplinkDataFrame API
Comparisons API
Comparisons API
Comparison
Comparison Level
Comparisons Library API
Comparisons Library API
Comparison Library
Comparison Level Library
Settings Editor
Settings dictionary reference
Tutorials
Tutorials
0. Tutorial introduction
1. Data prep prerequisites
2. Exploratory analysis
3. Blocking
4. Estimating model parameters
5. Predicting results
6. Visualising predictions
7. Quality assurance
Examples
Examples
Examples index
DuckDB
DuckDB
Deduplicate 50k rows historial persons
Linking financial transactions
Linking two tables of persons
Real time record linkage
QA from ground truth column
Estimating m probabilities from labels
Quick and dirty persons model
Febrl3 Dedupe
Febrl4 link-only
PySpark
PySpark
Deduplication using Pyspark
Developers' guides
Developers' guides
Caching and pipelining
Understanding and debugging Splink
Spark caching
Transpilation using sqlglot
Building docs
Settings Editor