Splink
Interactive Settings Editor
Initializing search
GitHub
Splink
GitHub
Home
Installation
Tutorials
Tutorials
0. Tutorial introduction
1. Data prep prerequisites
2. Exploratory analysis
3. Blocking
4. Estimating model parameters
5. Predicting results
6. Visualising predictions
7. Quality assurance
Examples
Examples
Examples index
DuckDB
DuckDB
Deduplicate 50k rows historical persons
Linking financial transactions
Linking two tables of persons
Real time record linkage
QA from ground truth column
Estimating m probabilities from labels
Quick and dirty persons model
Febrl3 Dedupe
Febrl4 link-only
PySpark
PySpark
Deduplication using Pyspark
Topic Guides
Topic Guides
Linkage Models in Splink
Linkage Models in Splink
Splink's SQL backends - Spark, DuckDB etc
Link type - linking vs deduping
Defining Splink models
Retrieving and querying Splink results
Feature Engineering
Comparing Records
Comparing Records
Defining and customising comparisons
Out-of-the-box comparisons
Comparing strings
Comparing strings
Choosing comparators and thresholds
String comparators
Phonetic transformations
Term-Frequency adjustments
Performance
Performance
Run times, performance and linking large data
Blocking rules for prediction vs estimation
Optimising Spark performance
Salting blocking rules
Reference
Reference
API
API
Linker API
Linker API
Full API
Exploratory analysis
Estimating model parameters
Predicting results
Visualisation and quality assurance
Comparisons Library API
Comparisons Library API
Comparison Library
Comparison Template Library
Comparison Level Library
Comparison Composition
EM Training Session API
SplinkDataFrame API
Comparisons API
Comparisons API
Comparison
Comparison Level
Settings
Settings
Settings Dictionary Reference
Interactive Settings Editor
Developers' guides
Developers' guides
Making Changes to Splink
Making Changes to Splink
Building a Virtual Environment
Linting
Building Docs
Testing
Releasing a Package Version
How Splink works
How Splink works
Understanding and debugging Splink
Transpilation using sqlglot
Performance & caching
Performance & caching
Caching and pipelining
Spark caching
Comparisons and comparison levels
Comparisons and comparison levels
Creating new comparisons and comparison levels
Extending existing comparisons and comparison levels
User-Defined Functions
settings
Interactive Settings Editor