Skip to content

Contributing to Splink¶

We encourage contributions from all users. Whether it be adding a feature, fixing a bug or fixing typos in our documentation we are extremely appreciative of the input of external contributors. Splink would not be as good without it!

Thank you for your interest in contributing to Splink! If this is your first time working with Splink, check our Contributors Guide.

When making changes to Splink, there are a number of common operations that developers need to perform. The guides below lay out some of these common operations, and provides scripts to automate these processes. These include:

Splink is quite a large, complex codebase. The guides in this section lay out some of the key structures and key areas within the Splink codebase. These include:

  • Understanding and Debugging Splink - demonstrates several ways of understanding how Splink code is running under the hood. This includes Splink's debug mode and logging.
  • Transpilation using SQLGlot - demonstrates how Splink translates SQL in order to be compatible with multiple SQL engines using the SQLGlot package.
  • Performance and caching - demonstrates how pipelining and caching is used to make Splink run more efficiently.
  • Comparison and Comparison Level Libraries - demonstrates how Comparison Library and ComparisonLevel Library functions are structured within Splink, including how to add new functions and edit existing functions.
  • Charts - demonstrates how charts are built in Splink, including how to add new charts and edit existing charts.
  • User-Defined Functions - demonstrates how User Defined Functions (UDFs) are used to provide functionality within Splink that is not native to a given SQL backend.
  • Settings Validation - summarises how to use and expand the existing settings schema and validation functions.
  • Managing Splink's Dependencies - this section provides guidelines for managing our core dependencies and our strategy for phasing out Python versions that have reached their end-of-life.