Skip to content

Documentation for the blocking_rule_library

CustomRule(blocking_rule, sql_dialect=None, salting_partitions=None, arrays_to_explode=None)

Bases: BlockingRuleCreator

Represents a custom blocking rule using a user-defined SQL condition. To refer to the left hand side and the right hand side of the pairwise record comparison, use l and r respectively, e.g. l.first_name = r.first_name and len(l.first_name) <2.

Parameters:

Name Type Description Default
blocking_rule str

A SQL condition string representing the custom blocking rule.

required
sql_dialect str

The SQL dialect of the provided blocking rule. If specified, Splink will attempt to translate the rule to the appropriate dialect.

None
salting_partitions int

The number of partitions to use for salting. If provided, enables salting for this blocking rule.

None
arrays_to_explode list[str]

A list of array column names to explode before applying the blocking rule.

None

Examples:

```python from splink.blocking_rule_library import CustomRule

Simple custom rule

rule_1 = CustomRule("l.postcode = r.postcode")

Custom rule with dialect translation

rule_2 = CustomRule( "SUBSTR(l.surname, 1, 3) = SUBSTR(r.surname, 1, 3)", sql_dialect="sqlite" )

Custom rule with salting

rule_3 = CustomRule( "l.city = r.city", salting_partitions=10 ) ```

block_on(*col_names_or_exprs, salting_partitions=None, arrays_to_explode=None)

Generates blocking rules of equality conditions based on the columns or SQL expressions specified.

When multiple columns or SQL snippets are provided, the function generates a compound blocking rule, connecting individual match conditions with "AND" clauses.

Further information on equi-join conditions can be found here

Parameters:

Name Type Description Default
col_names_or_exprs Union[str, ColumnExpression]

A list of input columns or SQL conditions you wish to create blocks on.

()
salting_partitions (optional, int)

Whether to add salting to the blocking rule. More information on salting can be found within the docs.

None
arrays_to_explode (optional, List[str])

List of arrays to explode before applying the blocking rule.

None

Examples:

python from splink import block_on br_1 = block_on("first_name") br_2 = block_on("substr(surname,1,2)", "surname")