Skip to content

Documentation for the blocking_rule_library

CustomRule(blocking_rule, sql_dialect=None, salting_partitions=None, arrays_to_explode=None)

Bases: BlockingRuleCreator

Represents a custom blocking rule using a user-defined SQL condition. To refer to the left hand side and the right hand side of the pairwise record comparison, use l and r respectively, e.g. l.first_name = r.first_name and len(l.first_name) <2.

Parameters:

Name Type Description Default
blocking_rule str

A SQL condition string representing the custom blocking rule.

required
sql_dialect str

The SQL dialect of the provided blocking rule. If specified, Splink will attempt to translate the rule to the appropriate dialect.

None
salting_partitions int

The number of partitions to use for salting. If provided, enables salting for this blocking rule.

None
arrays_to_explode list[str]

A list of array column names to explode before applying the blocking rule.

None

Examples:

from splink.blocking_rule_library import CustomRule

# Simple custom rule
rule_1 = CustomRule("l.postcode = r.postcode")

# Custom rule with dialect translation
rule_2 = CustomRule(
    "SUBSTR(l.surname, 1, 3) = SUBSTR(r.surname, 1, 3)",
    sql_dialect="sqlite"
)

# Custom rule with salting
rule_3 = CustomRule(
    "l.city = r.city",
    salting_partitions=10
)

block_on(*col_names_or_exprs, salting_partitions=None, arrays_to_explode=None)

Generates blocking rules of equality conditions based on the columns or SQL expressions specified.

When multiple columns or SQL snippets are provided, the function generates a compound blocking rule, connecting individual match conditions with "AND" clauses.

Further information on equi-join conditions can be found here

Parameters:

Name Type Description Default
col_names_or_exprs Union[str, ColumnExpression]

A list of input columns or SQL conditions you wish to create blocks on.

()
salting_partitions (optional, int)

Whether to add salting to the blocking rule. More information on salting can be found within the docs.

None
arrays_to_explode (optional, List[str])

List of arrays to explode before applying the blocking rule.

None

Examples:

from splink import block_on
br_1 = block_on("first_name")
br_2 = block_on("substr(surname,1,2)", "surname")