Documentation for the blocking_rule_library
¶
CustomRule(blocking_rule, sql_dialect=None, salting_partitions=None, arrays_to_explode=None)
¶
Bases: BlockingRuleCreator
Represents a custom blocking rule using a user-defined SQL condition. To
refer to the left hand side and the right hand side of the pairwise
record comparison, use l
and r
respectively, e.g.
l.first_name = r.first_name and len(l.first_name) <2
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
blocking_rule |
str
|
A SQL condition string representing the custom blocking rule. |
required |
sql_dialect |
str
|
The SQL dialect of the provided blocking rule. If specified, Splink will attempt to translate the rule to the appropriate dialect. |
None
|
salting_partitions |
int
|
The number of partitions to use for salting. If provided, enables salting for this blocking rule. |
None
|
arrays_to_explode |
list[str]
|
A list of array column names to explode before applying the blocking rule. |
None
|
Examples:
from splink.blocking_rule_library import CustomRule
# Simple custom rule
rule_1 = CustomRule("l.postcode = r.postcode")
# Custom rule with dialect translation
rule_2 = CustomRule(
"SUBSTR(l.surname, 1, 3) = SUBSTR(r.surname, 1, 3)",
sql_dialect="sqlite"
)
# Custom rule with salting
rule_3 = CustomRule(
"l.city = r.city",
salting_partitions=10
)
block_on(*col_names_or_exprs, salting_partitions=None, arrays_to_explode=None)
¶
Generates blocking rules of equality conditions based on the columns or SQL expressions specified.
When multiple columns or SQL snippets are provided, the function generates a compound blocking rule, connecting individual match conditions with "AND" clauses.
Further information on equi-join conditions can be found here
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col_names_or_exprs |
Union[str, ColumnExpression]
|
A list of input columns or SQL conditions you wish to create blocks on. |
()
|
salting_partitions |
(optional, int)
|
Whether to add salting to the blocking rule. More information on salting can be found within the docs. |
None
|
arrays_to_explode |
(optional, List[str])
|
List of arrays to explode before applying the blocking rule. |
None
|
Examples:
from splink import block_on
br_1 = block_on("first_name")
br_2 = block_on("substr(surname,1,2)", "surname")