Documentation for blocking_rules_library
¶
The blocking_rules_library
contains a series of pre-made blocking rules available for use in the construction of blocking rule strategies and em training blocks as described in this topic guide.
These conform to a more performant standard that is outlined in detail here.
The detailed API for each of these are outlined below.
Blocking Rule APIs¶
The block_on
function generates blocking rules that facilitate
efficient equi-joins based on the columns or SQL statements
specified in the col_names argument. When multiple columns or
SQL snippets are provided, the function generates a compound
blocking rule, connecting individual match conditions with
"AND" clauses.
This function is designed for scenarios where you aim to achieve efficient yet straightforward blocking conditions based on one or more columns or SQL snippets.
For more information on the intended use cases of block_on
, please see
the following discussion.
Further information on equi-join conditions can be found here
This function acts as a shorthand alias for the brl.and_
syntax:
import splink.duckdb.blocking_rule_library as brl
brl.and_(brl.exact_match_rule, brl.exact_match_rule, ...)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col_names |
list[str]
|
A list of input columns or sql conditions you wish to create blocks on. |
required |
salting_partitions |
(optional, int)
|
Whether to add salting to the blocking rule. More information on salting can be found within the docs. Salting is only valid for Spark. |
1
|
Examples:
from splink.duckdb.blocking_rule_library import block_on
block_on("first_name") # check for exact matches on first name
sql = "substr(surname,1,2)"
block_on([sql, "surname"])
from splink.spark.blocking_rule_library import block_on
block_on("first_name") # check for exact matches on first name
sql = "substr(surname,1,2)"
block_on([sql, "surname"], salting_partitions=1)
from splink.athena.blocking_rule_library import block_on
block_on("first_name") # check for exact matches on first name
sql = "substr(surname,1,2)"
block_on([sql, "surname"])
from splink.sqlite.blocking_rule_library import block_on
block_on("first_name") # check for exact matches on first name
sql = "substr(surname,1,2)"
block_on([sql, "surname"])
from splink.postgres.blocking_rule_library import block_on
block_on("first_name") # check for exact matches on first name
sql = "substr(surname,1,2)"
block_on([sql, "surname"])
Source code in splink/blocking_rules_library.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
Represents an exact match blocking rule.
DEPRECATED:
exact_match_rule
is deprecated. Please use block_on
instead, which acts as a wrapper with additional functionality.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col_name |
str
|
Input column name, or a str represent a sql
statement you'd like to match on. For example, |
required |
salting_partitions |
(optional, int)
|
Whether to add salting to the blocking rule. More information on salting can be found within the docs. Salting is currently only valid for Spark. |
None
|
Source code in splink/blocking_rules_library.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|