API reference¶

AddressMatcher¶

`AddressMatcher` ¶

Primary entry point for address matching.

Accepts either a raw DuckDBPyRelation (cleaned on the fly) or a str / Path pointing to a folder created by prepare_canonical_folder for canonical addresses. Messy addresses can be a DuckDB relation or a list of AddressRecord / dicts.

Stages default to [ExactMatchStage(), SplinkStage()]. Pass your own list to customise matching behaviour — the existing stage dataclasses (ExactMatchStage, UniqueTrigramStage, SplinkStage) already expose all the knobs you need.

Parameters:

Name	Type	Description	Default
`canonical_addresses`	`Union[DuckDBPyRelation, str, Path]`	Canonical dataset to match against. Can be a `DuckDBPyRelation` or a path to a prepared canonical folder.	required
`canonical_address_filter`	`str \| None`	Optional DuckDB SQL expression used to filter canonical addresses after load (for prepared folders) or directly on the provided canonical relation.	`None`
`addresses_to_match`	`Union[DuckDBPyRelation, list[AddressRecord], list[dict]]`	Messy addresses to resolve. Can be a `DuckDBPyRelation`, a list of `AddressRecord`, or a list of dicts with `address_concat`, `postcode`, and `unique_id` fields.	required
`con`	`DuckDBPyConnection`	DuckDB connection to use for all operations.	required
`stages`	`Optional[list[MatchingStage]]`	Optional list of `MatchingStage` instances defining the matching pipeline. Defaults to exact match followed by Splink.	`None`
`debug_options`	`Optional[DebugOptions]`	Optional `DebugOptions` to control debug output and logging.	`None`

Examples:

Simple matching:

import duckdb
from uk_address_matcher import AddressMatcher

con = duckdb.connect()
canonical = con.read_parquet("./canonical.parquet")
messy = con.read_parquet("./messy.parquet")

matcher = AddressMatcher(
    canonical_addresses=canonical,
    addresses_to_match=messy,
    con=con,
)
result = matcher.match()

Custom stages:

from uk_address_matcher import (
    AddressMatcher, ExactMatchStage, UniqueTrigramStage, SplinkStage,
)

matcher = AddressMatcher(
    canonical_addresses=canonical,
    addresses_to_match=messy,
    con=con,
    stages=[
        ExactMatchStage(),
        UniqueTrigramStage(),
        SplinkStage(),
    ],
)
result = matcher.match()

Pre-prepared canonical data:

matcher = AddressMatcher(
    canonical_addresses="./prepared_addressbase",
    addresses_to_match=messy,
    con=con,
)
result = matcher.match()

`init(canonical_addresses, addresses_to_match, *, canonical_address_filter=None, con, stages=None, debug_options=None)` ¶

`match()` ¶

Runs the full matching pipeline.

Each stage is executed in sequence. Earlier stages consume easy matches; later stages handle the remainder.

Returns:

Type	Description
`MatchResult`	A `MatchResult` wrapper around the final DuckDB relation, including
`MatchResult`	`unique_id`, `resolved_canonical_id`, `match_reason`, and any
`MatchResult`	additional columns produced by the stages.

`available_stages()` `classmethod` ¶

All registered MatchingStage subclasses.

Delegates to MatchingStage.available_stages() which walks the subclass tree dynamically, so newly added stages are picked up automatically without maintaining a hard-coded list.

Results¶

MatchResult¶

`MatchResult` `dataclass` ¶

Wraps match output with connection-scoped inspection helpers.

Access the underlying DuckDB relation via .matches().

Key methods

match_metrics - match-reason breakdown with counts and percentages. match_reasons - distinct match-reason values. _splink_predictions - raw Splink predictions table (requires SplinkStage).

`matches(*, all_columns=False)` ¶

The underlying DuckDB relation containing match results.

Parameters:

Name	Type	Description	Default
`all_columns`	`bool`	When True, return every column. By default only the key result columns are returned.	`False`

`match_metrics(*, order='descending')` ¶

Match-reason breakdown with counts and percentages

API reference¶

AddressMatcher¶

AddressMatcher ¶

__init__(canonical_addresses, addresses_to_match, *, canonical_address_filter=None, con, stages=None, debug_options=None) ¶

match() ¶

available_stages() classmethod ¶