Skip to content

Methods in Linker.visualisations¶

Visualisations to help you understand and diagnose your linkage model. Accessed via linker.visualisations.

Most of the visualisations return an altair.Chart object, meaning it can be saved an manipulated using Altair.

For example:

altair_chart = linker.visualisations.match_weights_chart()

# Save to various formats
altair_chart.save("mychart.png")
altair_chart.save("mychart.html")
altair_chart.save("mychart.svg")
altair_chart.save("mychart.json")

# Get chart spec as dict
altair_chart.to_dict()

To save the chart as a self-contained html file with all scripts inlined so it can be viewed offline:

from splink.internals.charts import save_offline_chart
c = linker.visualisations.match_weights_chart()
save_offline_chart(c.to_dict(), "test_chart.html")

View resultant html file in Jupyter (or just load it in your browser)

from IPython.display import IFrame
IFrame(src="./test_chart.html", width=1000, height=500)

match_weights_chart(as_dict=False) ¶

Display a chart of the (partial) match weights of the linkage model

Parameters:

Name Type Description Default
as_dict bool

If True, return the chart as a dictionary.

False

Examples:

altair_chart = linker.visualisations.match_weights_chart()
altair_chart.save("mychart.png")

m_u_parameters_chart(as_dict=False) ¶

Display a chart of the m and u parameters of the linkage model

Parameters:

Name Type Description Default
as_dict bool

If True, return the chart as a dictionary.

False

Examples:

altair_chart = linker.visualisations.m_u_parameters_chart()
altair_chart.save("mychart.png")

Returns:

Name Type Description
altair_chart ChartReturnType

An altair chart

match_weights_histogram(df_predict, target_bins=30, width=600, height=250, as_dict=False) ¶

Generate a histogram that shows the distribution of match weights in df_predict

Parameters:

Name Type Description Default
df_predict SplinkDataFrame

Output of linker.inference.predict()

required
target_bins int

Target number of bins in histogram. Defaults to 30.

30
width int

Width of output. Defaults to 600.

600
height int

Height of output chart. Defaults to 250.

250
as_dict bool

If True, return the chart as a dictionary.

False

Examples:

df_predict = linker.inference.predict(threshold_match_weight=-2)
linker.visualisations.match_weights_histogram(df_predict)

parameter_estimate_comparisons_chart(include_m=True, include_u=False, as_dict=False) ¶

Show a chart that shows how parameter estimates have differed across the different estimation methods you have used.

For example, if you have run two EM estimation sessions, blocking on different variables, and both result in parameter estimates for first_name, this chart will enable easy comparison of the different estimates

Parameters:

Name Type Description Default
include_m bool

Show different estimates of m values. Defaults to True.

True
include_u bool

Show different estimates of u values. Defaults to False.

False
as_dict bool

If True, return the chart as a dictionary.

False

Examples:

linker.training.estimate_parameters_using_expectation_maximisation(
    blocking_rule=block_on("first_name"),
)

linker.training.estimate_parameters_using_expectation_maximisation(
    blocking_rule=block_on("surname"),
)

linker.visualisations.parameter_estimate_comparisons_chart()

Returns:

Name Type Description
altair_chart ChartReturnType

An Altair chart

tf_adjustment_chart(output_column_name, n_most_freq=10, n_least_freq=10, vals_to_include=None, as_dict=False) ¶

Display a chart showing the impact of term frequency adjustments on a specific comparison level. Each value

Parameters:

Name Type Description Default
output_column_name str

Name of an output column for which term frequency adjustment has been applied.

required
n_most_freq int

Number of most frequent values to show. If this or n_least_freq set to None, all values will be shown. Default to 10.

10
n_least_freq int

Number of least frequent values to show. If this or n_most_freq set to None, all values will be shown. Default to 10.

10
vals_to_include list

Specific values for which to show term sfrequency adjustments. Defaults to None.

None
as_dict bool

If True, return the chart as a dictionary.

False

Examples:

linker.visualisations.tf_adjustment_chart("first_name")

Returns:

Name Type Description
altair_chart ChartReturnType

An Altair chart

waterfall_chart(records, filter_nulls=True, remove_sensitive_data=False, as_dict=False) ¶

Visualise how the final match weight is computed for the provided pairwise record comparisons.

Records must be provided as a list of dictionaries. This would usually be obtained from df.as_record_dict(limit=n) where df is a SplinkDataFrame.

Examples:

df = linker.inference.predict(threshold_match_weight=2)
records = df.as_record_dict(limit=10)
linker.visualisations.waterfall_chart(records)

Parameters:

Name Type Description Default
records List[dict]

Usually be obtained from df.as_record_dict(limit=n) where df is a SplinkDataFrame.

required
filter_nulls bool

Whether the visualisation shows null comparisons, which have no effect on final match weight. Defaults to True.

True
remove_sensitive_data bool

When True, The waterfall chart will contain match weights only, and all of the (potentially sensitive) data from the input tables will be removed prior to the chart being created.

False
as_dict bool

If True, return the chart as a dictionary.

False

Returns:

Name Type Description
altair_chart ChartReturnType

An Altair chart

comparison_viewer_dashboard(df_predict, out_path, overwrite=False, num_example_rows=2, return_html_as_string=False) ¶

Generate an interactive html visualization of the linker's predictions and save to out_path. For more information see this video

Parameters:

Name Type Description Default
df_predict SplinkDataFrame

The outputs of linker.predict()

required
out_path str

The path (including filename) to save the html file to.

required
overwrite bool

Overwrite the html file if it already exists? Defaults to False.

False
num_example_rows int

Number of example rows per comparison vector. Defaults to 2.

2
return_html_as_string bool

If True, return the html as a string

False

Examples:

df_predictions = linker.predict()
linker.visualisations.comparison_viewer_dashboard(
    df_predictions, "scv.html", True, 2
)

Optionally, in Jupyter, you can display the results inline Otherwise you can just load the html file in your browser

from IPython.display import IFrame
IFrame(src="./scv.html", width="100%", height=1200)

cluster_studio_dashboard(df_predict, df_clustered, out_path, sampling_method='random', sample_size=10, cluster_ids=None, cluster_names=None, overwrite=False, return_html_as_string=False, _df_cluster_metrics=None) ¶

Generate an interactive html visualization of the predicted cluster and save to out_path.

Parameters:

Name Type Description Default
df_predict SplinkDataFrame

The outputs of linker.predict()

required
df_clustered SplinkDataFrame

The outputs of linker.cluster_pairwise_predictions_at_threshold()

required
out_path str

The path (including filename) to save the html file to.

required
sampling_method str

random, by_cluster_size or lowest_density_clusters. Defaults to random.

'random'
sample_size int

Number of clusters to show in the dahboard. Defaults to 10.

10
cluster_ids list

The IDs of the clusters that will be displayed in the dashboard. If provided, ignore the sampling_method and sample_size arguments. Defaults to None.

None
overwrite bool

Overwrite the html file if it already exists? Defaults to False.

False
cluster_names list

If provided, the dashboard will display these names in the selection box. Ony works in conjunction with cluster_ids. Defaults to None.

None
return_html_as_string bool

If True, return the html as a string

False

Examples:

df_p = linker.inference.predict()
df_c = linker.visualisations.cluster_pairwise_predictions_at_threshold(
    df_p, 0.5
)

linker.cluster_studio_dashboard(
    df_p, df_c, [0, 4, 7], "cluster_studio.html"
)

Optionally, in Jupyter, you can display the results inline Otherwise you can just load the html file in your browser

from IPython.display import IFrame
IFrame(src="./cluster_studio.html", width="100%", height=1200)