Methods in Linker.visualisations¶

Visualisations to help you understand and diagnose your linkage model. Accessed via linker.visualisations.

Most of the visualisations return an altair.Chart object, meaning it can be saved an manipulated using Altair.

For example:

altair_chart = linker.visualisations.match_weights_chart()

# Save to various formats
altair_chart.save("mychart.png")
altair_chart.save("mychart.html")
altair_chart.save("mychart.svg")
altair_chart.save("mychart.json")

# Get chart spec as dict
altair_chart.to_dict()

To save the chart as a self-contained html file with all scripts inlined so it can be viewed offline:

from splink.internals.charts import save_offline_chart
c = linker.visualisations.match_weights_chart()
save_offline_chart(c.to_dict(), "test_chart.html")

View resultant html file in Jupyter (or just load it in your browser)

from IPython.display import IFrame
IFrame(src="./test_chart.html", width=1000, height=500)

`match_weights_chart(as_dict=False)` ¶

Display a chart of the (partial) match weights of the linkage model

Parameters:

Name	Type	Description	Default
`as_dict`	`bool`	If True, return the chart as a dictionary.	`False`

Examples:

altair_chart = linker.visualisations.match_weights_chart()
altair_chart.save("mychart.png")

`m_u_parameters_chart(as_dict=False)` ¶

Display a chart of the m and u parameters of the linkage model

Parameters:

Name	Type	Description	Default
`as_dict`	`bool`	If True, return the chart as a dictionary.	`False`

Examples:

altair_chart = linker.visualisations.m_u_parameters_chart()
altair_chart.save("mychart.png")

Returns:

Name	Type	Description
`altair_chart`	`ChartReturnType`	An altair chart

`match_weights_histogram(df_predict, target_bins=30, width=600, height=250, as_dict=False)` ¶

Generate a histogram that shows the distribution of match weights in df_predict

Parameters:

Name	Type	Description	Default
`df_predict`	`SplinkDataFrame`	Output of `linker.inference.predict()`	required
`target_bins`	`int`	Target number of bins in histogram. Defaults to 30.	`30`
`width`	`int`	Width of output. Defaults to 600.	`600`
`height`	`int`	Height of output chart. Defaults to 250.	`250`
`as_dict`	`bool`	If True, return the chart as a dictionary.	`False`

Examples:

df_predict = linker.inference.predict(threshold_match_weight=-2)
linker.visualisations.match_weights_histogram(df_predict)

`parameter_estimate_comparisons_chart(include_m=True, include_u=False, as_dict=False)` ¶

Show a chart that shows how parameter estimates have differed across the different estimation methods you have used.

For example, if you have run two EM estimation sessions, blocking on different variables, and both result in parameter estimates for first_name, this chart will enable easy comparison of the different estimates

Parameters:

Name	Type	Description	Default
`include_m`	`bool`	Show different estimates of m values. Defaults to True.	`True`
`include_u`	`bool`	Show different estimates of u values. Defaults to False.	`False`
`as_dict`	`bool`	If True, return the chart as a dictionary.	`False`

Examples:

linker.training.estimate_parameters_using_expectation_maximisation(
    blocking_rule=block_on("first_name"),
)

linker.training.estimate_parameters_using_expectation_maximisation(
    blocking_rule=block_on("surname"),
)

linker.visualisations.parameter_estimate_comparisons_chart()

Returns:

Name	Type	Description
`altair_chart`	`ChartReturnType`	An Altair chart

`tf_adjustment_chart(output_column_name, n_most_freq=10, n_least_freq=10, vals_to_include=None, as_dict=False)` ¶

Display a chart showing the impact of term frequency adjustments on a specific comparison level. Each value

Parameters:

Name	Type	Description	Default
`output_column_name`	`str`	Name of an output column for which term frequency adjustment has been applied.	required
`n_most_freq`	`int`	Number of most frequent values to show. If this or `n_least_freq` set to None, all values will be shown. Default to 10.	`10`
`n_least_freq`	`int`	Number of least frequent values to show. If this or `n_most_freq` set to None, all values will be shown. Default to 10.	`10`
`vals_to_include`	`list`	Specific values for which to show term frequency adjustments. Defaults to None.	`None`
`as_dict`	`bool`	If True, return the chart as a dictionary.	`False`

Examples:

linker.visualisations.tf_adjustment_chart("first_name")

Returns:

Name	Type	Description
`altair_chart`	`ChartReturnType`	An Altair chart

`waterfall_chart(records, filter_nulls=True, remove_sensitive_data=False, as_dict=False)` ¶

Visualise how the final match weight is computed for the provided pairwise record comparisons.

Records must be provided as a list of dictionaries. This would usually be obtained from df.as_record_dict(limit=n) where df is a SplinkDataFrame.

Examples:

df = linker.inference.predict(threshold_match_weight=2)
records = df.as_record_dict(limit=10)
linker.visualisations.waterfall_chart(records)

Parameters:

Name	Type	Description	Default
`records`	`List[dict]`	Usually be obtained from `df.as_record_dict(limit=n)` where `df` is a SplinkDataFrame.	required
`filter_nulls`	`bool`	Whether the visualisation shows null comparisons, which have no effect on final match weight. Defaults to True.	`True`
`remove_sensitive_data`	`bool`	When True, The waterfall chart will contain match weights only, and all of the (potentially sensitive) data from the input tables will be removed prior to the chart being created.	`False`
`as_dict`	`bool`	If True, return the chart as a dictionary.	`False`

Returns:

Name	Type	Description
`altair_chart`	`ChartReturnType`	An Altair chart

`comparison_viewer_dashboard(df_predict, out_path, overwrite=False, num_example_rows=2, return_html_as_string=False)` ¶

Generate an interactive html visualization of the linker's predictions and save to out_path. For more information see this video

Parameters:

Name	Type	Description	Default
`df_predict`	`SplinkDataFrame`	The outputs of `linker.predict()`	required
`out_path`	`str`	The path (including filename) to save the html file to.	required
`overwrite`	`bool`	Overwrite the html file if it already exists? Defaults to False.	`False`
`num_example_rows`	`int`	Number of example rows per comparison vector. Defaults to 2.	`2`
`return_html_as_string`	`bool`	If True, return the html as a string	`False`

Examples:

df_predictions = linker.predict()
linker.visualisations.comparison_viewer_dashboard(
    df_predictions, "scv.html", True, 2
)

Optionally, in Jupyter, you can display the results inline Otherwise you can just load the html file in your browser

from IPython.display import IFrame
IFrame(src="./scv.html", width="100%", height=1200)

`cluster_studio_dashboard(df_predict, df_clustered, out_path, sampling_method='random', sample_size=10, cluster_ids=None, cluster_names=None, overwrite=False, return_html_as_string=False, _df_cluster_metrics=None)` ¶

Generate an interactive html visualization of the predicted cluster and save to out_path.

Parameters:

Name	Type	Description	Default
`df_predict`	`SplinkDataFrame`	The outputs of `linker.predict()`	required
`df_clustered`	`SplinkDataFrame`	The outputs of `linker.cluster_pairwise_predictions_at_threshold()`	required
`out_path`	`str`	The path (including filename) to save the html file to.	required
`sampling_method`	`str`	`random`, `by_cluster_size` or `lowest_density_clusters_by_size`. Defaults to `random`.	`'random'`
`sample_size`	`int`	Number of clusters to show in the dahboard. Defaults to 10.	`10`
`cluster_ids`	`list`	The IDs of the clusters that will be displayed in the dashboard. If provided, ignore the `sampling_method` and `sample_size` arguments. Defaults to None.	`None`
`overwrite`	`bool`	Overwrite the html file if it already exists? Defaults to False.	`False`
`cluster_names`	`list`	If provided, the dashboard will display these names in the selection box. Ony works in conjunction with `cluster_ids`. Defaults to None.	`None`
`return_html_as_string`	`bool`	If True, return the html as a string	`False`

Examples:

df_p = linker.inference.predict()
df_c = linker.visualisations.cluster_pairwise_predictions_at_threshold(
    df_p, 0.5
)

linker.cluster_studio_dashboard(
    df_p, df_c, [0, 4, 7], "cluster_studio.html"
)

Optionally, in Jupyter, you can display the results inline Otherwise you can just load the html file in your browser

from IPython.display import IFrame
IFrame(src="./cluster_studio.html", width="100%", height=1200)

Methods in Linker.visualisations¶

match_weights_chart(as_dict=False) ¶

m_u_parameters_chart(as_dict=False) ¶

match_weights_histogram(df_predict, target_bins=30, width=600, height=250, as_dict=False) ¶

parameter_estimate_comparisons_chart(include_m=True, include_u=False, as_dict=False) ¶

tf_adjustment_chart(output_column_name, n_most_freq=10, n_least_freq=10, vals_to_include=None, as_dict=False) ¶

waterfall_chart(records, filter_nulls=True, remove_sensitive_data=False, as_dict=False) ¶

comparison_viewer_dashboard(df_predict, out_path, overwrite=False, num_example_rows=2, return_html_as_string=False) ¶

cluster_studio_dashboard(df_predict, df_clustered, out_path, sampling_method='random', sample_size=10, cluster_ids=None, cluster_names=None, overwrite=False, return_html_as_string=False, _df_cluster_metrics=None) ¶

`match_weights_chart(as_dict=False)` ¶

`m_u_parameters_chart(as_dict=False)` ¶

`match_weights_histogram(df_predict, target_bins=30, width=600, height=250, as_dict=False)` ¶

`parameter_estimate_comparisons_chart(include_m=True, include_u=False, as_dict=False)` ¶

`tf_adjustment_chart(output_column_name, n_most_freq=10, n_least_freq=10, vals_to_include=None, as_dict=False)` ¶

`waterfall_chart(records, filter_nulls=True, remove_sensitive_data=False, as_dict=False)` ¶

`comparison_viewer_dashboard(df_predict, out_path, overwrite=False, num_example_rows=2, return_html_as_string=False)` ¶

`cluster_studio_dashboard(df_predict, df_clustered, out_path, sampling_method='random', sample_size=10, cluster_ids=None, cluster_names=None, overwrite=False, return_html_as_string=False, _df_cluster_metrics=None)` ¶