Methods in Linker.visualisations¶
Visualisations to help you understand and diagnose your linkage model.
Accessed via linker.visualisations
.
Most of the visualisations return an altair.Chart object, meaning it can be saved an manipulated using Altair.
For example:
altair_chart = linker.visualisations.match_weights_chart()
# Save to various formats
altair_chart.save("mychart.png")
altair_chart.save("mychart.html")
altair_chart.save("mychart.svg")
altair_chart.save("mychart.json")
# Get chart spec as dict
altair_chart.to_dict()
To save the chart as a self-contained html file with all scripts inlined so it can be viewed offline:
from splink.internals.charts import save_offline_chart
c = linker.visualisations.match_weights_chart()
save_offline_chart(c.to_dict(), "test_chart.html")
View resultant html file in Jupyter (or just load it in your browser)
from IPython.display import IFrame
IFrame(src="./test_chart.html", width=1000, height=500)
match_weights_chart(as_dict=False)
¶
Display a chart of the (partial) match weights of the linkage model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
as_dict |
bool
|
If True, return the chart as a dictionary. |
False
|
Examples:
altair_chart = linker.visualisations.match_weights_chart()
altair_chart.save("mychart.png")
m_u_parameters_chart(as_dict=False)
¶
Display a chart of the m and u parameters of the linkage model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
as_dict |
bool
|
If True, return the chart as a dictionary. |
False
|
Examples:
altair_chart = linker.visualisations.m_u_parameters_chart()
altair_chart.save("mychart.png")
Returns:
Name | Type | Description |
---|---|---|
altair_chart |
ChartReturnType
|
An altair chart |
match_weights_histogram(df_predict, target_bins=30, width=600, height=250, as_dict=False)
¶
Generate a histogram that shows the distribution of match weights in
df_predict
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df_predict |
SplinkDataFrame
|
Output of |
required |
target_bins |
int
|
Target number of bins in histogram. Defaults to 30. |
30
|
width |
int
|
Width of output. Defaults to 600. |
600
|
height |
int
|
Height of output chart. Defaults to 250. |
250
|
as_dict |
bool
|
If True, return the chart as a dictionary. |
False
|
Examples:
df_predict = linker.inference.predict(threshold_match_weight=-2)
linker.visualisations.match_weights_histogram(df_predict)
parameter_estimate_comparisons_chart(include_m=True, include_u=False, as_dict=False)
¶
Show a chart that shows how parameter estimates have differed across the different estimation methods you have used.
For example, if you have run two EM estimation sessions, blocking on different variables, and both result in parameter estimates for first_name, this chart will enable easy comparison of the different estimates
Parameters:
Name | Type | Description | Default |
---|---|---|---|
include_m |
bool
|
Show different estimates of m values. Defaults to True. |
True
|
include_u |
bool
|
Show different estimates of u values. Defaults to False. |
False
|
as_dict |
bool
|
If True, return the chart as a dictionary. |
False
|
Examples:
linker.training.estimate_parameters_using_expectation_maximisation(
blocking_rule=block_on("first_name"),
)
linker.training.estimate_parameters_using_expectation_maximisation(
blocking_rule=block_on("surname"),
)
linker.visualisations.parameter_estimate_comparisons_chart()
Returns:
Name | Type | Description |
---|---|---|
altair_chart |
ChartReturnType
|
An Altair chart |
tf_adjustment_chart(output_column_name, n_most_freq=10, n_least_freq=10, vals_to_include=None, as_dict=False)
¶
Display a chart showing the impact of term frequency adjustments on a specific comparison level. Each value
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_column_name |
str
|
Name of an output column for which term frequency adjustment has been applied. |
required |
n_most_freq |
int
|
Number of most frequent values to show. If this
or |
10
|
n_least_freq |
int
|
Number of least frequent values to show. If
this or |
10
|
vals_to_include |
list
|
Specific values for which to show term frequency adjustments. Defaults to None. |
None
|
as_dict |
bool
|
If True, return the chart as a dictionary. |
False
|
Examples:
linker.visualisations.tf_adjustment_chart("first_name")
Returns:
Name | Type | Description |
---|---|---|
altair_chart |
ChartReturnType
|
An Altair chart |
waterfall_chart(records, filter_nulls=True, remove_sensitive_data=False, as_dict=False)
¶
Visualise how the final match weight is computed for the provided pairwise record comparisons.
Records must be provided as a list of dictionaries. This would usually be
obtained from df.as_record_dict(limit=n)
where df
is a SplinkDataFrame.
Examples:
df = linker.inference.predict(threshold_match_weight=2)
records = df.as_record_dict(limit=10)
linker.visualisations.waterfall_chart(records)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
records |
List[dict]
|
Usually be obtained from |
required |
filter_nulls |
bool
|
Whether the visualisation shows null comparisons, which have no effect on final match weight. Defaults to True. |
True
|
remove_sensitive_data |
bool
|
When True, The waterfall chart will contain match weights only, and all of the (potentially sensitive) data from the input tables will be removed prior to the chart being created. |
False
|
as_dict |
bool
|
If True, return the chart as a dictionary. |
False
|
Returns:
Name | Type | Description |
---|---|---|
altair_chart |
ChartReturnType
|
An Altair chart |
comparison_viewer_dashboard(df_predict, out_path, overwrite=False, num_example_rows=2, return_html_as_string=False)
¶
Generate an interactive html visualization of the linker's predictions and
save to out_path
. For more information see
this video
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df_predict |
SplinkDataFrame
|
The outputs of |
required |
out_path |
str
|
The path (including filename) to save the html file to. |
required |
overwrite |
bool
|
Overwrite the html file if it already exists? Defaults to False. |
False
|
num_example_rows |
int
|
Number of example rows per comparison vector. Defaults to 2. |
2
|
return_html_as_string |
bool
|
If True, return the html as a string |
False
|
Examples:
df_predictions = linker.predict()
linker.visualisations.comparison_viewer_dashboard(
df_predictions, "scv.html", True, 2
)
Optionally, in Jupyter, you can display the results inline Otherwise you can just load the html file in your browser
from IPython.display import IFrame
IFrame(src="./scv.html", width="100%", height=1200)
cluster_studio_dashboard(df_predict, df_clustered, out_path, sampling_method='random', sample_size=10, cluster_ids=None, cluster_names=None, overwrite=False, return_html_as_string=False, _df_cluster_metrics=None)
¶
Generate an interactive html visualization of the predicted cluster and
save to out_path
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df_predict |
SplinkDataFrame
|
The outputs of |
required |
df_clustered |
SplinkDataFrame
|
The outputs of
|
required |
out_path |
str
|
The path (including filename) to save the html file to. |
required |
sampling_method |
str
|
|
'random'
|
sample_size |
int
|
Number of clusters to show in the dahboard. Defaults to 10. |
10
|
cluster_ids |
list
|
The IDs of the clusters that will be displayed in the
dashboard. If provided, ignore the |
None
|
overwrite |
bool
|
Overwrite the html file if it already exists? Defaults to False. |
False
|
cluster_names |
list
|
If provided, the dashboard will display
these names in the selection box. Ony works in conjunction with
|
None
|
return_html_as_string |
bool
|
If True, return the html as a string |
False
|
Examples:
df_p = linker.inference.predict()
df_c = linker.visualisations.cluster_pairwise_predictions_at_threshold(
df_p, 0.5
)
linker.cluster_studio_dashboard(
df_p, df_c, [0, 4, 7], "cluster_studio.html"
)
Optionally, in Jupyter, you can display the results inline Otherwise you can just load the html file in your browser
from IPython.display import IFrame
IFrame(src="./cluster_studio.html", width="100%", height=1200)