Displaying databases, tables and columns using tables function¶
In this tutorial we are going to use tables
to explore databases, tables, and columns stored in S3.
import pandas as pd
import awswrangler as wr
import datetime
import pydbtools as pydb
The tables
function calls the aws wrangler function wrangler.catalog.tables
which has the following paramaters.
Parameters:
- limit (int, optional) – Max number of tables to be returned. If none is provided, the default value is 100.
- catalog_id (str, optional) – The ID of the Data Catalog from which to retrieve Databases. If none is provided, the AWS account ID is used by default.
- database (str, optional) – Database name.
- transaction_id (str, optional) – The ID of the transaction (i.e. used with GOVERNED tables).
- search_text (str, optional) – Select only tables with the given string in table’s properties.
- name_contains (str, optional) – Select by a specific string on table name
- name_prefix (str, optional) – Select by a specific prefix on table name
- name_suffix (str, optional) – Select by a specific suffix on table name
- boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.
Returns: Pandas Dataframe filled by formatted infos.
https://aws-sdk-pandas.readthedocs.io/en/stable/stubs/awswrangler.catalog.tables.html
If called on it's own, tables()
will return the top 100 tables as a dataframe without any filters.
pydb.tables()
Database | Table | Description | TableType | Columns | Partitions | |
---|---|---|---|---|---|---|
0 | accom_and_employ_qa_monitoring | sl014_table | EXTERNAL_TABLE | period, prison_nomis, prison_nd, nps_or_crc, n... | ||
1 | accom_and_employ_qa_monitoring | sl016_table | EXTERNAL_TABLE | period, prison_nomis, prison_nd, nps_or_crc, n... | ||
2 | accom_and_employ_qa_monitoring | test_remmc1020 | EXTERNAL_TABLE | month, year | ||
3 | address_base_test | address_base_json | EXTERNAL_TABLE | changetype, startdate, lastupdatedate, entryda... | snapshot, tile | |
4 | address_base_test | address_base_parquet | EXTERNAL_TABLE | changetype, startdate, lastupdatedate, entryda... | snapshot, tile | |
... | ... | ... | ... | ... | ... | ... |
95 | alpha_ccs_temp | e2e_time_xhibit_backup20220721 | EXTERNAL_TABLE | defendant_on_case_id, case_id, year, quarter, ... | ||
96 | alpha_ccs_temp | e2e_time_xhibit_backup20220818 | EXTERNAL_TABLE | defendant_on_case_id, case_id, year, quarter, ... | ||
97 | alpha_ccs_temp | e2e_time_xhibit_backup20220927 | EXTERNAL_TABLE | defendant_on_case_id, case_id, year, quarter, ... | ||
98 | alpha_ccs_temp | e2e_time_xhibit_backup20221022 | EXTERNAL_TABLE | defendant_on_case_id, case_id, year, quarter, ... | ||
99 | alpha_ccs_temp | e2e_time_xhibit_backup20221022_2 | EXTERNAL_TABLE | defendant_on_case_id, case_id, year, quarter, ... |
100 rows × 6 columns
Most of the time, we are interested in the tables of a specific database. This can be specified using the database argument. If the database has a large number of tables, the limit will also need to be adjusted. If limit=None
, there is no limit. The other arguments which can be used are described above.
pydb.tables(database="alpha_ccs_temp", limit=None)
Database | Table | Description | TableType | Columns | Partitions | |
---|---|---|---|---|---|---|
0 | alpha_ccs_temp | ccs_disposals_receipts | EXTERNAL_TABLE | disposal_year, disposal_quarter, disposal_mont... | ||
1 | alpha_ccs_temp | ccs_disposals_receipts_post_sept21 | EXTERNAL_TABLE | disposal_year, disposal_quarter, disposal_mont... | ||
2 | alpha_ccs_temp | ccs_disposals_receipts_pre_sept21 | EXTERNAL_TABLE | disposal_year, disposal_quarter, disposal_mont... | ||
3 | alpha_ccs_temp | ccs_disposals_receipts_sdp_v2 | EXTERNAL_TABLE | disposal_year, disposal_quarter, disposal_mont... | ||
4 | alpha_ccs_temp | ccs_disposals_receipts_sdp_v3 | EXTERNAL_TABLE | disposal_year, disposal_quarter, disposal_mont... | ||
... | ... | ... | ... | ... | ... | ... |
111 | alpha_ccs_temp | opt_sitting_days_septemberv4 | EXTERNAL_TABLE | crn_id, crn_name, lcjb_area_id, lcjb_area_name... | ||
112 | alpha_ccs_temp | os_definitions | EXTERNAL_TABLE | variable, definition | ||
113 | alpha_ccs_temp | test | EXTERNAL_TABLE | defendant_on_case_id, case_id, year, quarter, ... | ||
114 | alpha_ccs_temp | test_sdp_v1_mags_sjp_time_cp_20221020 | EXTERNAL_TABLE | sdp_v1_flag, case_id, defendant_id, offence_de... | ||
115 | alpha_ccs_temp | timeliness | EXTERNAL_TABLE | year, quarter, age, n, valid_n, valid_n_and_cc... |
116 rows × 6 columns