SQLMesh Evaluation

Centre for Excellence

May 2024

Demo

image

Must Haves

Tool Athena Adapter Unit Testing Model Generation
DBT ✅ ⭐ 🔧
SQLMesh ⏳ ⭐⭐ 🔧
  • Athena Adapter: SQLMesh has trino adapter, and we could potentially build it ourselves

  • Model Generation: Neither has in-built support for model generation so we would need to build a custom solution

  • Unit Testing: Both (>=dbt.1.8) have support for unit testing of models.
image

Should Haves

Tool Airflow Integration Incremental Models Monitoring
DBT 🔧 ⭐ ✅
SQLMesh SQL ⭐⭐ 💵
  • Airflow Integration: It's possible to kick-off a dbt build from Airflow, whereas with SQLMesh you can create a DAG per model.
  • Monitoring: You can integrate dbt with elementary and track model-level metrics.
  • Incremental Models: Both have support for incremental models but SQLMesh uses a more powerful interval-based approach see data load patterns for more details.
image

Nice to Haves

Tool Column Lineage Python Models Preview
DBT ❌ ⭐ ❌
SQLMesh ✅ ⭐⭐ ✅
  • Column Lineage: SQLMesh is able to track column lineage for SQL models but not python models

  • Python Models: dbt-athena supports Athena Spark python models, whilst SQLMesh supports both spark and local python models.

  • Preview: SQLMesh is able to preview changes before deploying them.

image

Conclusion

  • Liaise with AP team and ? domain team elementary on deploying elementary

  • Work with observability platform to visualise the stats

  • Work with HMCTS team to agree naming approach

  • Pause on SQLMesh for now

  • Pause on implementing incremental models for curation

image

but SQLMesh has the concept of a load class where you could dynamically generate N models and have them be recognized by SQLMesh as if they were defined in your file system. See https://tobiko-data.slack.com/archives/C044BRE5W4S/p1714598530943139 for more details.

SQLMesh uses python-based macros so you can use pytest for unit testing macros.

Wouldn't work with python models which run where SQLMesh is installed, instead of K* cluster

SQLMesh enterprise has native support for monitoring, otherwise you use Airflow UI to monitor individual models.