SQL can be difficult for others to read and QA if the SQL is nested. Common Table Expressions (CTEs) are a simple way to make your SQL code easier to read, much like a story. This is sometimes called literate SQL or literate programming.
Please also review this clear tutorial on essentialsql.com. It describes CTEs visually by highlighting key parts of the SQL code when using CTEs.
Also, try modules 10 to 12 of the Oracle Subqueries tutorial.
Below are two demonstrations of CTEs.
If you are not an R user (or don’t plan to be) you can safely ignore this hidden block of code. It puts demonstration Texas Housing Sales data into a temporary database. Here are all the rows of that table.
To demonstrate CTEs we first create a temporary in-memory database using the RSQLite package. We then write the demonstration Texas Housing Sales data from ggplot into that temproary database.
con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:")
DBI::dbWriteTable(conn = con,
name = "txhousing",
value = ggplot2::txhousing,
overwrite = TRUE)
texas_df <- DBI::dbReadTable(con, "txhousing")
Looking just at the SQL in green text below, note how we have:
Created a Common Table Expression (CTE) using the word WITH...
. Then named this CTE recent_years
. The SQL for the CTE is in brackets. It sums up housing sales in each city and in each year from 2002 onwards.
We then query that CTE (called recent_years
) directly after it by selecting three Texan cities and sorting the result.
query <- DBI::dbSendQuery(con,
"
WITH recent_years
AS (SELECT city,
year,
Sum(sales) AS sales
FROM txhousing
WHERE year >= 2002
GROUP BY city,
year)
SELECT *
FROM recent_years
WHERE city IN ( 'Abilene', 'Arlington', 'Kerrville' )
ORDER BY city,
year;
")
Let’s look at another example of a CTE on the same data table.
We first find the three cities with the biggest percentage change in median house prices in a CTE we have call top3_change
.
The CTE top3_change
is then joined (with an inner join) to the original txhousing sales table. Using an inner join will select only the cities in both tables, reducing the data to the top 3 citites.
query <- DBI::dbSendQuery(con,
"
WITH top3_change AS
(
SELECT city,
(Max(median) - Min(median))/ Min(median) AS change
FROM txhousing
GROUP BY city
ORDER BY change DESC limit 3)
SELECT a.*,
b.change
FROM txhousing AS a
INNER JOIN top3_change AS b
ON a.city = b.city;
")
Here is the result of that SQL query.