Why is literate SQL important?

SQL can be difficult for others to read and QA if the SQL is nested. Common Table Expressions (CTEs) are a simple way to make your SQL code easier to read, much like a story. This is sometimes called literate SQL or literate programming.

  1. Please also review this clear tutorial on essentialsql.com. It describes CTEs visually by highlighting key parts of the SQL code when using CTEs.

  2. Also, try modules 10 to 12 of the Oracle Subqueries tutorial.

Below are two demonstrations of CTEs.

Example CTE

If you are not an R user (or don’t plan to be) you can safely ignore this hidden block of code. It puts demonstration Texas Housing Sales data into a temporary database. Here are all the rows of that table.

R Users only - database set up

    To demonstrate CTEs we first create a temporary in-memory database using the RSQLite package. We then write the demonstration Texas Housing Sales data from ggplot into that temproary database.

    con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:")
    
    DBI::dbWriteTable(conn = con, 
                      name = "txhousing", 
                      value = ggplot2::txhousing,
                      overwrite = TRUE)
    
    texas_df <- DBI::dbReadTable(con, "txhousing")


Looking just at the SQL in green text below, note how we have:

  1. Created a Common Table Expression (CTE) using the word WITH.... Then named this CTE recent_years. The SQL for the CTE is in brackets. It sums up housing sales in each city and in each year from 2002 onwards.

  2. We then query that CTE (called recent_years) directly after it by selecting three Texan cities and sorting the result.

query <- DBI::dbSendQuery(con, 
                        "
WITH recent_years 
     AS (SELECT city, 
                year, 
                Sum(sales) AS sales 
         FROM   txhousing 
         WHERE  year >= 2002 
         GROUP  BY city, 
                   year) 
SELECT * 
FROM   recent_years 
WHERE  city IN ( 'Abilene', 'Arlington', 'Kerrville' ) 
ORDER  BY city, 
          year;                         
                        ")

Another CTE example

Let’s look at another example of a CTE on the same data table.

  1. We first find the three cities with the biggest percentage change in median house prices in a CTE we have call top3_change.

  2. The CTE top3_change is then joined (with an inner join) to the original txhousing sales table. Using an inner join will select only the cities in both tables, reducing the data to the top 3 citites.

query <- DBI::dbSendQuery(con, 
                        "
WITH top3_change AS 
( 
         SELECT   city, 
                  (Max(median) - Min(median))/ Min(median) AS change 
         FROM     txhousing 
         GROUP BY city 
         ORDER BY change DESC limit 3) 
SELECT     a.*, 
           b.change 
FROM       txhousing   AS a 
INNER JOIN top3_change AS b 
ON         a.city = b.city;

                        ")

Here is the result of that SQL query.