Subquery in SQL

A subquery is what the name suggests: a query within another query. The true inception of SQL. Subqueries are often used when you need to process data in several steps. For the majority of subqueries you’ll see in actual practice, the inner query will execute first and pass its result to the outer query it's nested in.

Subqueries are usually contrasted with Common Table Expressions (CTEs) as they have similar use cases. Unlike CTEs, which are usually separate SELECT statements within a query, subqueries are usually SELECT statements nested within a JOIN, FROM, or WHERE statement in a query.

To be honest, we rarely write subqueries here at dbt Labs since we prefer to use CTEs. We find that CTEs, in general, support better query readability, organization, and debugging. However, subqueries are a foundational concept in SQL and still widely used. We hope you can use this glossary to better understand how to use subqueries and how they differ from CTEs.

Subquery syntax

While there are technically several types of subqueries, the general syntax to build them is the same. A subquery usually consists of the following:

Enclosing parentheses
A name
An actual SELECT statement
A main query it is nested in via a FROM, WHERE, or JOIN clause

Let’s take this to an example, using the sample jaffle_shop dataset.

select customer_id, count(order_id) as cnt_orders
  from (
        select * from {{ ref('orders') }}
       ) all_orders
group by 1

Given the elements of subqueries laid out in the beginning, let’s break down this example into its respective parts.

Subquery elements	Example
Enclosing parentheses	✅
Subquery name	`all_orders`
`SELECT` statement	`select * from {{ ref('orders') }}`
Main query it is nested in	`select customer_id, count(order_id) as cnt_orders from all_orders group by 1`

When this query is actually executed, it will start by running the innermost query first. In this case, it would run select * from {{ ref('orders') }} first. Then, it would pass those results to the outer query, which is where you grab the count of orders by customer_id.

If you want to learn more about what a `ref` is, check out our documentation on it.

This is a relatively straightforward example, but should hopefully show you that subqueries start off like most other queries. As you nest more subqueries together, that’s when you unearth the power of subqueries, but also when you start to notice some readability tradeoffs. If you are using subqueries regularly, you'll want to leverage indenting and strong naming conventions for your subqueries to clearly distinguish code functionality.

Types of subqueries

In your day-to-day, you won’t normally formalize the names of the different types of subqueries you can write, but when someone uses the term “correlated subquery” at a data conference, you'll want to know what that means!

Nested subqueries

Nested subqueries are subqueries like the one you saw in the first example: a subquery where the inner query is executed first (and once) and passes its result to the main query. The majority of subqueries you will see in the real world are likely to be a nested subquery. These are most useful when you need to process data in multiple steps.

Debugging subqueries tip

It’s important to note that since the inner query is executed first in a nested subquery, the inner query must be able to execute by itself. If it’s unable to successfully run independently, it cannot pass results to the outer query.

Correlated subqueries

A correlated subquery is a nested subquery’s counterpart. If nested subqueries execute the inner query first and pass their result to the outer query, correlated subqueries execute the outer query first and pass their result to their inner query. For correlated subqueries, it’s useful to think about how the code is actually executed.

In a correlated subquery, the outer query will execute row-by-row. For each row, that result from the outer query will be passed to the inner query. Compare this to nested queries: in a nested query, the inner query is executed first and only once before being passed to the outer query.

These types of subqueries are most useful when you need to conduct analysis on a row-level.

Scalar and non-scalar subqueries

Scalar subqueries are queries that only return a single value. More specifically, this means if you execute a scalar subquery, it would return one column value of one specific row. Non-scalar subqueries, however, can return single or multiple rows and may contain multiple columns.

You may want to use a scalar subquery if you’re interested in passing only a single-row value into an outer query. This type of subquery can be useful when you’re trying to remove or update a specific row’s value using a Data Manipulation Language (DML) statement.

Subquery examples

You may often see subqueries in joins and DML statements. The following sections contain examples for each scenario.

Subquery in a join

In this example, you want to get the lifetime value per customer using your raw_orders and raw_payments table. Let’s take a look at how you can do that with a subquery in a join:

select
 
    orders.user_id, 
    sum(payments.amount) as lifetime_value
 
from {{ ref('raw_orders') }} as orders
left join (

    select

        order_id,
        amount

    from {{ ref('raw_payments') }}

) all_payments
on orders.id = payments.order_id
group by 1

Similar to what you saw in the first example, let’s break down the elements of this query.

Subquery elements	Example
Enclosing parentheses	✅
Subquery name	`all_payments`
`SELECT` statement	`select order_id, amount from {{ ref('raw_payments') }}`
Main query it is nested in	`select orders.user_id, sum(payments.amount) as lifetime_value from {{ ref('raw_orders') }} as orders...`

In this example, the all_payments subquery will execute first. you use the data from this query to join on the raw_orders table to calculate lifetime value per user. Unlike the first example, the subquery happens in the join statement. Subqueries can happen in JOIN, FROM, and WHERE clauses.

Subquery in a DML command

You may also see subqueries used in DML commands. As a jogger, DML commands are a series of SQL statements that you can write to access and manipulate row-level data in database objects. Oftentimes, you’ll want to use a query result in a qualifying WHERE clause to only delete, update, or manipulate certain rows of data.

In the following example, you'll attempt to update the status of certain orders based on the payment method used in the raw_payments table.

UPDATE raw_orders
set status = 'returned'
where order_id in (
select order_id 
from raw_payments 
where payment_method  = 'bank_transfer')

Subquery vs CTE

A subquery is a nested query that can oftentimes be used in place of a CTE. Subqueries have different syntax than CTEs, but often have similar use cases. The content won’t go too deep into CTEs here, but it’ll highlight some of the main differences between CTEs and subqueries below.

CTE	Subquery
Typically more readable since CTEs can be used to give structure to your query	Typically less readable, especially if there are many nested queries
Reusable in the same query	Must declare the subquery everytime it is used in a query
Allows for recursiveness	Does not allow for recursiveness
CTEs must have unique CTE_EXPRESSION_NAMES when used in a query	Subqueries don’t always have to be explicitly named
CTEs cannot be used in a `WHERE` clause	Subqueries can be used in a `WHERE` clause

Subquery vs CTE example

The following example demonstrates the similarities and differences between subqueries and CTEs. Using the first subquery example, you can compare how you would perform that query using subquery or a CTE:

Subquery example
CTE example

select customer_id, count(order_id) as cnt_orders
  from (

        select * from {{ ref('orders') }}

       ) all_orders
group by 1

with all_orders as (

select * from {{ ref('orders') }}

),
aggregate_orders as (

	select

		customer_id,
		count(order_id) as cnt_orders

	from all_orders
	group by 1

)
select * from aggregate_orders

While the code for the query involving CTEs may be longer in lines, it also allows us to explicitly define code functionality using the CTE name. Unlike the subquery example that executes its inner query and then the outer query, the query using CTEs executes moving down the code.

Again, choosing to use CTEs over subqueries is a personal choice. It may help to write out the same code functionality in a subquery and with CTEs and see what is more understandable to you.

Data warehouse support for subqueries

Subqueries are likely to be supported across most, if not all, modern data warehouses. Please use this table to see more information about using subqueries in your specific data warehouse.

Data warehouse	Supports subqueries?
Snowflake	✅
Amazon Redshift	✅
Google BigQuery	✅
Databricks	✅
Postgres	✅

Conclusion

I’m going to be honest, I was hesitant to start writing the glossary page for SQL subqueries. As someone who has been using CTEs almost exclusively in their data career, I was intimidated by this concept. However, I am excited to say: Subqueries are not as scary as I expected them to be!

At their core, subqueries are nested queries within a main query. They are often implemented in FROM, WHERE, and JOIN clauses and are used to write code that builds on itself. Despite the fact that subqueries are SQL like any other query, it is important to note that subqueries can struggle in their readability, structure, and debugging process due to their nested nature. Because of these downsides, we recommend leveraging CTEs over subqueries whenever possible.

I have not been made a subquery convert, but I’m walking away from this a little less intimidated by subqueries and I hope you are too.

Subquery syntax​

Types of subqueries​

Nested subqueries​

Correlated subqueries​

Scalar and non-scalar subqueries​

Subquery examples​

Subquery in a join​

Subquery in a DML command​

Subquery vs CTE​

Subquery vs CTE example​

Data warehouse support for subqueries​

Conclusion​

Further reading​