Skip to main content

Subquery in SQL

A subquery is what the name suggests: a query within another query. _The true inception of SQL_. Subqueries are often used when you need to process data in several steps. For the majority of subqueries you’ll see in actual practice, the inner query will execute first and pass its result to the outer query it's nested in.

Subqueries are usually contrasted with Common Table Expressions (CTEs) as they have similar use cases. Unlike CTEs, which are usually separate SELECT statements within a query, subqueries are usually SELECT statements nested within a JOIN, FROM, or WHERE statement in a query.

To be honest, we rarely write subqueries here at dbt Labs since we prefer to use CTEs. We find that CTEs, in general, support better query readability, organization, and debugging. However, subqueries are a foundational concept in SQL and still widely used. We hope you can use this glossary to better understand how to use subqueries and how they differ from CTEs.

Subquery syntax

While there are technically several types of subqueries, the general syntax to build them is the same. A subquery usually consists of the following:

  • Enclosing parentheses
  • A name
  • An actual SELECT statement
  • A main query it is nested in via a FROM, WHERE, or JOIN clause

Let’s take this to an example, using the sample jaffle_shop dataset.

select customer_id, count(order_id) as cnt_orders
from (
select * from {{ ref('orders') }}
) all_orders
group by 1

Given the elements of subqueries laid out in the beginning, let’s break down this example into its respective parts.

Subquery elementsExample
Enclosing parentheses
Subquery nameall_orders
SELECT statementselect * from {{ ref('orders') }}
Main query it is nested inselect customer_id, count(order_id) as cnt_orders from all_orders group by 1

When this query is actually executed, it will start by running the innermost query first. In this case, it would run select * from {{ ref('orders') }} first. Then, it would pass those results to the outer query, which is where you grab the count of orders by customer_id.

If you want to learn more about what a `ref` is, check out our documentation on it.

This is a relatively straightforward example, but should hopefully show you that subqueries start off like most other queries. As you nest more subqueries together, that’s when you unearth the power of subqueries, but also when you start to notice some readability tradeoffs. If you are using subqueries regularly, you'll want to leverage indenting and strong naming conventions for your subqueries to clearly distinguish code functionality.

Types of subqueries

In your day-to-day, you won’t normally formalize the names of the different types of subqueries you can write, but when someone uses the term “correlated subquery” at a data conference, you'll want to know what that means!

Nested subqueries

Nested subqueries are subqueries like the one you saw in the first example: a subquery where the inner query is executed first (and once) and passes its result to the main query. The majority of subqueries you will see in the real world are likely to be a nested subquery. These are most useful when you need to process data in multiple steps.

Debugging subqueries tip

It’s important to note that since the inner query is executed first in a nested subquery, the inner query must be able to execute by itself. If it’s unable to successfully run independently, it cannot pass results to the outer query.

Correlated subqueries

A correlated subquery is a nested subquery’s counterpart. If nested subqueries execute the inner query first and pass their result to the outer query, correlated subqueries execute the outer query first and pass their result to their inner query. For correlated subqueries, it’s useful to think about how the code is actually executed.

In a correlated subquery, the outer query will execute row-by-row. For each row, that result from the outer query will be passed to the inner query. Compare this to nested queries: in a nested query, the inner query is executed first and only once before being passed to the outer query.

These types of subqueries are most useful when you need to conduct analysis on a row-level.

Scalar and non-scalar subqueries

Scalar subqueries are queries that only return a single value. More specifically, this means if you execute a scalar subquery, it would return one column value of one specific row. Non-scalar subqueries, however, can return single or multiple rows and may contain multiple columns.

You may want to use a scalar subquery if you’re interested in passing only a single-row value into an outer query. This type of subquery can be useful when you’re trying to remove or update a specific row’s value using a Data Manipulation Language (DML) statement.

Subquery examples

You may often see subqueries in joins and DML statements. The following sections contain examples for each scenario.

Subquery in a join

In this example, you want to get the lifetime value per customer using your raw_orders and raw_payments table. Let’s take a look at how you can do that with a subquery in a join:

select

orders.user_id,
sum(payments.amount) as lifetime_value

from {{ ref('raw_orders') }} as orders
left join (

select

order_id,
amount

from {{ ref('raw_payments') }}

) all_payments
on orders.id = payments.order_id
group by 1

Similar to what you saw in the first example, let’s break down the elements of this query.

Subquery elementsExample
Enclosing parentheses
Subquery nameall_payments
SELECT statementselect order_id, amount from {{ ref('raw_payments') }}
Main query it is nested inselect orders.user_id, sum(payments.amount) as lifetime_value from {{ ref('raw_orders') }} as orders...

In this example, the all_payments subquery will execute first. you use the data from this query to join on the raw_orders table to calculate lifetime value per user. Unlike the first example, the subquery happens in the join statement. Subqueries can happen in JOIN, FROM, and WHERE clauses.

Subquery in a DML command

You may also see subqueries used in DML commands. As a jogger, DML commands are a series of SQL statements that you can write to access and manipulate row-level data in database objects. Oftentimes, you’ll want to use a query result in a qualifying WHERE clause to only delete, update, or manipulate certain rows of data.

In the following example, you'll attempt to update the status of certain orders based on the payment method used in the raw_payments table.

UPDATE raw_orders
set status = 'returned'
where order_id in (
select order_id
from raw_payments
where payment_method = 'bank_transfer')

Subquery vs CTE

A subquery is a nested query that can oftentimes be used in place of a CTE. Subqueries have different syntax than CTEs, but often have similar use cases. The content won’t go too deep into CTEs here, but it’ll highlight some of the main differences between CTEs and subqueries below.

CTESubquery
Typically more readable since CTEs can be used to give structure to your queryTypically less readable, especially if there are many nested queries
Reusable in the same queryMust declare the subquery everytime it is used in a query
Allows for recursivenessDoes not allow for recursiveness
CTEs must have unique CTE_EXPRESSION_NAMES when used in a querySubqueries don’t always have to be explicitly named
CTEs cannot be used in a WHERE clauseSubqueries can be used in a WHERE clause

Subquery vs CTE example

The following example demonstrates the similarities and differences between subqueries and CTEs. Using the first subquery example, you can compare how you would perform that query using subquery or a CTE:

select customer_id, count(order_id) as cnt_orders
from (

select * from {{ ref('orders') }}

) all_orders
group by 1

While the code for the query involving CTEs may be longer in lines, it also allows us to explicitly define code functionality using the CTE name. Unlike the subquery example that executes its inner query and then the outer query, the query using CTEs executes moving down the code.

Again, choosing to use CTEs over subqueries is a personal choice. It may help to write out the same code functionality in a subquery and with CTEs and see what is more understandable to you.

Data warehouse support for subqueries

Subqueries are likely to be supported across most, if not all, modern data warehouses. Please use this table to see more information about using subqueries in your specific data warehouse.

Data warehouseSupports subqueries?
Snowflake
Amazon Redshift
Google BigQuery
Databricks
Postgres

Conclusion

I’m going to be honest, I was hesitant to start writing the glossary page for SQL subqueries. As someone who has been using CTEs almost exclusively in their data career, I was intimidated by this concept. However, I am excited to say: Subqueries are not as scary as I expected them to be!

At their core, subqueries are nested queries within a main query. They are often implemented in FROM, WHERE, and JOIN clauses and are used to write code that builds on itself. Despite the fact that subqueries are SQL like any other query, it is important to note that subqueries can struggle in their readability, structure, and debugging process due to their nested nature. Because of these downsides, we recommend leveraging CTEs over subqueries whenever possible.

I have not been made a subquery convert, but I’m walking away from this a little less intimidated by subqueries and I hope you are too.

Further reading

Please check out some of our favorite readings related to subqueries!

0