How to write subqueries in SQL and why you shouldn’t
Structured Query Language (SQL) is the bedrock of relational database management and is widely used to manipulate and retrieve data. One of its most flexible and powerful features is the subquery—a query nested inside another SQL query. Subqueries allow for modular, expressive, and readable statements that solve complex problems. However, with that power comes responsibility: when used improperly, subqueries can lead to performance bottlenecks, unexpected results, and maintenance challenges.
This article provides a comprehensive look into how to write SQL subqueries effectively, the various types of subqueries, and the performance implications you should be aware of when using them.
Learn SQL online
What is a SQL Subquery?
A subquery—also known as an inner query or nested query—is a complete SQL statement embedded within another SQL query (the outer query). Subqueries are typically used to:
Compute intermediary results for filtering
Populate dynamic values
Construct derived tables
Replace multi-step procedural logic
They are enclosed in parentheses and can appear in various parts of the main SQL statement, including the SELECT
, FROM
, WHERE
, and HAVING
clauses.
Types of Subqueries
Subqueries come in multiple flavors, and understanding their types is essential for writing optimized SQL:
Scalar Subquery
Returns exactly one value—one row and one column. If more than one row is returned, most RDBMS will throw an error.
SELECT employee_id,
(SELECT department_name
FROM departments
WHERE departments.department_id = employees.department_id) AS dept_name
FROM employees;
Scalar subqueries are often used in the SELECT
list or in the WHERE
clause for comparison purposes.
Column Subquery (Single-Column, Multi-Row)
Returns a single column with potentially multiple rows. Often used with IN
, ANY
, ALL
, or set operations.
SELECT product_name, price
FROM products
WHERE category_id IN (
SELECT category_id
FROM categories
WHERE category_name LIKE '%Electronics%'
);
Row Subquery
Returns a single row with multiple columns. Used for row-wise comparisons.
SELECT employee_id, employee_name
FROM employees
WHERE (department_id, job_id) = (
SELECT department_id, job_id
FROM employees
WHERE employee_id = 101
);
Table Subquery (Derived Table)
Returns a full result set (multiple rows and columns). Used in the FROM
clause to define temporary, inline views.
SELECT dept_id, COUNT(*) AS high_earners
FROM (
SELECT department_id AS dept_id
FROM employees
WHERE salary > 100000
) AS high_paid_employees
GROUP BY dept_id;
Writing Subqueries in Practice
Subqueries in the SELECT Clause
Useful for value derivation or aggregations per row.
SELECT employee_id,
(SELECT MAX(salary) FROM employees) AS max_salary
FROM employees;
Subqueries in the WHERE Clause
Ideal for filtering based on complex conditions.
SELECT first_name, last_name
FROM employees
WHERE department_id = (
SELECT department_id
FROM departments
WHERE department_name = 'Engineering'
);
Subqueries in the FROM Clause
Effective for modularizing large queries
SELECT temp.department_id, AVG(temp.salary) AS avg_salary
FROM (
SELECT department_id, salary
FROM employees
WHERE job_id = 'DEV'
) AS temp
GROUP BY temp.department_id;
Performance Considerations
While subqueries are a powerful abstraction tool, they are not always the most efficient choice. Their performance impact depends on how they are written, where they are placed, and how the query optimizer interprets them.
Repeated Execution of Correlated Subqueries
Correlated subqueries refer to columns in the outer query. These are re-evaluated for every row processed in the outer query, which can lead to significant performance degradation.
Example:
SELECT employee_id, first_name
FROM employees e
WHERE salary > (
SELECT AVG(salary)
FROM employees
WHERE department_id = e.department_id
);
In the example above, the subquery must be executed once for every employee in the employees
table—potentially thousands of times. This is highly inefficient on large datasets.
Alternative (using JOIN):
WITH dept_avg AS (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
)
SELECT e.employee_id, e.first_name
FROM employees e
JOIN dept_avg d ON e.department_id = d.department_id
WHERE e.salary > d.avg_salary;
sing a JOIN
or a CTE
can reduce execution from O(n²) to O(n), making it dramatically more efficient.
Lack of Index Utilization
Subqueries, particularly correlated ones, can interfere with the database’s ability to utilize indexes. This is because the optimizer may not always be able to push down predicates into the subquery, limiting the ability to prune large portions of data early.
Materialization of Derived Tables
When using subqueries in the FROM
clause, many databases materialize the derived table as a temporary structure. If the subquery returns a large dataset, this materialization can consume a lot of memory or disk I/O, degrading performance.
Best Practices and Alternatives
To ensure efficient use of subqueries, consider the following recommendations:
✅ Prefer Joins Over Correlated Subqueries
If a correlated subquery can be rewritten as a join, do it. Joins are typically better optimized by the SQL engine and take advantage of indexes and execution plans more effectively.
✅ Use CTEs for Clarity and Reusability
Common Table Expressions (WITH
clause) allow subqueries to be modular and reusable.
WITH high_paid AS (
SELECT employee_id, department_id
FROM employees
WHERE salary > 100000
)
SELECT d.department_name, COUNT(*) AS high_paid_count
FROM high_paid h
JOIN departments d ON h.department_id = d.department_id
GROUP BY d.department_name;
✅ Replace IN
with EXISTS
for Large Subsets
If the subquery returns a large number of rows, replacing IN
with EXISTS
can improve performance.
-- Using IN
SELECT first_name
FROM employees
WHERE department_id IN (
SELECT department_id
FROM departments
WHERE location_id = 1700
);
-- Using EXISTS
SELECT first_name
FROM employees e
WHERE EXISTS (
SELECT 1
FROM departments d
WHERE d.location_id = 1700 AND d.department_id = e.department_id
);
✅ Analyze Execution Plans
Always use the EXPLAIN
or EXPLAIN ANALYZE
feature in your RDBMS to understand how the database plans to execute your subqueries. This provides visibility into whether indexes are used, if the subquery is being executed multiple times, or if a full table scan is occurring.
Conclusion
SQL subqueries are indispensable tools for solving many complex data manipulation tasks. They promote modular thinking, encapsulation of logic, and expressive querying. However, like all powerful tools, they must be used judiciously. Misuse—especially in the form of correlated subqueries and poorly written derived tables—can lead to serious performance issues, particularly as data volumes grow.
By understanding the different types of subqueries, their appropriate use cases, and performance trade-offs, you can write SQL that is not only correct but also efficient and scalable.
Please see the other SQL video learning articles below as part of this learning SQL online course video.
If you would like an instructor lead SQL course that goes into more detail with worked examples then we also provide a video course to learn SQL online hosted on UDEMY.