How to write subqueries in SQL and why you shouldn’t

How to write subqueries in SQL and why you shouldn’t

Structured Query Language (SQL) is the bedrock of relational database management and is widely used to manipulate and retrieve data. One of its most flexible and powerful features is the subquery—a query nested inside another SQL query. Subqueries allow for modular, expressive, and readable statements that solve complex problems. However, with that power comes responsibility: when used improperly, subqueries can lead to performance bottlenecks, unexpected results, and maintenance challenges.

This article provides a comprehensive look into how to write SQL subqueries effectively, the various types of subqueries, and the performance implications you should be aware of when using them.

Learn SQL online

For our instructor lead, more in depth look at learning to write a SQL SELECT, SORT BY and WHERE clauses why not check out our video course hosted on UDEMY or enjoy the free SQL course below.

What is a SQL Subquery?

A subquery—also known as an inner query or nested query—is a complete SQL statement embedded within another SQL query (the outer query). Subqueries are typically used to:

  • Compute intermediary results for filtering

  • Populate dynamic values

  • Construct derived tables

  • Replace multi-step procedural logic

They are enclosed in parentheses and can appear in various parts of the main SQL statement, including the SELECT, FROM, WHERE, and HAVING clauses.

Types of Subqueries

Subqueries come in multiple flavors, and understanding their types is essential for writing optimized SQL:

Scalar Subquery

Returns exactly one value—one row and one column. If more than one row is returned, most RDBMS will throw an error.

				
					SELECT employee_id, 
       (SELECT department_name 
        FROM departments 
        WHERE departments.department_id = employees.department_id) AS dept_name
FROM employees;
				
			

Scalar subqueries are often used in the SELECT list or in the WHERE clause for comparison purposes.

Column Subquery (Single-Column, Multi-Row)

Returns a single column with potentially multiple rows. Often used with IN, ANY, ALL, or set operations.

				
					SELECT product_name, price
FROM products
WHERE category_id IN (
    SELECT category_id
    FROM categories
    WHERE category_name LIKE '%Electronics%'
);
				
			

Row Subquery

Returns a single row with multiple columns. Used for row-wise comparisons.

				
					SELECT employee_id, employee_name
FROM employees
WHERE (department_id, job_id) = (
    SELECT department_id, job_id
    FROM employees
    WHERE employee_id = 101
);
				
			

Table Subquery (Derived Table)

Returns a full result set (multiple rows and columns). Used in the FROM clause to define temporary, inline views.

				
					SELECT dept_id, COUNT(*) AS high_earners
FROM (
    SELECT department_id AS dept_id
    FROM employees
    WHERE salary > 100000
) AS high_paid_employees
GROUP BY dept_id;
				
			

Writing Subqueries in Practice

Subqueries in the SELECT Clause

Useful for value derivation or aggregations per row.

				
					SELECT employee_id,
       (SELECT MAX(salary) FROM employees) AS max_salary
FROM employees;
				
			

Subqueries in the WHERE Clause

Ideal for filtering based on complex conditions.

				
					SELECT first_name, last_name
FROM employees
WHERE department_id = (
    SELECT department_id
    FROM departments
    WHERE department_name = 'Engineering'
);
				
			

Subqueries in the FROM Clause

Effective for modularizing large queries

				
					SELECT temp.department_id, AVG(temp.salary) AS avg_salary
FROM (
    SELECT department_id, salary
    FROM employees
    WHERE job_id = 'DEV'
) AS temp
GROUP BY temp.department_id;
				
			

Performance Considerations

While subqueries are a powerful abstraction tool, they are not always the most efficient choice. Their performance impact depends on how they are written, where they are placed, and how the query optimizer interprets them.

Repeated Execution of Correlated Subqueries

Correlated subqueries refer to columns in the outer query. These are re-evaluated for every row processed in the outer query, which can lead to significant performance degradation.

Example:

				
					SELECT employee_id, first_name
FROM employees e
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = e.department_id
);
				
			

In the example above, the subquery must be executed once for every employee in the employees table—potentially thousands of times. This is highly inefficient on large datasets.

Alternative (using JOIN):

				
					WITH dept_avg AS (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
)
SELECT e.employee_id, e.first_name
FROM employees e
JOIN dept_avg d ON e.department_id = d.department_id
WHERE e.salary > d.avg_salary;
				
			

sing a JOIN or a CTE can reduce execution from O(n²) to O(n), making it dramatically more efficient.

Lack of Index Utilization

Subqueries, particularly correlated ones, can interfere with the database’s ability to utilize indexes. This is because the optimizer may not always be able to push down predicates into the subquery, limiting the ability to prune large portions of data early.

Materialization of Derived Tables

When using subqueries in the FROM clause, many databases materialize the derived table as a temporary structure. If the subquery returns a large dataset, this materialization can consume a lot of memory or disk I/O, degrading performance.

Best Practices and Alternatives

To ensure efficient use of subqueries, consider the following recommendations:

✅ Prefer Joins Over Correlated Subqueries

If a correlated subquery can be rewritten as a join, do it. Joins are typically better optimized by the SQL engine and take advantage of indexes and execution plans more effectively.

✅ Use CTEs for Clarity and Reusability

Common Table Expressions (WITH clause) allow subqueries to be modular and reusable.

				
					WITH high_paid AS (
    SELECT employee_id, department_id
    FROM employees
    WHERE salary > 100000
)
SELECT d.department_name, COUNT(*) AS high_paid_count
FROM high_paid h
JOIN departments d ON h.department_id = d.department_id
GROUP BY d.department_name;
				
			

✅ Replace IN with EXISTS for Large Subsets

If the subquery returns a large number of rows, replacing IN with EXISTS can improve performance.

				
					-- Using IN
SELECT first_name
FROM employees
WHERE department_id IN (
    SELECT department_id
    FROM departments
    WHERE location_id = 1700
);

-- Using EXISTS
SELECT first_name
FROM employees e
WHERE EXISTS (
    SELECT 1
    FROM departments d
    WHERE d.location_id = 1700 AND d.department_id = e.department_id
);
				
			

✅ Analyze Execution Plans

Always use the EXPLAIN or EXPLAIN ANALYZE feature in your RDBMS to understand how the database plans to execute your subqueries. This provides visibility into whether indexes are used, if the subquery is being executed multiple times, or if a full table scan is occurring.

Conclusion

SQL subqueries are indispensable tools for solving many complex data manipulation tasks. They promote modular thinking, encapsulation of logic, and expressive querying. However, like all powerful tools, they must be used judiciously. Misuse—especially in the form of correlated subqueries and poorly written derived tables—can lead to serious performance issues, particularly as data volumes grow.

By understanding the different types of subqueries, their appropriate use cases, and performance trade-offs, you can write SQL that is not only correct but also efficient and scalable.

Please see the other SQL video learning articles below as part of this learning SQL online course video. 

If you would like an instructor lead SQL course that goes into more detail with worked examples then we also provide a video course to learn SQL online hosted on UDEMY. 

Learning SQL programming with SQL server
Learn SQL Online