In this lesson, we will see how SQL subqueries can be incorporated within queries.
Side note: We are going to mention the possible uses of JOINS. So, if you are not familiar with the subject, you can check out the following set of tutorials, beginning with the INNER JOIN.
As their name suggests, subqueries are queries embedded within a query. They are also called inner queries or nested queries, and they are part of another query, called an outer query. Alternative names for these SQL features are inner select and outer select, respectively.
Subqueries can be applied in many ways. Nevertheless, the main idea is the same.
Side note: You can see how to load the databases we are using, in the linked tutorial.
A Case in Point
Most often, SQL subqueries are employed in the WHERE clause of a SELECT statement. Let’s see what will happen after we run the following code:
SELECT
*
FROM
dept_manager;
We retrieved 24 records, which means 24 people currently work in the enterprise as managers. If our database is in perfect condition, these same 24 people must be registered in the “employees” table, where we can find different information about them.
Why We Need SQL Subqueries
So, assume you are interested in obtaining the names of all employees working within the enterprise as managers. Keep in mind that names can be found only in the “employees” table, not in the “dept_manager” table.
There are a few ways you could obtain the desired result. We will try one involving the use of SQL subqueries, as it will allow us to extract the required information from the first table, so long as the same employees have been registered in the “dept_manager” table.
The Syntax
We select the first and last name from the “employees” table for the same employee numbers that can be found in the “dept_manager” table. The last condition can be set through the WHERE clause and the IN operator. It will enlist all employee numbers recorded in the “dept_manager” table. In programming language, it will look like this:
SELECT
e.first_name, e.last_name
FROM
employees e
WHERE e.emp_no IN (SELECT
dm.emp_no
FROM
dept_manager dm);
After we execute this query, we will be able to verify that the data about all 24 managers has been stored in the “employees” table as well.
Analysis
Let’s analyse this example. First, the outer query is the entire code, starting from SELECT first name, last name, and so on, until the IN operator.
The subquery, or the inner query, that is nested within the outer query, is what is underlined in the picture below.
Important: A subquery should always be placed within parentheses.
In this case, we should make no exception to this rule as well.
The Process
From a processing point of view, the SQL engine starts by running the inner query. Then it uses its returned output, which is intermediate, to execute the outer query.
In general, a subquery may return a single value (a scalar), a single row, a single column, or an entire table. In our case, it delivers the entire “emp_no” column from the “dept_manager” table.
Actually, to prove this is the case, we could run the subquery as a stand-alone query.
So, we will simply write:
SELECT
dm.emp_no
FROM
dept_manager dm;
As you can see in the picture above, we obtained the “emp_no” column the way we should have.
Multiple Nested SQL Subqueries
Although we will not dive into tasks that are too complex, please remember you can have a lot more than one subquery in your outer query. It is possible to nest inner queries within other inner queries, thus allowing your analysis to achieve significant depth. In that case, the SQL engine would execute the innermost query first, and then each subsequent query, until it runs the outermost query last.
What Does the EXISTS Operator Do?
There is another logical operator we need to introduce, as it is related strictly to using SQL subqueries.
The EXISTS operator checks whether certain row values are found within a subquery. This check is conducted row by row. It returns a Boolean value. If a row value of a subquery exists, the operator returns TRUE and the corresponding record of the outer query is extracted. Otherwise, EXISTS returns FALSE, and no row value from the outer query is extracted.
An Example
Try writing the following code:
SELECT
e.first_name, e.last_name
FROM
employees e
WHERE EXISTS( SELECT
*
FROM
dept_manager dm
WHERE
dm.emp_no = e.emp_no);
It will deliver all first and last names of the people in the “employees” table who are also found in the “dept_manager” table. As a matter of fact, we’ll create a whole table, not just a column, as we did with the IN operator.
After running the query, the output contains 24 rows, as shown in the picture below.
EXISTS vs IN
Obviously, the EXISTS and the IN operators resemble each other when applied to SQL subqueries. Here is the substantial difference between the two.
It is noticeable, particularly when working on large web, mobile, or business applications, where visualizing data quickly is crucial.
EXISTS tests row values for existence, while IN searches among values. Hence, EXISTS is quicker in retrieving large amounts of data, while IN is faster with smaller datasets.
ORDER BY in Nested Queries
Let me make a short side note about using ORDER BY in the context of nested queries. In MySQL, you can order the result list by “emp_no” and insert the ORDER BY clause within the subquery.
The code should look like this:
SELECT
e.first_name, e.last_name
FROM
employees e
WHERE
EXISTS( SELECT
*
FROM
dept_manager dm
WHERE
dm.emp_no = e.emp_no
ORDER BY emp_no);
Of course, the query will run properly – the outer query will simply follow the order of the inner query.
The Better Way to Do It
However, it is more professional to apply ORDER BY in the outer query.
Just like this:
SELECT
e.first_name, e.last_name
FROM
employees e
WHERE
EXISTS( SELECT
*
FROM
dept_manager dm
WHERE
dm.emp_no = e.emp_no)
ORDER BY emp_no;
So, please try to get used to leaving this clause in the outer select. It is more acceptable logically to sort the final version of your dataset and not versions preceding the final one.
The Alternative
Be aware that some, though not all, nested queries can be rewritten using JOINs, which are more efficient in general. This is true particularly for inner queries using the WHERE clause. Although sometimes the same answers can be obtained with a join, nested queries are considered an essential tool, and that’s why you must learn them.
What Are the Benefits of SQL Subqueries?
On certain occasions, subqueries can be bad for performance concerns, but here’s why people would still need to use them.
First, they allow for better structuring of the outer query. Thus, each inner query can be thought of in isolation, and organizing the extraction process can be improved. Hence the name of SQL – Structured Query Language!
Second, in some situations, the use of subqueries is much more intuitive compared to the use of complex joins and unions.
Finally, many users prefer subqueries simply because they offer enhanced code readability.
What’s Next?
So, if you didn’t know, there is a tool in SQL that allows you to join a table with itself. Feel free to embark on the journey where we will be discussing the SELF JOIN.
***
Eager to hone your SQL skills? Check out our SQL course.
Next Tutorial: Use the SELF JOIN in SQL