If you are already familiar with the HAVING clause, you may find yourself in a situation and not know whether to use WHERE or HAVING. You may even be asking yourself – if we can use WHERE, why does the HAVING clause even exist?
The HAVING clause is frequently implemented with GROUP BY because it refines the output from records that do not satisfy a certain condition.
Internalizing the corresponding syntax will help us explain the difference between the two keywords, so let’s begin there.
HAVING needs to be inserted between the GROUP BY and ORDER BY clauses. In addition, HAVING is like WHERE but applied to the GROUP BY block.
The difference can be better understood with an example.
Side note: We will be working with the ‘employees’ database, so if you haven’t downloaded it, make sure to check out our tutorial explaining it.
Providing a Case in Point
First, on some occasions, an identical result could be obtained by implementing the same condition, either with the WHERE or HAVING clause.
For instance, should we select all employees hired after the 1st of January 2000, the retrieved table will be the same, whether we are using WHERE or HAVING!
If we are using WHERE, the code will be:
SELECT * FROM employees WHERE hire_date >= ‘2000-01-01’;
And if we decide to use the HAVING clause, we can simply replace WHERE:
SELECT * FROM employees HAVING hire_date >= ‘2000-01-01’;
The results are the same, as shown in the picture below.
The Main Advantage of the HAVING Clause over the WHERE Clause
Well, the main distinction between the two clauses is that HAVING can be applied for subsets of aggregated groups, while in the WHERE block, this is forbidden. In simpler words, after HAVING, we can have a condition with an aggregate function, while WHERE cannot use aggregate functions within its conditions.
Assume we want to extract a list with all first names that appear more than 250 times in the “employees” table.
If we try to set this condition in the WHERE clause, Workbench wouldn’t indicate there is a mistake in our code, because this is the correct syntax.
We will be shown an error message when we try to execute the query. And it will be a very eloquent one: invalid use of group function.
Fixing the Error
We can change the keyword to HAVING and add the line of code in the right place, just after the GROUP BY statement.
SELECT first_name, COUNT(first_name) as names_count FROM employees GROUP BY first_name HAVING COUNT(first_name) > 250 ORDER BY first_name;
Now, let’s re-run the query.
We retrieved 193 names that can be encountered more than 250 times in the “employees” table.
WHERE or HAVING?
So, when is the right time to use WHERE and when should you use HAVING?
It is not so hard to decide. Let’s go over the problem we just solved again, “Extract all first names that appear more than 250 times in the “employees” table”. You must first spot the phrase “250 times”. It leads to counting the number of times something appears in the data table. And counting numbers is executed through the COUNT() function, isn’t it?
COUNT() is an aggregate function.
You must use HAVING, not WHERE.
It is as simple as that. The same logic must be applied anytime an aggregate function is required for the solution of your task.
Summary of WHERE and HAVING
It is important to decide whether to use WHERE or HAVING in certain situations
WHERE allows us to set conditions that refer to subsets of individual rows. These conditions are applied before re-organizing the output into groups.
Once the rows that satisfy the WHERE conditions are chosen, they progress in the data retrieval process and can be grouped by distinct values recorded in a certain field or fields.
It is not until this moment, when the output can be further improved, or filtered, with a condition specified in the HAVING clause.
Finally, you could sort the records of the final list through the ORDER BY clause.
Solving a Task
To reinforce your understanding of the data retrieval process, let’s see an example containing both a WHERE and a HAVING condition.
The task is to extract a list of all names that are encountered less than 200 times. Let the data refer to people hired after the 1st of January 1999 only.
Let’s create the query, step-by-step.
Apparently, we must select the first names and the number of times a first name is encountered, renaming the second selection as “names_count”.
The second thing to do is designate the table we will retrieve data from – “employees”.
The code will look like this:
select first_name, count(first_name) as names_count from employees;
What to Use
Should we only use WHERE or HAVING, or both keywords, while setting our conditions?
Well, there are two conditions to satisfy to solve our problem.
One is that the names must be encountered less than 200 times. “200 times” immediately means we must use COUNT(), which will count the number of times a certain first name appears in the “employees” data table.
COUNT() is an aggregate function and, as we said earlier, so it must go with HAVING.
The other condition to satisfy is general: “All the rows extracted must be of people who were hired after the 1st of January 1999.” This condition refers to all individual rows in the “employees” table. No specific aggregate function must be applied. Therefore, this condition must go with the WHERE clause.
Important: Between the WHERE and the HAVING blocks, we shouldn’t forget to insert the GROUP BY segment. We must group by “first_name”, not by some other field, since our task requires us to aggregate our output by the number of times a certain first name is encountered.
Actually, let’s order the output by the “first_name” in descending order.
The Final Solution
The whole query can be written in the following way:
SELECT first_name, COUNT(first_name) AS names_count FROM employees WHERE hire_date > ‘1999-01-01’ GROUP BY first_name HAVING COUNT(first_name) < 200 ORDER BY first_name DESC;
The query worked and solved our problem – we have a list with the number of all distinct names of people hired after the 1st of January 1999.
A Few Observations
The function “COUNT(first_name)” has been applied twice – once in the SELECT statement and once in the HAVING block.
On the screen, the two phrases seem distant from one another. However, in terms of logic, they are close. Normally, you would like the aggregate function you have attached to a specific condition to be seen in your output, too. So, it makes sense to see the same phrase twice.
On the other hand, “first_name” appears in the query three times. Once, because you group by this field. A second time because grouping by a field requires stating the column name in the SELECT block. And the third time, at the bottom of the query, because we order the data by the same selection.
Looking out for the Order
It might seem you are using “first_name” and COUNT() too often within the query. Well, this is fine. This is not always going to be the case, but it is okay if it happens. Therefore, you must master the query structure well; otherwise, you will easily fall into traps, inducing you to make mistakes while coding.
Another Peculiarity of the HAVING Keyword
In addition, as an exercise, you could try to place the “hire_date” condition in the HAVING clause, instead of in the WHERE clause, but you’ll get an error.
You saw HAVING can contain such a condition as “hire_date” greater than the 1st of January 2000.” In another query, it worked properly with an aggregate function:
Important: Both situations are feasible for MySQL, but you cannot have both an aggregated and a non-aggregated condition in the HAVING clause! Although this seems to be a minor detail, sometimes, it will be important for you to know this. Don’t mix the conditions in the HAVING block. This is a pitfall you should be aware of, and that’s why we shared it with you!
When to Use the HAVING Clause
To sum up this tutorial, remember the simple rule:
If you need to work with aggregate functions, use GROUP BY and HAVING. And if you need to apply general conditions, use WHERE.
You will agree it seems there are too many clauses and tricks to remember about the SELECT statement, right? SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, and their corresponding features.
Don’t worry – as you read more tutorials and practice in the exercises attached to the lessons, you will start using these statements of the data manipulation language freely.
So, it may come to you as no surprise that there are other powerful statements in SQL. Feel free to dive into the unknown world of the LIMIT statement.
Next Tutorial: The LIMIT Statement