Deciding When to Use WHERE or HAVING

SQL Tutorials 10 min read
where or having

Deciding When to Use WHERE or HAVING

10 min read
Blog / SQL Tutorials / Deciding When to Use WHERE or HAVING

If you are already familiar with the HAVING clause, you may find yourself in a situation and not know whether to use WHERE or HAVING. You may even be asking yourself – if we can use WHERE, why does the HAVING clause even exist?

The HAVING clause is frequently implemented with GROUP BY because it refines the output from records that do not satisfy a certain condition.

Internalizing the corresponding syntax will help us explain the difference between the two keywords, so let’s begin there.

HAVING needs to be inserted between the GROUP BY and ORDER BY clauses. In addition, HAVING is like WHERE but applied to the GROUP BY block.

Insert having between group by and order by

The difference can be better understood with an example.

Side note: We will be working with the ‘employees’ database, so if you haven’t downloaded it, make sure to check out our tutorial explaining it.

compete-data-science-program

Providing a Case in Point

 

First, on some occasions, an identical result could be obtained by implementing the same condition, either with the WHERE or HAVING clause.

For instance, should we select all employees hired after the 1st of January 2000, the retrieved table will be the same, whether we are using WHERE or HAVING!

If we are using WHERE, the code will be:

SELECT

               *

FROM

               employees

WHERE

               hire_date >= ‘2000-01-01’;

And if we decide to use the HAVING clause, we can simply replace WHERE:

SELECT

               *

FROM

               employees

HAVING

               hire_date >= ‘2000-01-01’;

The results are the same, as shown in the picture below.

The results are the same

 

The Main Advantage of the HAVING Clause over the WHERE Clause

 

Well, the main distinction between the two clauses is that HAVING can be applied for subsets of aggregated groups, while in the WHERE block, this is forbidden. In simpler words, after HAVING, we can have a condition with an aggregate function, while WHERE cannot use aggregate functions within its conditions.

 

An Example

Assume we want to extract a list with all first names that appear more than 250 times in the “employees” table.

If we try to set this condition in the WHERE clause, Workbench wouldn’t indicate there is a mistake in our code, because this is the correct syntax.

The where clause is the correct syntax

We will be shown an error message when we try to execute the query. And it will be a very eloquent one: invalid use of group function.

Error 1111 - invalid use of group function

 

Fixing the Error

We can change the keyword to HAVING and add the line of code in the right place, just after the GROUP BY statement.

SELECT

               first_name, COUNT(first_name) as names_count

FROM

               employees

GROUP BY first_name

HAVING COUNT(first_name) > 250

ORDER BY first_name;

Now, let’s re-run the query.

193 names were returned

We retrieved 193 names that can be encountered more than 250 times in the “employees” table.

 

WHERE or HAVING?

 

So, when is the right time to use WHERE and when should you use HAVING?

It is not so hard to decide. Let’s go over the problem we just solved again, “Extract all first names that appear more than 250 times in the “employees” table”. You must first spot the phrase “250 times”. It leads to counting the number of times something appears in the data table. And counting numbers is executed through the COUNT() function, isn’t it?

COUNT() is an aggregate function.

You must use HAVING, not WHERE.

count, an aggregate function, having

It is as simple as that. The same logic must be applied anytime an aggregate function is required for the solution of your task.

 

Summary of WHERE and HAVING

It is important to decide whether to use WHERE or HAVING in certain situations

WHERE allows us to set conditions that refer to subsets of individual rows. These conditions are applied before re-organizing the output into groups.

Once the rows that satisfy the WHERE conditions are chosen, they progress in the data retrieval process and can be grouped by distinct values recorded in a certain field or fields.

reorganize the output into groups

It is not until this moment, when the output can be further improved, or filtered, with a condition specified in the HAVING clause.

Finally, you could sort the records of the final list through the ORDER BY clause.

group by where then having

 

Solving a Task

To reinforce your understanding of the data retrieval process, let’s see an example containing both a WHERE and a HAVING condition.

The task is to extract a list of all names that are encountered less than 200 times. Let the data refer to people hired after the 1st of January 1999 only.

Let’s create the query, step-by-step.

Apparently, we must select the first names and the number of times a first name is encountered, renaming the second selection as “names_count”.

The second thing to do is designate the table we will retrieve data from – “employees”.

The code will look like this:

select first_name, count(first_name) as names_count

from employees;

 

What to Use

where? having?

Should we only use WHERE or HAVING, or both keywords, while setting our conditions?

Well, there are two conditions to satisfy to solve our problem.

 

First Condition

One is that the names must be encountered less than 200 times. “200 times” immediately means we must use COUNT(), which will count the number of times a certain first name appears in the “employees” data table.

COUNT() is an aggregate function and, as we said earlier, so it must go with HAVING.

count must go with having

 

Second Condition

The other condition to satisfy is general: “All the rows extracted must be of people who were hired after the 1st of January 1999.” This condition refers to all individual rows in the “employees” table. No specific aggregate function must be applied. Therefore, this condition must go with the WHERE clause.

no aggregate then with where

Important: Between the WHERE and the HAVING blocks, we shouldn’t forget to insert the GROUP BY segment. We must group by “first_name”, not by some other field, since our task requires us to aggregate our output by the number of times a certain first name is encountered.

Actually, let’s order the output by the “first_name” in descending order.

 

The Final Solution

The whole query can be written in the following way:

SELECT

               first_name, COUNT(first_name) AS names_count

FROM

               employees

WHERE

               hire_date > ‘1999-01-01’

GROUP BY first_name

HAVING COUNT(first_name) < 200

ORDER BY first_name DESC;

a list with the number of all distinct names of people

The query worked and solved our problem – we have a list with the number of all distinct names of people hired after the 1st of January 1999.

 

A Few Observations

The function “COUNT(first_name)” has been applied twice – once in the SELECT statement and once in the HAVING block.

 

Multiple Use

On the screen, the two phrases seem distant from one another. However, in terms of logic, they are close. Normally, you would like the aggregate function you have attached to a specific condition to be seen in your output, too. So, it makes sense to see the same phrase twice.

It's normal to see the phrase twice

On the other hand, “first_name” appears in the query three times. Once, because you group by this field. A second time because grouping by a field requires stating the column name in the SELECT block. And the third time, at the bottom of the query, because we order the data by the same selection.

first name is in the query 3 times

 

Looking out for the Order

It might seem you are using “first_name” and COUNT() too often within the query. Well, this is fine. This is not always going to be the case, but it is okay if it happens. Therefore, you must master the query structure well; otherwise, you will easily fall into traps, inducing you to make mistakes while coding.

first name appears 5 times

 

Another Peculiarity of the HAVING Keyword

In addition, as an exercise, you could try to place the “hire_date” condition in the HAVING clause, instead of in the WHERE clause, but you’ll get an error.

hire dat in the having clause causes an error

You saw HAVING can contain such a condition as “hire_date” greater than the 1st of January 2000.” In another query, it worked properly with an aggregate function:

it worked with an aggregate function

Important: Both situations are feasible for MySQL, but you cannot have both an aggregated and a non-aggregated condition in the HAVING clause! Although this seems to be a minor detail, sometimes, it will be important for you to know this. Don’t mix the conditions in the HAVING block. This is a pitfall you should be aware of, and that’s why we shared it with you!

don't mix conditions in having

 

When to Use the HAVING Clause

 

To sum up this tutorial, remember the simple rule:

If you need to work with aggregate functions, use GROUP BY and HAVING. And if you need to apply general conditions, use WHERE.

You will agree it seems there are too many clauses and tricks to remember about the SELECT statement, right? SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, and their corresponding features.

Don’t worry – as you read more tutorials and practice in the exercises attached to the lessons, you will start using these statements of the data manipulation language freely.

So, it may come to you as no surprise that there are other powerful statements in SQL. Feel free to dive into the unknown world of the LIMIT statement.

Next Tutorial: The LIMIT Statement

Leave a Reply

Your email address will not be published.

A Free Data Science Career Guide?

Check your email shortly!