How does a LEFT JOIN in SQL work?
If you have any idea what JOINS are and you are familiar with the INNER JOIN type, learning how to use the LEFT JOIN in SQL should be a walk in the park. Moreover, we will be handling duplicate records, so make sure you know a thing or two about it.
Side note: If you don’t have the ‘employees’ database, you can download it here.
A Venn Diagram
The Venn diagram you see in the picture below, allows you to visualize how a LEFT JOIN in SQL works.
Its output allows us to see all records from the table on the left side of the JOIN, including all matching rows of the two tables. That’s why, compared to the INNER JOIN, the result set, coloured in red, includes the rest of the area of the left table.
In SQL terms, this translates to retrieving all matching values of the two tables plus all values from the left table that do not match any values from the right table.
To clarify the concept, let’s look at an application.
An Application of the LEFT JOIN in SQL
So, if you’ve read the tutorial on the duplicate records, there are a few things we want to delete. First, let’s remove the duplicates from the two tables. We can achieve that by writing the following code:
DELETE FROM dept_manager_dup WHERE emp_no = ‘110228’; DELETE FROM departments_dup WHERE dept_no = ‘d009’;
Duplicate values inflate our output with repeating records and do not contribute to the results we obtain meaningfully.
Technically, after deleting the rows in which we see the respective employee and department numbers, MySQL will delete all initial records and all their duplicates.
How to Re-add the Initial Records?
In order to add them back, try coding:
INSERT INTO dept_manager_dup VALUES (‘110228’, ‘d003’, ‘1992-03-21’, ‘9999-01-01’); INSERT INTO departments_dup VALUES (‘d009’, ‘Customer Service’);
After running the query, everything looks good.
How to Implement the LEFT JOIN in SQL?
We can write similar code to the one we would use with the INNER JOIN. However, the JOIN syntax will be subject to a slight change. Instead of relating the two tables through the keyword JOIN, or INNER JOIN, we must type LEFT JOIN. This is what the code would look like:
SELECT m.dept_no, m.emp_no, d.dept_name FROM dept_manager_dup m LEFT JOIN departments_dup d ON m.dept_no = d.dept_no GROUP BY m.emp_no ORDER BY m.dept_no;
After executing, the query returned 26 rows, as shown in the picture below!
The Extra Rows
6 rows more than the 20 rows we obtained in the example about INNER JOINS. Which are these 6 rows that make the difference?
To start our analysis, please, refer to the Venn diagram. These 6 records represent the area filled with blue. This is the part of the left set where no values match with the right table.
All 6 records of interest are in the top part of the output, as you can see in the picture below.
The First Rows
To begin with, all 4 records that appear in the “dept_manager_dup” table, but not in the “departments_dup” table, have been retrieved. They refer to employees with numbers 9999 0-4, 5, 6, and 7.
Why were these 4 rows retrieved? Because they were registered in the table on the left, but not in the table on the right. There are no department number values that could match these records with the table on the right. Hence, they are an example of non-matching values from the left table.
The Last Rows
There are two more records we must discuss – the ones of department 2 from the left table, “dept_manager_dup”. They appear in the LEFT JOIN output. There are no such records in the “departments_dup” table.
There we had information for all departments except department 2. The two records appear in the final left-join list, but there are no corresponding values for the “dept_name” field that can come from the table on the right.
The Order Matters
Basically, this is a proof that, differently from an INNER JOIN, when working with a LEFT JOIN in SQL, the order in which you join tables matters. Having the “dept_manager_dup” table, M, or the “departments_dup” table, D, on the left, can change results completely.
How to Change the Order?
Let’s invert the order of the tables in the query. We will place the “departments_dup” table on the left and the “dept_manager_dup” table on the right.
Let’s see what will happen, after we write the following code:
SELECT m.dept_no, m.emp_no, d.dept_name FROM departments_dup d LEFT JOIN dept_manager_dup m ON m.dept_no = d.dept_no ORDER BY m.dept_no;
A different output! This was due to swapping the direction of the two tables. So, now we will have all matching values from the two tables plus the department numbers from the “departments_dup” table that match no value from the “dept_manager_dup” table.
Therefore, we can expect to see records for departments 1, 10, and 11.
But why don’t we see any of them?
Correcting the Mistake We Made
The answer lies in the way we’ve chosen the first selection in this query. After we inverted the order of the tables and executed the query, the output changed, but that is not so obvious. To re-organize it properly, please, retrieve the department number data from the “departments_dup” table, not from the “dept_manager_dup” table.
The correct way to do that is to retrieve the first selection from the first table you have set in the join syntax.
What We Have Swapped
The picture is the opposite of the one the analysis just conducted. Referring to the Venn diagram, imagine that, with this query, we swapped the places of the two tables. That’s why, in the result section, we have only one null department number value, and we have data about department 1 that has no information about an employee number.
The latter is a consequence of the fact that we have no data about department number 1 in the “dept_manager_dup” table.
And because the “departments_dup” table contains no information about the second department, there will be no record of it in the result set either.
The Rest of the Departments
Finally, as you scroll down, you’ll see information about all departments from 3 to 9 with corresponding employee numbers and department names. At the end, as you probably expected, departments 10 and 11 are quoted, with no information in the remaining two fields.
Some Syntactical Peculiarities
Why are INNER JOINS called that? Because the values in the output are common to the two tables, and the result set is in the inner part of the Venn diagram; it is the area where both circles overlap.
So, OUTER JOINS must exist, too! The LEFT JOIN in SQL falls in this category since in the output obtained you have data from the outer part of the Venn diagram too.
My point here is, in this case, you could also use the keywords LEFT OUTER JOIN, and you will extract an identical result.
Therefore, if we re-run the query, adding the keyword OUTER in the code, the outcome will not differ from the one we obtained earlier.
Side note: The LEFT JOIN and LEFT OUTER JOIN are two interchangeable phrases. From a professional perspective, there is no reason for typing the keyword OUTER here, because if you are using a LEFT JOIN in SQL, it will always be an OUTER type of JOIN. This is like omitting the word INNER from INNER JOIN. Therefore, as long as it is clear what type of join will be applied, professionals prefer shorter phrases.
How to Obtain Records Only from the Left Table?
There is one final interesting detail I’d like to add. LEFT JOINS can deliver a list with all records from the left table that do not match any rows from the right table. Put differently, the idea is to obtain all records that appear only in the left outer blue part of the Venn diagram in this picture.
Which are these records, in our case?
Important: Let’s get the code back to its original state:
SELECT m.dept_no, m.emp_no, d.dept_name FROM dept_manager_dup m LEFT JOIN departments_dup d ON m.dept_no = d.dept_no ORDER BY m.dept_no;
Looking at the output obtained with the first LEFT JOIN, these are the top six rows of the data list. It is shown in the picture below.
They have null values in the “dept_name” field. And how could we get to these rows programmatically?
Using the WHERE Clause
The trick is to include a WHERE clause, whose condition will be all “dept_name” values to be null.
So, we can add the following code, right before the ORDER BY statement:
WHERE dept_name IS NULL
Through a LEFT JOIN, we will retrieve this bit of the table, whose column information comes from the right table and is null.
We managed to obtain a 6-row result, as we wanted.
What is Different about the LEFT JOIN?
In conclusion, when we use a LEFT JOIN in SQL, we are still joining data. However, it is more of an OUTER JOIN. What this means, is that some of the data belongs to only one of the two tables. Particularly, to the left one. Contrary to the INNER JOIN, changing the order of the tables in our code will make a huge difference to the output.
So, I suppose you are thinking that if there is a LEFT JOIN in SQL, there must be a RIGHT JOIN, too. Well, you are absolutely right. Feel free to jump into the tutorial where we will be discussing it.
Eager to hone your SQL skills? Learn how to put theory into practice with our hands-on tutorials!
Next Tutorial: SQL RIGHT JOIN