How does an INNER JOIN in SQL work?
If you want to learn how to create an inner join in SQL, you should have an idea what joins are in the first place. Furthermore, there are two tables that you should set up for this tutorial.
To illustrate the functionality of the inner join in SQL, we suggest you refer to the following graphics.
Shown in the picture below is a Venn diagram – a mathematical tool representing all possible logical relations between a finite collection of sets.
The “dept_manager_dup” table is represented as the set on the left, while the “departments_dup” table is represented by the set, or the circle, on the right.
Which will be the related column here? We know that we can join tables with columns of the same type and meaning. Therefore, the related column of these two tables will be the dept_no.
The area that belongs to both circles, which is filled with red, represents all records belonging to both the “dept_manager_dup” and the “departments_dup” tables. This area is also called a result set.
The inner join in SQL can help us extract this result set. With this type of join, we can match records of two or more tables. The same values appearing in the “dept_no” column in both tables are the matching values, also called matching records.
The values that do not match will not appear in our final output. Logically, they are called non-matching values, or non-matching records.
A Case in Point
An example from the “employees” database can clarify the concept.
The Department Manager Duplicate Table
Let’s check the current state of the table on the left side of our graph, the “dept_manager_dup” table, by writing the following code:
SELECT * FROM dept_manager_dup ORDER BY dept_no;
Side note: We have ordered the output by “dept_no”, so it is easier to see which records contain data values in all fields and which don’t.
However, four records contain no information about “dept_no” and “to_date”. That’s why the values in that column appear as NULL. In addition, we lack data about department number 1.
Then, we have data about the departments from 2 to 9, though.
The Departments Duplicate Table
Let’s carry out the same analysis on the other table – the “departments_dup” table. We can write the same code, as a while ago, with a small change:
SELECT * FROM departments_dup ORDER BY dept_no;
As you can see in the picture above, “Public Relations” contains no department number for some reason; therefore, we see “NULL” in the first column of the first row. Furthermore, we have data about the number and name of all departments from 1 to 11 with these exceptions:
- There is nothing about department number 2 – neither a number nor a name.
- Departments 10 and 11 do not have corresponding names.
What is the Syntax of an Inner Join in SQL?
To do this, we’ll need the join syntax, which is very intuitive. It serves to show columns from the different tables and specify which the related column will be.
In a SELECT statement, write all columns you wish to see in the result. It is important to designate the tables in which the columns belong because the data is not in a single table this time.
That’s why, besides typing the keyword FROM and the name of the first table, you should proceed to write JOIN and the name of the second table.
Relating the Tables
To this moment, the syntax allows us to specify the fields we would like to see in the result and the tables we are matching.
But we haven’t specified how we would relate these tables.
Let’s go ahead and type ON then quote the columns that relate the two tables. You can see how it is done in the picture below.
You are specifying where SQL will extract the matching values from.
The code you see in the picture above is the main body of the syntax structure about writing joins.
How to Use Aliases with Joins?
Before we proceed with our example, let me share a fundamental coding practice that professionals use in the joins syntax – aliases. More precisely, I am talking about aliases of the tables’ names.
This means table_1 can be renamed to t1 and table_2 – to t2. When used for assigning table names, the aliases are usually added right after the original table name, without using the keyword “AS”. Then, instead of typing the entire tables’ names in the SELECT block, we can use t1 and t2, respectively.
Now, let’s see how an inner join in SQL can be implemented with an example.
How to Implement an Inner JOIN in SQL?
Let’s apply a join syntax to the “dept_manager_dup” and “departments_dup” tables. By doing this, we will obtain a list of records containing information about the department number, the employee number, and the department name related to every manager.
We can use the letters M and D as aliases for the two tables, respectively. Knowing their content, we can specify the fields we would like to see right after the SELECT keyword. Let’s extract the department and employee numbers from the “dept_manager_dup” table and the department name from the “departments_dup” table.
We can achieve that by writing the following code:
select m.dept_no, m.emp_no, d.dept_name
Joining the Tables
Then, we must join the two tables.
Important: While doing that, please don’t forget to add their aliases!
So, we must type:
select m.dept_no, m.emp_no, d.dept_name from dept_manager_dup m inner join departments_dup d
The Columns Relating the Tables
To conclude the join syntax, we must specify the columns relating the two tables. Let’s do that by typing :
select m.dept_no, m.emp_no, d.dept_name from dept_manager_dup m inner join departments_dup d on m.dept_no = d.dept_on;
These are the “dept_no” field from the “dept_manager_dup” table and the “dept_no” field from the “departments_dup” table.
Side note: The relating columns of the two tables need not be named in the same way. However, it helps avoid confusion and is more convenient if they are. That’s why, in our example, both fields carry the same name.
Sorting the Output
Before running this interesting query, let’s order the results according to their “dept_no”. This will improve the output’s representation. Since we obtain this column from the “dept_manager_dup” table, we will sort the information in the ORDER BY clause by the “dept_no” from the M table.
The 20 rows of information that we obtained after executing the query are the result from the following code:
SELECT m.dept_no, d.dept_name FROM dept_manager_dup m INNER JOIN departments_dup d ON m.dept_no = d.dept_no ORDER BY m.dept_no;
The Retrieved Data
We retrieved data about the departments from 3 to 9 only. There is no data about departments 1, 2, 10, or 11 or about the ones labelled as null values.
Why is this happening?
Important: Inner joins only extract records where the values in the related columns match. Null values, or values appearing in just one of the two tables and not appearing in the other, are not displayed.
The “dept_manager_dup” has a few null-valued department numbers and has no information about departments 1, 10, and 11. Therefore, inner joins do not present null values; the first four rows do not appear in the inner join we executed.
We have 26 records in this table altogether, and 26 – 4 = 22. Hence, there must be two records in the inner join we do not see because it only displays 20 rows!
Re-checking the Content of the “departments_dup” Table
It has a null-valued department number, too. We don’t see it in the inner join output.
And why didn’t SQL match the null values of the first table with this null value here? Because it would make no sense. Only non-null matching values are in play.
Moving further, department number 2 doesn’t participate in any record. Do we have it in the “dept_manager_dup” table?
Yes, we do. There are two records with it in there.
Moreover, these were the two records we did not see in the final output. There were no values from the “departments_dup” table to be matched with them in the inner join.
What Else is Dropped from the Outcome
Finally, the “departments_dup” table has 2 more records. They are departments 10 and 11, but as we mentioned, they have no values to be matched with from the “dept_manager_dup” table.
I hope this analysis makes it clearer and explains why we say an inner join is represented by the common area of the two circles in the Venn diagram. This area is namely the set of the matching values of both tables, which are the only ones displayed in the inner-join output.
And what if such matching values did not exist? Simply, the result set will be empty. There will be no link between the two tables.
Understanding How to Use Inner Joins
In conclusion, if you need to relate tables via a column they have in common, an inner join in SQL will do the job. If there is missing data, it must be a null value or it isn’t the same in both tables. Therefore, we used a Venn Diagram to make it easier to understand. So, if you are eager to learn several details about the syntax of joins in SQL, feel free to dive into the tutorial explaining this.
Eager to hone your SQL skills? Learn how to put theory into practice with our hands-on tutorials!
Next Tutorial: Joins Syntax in SQL