In order to become an advanced data scientist, it is crucial to understand how to manipulate data and use operators in SQL. We will be working with databases, so being familiar with them is essential. Let’s take it to the next level! It’s time to start manipulating and retrieving data in MySQL!
What We Will Need
For the purposes of this tutorial, we’ll be using a huge dataset that can be found on GitHub. It is frequently used by programmers who want to turn it up a notch and dig deeper into SQL. Therefore, this dataset is the best possible choice for laying the foundations for working with MySQL and relational databases.
We will take advantage of its complexity by solving several challenging tasks. These tasks will be a wonderful exercise that will turn you into a proficient SQL user.
The ‘Employees’ Database
We have re-organized the database script in an SQL file ('employees.sql') that you can download and run.
Important: Don't forget the location you are storing it in!
Once this process is ready, open the script file from Workbench as you would with other SQL files.
As you scroll down, you’ll see the code that will create your entire database.
Running the ‘Employees’ Database
Run the file and wait for a while.
Once the execution phase is over, we can refresh the data in our schemas section. The “Employees” database will appear on our list.
This is the database we’ll be using until the end of the tutorial, so let’s roll up our sleeves and get into some serious coding!
Data Manipulation
It is time to study the data manipulation tools in SQL. We will go through all statements of the data manipulation language, and we apply them in practice.
The SELECT Statement
The statement we will use almost all the time is SELECT. It is one of the most important statements in MySQL and SQL.
SELECT, as its name suggests, allows us to extract a fraction of the entire data set. It allows us to retrieve data from tables. Some people also say that this statement is used to “query data from a database”. SELECT queries maybe 4 words:
But they may not fit in a 30-inch screen.
If SQL is the principal language you are using at work, there is no chance you will spend a day without applying the SELECT statement.
Syntax
The basic SELECT statement syntax is the following:
SELECT
column 1, column 2, …
FROM
table_name;
Practically, the FROM keyword is mandatory. It tells us which table we will be retrieving data from. When extracting information, SELECT goes with FROM. We are always selecting something from a certain table or another type of SQL object.
Exploring the ‘Employees’ Database
Now, when we expand the “Tables” section of the ‘Employees’ database, there are six tables in this database. It is shown in the picture below:
Then, we can expand the ‘Employees’ table, for instance, and see the “Columns” section comprises six columns.
Assume you are interested in exploring the names of all employees. How can you do that? With the SELECT statement.
Abiding by the abovementioned syntax:
SELECT
first_name, last_name
FROM
employees;
This should be the query that will return an output of the entire first-name and last-name columns from the “Employees” table. Let’s write it down and run it to see if it works.
By scrolling down in the “Output” section, we can see that all the records are retrieved.
The Beautify Option
We can use the Beautify option offered in Workbench. It is an excellent formatting tool, especially for educational purposes, so let’s take advantage of it. It can only organize the code better by separating it into a few rows for easier reading.
The Star Symbol in SQL
There is a very important thing we must mention here. Say you are interested in the information contained in the ‘Employees’ table. There is a substitute for enlisting all its column names between SELECT and FROM. In SQL, the star symbol, *, technically defined as a wildcard character, means “all”, or “everything”.
Therefore, “SELECT * FROM Employees” should be thought of as “SELECT all columns FROM the ‘Employees table’. Let’s run this query to verify the result.
We have the entire multi-row data from this table shown in the picture above. For each record, we see an employee’s number, birthdate, first and last name, gender, and the date they started with the company.
The WHERE Clause
Another technique to learn is adding a WHERE clause to the SELECT statement. This will allow us to set a condition upon which we will specify what part of the data we want to retrieve from the database. This is much easier than it sounds. Let’s start by checking the relevant syntax.
Syntax
In the body of the query, after the SELECT… FROM… structure which we’ve already seen, we must add the WHERE keyword and then specify a condition.
This condition can be, for example, a verification that certain information is available, a check, or a mathematical expression. In the tutorials to come, we will examine all possible types of conditions that can be used after the WHERE clause.
Using the WHERE Clause
Let’s look at the “Employees” table again.
You have data on thousands of employees. Assume you know that there are people in this huge list who are named Denis.
To extract their records from the “Employees” table, we can type:
SELECT
*
FROM
employees
WHERE
first name = ‘Denis’;
Important: Don’t forget to add single or double quotes to the string Denis! Otherwise, your query won’t work!
In SQL, the equality sign isn’t only used for mathematical expressions.
In this line of code, it means the selection of data will be based on the condition that the individual’s name is “Denis”.
Let’s run this query to see what happens.
Two hundred and thirty-two rows were affected, which means we have retrieved 232 instances of the “Employees” table. These are people whose first name is “Denis”.
Operators in SQL
You should know that the technical term for the equality sign is ‘equal operator’. In SQL, there are many other linking keywords and symbols, i.e. operators in SQL that you can use with the WHERE clause.
Examples include AND, OR, IN, NOT IN, LIKE, NOT LIKE, BETWEEN… AND…, EXISTS, NOT EXISTS, IS NULL, IS NOT NULL, and more.
Some of them are straightforward, while others, not so much. Now, we will concentrate on the most frequently encountered of the operators in SQL.
The AND Operator
Let’s start with AND. This is one of the operators in SQL that allows you to logically combine two statements in the condition code block. For instance, you may wish to retrieve data about all employees whose first name is Denis, and who are male. The latter would mean their gender is ‘M’.
The setting of such a condition stems from the fact that not all people who carry the name “Denis” are of the same gender. So, we might want to obtain a list containing men who are called ‘Denis’ only. The AND operator in SQL allows us to narrow the output we would like to extract from our data.
How to Use it
Technically, this operator must be placed right after the first condition, “first name = Denis”. It must be succeeded by the second condition we just talked about.
So, we must type:
SELECT
*
FROM
employees
WHERE
first_name = ‘Denis’ and gender = ‘M’;
140 rows affected. Great – this means not all 232 people named “Denis” were men! Just 140 of them.
Therefore, the output is a list of the records of all men named “Denis”.
The OR Operator
Perhaps you think AND and OR are the most straightforward operators in SQL to apply.
This isn’t exactly true. To be sure of what you are asking the computer to do, you should know how SQL treats these reserved words.
Going Over the Previous Operators in SQL
Let’s quickly revise some of the queries we’ve run so far.
When we executed SELECT with the condition that an individual’s first name must be “Denis”, we retrieved 232 rows.
Then we ran the same query, adding a condition to the WHERE clause – we were looking for men only. We extracted 140 rows of data.
Working with the AND Operator
To achieve this, we used the AND operator. This means that AND binds SQL to meet both conditions enlisted in the WHERE clause simultaneously. We selected the data satisfying both condition 1 AND condition 2.
Thus, the list obtained through the last query contains people who meet both criteria: their first name is ‘Denis’ AND their gender is male. That’s why only 140 rows (a number smaller than the initial 232) were extracted. The difference is represented by the number of women (92) named “Denis”.
Working with the OR Operator in SQL
Now let’s see how OR works.
We can execute the same query, changing the conditions in the WHERE clause to
first name = “Denis” OR first name = “Elvis”;
How it Works
Perhaps you think it is logical to obtain two outputs – one containing a full list of individuals named “Denis” and another one containing those named “Elvis”.
However, that’s not how SQL understands things. We will obtain a single output – a list of all individuals whose name is either “Denis” or “Elvis”. This list will contain a combination of the records of the two outputs you were thinking of, as you can see in the picture below.
Moreover, the first names will be scrambled – they are ordered differently, following the employee number.
This list contains 478 rows of information – 232 of all people named “Denis” plus, obviously, the 246 people named “Elvis”.
Using AND instead of OR
To prove to you that this is the way SQL understands the AND and OR operators, let’s run the same query using AND instead of OR. Notice how both conditions are set in the same column – “first name”.
We execute this query, and Workbench returns nothing.
As shown in the picture below, SQL was unable to find a person named both “Denis” and “Elvis” at the same time!
Therefore, we can conclude that we should either use AND with the conditions set on different columns, or we should use OR with conditions set on the same column.
The Logical Order
Now, we will introduce you to the logical order with which you must comply when you use both operators in SQL in the same WHERE block.
Imagine you wanted to extract a list of all people whose last name is Denis, regardless of their gender. We know we don’t have missing data in the gender column, but let’s assume you would like to specifically indicate that you wish to retrieve data about male and female individuals with this family name.
The Problem
Hence, after
SELECT * FROM employees WHERE
you can type
last name = ‘Denis’
to show the last name of interest. Then use the keyword AND. Finally, extract the information for either of the genders
last name = ‘Denis’ gender = ‘M’ OR gender = ‘F’;
Let’s run the query to see if we obtain the desired output.
Not at all!
More than 120,000 people named Denis? Sounds impossible!
Something must be wrong with this query…
The Logical Operator Precedence
The solution to this problem lies in the logical operator precedence. This is an SQL rule stating that in the execution of the query, the operator AND is applied first, while the operator OR is applied second.
In other words, regardless of the order in which you use these operators, SQL will always start by reading the conditions around the AND operator. Only then will it comply with the conditions around the OR operator. Always!
Having said that, because of the way conditions and operators in SQL are structured in our code, it turns out we have extracted all male individuals with the last name “Denis”, and all female individuals in the data table, regardless of their family name.
The Solution
To circumvent this issue, parentheses can help us re-structure the logic of the operation. To retrieve a list of people with the last name “Denis”, independent of their gender, you must place parentheses around the conditions about gender. This way, the computer will consider the first condition, the one about the last name, simultaneously with the second condition, which means the individual named “Denis” could be either a male or a female.
Let’s execute this query to verify that what we say is true.
The Importance of the Operator Precedence
Hence, remember that the logical operator precedence is an important phenomenon when enlisting conditions in the WHERE clause. It is crucial to know how it works if you want to create scripts that work the right way.
In addition, always try to double-check the results you obtain! As you saw, the first query leads us to an output, too. Nevertheless, we did not accept that output without verifying that the retrieved data corresponded to the question we were trying to answer. This was the right thing to do because, apparently, it was not the result we were looking for!
Retrieving Data in SQL
You already know how to retrieve information from an SQL table. You can use SELECT with the star symbol to obtain the entire data from the table of interest, or you can use SELECT with the column names to designate which ones from this table you would like to see displayed. You've also become familiar with the WHERE clause, allowing you to set conditions upon which you could specify what part of the data you want to extract. Moreover, we introduced a few operators in SQL, including AND, OR and the logical operator precedence. If you keep in mind that AND is with bigger priority than OR, you shouldn’t have any worries.
With what you’ve learned from this tutorial, you can easily retrieve certain data from various tables. If you can’t wait to learn how to sort the output in SQL, which is a crucial skill for any data analyst out there, don’t hesitate before jumping into our next tutorial.
***
Eager to hone your SQL skills? Enroll in our SQL course.
Next Tutorial: How To Use The ORDER BY Clause