Databases vs Spreadsheets

SQL Tutorials 13 min read
why databases
Blog / SQL Tutorials / Databases vs Spreadsheets

In this tutorial, we will focus on databases vs spreadsheets. It is not just Excel users who will benefit from this article. Those of you who do not use Excel regularly will learn the advantages and the disadvantages of using databases or spreadsheets.

If you are familiar with spreadsheets, you might be wondering why databases are even used. Some of you might be users and still imagine an Excel spreadsheet when we talk about tabular data. Please, don’t do that! Data tables, databases, and Excel spreadsheets are different things!

data tables do not equal databases do not equal excel spreadsheets

Spreadsheets

Let’s start with a definition.

A spreadsheet is an electronic ledger, i.e., an electronic version of paper accounting worksheets. It was created to facilitate people who needed to store their accounting information in tabular form digitally.

An electronic ledger is an electronic version of paper accounting worksheets databases vs spreadsheets

So, it is possible to create tables in a spreadsheet. This is why some people believe spreadsheets and databases are interchangeable, while, in reality, they aren’t.

data science training

Common Things Between Databases and Spreadsheets

There are similarities between the two.

  1. Both can contain a large amount of tabular data.
  2. Both can use existing data to make calculations.
  3. Neither spreadsheets nor databases are typically used by a single person, rather many users will work with the data.both contain large amounts of data and make calculations and are use my many users, databases vs spreadsheets

What Are the Differences Between Databases and Spreadsheets?

The differences between the two forms of data storage lie in the way these three characteristics are implemented.

The Data They Contain

Data in Spreadsheets

Imagine a spreadsheet. Every cell is treated as a unique entity. It can store any type of information – a date, an integer value, a string name, etc.

Not only can we have diverse types of values in various cells, but we can also apply a specific format to these cells.

We can format information in a cell, databases vs spreadsheets

Data in Databases

This is not the case with databases. They contain only raw data. Each cell is a container of a single data value. It is the smallest piece of information there is. You must pre-set the type of data contained in a certain field. This feature prevents inadvertent mistakes. For example, in a field containing date values, should you try to insert a string, the software will show an error and you will have the chance to correct yourself.

you can't enter a string into a data field, databases vs spreadsheets

This won’t happen in Excel – if you insert a string in the column with date values, you won’t obtain an error message and Excel will store the string value.

Excel will store the string, databases vs spreadsheets

Storing Data

In a spreadsheet, you can store data in a cell, while in a database, you store it in a record of a table, meaning you must count the records in a table to express how long the data table is, not the number of the cells. And that is it – you cannot pick a font colour or size. All you care about is the information being stored; you don’t care about formatting. Our main goal is to save the numbers.

No formatting in data tables, databases vs spreadsheets

Calculations in Cells

Another substantial difference is that, in a spreadsheet, different cells can contain calculations, such as functions and formulas. This means, if you want to combine two integers, the result will be stored in another cell. In a database, all calculations and operations are based on the existing data and are done after its retrieval. There is a specific feature, called “views”, similar to the tables, in which you can do a calculation.

Calculations are done after data retireval in relational data bases. Different cells can contain calculations in Excel, databases vs spreadsheets

These objects also contain columns that can be normal like the ones in the tables or could contain a certain type of calculation. There is no way you can mistake a record of data with a calculation.

Data Integrity

The database features mentioned so far improve data integrity – you can’t store different types of data in the same field, and it is unlikely someone will mistake a data value for an outcome of a calculation, especially in large data sets. Data integrity is why databases are more appropriate when you want to avoid making mistakes.

In relational databases a record of data does not equal a calculation, databases vs spreadsheets

Creating Relations

Naturally, you might think a spreadsheet can contain multiple worksheets, so one can create tables in the worksheets, and then use the worksheets to create relations between the tables. Why are databases used then? Well, in a spreadsheet, such relations will be logically limited.

Excel spreadsheets are logically limited, databases vs spreadsheets

Instead of setting up spreadsheets or worksheets, one can set up relations between the tables. This will boost the performance of operations and increase the speed at which you manipulate your dataset.

Relational databases increase speed, databases vs spreadsheets

What Are the Cons of Using Spreadsheets?

Albeit powerful for many circumstances, spreadsheets do have limitations.

Limitations

Excel is incapable of handling over 1 million rows of data. This immediately induces us to look for a solution. Usually, the fix is to use databases, where having 2, 5, or 10 million records is not a problem.

Spreadsheets are limited to 1 million rows, databases vs spreadsheets

Multi-User Property

Referring to the multi-user property, spreadsheets are lacking. Essentially, every person must update their own spreadsheet with new data. For instance, if there is a new purchase to register or a last name in a table below to correct, every user must make these changes manually.

Everyone must update their own version of a spreadsheet

You would justifiably think Google Docs and the latest versions of Office solve this issue, but they do so only partially. In Google Docs, you might have trouble finding out who changed or deleted information incorrectly, which often leads to a cumbersome situation where people have a hard time organizing their tasks.

Things like google docs can still leave things confusing, databases vs spreadsheets

What are the Pros of Using Databases vs Spreadsheets?

Controlling Changes

As opposed to that, the first reason why databases are useful is that they provide a stable structure, controlling access permissions and user restrictions. One person can make a change which will be visible to everybody instantly. This feature increases efficiency and data consistency when using databases.

Relational databases provide a stable structure, databases vs spreadsheets

Increased Efficiency

Considering data integrity and data consistency, using databases eliminates duplicate information, which is another way to save space and increase efficiency. Look at the table in the picture below.

Databases eliminate duplicates

The Problem with Spreadsheets

When using databases, you know a certain first and last name corresponds to a unique email address. However, if you are using a spreadsheet flooded with data and you know John McKinley has changed his email, you may change the email address once and accidentally miss updating the same address in another record. This may lead to inadvertent mistakes.

When information is changed it avoid inadvertant mistakes, databases vs spreadsheets

The Solution

This can be avoided by using a relational database – an accredited user only needs to access the table and change John McKinley’s email address there. Just once. Not only will this operation save time, but it will also anticipate inconsistencies. This is yet another reason why databases are more useful when your data can be stored in logically related tables.

Information is changed across databases, databases vs spreadsheets

Database Terminology

Now, that you know why databases are better than spreadsheets, let’s focus on some database terminology. This is a necessary step that will help you when you start coding in SQL.

What’s the Process of Creating a Database?

Let’s go through the entire process of creating a database. Assume our database containing customer sales data has not been set up yet.

How to create a database?

So, imagine you are the shop owner and you realize your goods have been selling quite well recently, and you have more than a million rows of data.

A shopkeep with a million rows of data, databases vs spreadsheets

What do you need, then? A database! But you know nothing about databases. Who do you call, an SQL specialist? No. You need a database designer.

A shopkeeper needs a database designer

Database Designers

She will decide how to organize the data into tables and how to set up the relations between these tables. This step is crucial.

The database design is essential

If the database design is not perfect from the beginning, your system will be difficult to work with and won’t facilitate your business needs; you will have to start over again. Considering the time, data, and money involved in the process, you want to avoid going back to square 1.

A bad database design wastes time and money

Plotting the Database System

What do database designers actually do? They plot the entire database system on a canvas using a visualization tool. There are two main ways to do that.

Entity-Relationship Diagram

One is drawing an Entity-Relationship diagram, an ER diagram for short. As shown in the picture below, the different figures represent different data entities and the specific relationships we have between entities.

An ER diagram describes how the databases will relate

The connections between tables are indicated by lines. This way of representing databases is powerful and professional, but it is complicated. We will not focus in-depth on ER diagrams in these tutorials, but you should know they exist and refer to the process of database design.

Relational Schema

Another form of representation of a database is the relational schema. This is an existing idea of how the database must be organized. It is useful when you are certain of the structure and organization of the database you would like to create.

Let’s see what a relational schema would look like.

A relational schema lists how a database should be organised

It represents a table in the shape of a rectangle. The name of the table is at the top of the rectangle. The column names are listed below.

Column names

All relational schemas in a database form the database schema. You can also see lines indicating how tables are related in the database.

A database schema indicated how databases are organised through lines

Creating the Database

Once the database design process is done, the next step would be to create the database. Up to this moment, it has been ideas, planning, abstract thinking, and design. At this stage, it would be correct to say SQL can be used to set up the database physically, as opposed to contriving it abstractly.

SQL sets up the database physically

Data Manipulation

It is only then that you can enjoy the advantages of data manipulation. It will allow you to use your dataset to extract business insights that aim to improve the performance and efficiency of the business you are working for. This process is interesting, and, practically, the main part of the tutorials will be related to that.

Database manipultion allows you to use your dataset to extract business insights

Important: Well thought-out and carefully designed databases are crucial prerequisites for data manipulation. If we have done good work with these steps, we can write effective queries and navigate a database rather quickly.

The three steps - Design, creation and manipulation

Database Management

You will often hear the term database management. It comprises all the steps a business undertakes to design, create, and manipulate its databases successfully.

These three steps are database managment

Database Administration

Finally, database administration is the most frequently encountered job of all. A database administrator provides daily care and maintenance of a database. Her scope of responsibilities is narrower regarding the ones carried out by a database manager, but she is still indispensable for the database department of a company.

Database administration is the maintenance of databases

Why Databases?

So, what we discussed in this tutorial highlights why databases are a better environment for storing and keeping track of data when working with multiple dimensions and large amounts of data. Spreadsheets have their advantages as well – they are an excellent tool that allows us to carry out extensive analysis. But for the easy retrieval and updating of data, efficiency, data consistency, data integrity, speed, and security, relational databases are definitely the structure to opt for. They can store lots of raw data and are excellent when separating the data from the way it is displayed for analysis.

The pros of a relational database

As you saw, it would be a good idea to stop trying to visualize data tables in the form of spreadsheets. They are different.

database tables to not equal spreadsheets

Moreover, this tutorial allowed us to learn certain terms we will be using often when dealing with databases. If you are eager to learn more, make sure you jump into the next tutorial where we will explain how meaningful a relational schema can be for you to understand the functioning of a certain database.

***

Eager to hone your SQL skills? Learn how to put theory into practice with our hands-on tutorials!

 

Next Tutorial: SQL Primary Key

Earn your Data Science Degree

Expert instructions, unmatched support and a verified certificate upon completion!

Leave a Reply

Your email address will not be published.

×
LAST CHANCE
Ready to Learn Data Science?
50% OFF