What Is a Data Warehouse?

Join over 2 million students who advanced their careers with 365 Data Science. Learn from instructors who have worked at Meta, Spotify, Google, IKEA, Netflix, and Coca-Cola and master Python, SQL, Excel, machine learning, data analysis, AI fundamentals, and more.

Start for Free
Iliya Valchanov Iliya Valchanov 2 May 2023 1 min read

what is a data warehouse

Data warehousing is one of the hottest topics both in business and in data science. But if you’re new to the field, you’re probably wondering what a data warehouse is, why we need it, and how it works. Don’t worry because, in this article, you’ll find the answers to all these questions.

First, let’s start with a definition: the meaning of the phrase: ‘Single source of truth’.

What Is the Single Source of Truth?

In information systems theory, the ‘single source of truth’ is the practice of structuring all the best quality data in one place.

Here's a very simple example.

Surely it has happened to you to work on a file and to create many different versions of it.

How do you name such a file?

Well, once you are ready you often place the word ‘final’ at the end. This results in having a bunch of files with extensions:

  • ‘final’
  • ‘final, final’
  • ‘final, final, final’

Or my favorite:

  • ‘really final’… ‘final’

If this is you, you are not alone. It seems that even corporations never know where the most recent or most appropriate file is.

an Excel file with many different versions and extensions

But what if you knew that there is one single place where you would always have the single source of information?

That would be quite helpful wouldn’t it?

Well, a data warehouse exists to fill that need.

So, what is a data warehouse exactly?

data warehouse definition

It is the place where companies store their valuable data assets, including customer data, sales data, employee data, and so on.

In short, a data warehouse is the de facto ‘single source of data truth’ for an organization. It is usually created and used primarily for data reporting and analysis purposes.

There are several defining features of a data warehouse.

It is:

  • subject-oriented
  • integrated
  • time-variant
  • nonvolatile
  • summarized

Let’s quickly go through these, one by one.

Subject-oriented means that the information in a data warehouse revolves around some subject.

Therefore, it does not contain all company data ever, but only the subject matters of interest. For instance, data on your competitors need not appear in a data warehouse, however, your own sales data will most certainly be there.

a data warehouse is subject oriented

Integrated corresponds to the example from the beginning of the video.

Each database, or each team, or even each person has their own preferences when it comes to naming conventions. That is why common standards are developed to make sure that the data warehouse picks the best quality data from everywhere. This relates to ‘master data governance’, but that is a topic for another time.

a data warehouse is integrated

Time-variant relates to the fact that a data warehouse contains historical data, too.

As said before, we mainly use a data warehouse for analysis and reporting, which implies we need to know what happened 5 or 10 years ago.

a data warehouse is time-variant

Nonvolatile implies that the data only flows in the data warehouse as is.

Once there, it cannot be changed or deleted.

a data warehouse is nonvolatile

Summarized once again touches upon the fact that the data is used for data analytics.

Often it is aggregated or segmented in some ways, in order to facilitate analysis and reporting.

a data warehouse is summarized

So, that’s what a data warehouse is – a very well structured and nonvolatile, ‘de facto’, single source of truth for a company. You can learn more about it in our Data Literacy course.

Ready to take the next step towards a data science career?

Check out the complete Data Science Program today. Start with the fundamentals with our Statistics, Maths, and Excel courses. Build up a step-by-step experience with SQL, Python, R, Power BI, and Tableau. And upgrade your skillset with Machine Learning, Deep Learning, Credit Risk Modeling, Time Series Analysis, and Customer Analytics in Python. Still not sure you want to turn your interest in data science into a career? You can sign up for free by clicking on the button below and explore the curriculum.

Iliya Valchanov

Iliya Valchanov

Co-founder of 365 Data Science

Iliya is a finance graduate with a strong quantitative background who chose the exciting path of a startup entrepreneur. He demonstrated a formidable affinity for numbers during his childhood, winning more than 90 national and international awards and competitions through the years. Iliya started teaching at university, helping other students learn statistics and econometrics. Inspired by his first happy students, he co-founded 365 Data Science to continue spreading knowledge. He authored several of the program’s online courses in mathematics, statistics, machine learning, and deep learning.

Top