Data Preprocessing with NumPy

This course is designed to show you how to work with one of Python’s fundamental packages – NumPy. You will learn what a “package” is and see how to install, upgrade and import it. By the time you finish the course, you’ll be comfortable with NumPy’ ndarray class, how to slice and reduce the dimensions of its instances, as well as how to quickly refer to the documentation. Furthermore, you’ll be ready to take advantage of NumPy’s various built-in functions and methods, which we’ll use to generate random and non-random data, import and export data to and from Python, find statistical values for a dataset, and clean and preprocess ndarrays.

Sign up to
preview the program
for FREE!

Create a free account and start learning data science today.

create free account
Our graduates work at exciting places:
walmart
tesla
paypal
citibank
booking.com

Section 1

Intro to NumPy

This introductory section presents the NumPy package and its applications. You’ll learn how to install and upgrade NumPy, before quickly learning about its most important assets – “arrays”. We’ll also go over how to use the documentation - an extremely useful component for our work later on in the course.

Premium course icon The NumPy Package
Premium course icon Installing/Upgrading Numpy
Premium course icon What is an array?
Premium course icon The NumPy Documentation

Section 2

Why NumPy?

This section follows NumPy’s role in the development of Python and takes a closer look at ndarrays. We discuss what makes them so useful and compare them to another similarly-looking data structure – NumPy lists.

Premium course icon History of Num Py
Premium course icon ndarrays
Premium course icon Arrays vs Lists

Section 3

NumPy Fundamentals

Here, we focus on the basic NumPy syntax. You’ll learn about “indexing” and the different ways of assigning values to an array. This section also explains the elementwise properties of arrays, as we go over the different types of data we can store in them. In addition, we’ll take a look at some of the most important characteristics and properties of NumPy functions.

Premium course icon Indexing
Premium course icon Assigning values
Premium course icon Elementwise Properties
Premium course icon Types of data supported by NumPy
Premium course icon Characteristics of NumPy Functions Part 1
Premium course icon Characteristics of NumPy Functions Part 2

Section 4

Working with arrays

This section explores the concept of slicing and how its many variations can be applied to ndarrays. You’ll grasp what “dimensions” are when it comes to arrays and learn how the “reduce” function and method work.

Premium course icon Slicing
Premium course icon Stepwise Slicing
Premium course icon Conditional Slicing
Premium course icon Dimensions and the Squeeze Function

Section 5

Generating Data

This part of the course explains how to generate arrays of random and non-random data. We begin by creating “empty” arrays, as well as basic arrays of 1s and 0s, before moving on to random generators. Then, we introduce NumPy’s capabilities of generating pseudo-random data pulled from a probability distribution. The section concludes with the applications of generating pseudo-random data.

Premium course icon np.empty, np.zeros, np.ones, np.full
Premium course icon "_like" functions
Premium course icon Generating a Sequence of Numbers (np.arange)
Premium course icon Random Generators and Seeds
Premium course icon np.integers(), np.random(), np.choice()
Premium course icon Probability Distributions
Premium course icon Applications

Section 6

Importing and Saving Data

In this section of the course, we focus on importing and exporting, also known as saving data using the NumPy package. We discuss the differences between “np.loadtxt()” and “np.genfromtxt()” and their applications. We’ll examine NumPy’s capabilities to partially clean datasets as we import them. Later in the section, you’ll learn why you need to import a file into a specific datatype and how choosing the incorrect one can affect your results. We continue with the topic of saving ndarrays to external files where you’ll discover what N-P-Y and N-P-Z files are and when (and how) to export arrays in those formats. Finally, we provide you with a more conventional approach and showcase how to save arrays as text files.

Premium course icon np.loadtxt() vs np.genfromtxt()
Premium course icon String vs Objet vs Numbers
Premium course icon Simple Cleaning when Importing
Premium course icon np.save()
Premium course icon np.savez()
Premium course icon np.savetxt()

Section 7

Statistics

This section revolves around NumPy’s capabilities to compute important characteristics or statistics from an array. These include minimal and maximal values, various forms of averages, covariances, correlations as well as histograms. In addition, you’ll also learn about nan equivalent functions and how to use them.

Premium course icon Using NumPy functions - np.mean()
Premium course icon Min & Max values (min, amin, minimum and equivalent max, amax, maximum)
Premium course icon Statistical Order Functions (np.ptp, np.percentile, np.quantile)
Premium course icon Averages and Variances (mean, median, average, var, std etc.)
Premium course icon Correlation and Covariance
Premium course icon Histograms in NumPy Part 1
Premium course icon Histograms in NumPy Part 2
Premium course icon N-A-N Equivalent Functions

Section 8

Preprocessing

In this part of the NumPy course, we explore ways to clean and preprocess data in NumPy. You’ll understand how to find and fill missing values, reshape an array, delete excess data as well as sort, shuffle and cast ndarrays. The section also explains what argument functions are and why they are so useful, and introduces ways to combining arrays by stacking and concatenating them. Finally, you’ll discover how to extract the unique values of an array and why this can be important for your analysis.

Premium course icon Checking for Missing Values
Premium course icon Substituting Missing Values
Premium course icon Reshaping
Premium course icon Removing Values
Premium course icon Sorting Data
Premium course icon Argument Functions - Argument Sort
Premium course icon Argument Functions - Argument Where
Premium course icon Shuffling Data
Show all lessons
Premium course icon Assigning DataTypes
Premium course icon Striping Data
Premium course icon Stacking Data - stack, dstack, vstack, hstack
Premium course icon Concatenate
Premium course icon Unique
Show fewer lessons
MODULE 2

Programming for Data Science

This course is part of Module 2 of the 365 Data Science Program. The complete training consists of four modules, each building upon your knowledge from the previous one. In contrast to the introductory nature of Module 1, Module 2 is designed to tackle all aspects of programming for data science. You will learn how to work with relational databases and SQL, as well as how to code in Python and R. By the end of this Module, you will have a versatile programming skill set.

See All Modules

Trust the other 500,000 students

Ready to start?
Sign up today for FREE!

Whether you want to scale your career or transition into a new field, data science is the number one skillset employers look for. Grow your analytics expertise and get hired as a data scientist!