# Data Preprocessing with NumPy

This course is designed to show you how to work with one of Python’s fundamental packages – NumPy. You will learn what a “package” is and see how to install, upgrade and import it. By the time you finish the course, you’ll be comfortable with NumPy’ ndarray class, how to slice and reduce the dimensions of its instances, as well as how to quickly refer to the documentation. Furthermore, you’ll be ready to take advantage of NumPy’s various built-in functions and methods, which we’ll use to generate random and non-random data, import and export data to and from Python, find statistical values for a dataset, and clean and preprocess ndarrays.

##### Our graduates work at exciting places:     ## Intro to NumPy

This introductory section presents the NumPy package and its applications. You’ll learn how to install and upgrade NumPy, before quickly learning about its most important assets – “arrays”. We’ll also go over how to use the documentation - an extremely useful component for our work later on in the course. The NumPy Package Installing/Upgrading Numpy What is an array? The NumPy Documentation

## Why NumPy?

This section follows NumPy’s role in the development of Python and takes a closer look at ndarrays. We discuss what makes them so useful and compare them to another similarly-looking data structure – NumPy lists. History of Num Py ndarrays Arrays vs Lists

## NumPy Fundamentals

Here, we focus on the basic NumPy syntax. You’ll learn about “indexing” and the different ways of assigning values to an array. This section also explains the elementwise properties of arrays, as we go over the different types of data we can store in them. In addition, we’ll take a look at some of the most important characteristics and properties of NumPy functions. Indexing Assigning values Elementwise Properties Types of data supported by NumPy Characteristics of NumPy Functions Part 1 Characteristics of NumPy Functions Part 2

## Working with arrays

This section explores the concept of slicing and how its many variations can be applied to ndarrays. You’ll grasp what “dimensions” are when it comes to arrays and learn how the “reduce” function and method work. Slicing Stepwise Slicing Conditional Slicing Dimensions and the Squeeze Function

## Generating Data

This part of the course explains how to generate arrays of random and non-random data. We begin by creating “empty” arrays, as well as basic arrays of 1s and 0s, before moving on to random generators. Then, we introduce NumPy’s capabilities of generating pseudo-random data pulled from a probability distribution. The section concludes with the applications of generating pseudo-random data. np.empty, np.zeros, np.ones, np.full "_like" functions Generating a Sequence of Numbers (np.arange) Random Generators and Seeds np.integers(), np.random(), np.choice() Probability Distributions Applications

## Importing and Saving Data

In this section of the course, we focus on importing and exporting, also known as saving data using the NumPy package. We discuss the differences between “np.loadtxt()” and “np.genfromtxt()” and their applications. We’ll examine NumPy’s capabilities to partially clean datasets as we import them. Later in the section, you’ll learn why you need to import a file into a specific datatype and how choosing the incorrect one can affect your results. We continue with the topic of saving ndarrays to external files where you’ll discover what N-P-Y and N-P-Z files are and when (and how) to export arrays in those formats. Finally, we provide you with a more conventional approach and showcase how to save arrays as text files. np.loadtxt() vs np.genfromtxt() String vs Objet vs Numbers Simple Cleaning when Importing np.save() np.savez() np.savetxt()

## Statistics

This section revolves around NumPy’s capabilities to compute important characteristics or statistics from an array. These include minimal and maximal values, various forms of averages, covariances, correlations as well as histograms. In addition, you’ll also learn about nan equivalent functions and how to use them. Using NumPy functions - np.mean() Min & Max values (min, amin, minimum and equivalent max, amax, maximum) Statistical Order Functions (np.ptp, np.percentile, np.quantile) Averages and Variances (mean, median, average, var, std etc.) Correlation and Covariance Histograms in NumPy Part 1 Histograms in NumPy Part 2 N-A-N Equivalent Functions

## Preprocessing

In this part of the NumPy course, we explore ways to clean and preprocess data in NumPy. You’ll understand how to find and fill missing values, reshape an array, delete excess data as well as sort, shuffle and cast ndarrays. The section also explains what argument functions are and why they are so useful, and introduces ways to combining arrays by stacking and concatenating them. Finally, you’ll discover how to extract the unique values of an array and why this can be important for your analysis. Checking for Missing Values Substituting Missing Values Reshaping Removing Values Sorting Data Argument Functions - Argument Sort Argument Functions - Argument Where Shuffling Data
Show all lessons Assigning DataTypes Striping Data Stacking Data - stack, dstack, vstack, hstack Concatenate Unique
Show fewer lessons
MODULE 2

## Programming for Data Science This course is part of Module 2 of the 365 Data Science Program. The complete training consists of four modules, each building upon your knowledge from the previous one. In contrast to the introductory nature of Module 1, Module 2 is designed to tackle all aspects of programming for data science. You will learn how to work with relational databases and SQL, as well as how to code in Python and R. By the end of this Module, you will have a versatile programming skill set.

## Trust the other 500,000 students 