Data Cleaning and Preprocessing with pandas

Introducing you to the fundamentals of the quintessential Python data analysis library: pandas, and its core data structures – the Series and DataFrame objects.
Hours

2

Lessons

27

Quizzes

0

Assignments

3

Course description

pandas is one of today’s most successful data analysis libraries out there. A favorite to many, its versatile functionalities can be leveraged for manipulation of many types of data - numeric, text, Boolean, and more. That’s one of the features that make pandas the go-to choice for analysts, especially during the data cleaning and preprocessing stages. Technically, pandas has been built on NumPy because the former needs the computational power and abilities of the latter. But what makes pandas truly great is its ability to operate with the data in an easy-to-use way, allowing you to focus almost entirely on your analytic task. And in this course, you will learn how to work with this powerful Python library and its core data structures – the pandas Series and DataFrames.
2

Data Cleaning and Data Preprocessing

Only about 20% of the work of a data analytics or science team goes to statistical analysis, making visualization or predictive models. The bulk of the time is consumed by collecting, cleaning, and preprocessing data. That is why in this section, we’ve provided a single lecture that aims at clarifying the meaning of and difference between the data cleaning and data preprocessing stages.

3

pandas Series

Here, we will introduce you to working with one of the two core data structures of pandas – the pandas Series object. You will also discover several common methods and learn how to apply them to a pandas Series.

4

pandas DataFrames

This section focuses on the other fundamental object in pandas - the DataFrame. The DataFrame is universally known as the most important structure in this library. Here, we will revise its characteristics as well as comment on several popular related methods. In addition, we will show you how to deal with various techniques for data selection in a DataFrame.