🛠️ Scheduled Maintenance | We’ll be undergoing scheduled maintenance and upgrades between 00:00 PST Jan 26th until 00:00 PST Jan 28th. There may be brief interruption of services in that period. We apologize for the inconvenience.

×

Data Cleaning and Preprocessing with pandas

pandas is one of today’s most successful data analysis libraries out there. A favorite to many, its versatile functionalities can be leveraged for manipulation of many types of data - numeric, text, Boolean, and more. That’s one of the features that make pandas the go-to choice for analysts, especially during the data cleaning and preprocessing stages. Technically, pandas has been built on NumPy because the former needs the computational power and abilities of the latter. But what makes pandas truly great is its ability to operate with the data in an easy-to-use way, allowing you to focus almost entirely on your analytic task. And in this course, you will learn how to work with this powerful Python library and its core data structures – the pandas Series and DataFrames.

Sign up to
preview the program
for FREE!

Create a free account and start learning data science today.

create free account
Our graduates work at exciting places:
walmart
tesla
paypal
citibank
booking.com

Section 1

pandas - Basics

In this section, you will develop a basic understanding of the pandas library and practice with fundamental programming tools such as methods, parameters, arguments, attributes, and index values. You will also learn how to work with the pandas Series and DataFrame objects. In the end, we will present the pandas documentation and will show how you can navigate through it.

Premium course icon Introduction to the pandas Library
Premium course icon Installing and Running pandas
Premium course icon Introduction to pandas Series
Premium course icon Working with Attributes in Python
Premium course icon Using an Index in pandas
Premium course icon Label-based vs Position-based Indexing
Premium course icon More on Working with Indices in Python
Premium course icon Using Methods in Python - Part I
Show all lessons
Premium course icon Using Methods in Python - Part II
Premium course icon Parameters vs Arguments
Premium course icon The pandas Documentation
Premium course icon Introduction to pandas DataFrames
Premium course icon Creating DataFrames from Scratch - Part I
Premium course icon Creating DataFrames from Scratch - Part II
Premium course icon Additional Notes on Using DataFrames
Premium course icon pandas Basics - Conclusion (ARTICLE)
Show fewer lessons

Section 2

Data Cleaning and Data Preprocessing

Only about 20% of the work of a data analytics or science team goes to statistical analysis, making visualization or predictive models. The bulk of the time is consumed by collecting, cleaning, and preprocessing data. That is why in this section, we’ve provided a single lecture that aims at clarifying the meaning of and difference between the data cleaning and data preprocessing stages.

Premium course icon Data Cleaning and Data Preprocessing

Section 3

pandas Series

Here, we will introduce you to working with one of the two core data structures of pandas – the pandas Series object. You will also discover several common methods and learn how to apply them to a pandas Series.

Premium course icon .unique(), .nunique()
Premium course icon Converting Series into Arrays
Premium course icon .sort_values()
Premium course icon Attribute and Method Chaining
Premium course icon .sort_index()

Section 4

pandas DataFrames

This section focuses on the other fundamental object in pandas - the DataFrame. The DataFrame is the most important structure in this library. Here, we will revise its characteristics as well as comment on several popular related methods. In addition, we will show you how to deal with various techniques for data selection in a DataFrame.

Premium course icon A Revision to pandas DataFrames
Premium course icon Common Attributes for Working with DataFrames
Premium course icon Data Selection in pandas DataFrames
Premium course icon Data Selection - Indexing Data with .iloc[]
Premium course icon Data Selection - Indexing Data with .loc[]
Premium course icon A Few Comments on Using .loc[] and .iloc[]
MODULE 2

Programming for Data Science

This course is part of Module 2 of the 365 Data Science Program. The complete training consists of four modules, each building upon your knowledge from the previous one. In contrast to the introductory nature of Module 1, Module 2 is designed to tackle all aspects of programming for data science. You will learn how to work with relational databases and SQL, as well as how to code in Python and R. By the end of this Module, you will have a versatile programming skill set.

See All Modules

Trust the other 500,000 students

Ready to start?
Sign up today for FREE!

Whether you want to scale your career or transition into a new field, data science is the number one skillset employers look for. Grow your analytics expertise and get hired as a data scientist!