Data Preprocessing with NumPy
This course is designed to show you how to work with one of Python’s fundamental packages – NumPy. You will learn what a “package” is and see how to install, upgrade and import it. By the time you finish the course, you’ll be comfortable with NumPy’ ndarray class, how to slice and reduce the dimensions of its instances, as well as how to quickly refer to the documentation. Furthermore, you’ll be ready to take advantage of NumPy’s various built-in functions and methods, which we’ll use to generate random and non-random data, import and export data to and from Python, find statistical values for a dataset, and clean and preprocess ndarrays.
Sign up to
preview the program
Create a free account and start learning data science today.create free account
Our graduates work at exciting places:
Intro to NumPy
This introductory section presents the NumPy package and its applications. You’ll learn how to install and upgrade NumPy, before quickly learning about its most important assets – “arrays”. We’ll also go over how to use the documentation - an extremely useful component for our work later on in the course.
This section follows NumPy’s role in the development of Python and takes a closer look at ndarrays. We discuss what makes them so useful and compare them to another similarly-looking data structure – NumPy lists.
Here, we focus on the basic NumPy syntax. You’ll learn about “indexing” and the different ways of assigning values to an array. This section also explains the elementwise properties of arrays, as we go over the different types of data we can store in them. In addition, we’ll take a look at some of the most important characteristics and properties of NumPy functions.
Working with arrays
This section explores the concept of slicing and how its many variations can be applied to ndarrays. You’ll grasp what “dimensions” are when it comes to arrays and learn how the “reduce” function and method work.
This part of the course explains how to generate arrays of random and non-random data. We begin by creating “empty” arrays, as well as basic arrays of 1s and 0s, before moving on to random generators. Then, we introduce NumPy’s capabilities of generating pseudo-random data pulled from a probability distribution. The section concludes with the applications of generating pseudo-random data.
Importing and Saving Data
In this section of the course, we focus on importing and exporting, also known as saving data using the NumPy package. We discuss the differences between “np.loadtxt()” and “np.genfromtxt()” and their applications. We’ll examine NumPy’s capabilities to partially clean datasets as we import them. Later in the section, you’ll learn why you need to import a file into a specific datatype and how choosing the incorrect one can affect your results. We continue with the topic of saving ndarrays to external files where you’ll discover what N-P-Y and N-P-Z files are and when (and how) to export arrays in those formats. Finally, we provide you with a more conventional approach and showcase how to save arrays as text files.
This section revolves around NumPy’s capabilities to compute important characteristics or statistics from an array. These include minimal and maximal values, various forms of averages, covariances, correlations as well as histograms. In addition, you’ll also learn about nan equivalent functions and how to use them.
In this part of the NumPy course, we explore ways to clean and preprocess data in NumPy. You’ll understand how to find and fill missing values, reshape an array, delete excess data as well as sort, shuffle and cast ndarrays. The section also explains what argument functions are and why they are so useful, and introduces ways to combining arrays by stacking and concatenating them. Finally, you’ll discover how to extract the unique values of an array and why this can be important for your analysis.
Programming for Data Science
This course is part of Module 2 of the 365 Data Science Program. The complete training consists of four modules, each building upon your knowledge from the previous one. In contrast to the introductory nature of Module 1, Module 2 is designed to tackle all aspects of programming for data science. You will learn how to work with relational databases and SQL, as well as how to code in Python and R. By the end of this Module, you will have a versatile programming skill set.See All Modules
Why Choose the 365 Data Science Program?
Real-life project and data. Solve them on your own computer as you would in the office.
Our expert instructors are happy to help. Post a question and get a personal answer by one of our instructors.
Earn a verifiable certificate after each completed course. Celebrate your successes and share your progress with your professional network!
Trust the other 500,000 students
The course is in-depth and is delivered at a steady pace with eye catching visuals. The instructors go through all the basics really well. They try not to over-simplify the material, you get a good sense аof how deep Data Science is in the course. Great job!!!
This course is amazing! After watching the video carefully and doing all the exercises, I am even capable of having discussions with Machine learning major Master’s students! High standard course with reasonable pricing.
Very clear and in-depth explanation of data science and how all the inter-related concepts apply in real life business environment. Absolutely great for beginners! Best data science course I have come across so far!
I would highly recommend the course to any beginner who wants to venture into the world of Data Science. The concepts are very well explained and there is an emphasis on practical application which really helps create a better understanding of the concepts.