The Ultimate NumPy Tutorial (With Code!)

Join over 2 million students who advanced their careers with 365 Data Science. Learn from instructors who have worked at Meta, Spotify, Google, IKEA, Netflix, and Coca-Cola and master Python, SQL, Excel, machine learning, data analysis, AI fundamentals, and more.

Start for Free
Kay Jan Wong Kay Jan Wong 10 Jun 2022 10 min read

Data scientists deal with data all the time, usually in the format of lists, dictionaries, or tables. The process can be complex, involving preprocessing, queries, and modifications such as data wrangling. As an aspiring data analyst or machine learning engineer, you’re probably thinking that these operations can be quite time-consuming, but thankfully you’ll have a helping hand. Instead of sorting or reversing algorithms by yourself, the Python NumPy package handles everything efficiently for you. The library also boasts high mathematical functions for linear algebra, matrices, and arrays. Because of its computational speed and high functionality, NumPy is often a go-to choice for many professionals and is perfect for anyone looking to break into data science.

In this tutorial, I’ll show you how to install NumPy, go through its basic uses, like how to create an array, and finish off with some more advanced techniques such as performing queries and data manipulation.

Table of Contents

What Is NumPy in Python?

NumPy (i.e. Numerical Python) is one of the most popular Python libraries, utilized in many other popular packages as well, such as pandas, SciPy, Matplotlib, and many more. With arrays naturally faster than Python lists, it optimizes computational performance in the workflow – from simple mathematical calculations to data manipulation for data science operations.

How to Install NumPy in Python?

Step 1: Install NumPy

You can install NumPy by using the Python tools conda or pip:

conda install numpy
pip install numpy

Simply run either code and et voila, you’re ready to go!

Step 2: Import NumPy in Python

To use the NumPy in Python, simply import it using the command:

import numpy as np

In Python, the library usually appears with the shortened np by convention.

Why Is NumPy Used in Python?

As I’ve previously mentioned, NumPy has many functionalities that make it a good fit for data scientists to use in their daily tasks. Perhaps what the Python library is most known for is its use of multidimensional arrays and their high computational speed.

How to Create a NumPy Array in Python?

A NumPy array is a type of data structure that stores, well, data. While similar to Python lists in terms of the coding convention, they optimize operational performance, resulting in faster computation and ease of manipulating numerical data.

One-dimensional NumPy Array

To create a basic array, you simply wrap an np.array()command around a Python list:

>>> import numpy as np
>>> a = np.array([1, 2, 3, 4, 5])

In some cases, it is useful to generate one automatically without hardcoding values like the example above. For instance, I can create an array filled with very small random numbers, which is useful to add noise to data. Another example would be full of 0s or 1s, the latter of which I can multiply with another number to create any numbered array I want.

Alternatively, if I want to initialize a NumPy array with different numbers, not just 0s and 1s, I can also fill it with uniformly distributed decimals in a range or, alternatively, in a range of running numbers – the possibilities are endless!

>>> np.empty(shape=2) # filled with small random numbers
array([4.24399158e-314, 8.48798317e-314])
>>> np.zeros(shape=3) # filled with zeros
array([0., 0., 0.])
>>> np.ones(shape=3) # filled with ones
array([1., 1., 1.])
>>> np.linspace(start=0, stop=10, num=5) # uniformly distributed in a range
array([ 0. ,  2.5,  5. ,  7.5, 10. ])
>>> np.arange(start=1, stop=5, step=0.5) # range of running numbers
array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

In the code examples above, I have provided the argument name for clarity, but it works just the same without it –  np.zeros(shape=3) is the same as np.zeros(3).

N-dimensional NumPy Array

Until now, we dealt with a one-dimensional array, otherwise known as a vector. NumPy, however, can handle two-dimensional matrices, three-dimensional tensors, and more. Indeed, higher dimensional arrays have more layers onto which we can perform more complex mathematical operations. For simplicity, you can call them n-dimensional or as they’re displayed in the Python library – ndarray.

To create one, you can wrap an np.array() command around a Python list so that it becomes a nested list instead:

>>> np.array([[1, 2, 3], [4, 5, 6]])
array([[1, 2, 3],
       [4, 5, 6]])

You can also create an n-dimensional ndarray by reshaping or stacking multiple one-dimensional arrays horizontally or vertically. Reshaping is useful when you’re working with the wrong shape or dimension. What does that mean? In some cases, you may have different operations returning some of the data. Thus, being able to stack arrays horizontally or vertically allows you to retrieve and reassemble all of the data.

>>> a = np.zeros(6)
>>> b = np.ones(6)
>>> a.reshape((2, 3))
array([[0., 0., 0.],
       [0., 0., 0.]])
>>> np.hstack((a, b)) # stacking horizontally
array([0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.])
>>> np.vstack((a, b)) # stacking vertically
array([[0., 0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1., 1.]])

Moreover, you can generate n-dimensional arrays automatically without hardcoding with np.zeros and np.ones. The difference is that the shape argument is now a tuple depicting the shape of the array. By convention, it should be (height, width) when creating a two-dimensional space:

>>> np.zeros(shape=(2, 3))
array([[0., 0., 0.],
       [0., 0., 0.]])
>>> np.ones(shape=(3, 2))
array([[1., 1.],
       [1., 1.],
       [1., 1.]])

Other useful n-dimension arrays include the identity matrix or one filled with 0s or 1s that takes the shape of the input array. Of course, you can do the latter manually, but why not skip a step by writing np.zeros_like and np.ones_like to perform the same operation:

 >>> np.eye(N=3) # identity matrix
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])
>>> a = np.ones((2, 3)) # matrix filled with 1
>>> np.zeros_like(a) # matrix filled with 0, taking shape of input matrix a
array([[0., 0., 0.],
       [0., 0., 0.]])
>>> np.ones_like(a) # matrix filled with 1, taking shape of input matrix a
array([[1., 1., 1.],
       [1., 1., 1.]])

How to Use NumPy in Python?

After learning how to create basic and n-dimensional spaces, we can take it up a notch by modifying the values. And how do we do that? For starters, we can perform mathematical operations in NumPy for data wrangling and summary statistics, as well as sort and flip the array.

Mathematical Operations

With NumPy, you can execute all the math basics such as:

  • Addition
  • Subtraction
  • Multiplication
  • Division

There are 2 ways you can do these: with an integer or with another array. When you have an integer, it is applied for all values in the array – otherwise known as broadcasting. Meanwhile, if you’re working with 2 arrays, then the mathematical operation will be done pairwise.

>>> a = np.arange(1, 5)
>>> a
array([1, 2, 3, 4])
>>> a + 1 # addition of array with integer
array([2, 3, 4, 5])
>>> a + a # addition of array with another array
array([2, 4, 6, 8])
>>> a * 2 # multiplication of array with integer
array([2, 4, 6, 8])
>>> a * a # multiplication of array with another array
array([ 1,  4,  9, 16])

Other mathematical operations include taking the minimum, maximum, standard deviation, covariance, cumulative summation, and cumulative product of an array:

>>> a = np.arange(1, 10)
>>> np.min(a) # minumum
1
>>> np.max(a) # maximum
9
>>> np.std(a) # standard deviation
2.581988897471611
>>> np.cov(a) # covariance
array(7.5)
>>> np.cumsum(a) # cumulative sum
array([1, 3, 6, 10, 15, 21, 28, 36, 45], dtype=int32)
>>> np.cumprod(a) # cumulative product
array([1, 2, 6, 24, 120, 720, 5040, 40320, 362880], dtype=int32)

By getting such a statistical summary of the data, you get a better idea of how the values are distributed within your workspace.

Another feature of the Python library is that it allows even more complex calculations such as:

  • Dot product (summation of pair-wise multiplication)
  • Cross product
  • Trigonometry operations

These are useful if you’d like to perform computational geometry, such as to measure the angle between 2 vectors:

>>> a = np.arange(1, 3) # array([1, 2])
>>> b = np.arange(4, 6) # array([4, 5])
>>> np.dot(a, b)
14
>>> a = np.arange(1, 4) # array([1, 2, 3])
>>> b = np.arange(5, 7) # array([5, 6])
>>> np.cross(a, b)
array([-18,  15,  -4])

>>> unit_vector1 = [0, 1]
>>> unit_vector2 = [1, 0]
>>> dot_product = np.dot(unit_vector1, unit_vector2) # dot product
>>> angle = np.arccos(dot_product) # inverse cosine
>>> angle # angle in radians
1.5707963267948966
>>> angle / np.pi * 180 # angle in degree
90.0

Last but not least, we can take the sum of the whole array, or by column and row respectively. Say you want to add all the values in an array. Or you have a table where each column shows how an item changed over time – summing them all up can help derive the total change in value:

>>> a = np.vstack((np.arange(1, 4), np.arange(1, 4)))
>>> a
array([[1, 2, 3],
       [1, 2, 3]])
>>> np.sum(a)
12
>>> np.sum(a, axis=0) # sum by column
array([2, 4, 6])
>>> np.sum(a, axis=1) # sum by row
array([6, 6])

As a rule, the matrix’s Axis 0 refers to the columns and Axis 1 – to the rows.

Sorting and Flipping an Array

Common array manipulations include sorting or reversing the order of elements. I find this operation useful in data science interviews that aim to test your data wrangling abilities.

Note that you can do these NumPy operations on a list and still obtain an array regardless. In addition, it is worth mentioning that these operations will not modify the original space – instead, they’ll return a copy:

>>> a = [3, 5, 3, 8, 4]
>>> np.sort(a)
array([3, 3, 4, 5, 8])
>>> np.flip(a)
array([4, 8, 3, 5, 3])
>>> a
[3, 5, 3, 8, 4]

What Are NumPy’s Advanced Functions in Python?

As you might remember, I mentioned we’ll also be looking at some more complex techniques you can use NumPy for when working in a Python space. These include:

  • Queries
  • Data manipulation

While more advanced, obtaining these skills is definitely worthwhile if you’re an aspiring data scientist looking for an entry-level job as you can further impress potential employers with your resume.

Querying with NumPy

Suppose you want to retrieve some characteristic or another piece of information such as:

  • The number of values
  • The type of values
  • The dimension and shape of an array

You can achieve this by querying them directly in NumPy:

>>> a
array([[0., 0., 0.],
       [0., 0., 0.]])
>>> a.size # number of values
6
>>> a.dtype # type of values
dtype('float64')
>>> a.ndim # number of dimensions
2
>>> a.shape # shape of array
(2, 3)

Another purpose for queries is to return a portion of an array that satisfies a logical condition, i.e., conditions that return a True or False statement. Use this functionality to check whether any value in your space violates the intended range of accepted values:

>>> a = np.arange(1, 10)
>>> a
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a > 5 # logical condition
array([False, False, False, False, False,  True,  True,  True,  True])
>>> a[a > 5]
array([6, 7, 8, 9])

Data Manipulation with NumPy

NumPy arrays are mutable, meaning that the values inside can be changed. For instance, you can perform a logical operation with the format:

np.where(logical condition, value if true, value if false)

Remember how I mentioned that you can check whether a value has violated the intended rage through queries? Well, with this you can modify those values, replacing them with other numbers. For example, the following code shows a condition of retaining the original values if they are greater than 5, and 0s elsewhere:

>>> a
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.where(a > 5, a, 0)
array([0, 0, 0, 0, 0, 6, 7, 8, 9])

Previously, we reshaped our space using the command of the same name. Now, if we use it again here, NumPy will return a new array with the shape we defined. But if we want to modify it directly, we can use the resize command instead:

>>> a = np.arange(1, 10)
>>> a.reshape((3, 3))
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
>>> a # reshape does not change the array a
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a.resize((3, 3)) # does not return anything, changes array directly
>>> a
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Important note: This change is irreversible. If you’d like to return your array to its original shape, you will have to use the reshape command.

To learn more commands and functionalities, read the NumPy source code and NumPy documentation.

Q&A

What is NumPy in Python?
NumPy is one of the most popular open-source libraries in Python. With a multitude of functionalities that help with the data science workflow, you can preprocess, manipulate, extract, and sort data. At its core sits the array object, otherwise called ndarray. Essentially, you can create and work with one-dimensional, stack them together to form multidimensional spaces, manipulate their shape, size, order, and more. Additionally, users can also perform mathematical operations from basics like addition and subtraction to more complex techniques in the realm of linear algebra and matrix processing. Unlike other Python libraries, NumPy has a very high computational speed that optimizes the workflow performance. You can perform lengthier operations by writing 1-2 lines of code as the library is rich in commands and documentation. It can also be integrated with other commonly used packages like pandas, SciPy, and Matplotlib in order to maximize its programming efficiency.

 

Why Do We Use NumPy in Python?
We use NumPy in Python due to is very high computational speed and support of multidimensional objects, otherwise called arrays. Similar to a Python list, an array is a faster type of data structure that can store a significant amount of information. Moreover, it optimizes the operational performance, easing the process of numerical data manipulation. NumPy allows for calculation between arrays with other arrays or different elements such as integers. Its code is also simplified to closely resemble standard notations, as well as to reduce the amount of processing time and number of bugs. Because of this, operations are easier to read and interpret. Integrating the NumPy library in Python allows for users to conduct advanced mathematical techniques in just a few lines of code, such as to perform linear algebra, process matrices, and apply complex queries and manipulations onto the arrays at hand. The package is also useful for data wrangling as it holds many statistical methods for preprocessing and dealing with different data types.

 

Is NumPy Easy to Learn?
Yes, NumPy is easy to learn. Its syntax is simpler than Python’s, synthesized into cleaner, shorter lines. They closely emulate standard mathematic notations, which significantly reduced the time you need to spend writing code. Because it’s operationally automized, NumPy takes less time to get used to than other libraries such as pandas. Not only that, but also the package is computationally fast which allows for more time spent experimenting.

 

Ultimate NumPy Tutorial: Next Steps

Mastering NumPy in Python is absolutely critical if you are set on a career in data science. Not only will it simplify your data science workflow, but it will be a valuable asset during your job hunt.

However, programming can be a lengthy, confusing process before you can before fully proficient – especially without a background in computer science. In order to experience a smoother learning journey, it’s a good idea to find good online resources to support you in every step of your data science journey. Luckily, you’ve come to the right place!

The 365 Data Science Program offers self-paced courses led by renowned industry experts. Starting from the very basics all the way to advanced specialization, you will learn by doing with a myriad of practical exercises and real-world business cases. If you want to see how the training works, start with a selection of free lessons by signing up below.

Kay Jan Wong

Kay Jan Wong

Data Scientist

Kay Jan is a Data Scientist and writer with a Bachelor’s degree in Engineering and Master’s in Computing. Her experience includes working for some of the world’s leading financial service and business advisory providers. In 2018, she started her e-learning journey to continue improving her professional skills and knowledge. Kay finds fulfilment in giving back to the data science community through teaching and writing.

Top