Last answered:

15 Apr 2025

Posted on:

14 Apr 2025

0

How is an RGB Image Represented in Numpy?

In the final lecture for this course, the instrutor says that an RGB image is a 3 x 400 x 400 tensor, which is one 400x400 matrix for each R,G,B channel. But I'm not sure that's how numpy expresses an RGB image, right? After some research on my own, it seems numpy would represent an RGB image of 400 x 400 pixels wide as np.array(400,400,3).

Is that right or wrong?

Thanks,

Justin
2 answers ( 0 marked as helpful)
Instructor
Posted on:

15 Apr 2025

1
Hi Justin,

Ultimately, it is up to the user to decide how they want to represent an image.
But yes, more often images would have the shape (height, width, rgb_channels). In most cases the channels are 3, as there are 3 colors in a regular pixel.

One reason for that is because in Machine Learning we are not working just with a single image, but with a whole dataset of images. So, let's say, you have 1000 400x400 images. The numpy representation of that dataset would have the shape (1000, 400, 400, 3), where the forst dimension represents the number of samples in our dataset.

At the end of the day, both representations can work and provide a helpful construct.

Hope this helps!

Best,
Nikola, 365 Team
Posted on:

15 Apr 2025

0
Thanks Nikola!

So it seems you are saying both ways are valid, but implementation depends on user preference and the problem we are trying to solve. Is that right? And it seems that more often than not, the general accepted best practice is (height, width, channels) for one image. If the image is RGB, it will have 3 channels, if it is RGBA, it will have 4.

Thanks again!

-Justin

Submit an answer