Resolved: How is an RGB Image Represented in Numpy?

Question

In the final lecture for this course, the instrutor says that an RGB image is a 3 x 400 x 400 tensor, which is one 400x400 matrix for each R,G,B channel. But I'm not sure that's how numpy expresses an RGB image, right? After some research on my own, it seems numpy would represent an RGB image of 400 x 400 pixels wide as np.array(400,400,3).

Is that right or wrong?

Thanks,

Justin

Answer 1

Hi Justin,

Ultimately, it is up to the user to decide how they want to represent an image.
But yes, more often than not images would have the shape (height, width, rgb_channels). In most cases the channels are 3, as there are 3 colors in a regular pixel.

One reason for that is because in Machine Learning we are not working just with a single image, but with a whole dataset of images. So, let's say, you have 1000 400x400 images. The numpy representation of that dataset would have the shape (1000, 400, 400, 3), where the first dimension represents the number of samples in our dataset.

At the end of the day, both representations can work and provide a helpful construct.

Hope this helps!

Best,
Nikola, 365 Team

Answer 2

Thanks Nikola!

So it seems you are saying both ways are valid, but implementation depends on user preference and the problem we are trying to solve. Is that right? And it seems that more often than not, the general accepted best practice is (height, width, channels) for one image. If the image is RGB, it will have 3 channels, if it is RGBA, it will have 4.

Thanks again!

-Justin

Resolved: How is an RGB Image Represented in Numpy?

Submit an answer

related questions