Posted on:

23 Apr 2024


Question about the application of convolutional network on color image (3D)

I have a doubt about what happens when applying a convolutional layer to a color image. To simplify, let's say I have a color image (3 RGB channels) with a width of 1 pixel and a height of 1 pixel, so its dimensions are nxmx3, and I apply a convolutional layer with a 1x1 kernel that has only 1 filter, to maintain the output dimensions, and to focus on how we transition from a depth of 3 to the number of filters of the convolutional layer, in this case, 1.

So I apply the only filter of the convolutional layer to each layer separately of the original image, obtaining the transformed pixel in each layer, imagining some arbitrary values, without considering scaling or anything.

1x1R Transformed --> 6

1x1G Transformed --> 4

1x1B Transformed --> 3

My question is, since the output of the convolutional layer only needs to have the depth of the number of filters, what happens? Do the layers get summed? Is the average taken? Is another operation performed?
I attach a drawing in case it helps. My question is how would you transition from the [pixel 1x1 x kernel] to the pink square, which would be the first element of the output of the convolutional layer, formed by the light blue, dark green, and orange squares.

Greetings and thanks in advance.

0 answers ( 0 marked as helpful)

Submit an answer