dimensions of feature maps

Question

I understand the dimensions of the image and that there are 10 different kernels that transform it to the feature maps, the only thing unclear is why the 3 (the colours) in the end disappears after the first convolution and the feature map does not have the x3 in the end

Answer 1

Hi Nikita,

In order to generate a single feature map, we take the whole depth of the image. That means that a single two dimensional feature map is produced by taking all the colour channels of the original image. We don't have 3 separate maps for each colour, it's just a single one.
That makes sense when we recall that the idea of a feature map is to "show where a specific feature is present". If, for example, the feature we are interested in is a circle, we want to be able to identify red circle just as well as green ones. That's why the kernels that generate this feature map will look into all colours.

Important to note, though, is that the outlined above is not true just for the colours. It also happens for every subsequent convolution. Since usually, we get many feature maps as an output out of a single layer, they are ALL considered by the next kernel. So, the kernel takes all, let's say, 256 feature maps, combines them, and generates a single new feature map from them. Thus, in a sense, you can think of the different colours of the original image as "feature maps" for red, green and blue.

To give an example with numbers imagine we have a convolutional layer with:
- input that consists of 256 feature maps with spatial dimensions of 128x128
- kernel size of 5x5
- we want to obtain 64 feature maps as an output

In this case, we can treat the input as a tensor with dimensions 128x128x256. This means, that the kernel to be used should have dimensions of 5x5x256. This kernel produces a single feature map with dimensions 124x124 (ignoring edges).
To get 64 different feature maps, we need 64 different kernels.

SYmbolically, we can represent it like this:
[128x128x256] * [5x5x256x64] -> [124x124x64]

Hope this clears up some of the confusion regarding the dimensions of kernels and feature maps.

Best,
Nikola, 365 Team

Answer 2

I had the same question. So how do we calculate the convolution of three layers?

I understand how we calculate the kernel * filter to get the value over one layer.

We can calculate this value for each layer, we get then 3 values.

The question is mathematically speaking, what operation do we do make that fit into one value?

My guess is that our kernel also has three layers, which is the filter x 3 slices.

If we apply the same convulation equation by summing up the results of the three layers, we would get the one value.

dimensions of feature maps

Submit an answer