How Conv2d works in different input dimension and filter dimension?

Issue

I wonder how TensorFlow conv2d works in different input dimensions and filter dimensions.
For example, the input shape of a Conv2d layer is [1, 13, 13, 10] and the filter shape is
[20, 3, 3, 10] (will use 20 3×3 filters, no pedding)).

In this situation, how does the Filter works?
As far as I understand, 20 filters do dot product on 10 inputs each.

(The first filter does dot product on every 10 inputs, and next filter does same,,)
So the output shape can be [1, 11, 11, 20].

Am I right?

Solution

Assuming you have as input [b,w,h,c], and your kernel as [N,w,h] (there are no channels for kernels as you presented in your example, the channels can be seen as the total of kernels).
Then your logic is correct, each filter will compute a dot product with each channel and sum the results of each channel, so for each kernel you will get one single output, resulting in 20 channels of 11×11 (lose 2 w and h due to the lack of padding) = [11x11x20].