Input data for AI on image data is normally three channels of R, G and B. On the other hand, we often select some dozens of channel count on convolution or deconvolution, because three is not enough in terms of parameter size. Output channel count on semantic segmentation is the number of classification elements. Therefore, the number will be smaller than the channel count of convolution or deconvolution layers.
Now we are going to explain pointwise convolution. It is often used for semantic segmentation, because you can change the output channel count keeping the image size using this type of layer.
The image below explains the basic idea of the calculation process. The input image has three channels, and the filter count is one. Therefore, the output image has one channel.
The filter has three values, and each value is linked to an individual channel. Any pixel on the input image is selected, multiply the value at the same location on each channel with the corresponding filter value, and sum up all results. Each result value is the output at the selected pixel. You can get one channel output with one filter. As you already understand, using three filters results in three channel output.
The value count of filters depends on the channel count of both input and output. If the channel count of input is 10, and of output is 3, the value count of each filter is 10 and the filter count is 3.