Innovation by DX

IT for Society

Artificial Intelligence (AI)

Lesson 27  Network Architecture for Image Classification

Lesson 17 describes an example of network architecture for deep learning. Let's review the contents, imagining the case of image classification.

The following is an example of network which consists of multiple convolution, pooling, and full connection layers.

When the system gets the input image, the size will be adjusted at 320×320 . The matrix size is (3,320,320) as we suppose three channels of RGB.

The filter size of the first convolution layer is 2×2 , and the slide size is 2×2 . Therefore, the output size is 3×160×160 .

The second layer is pooling, and the filter size is 2×2 , and the slide size is 2×2 . Therefore, the output size is 3×80×80 .

One more convolution layer and one more pooling with the same parameters are added. The output size at the last pooling layer is 3×20×20 .

The system will change the square matrix to a column vector just before the first full connection layer. Thus, the input matrix size is (1200,1) .

As Lesson 13 describes, the calculation formulas at each artificial neuron are as follows.

u = i = 1 n w i x i + b
z = f u

Full connection layer has multiple artificial neurons, and all input and output elements are mutually connected. The input matrix size of the first full connection layer is (1200,1) , and the output matrix size is (800,1) . It means, the number of input neurons is 1200 , and the number of output neurons is 800 . The formulas above can be described with matrices on all artificial neurons as follows.

u = W x + b
z = f u

The size of x is (1200,1) , and the size of u is (800,1) . Therefore, the size of W is (800,1200) , and the size of b is (800,1) . z is (800,1) .

The matrix size at the last full connection layer is (2,1) . The reason is, we suppose the network will classify two classes, such as dog and cat. If the network will classify four classes, the output matrix size should be (4,1) .

The activation function at the last full connection layer should be soft max. It allows you to get probabilities of classes as the total of output matrix values is 1.

To Next Article

To Contents Page