Description
In the tutorial: beginner_source/blitz/neural_networks_tutorial.py,
The explanation for the first linear layer dimensions is unclear:
self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6*6 from image dimension
The input image dimension expected is 32 x 32.
The visualization of the net shows a dimension of 5x5 after the last max pool layer.
Where is the extra 1 x 1 coming from ?
The layer dimensions calculation can be a hurdle for beginners, is for me anyway.
Its confusing because the dimension sizes is complicatedly dependent on the input image size, which is nowhere in the initialization parameters.
The paper linked in the docs is helpful: https://arxiv.org/pdf/1603.07285.pdf
So, after printing the dimensions before and after each step in the net, i see that the net visualization image (https://pytorch.org/tutorials/_images/mnist.png) has the wrong dimensions listed. The actual sizes after each step in the net are:
torch.Size([1, 1, 32, 32]) # input size
torch.Size([1, 6, 30, 30]) # after conv1
torch.Size([1, 6, 30, 30]) # after relu1
torch.Size([1, 6, 15, 15]) # after maxpool1
torch.Size([1, 16, 13, 13]) # after conv2
torch.Size([1, 16, 13, 13]) # after relu2
torch.Size([1, 16, 6, 6]) # after maxpool2
torch.Size([1, 576]) # after flattening
torch.Size([1, 120]) # after fully connected layer 1
torch.Size([1, 84]) # after fully connected layer 2
torch.Size([1, 10]) # after fully connected layer 3