This article is about one of the pre-trained CNN models known as the VGG-16. The process of using a pretrained CNN is known as transfer learning. In this case, we need not build a CNN instead we can use this with a modification. The modifications are:-
- Removing the top (input) and bottom (output) layers
- Adding input layer with size equal to the dimension of the image
- Adding output layer with size equal to number of classes
- Adding additional layers (if needed)
The pre-trained model explained in this article is called the VGGNet. This model was developed by the Oxford University researchers as a solution to the ImageNet task. The ImageNet data consists of 10 classes with 1000 images each leading to 10000 images in total.
VGGNet

I/p 1 2 3 4 5 6 7 8 9 10 11 12 13 o/p
Credit: – Nshafiei neural network in Machine learning Creative Commons Attribution-ShareAlike 4.0 License.
This is the architecture for VGGNet. This has been found for the CIFAR-10 dataset, a standard dataset containing 1000 classes. This was used for multiclass classification. Some modifications are made before using it for detecting OA. The output dimension is changed into 1*1*2 and the given images must be reshaped to 224*224 since this dimension is compatible with VGGNet. The dimensions and other terms like padding, stride, number of filters, dimension of filter are chosen by researchers and found optimal. In general, any number can be used in this place.
The numbers given below the figure correspond to the layer number. So the VGGNet is 13 layered and is CNN till layer 10 and the rest are FNN.
| Colour index | Name |
| Grey | Convolution |
| Red | Pooling |
| Blue | FFN |
Computations and parameters for each layer
Input
224*224 images are converted into a vector whose dimension is 224*224*3 based on the RGB value.
Layer 1-C1
This is the first convolutional layer. Here 64 filters are used.
Wi =224, P=1, S=1, K=64, f=3*3
Wo =224 (this is the input Wi for the next layer)
Dim= 224*224*64
Parameter= 64*3*3= 576
Layer 2-P1
This is the first pooling layer
Wi =224, S=2, P=1, f=3
Wo=112 (this is the input Wi for the next layer)
Dim= 112*112*3
Parameter= 0
Layer 3-C2C3
Here two convolutions are applied. 128 filters are used.
Wi =112, P=1, S=1, K=64, f=3
Wo=112 (this is the input Wi for the next layer)
Dim= 112*112*128
Parameter= 128*3*3=1152
Layer 4- P2
Second pooling layer
Wi =112, P=1, S=2, f=3*3
Wo =56 (this is the input Wi for the next layer)
Dim= 56*56*3
Parameter= 0
Layer 5- C4C5C6
Combination of three convolutions
Wi =56, P=1, S=1, K=256, f=3*3
Wo = 56 (this is the input Wi for the next layer)
Dim= 224*224*64
Parameter= 64*3*3= 576
Layer 6-P3
Third pooling layer
Wi =56, P=1, S=2, f=3*3
Wo =28 (this is the input Wi for the next layer)
Dim= 28*28*3
Parameter= 0
Layer 7-C7C8C9
Combination of three convolutions
Wi =28, P=1, S=1, K=512, f=3*3
Wo =28 (this is the input Wi for the next layer)
Dim= 28*28*512
Parameter= 512*3*3= 4608
Layer 8-P4
Fourth pooling layer
Wi =28, P=1, S=2, f=3*3
Wo =14 (this is the input Wi for the next layer)
Dim= 14*14*3
Parameter= 0
Layer 9-C10C11C12
Last convolution layer, Combination of three convolutions
Wi =14, P=1, S=1, K=512, f=3*3
Wo =14 (this is the input Wi for the next layer)
Dim= 14*14*512
Parameter= 512*3*3= 4608
Layer 10-P5
Last pooling layer and last layer in CNN
Wi =14, P=1, S=2, f=3*3
Wo =7 (this is the input Wi for the next layer)
Dim= 7*7*3
Parameter= 512*3*3= 4608
With here the CNN gets over. So a complex 224*224*3 boil down to 7*7*3
Trends in CNN
As the layer number increases,
- The dimension decreases.
- The filter number increases.
- Filter dimension is constant.
In convolution
Padding of 1 and stride of 1 to transfer original dimensions to output
In pooling
Padding of 1 and stride of 2 are used in order to half the dimensions.
Layer 11- FF1
4096 neurons
Parameter= 512*7*7*4096=102M
Wo= 4096
Layer 12- FF2
4096 neurons
Wo= 4096
Parameter= 4096*4096= 16M
Output layer
2 classes
- non-osteoarthritic
- osteoarthritic
Parameter= 4096*2= 8192
Parameters
| Layer | Value of parameters |
| Convolution | 16M |
| FF1 | 102M |
| FF2 | 16M |
| Total | 134M |
It takes a very large amount of time nearly hours for a machine on CPU to learn all the parameters. Hence they came with speed enhancers like faster processors known as GPU Graphic Processing Unit which may finish the work up to 85% faster than CPU.

You must be logged in to post a comment.