DEEP LEARNING SERIES- PART 10

This is the last article in this series. This article is about another pre-trained CNN known as the ResNet along with an output visualization parameter known as the confusion matrix.

ResNet

This is also known as a residual network. It has three variations 51,101,151. They used a simple technique to achieve this high number of layers.

Credit – Xiaozhu0429/ Wikimedia Commons / CC-BY-SA-4.0

The problem in using many layers is that the input information gets changed in accordance with each layer and subsequently, the information will become completely morphed. So to prevent this, the input information is sent in again like a recurrent for every two steps so that the layers don’t forget the original information. Using this simple technique they achieved about 100+ layers.

ResNet these are the three fundamentals used throughout the network.

  (conv1): Conv2d (3, 64, kernel_size= (7, 7), stride= (2, 2), padding= (3, 3))

  (relu): ReLU

  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1)

These are the layers found within a single bottleneck of the ResNet.

    (0): Bottleneck

  1    (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))

  2    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))     

  3    (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))    

      (relu): ReLU(inplace=True)

   Down sampling   

   Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))

    (1): Bottleneck

  4    (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))

  5    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))     

  6   (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))     

      (relu): ReLU(inplace=True)

    )

    (2): Bottleneck

  7    (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))

  8    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

  9   (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))

   (relu): ReLU

There are many bottlenecks like these throughout the network. Hence by this, the ResNet is able to perform well and produce good accuracy. As a matter of fact, the ResNet is the model which won the ImageNet task competition.

There are 4 layers in this architecture. Each layer has a bottleneck which comprises convolution followed by relu activation function. There are 46 convolutions, 2 pooling, 2 FC layers.

TypeNo of layers
7*7 convolution1
1*1, k=64 + 3*3, k=64+1*1, k=256 convolution9
1*1, k=128+ 3*3, k=128+1*1, k=512  convolution10
1*1, k=256+ 3*3, k=256 + 1*1, k=1024 convolution16
1 * 1, k=512+3 * 3, k=512+1 * 1, k=2048 convolution9
Pooling and FC4
Total50

There is a particular aspect apart from the accuracy which is used to evaluate the model, especially in research papers. That method is known as the confusion matrix. It is seen in a lot of places and in the medical field it can be seen in test results. The terms used in the confusion matrix have become popularized in the anti-PCR test for COVID.

The four terms used in a confusion matrix are True Positive, True Negative, and False positive, and false negative. This is known as the confusion matrix.

True positive- both the truth and prediction are positive

True negative- both the truth and prediction are negative

False-positive- the truth is negative but the prediction is positive

False-negative- the truth is positive but the prediction is false

Out of these the false positive is dangerous and has to be ensured that this value is minimal.

We have now come to the end of the series. Hope that you have got some knowledge in this field of science. Deep learning is a very interesting field since we can do a variety of projects using the artificial brain which we have with ourselves. Also, the technology present nowadays makes these implementations so easy. So I recommend all to study and do projects using these concepts. Till then,

HAPPY LEARNING!!!

DEEP LEARNING SERIES- PART 9

This article is about one of the pre-trained CNN models known as the VGG-16. The process of using a pretrained CNN is known as transfer learning. In this case, we need not build a CNN instead we can use this with a modification. The modifications are:-

  • Removing the top (input) and bottom (output) layers
  • Adding input layer with size equal to the dimension of the image
  • Adding output layer with size equal to number of classes
  • Adding additional layers (if needed)

The pre-trained model explained in this article is called the VGGNet. This model was developed by the Oxford University researchers as a solution to the ImageNet task. The ImageNet data consists of 10 classes with 1000 images each leading to 10000 images in total.

VGGNet

I/p 1     2   3     4     5        6       7         8      9          10     11            12       13   o/p

Credit: – Nshafiei neural network in Machine learning  Creative Commons Attribution-ShareAlike 4.0 License.

This is the architecture for VGGNet. This has been found for the CIFAR-10 dataset, a standard dataset containing 1000 classes. This was used for multiclass classification. Some modifications are made before using it for detecting OA. The output dimension is changed into 1*1*2 and the given images must be reshaped to 224*224 since this dimension is compatible with VGGNet. The dimensions and other terms like padding, stride, number of filters, dimension of filter are chosen by researchers and found optimal. In general, any number can be used in this place.

The numbers given below the figure correspond to the layer number. So the VGGNet is 13 layered and is CNN till layer 10 and the rest are FNN.

Colour indexName
GreyConvolution
RedPooling
BlueFFN

Computations and parameters for each layer

Input

224*224 images are converted into a vector whose dimension is 224*224*3 based on the RGB value.

Layer 1-C1

This is the first convolutional layer. Here 64 filters are used.

Wi =224, P=1, S=1, K=64, f=3*3

Wo =224 (this is the input Wi for the next layer)

Dim= 224*224*64

Parameter= 64*3*3= 576

Layer 2-P1

This is the first pooling layer

 Wi =224, S=2, P=1, f=3

Wo=112 (this is the input Wi for the next layer)

Dim= 112*112*3

Parameter= 0

Layer 3-C2C3

Here two convolutions are applied. 128 filters are used.

Wi =112, P=1, S=1, K=64, f=3

Wo=112 (this is the input Wi for the next layer)

Dim= 112*112*128

Parameter= 128*3*3=1152

Layer 4- P2

Second pooling layer

Wi =112, P=1, S=2, f=3*3

Wo =56 (this is the input Wi for the next layer)

Dim= 56*56*3

Parameter= 0

Layer 5- C4C5C6

Combination of three convolutions

Wi =56, P=1, S=1, K=256, f=3*3

Wo = 56 (this is the input Wi for the next layer)

Dim= 224*224*64

Parameter= 64*3*3= 576

Layer 6-P3

Third pooling layer

Wi =56, P=1, S=2, f=3*3

Wo =28 (this is the input Wi for the next layer)

Dim= 28*28*3

Parameter= 0

Layer 7-C7C8C9

Combination of three convolutions

Wi =28, P=1, S=1, K=512, f=3*3

Wo =28 (this is the input Wi for the next layer)

Dim= 28*28*512

Parameter= 512*3*3= 4608

Layer 8-P4

Fourth pooling layer

Wi =28, P=1, S=2, f=3*3

Wo =14 (this is the input Wi for the next layer)

Dim= 14*14*3

Parameter= 0

Layer 9-C10C11C12

Last convolution layer, Combination of three convolutions

Wi =14, P=1, S=1, K=512, f=3*3

Wo =14 (this is the input Wi for the next layer)

Dim= 14*14*512

Parameter= 512*3*3= 4608

Layer 10-P5

Last pooling layer and last layer in CNN

Wi =14, P=1, S=2, f=3*3

Wo =7 (this is the input Wi for the next layer)

Dim= 7*7*3

Parameter= 512*3*3= 4608

With here the CNN gets over. So a complex 224*224*3 boil down to 7*7*3

Trends in CNN

As the layer number increases,

  1. The dimension decreases.
  2. The filter number increases.
  3. Filter dimension is constant.

In convolution

Padding of 1 and stride of 1 to transfer original dimensions to output

In pooling

Padding of 1 and stride of 2 are used in order to half the dimensions.

Layer 11- FF1

4096 neurons

Parameter= 512*7*7*4096=102M

Wo= 4096

Layer 12- FF2

4096 neurons

Wo= 4096

Parameter= 4096*4096= 16M

Output layer

2 classes

  • non-osteoarthritic
  • osteoarthritic

Parameter= 4096*2= 8192

Parameters

LayerValue of parameters
Convolution16M
FF1102M
FF216M
Total134M

It takes a very large amount of time nearly hours for a machine on CPU to learn all the parameters. Hence they came with speed enhancers like faster processors known as GPU Graphic Processing Unit which may finish the work up to 85% faster than CPU.

HAPPY LEARNING!!

DEEP LEARNING SERIES- PART 8

The previous article was about the padding, stride, and parameters of CNN. This article is about the pooling and the procedure to build an image classifier.

Pooling

This is another aspect of CNN. There are different types of pooling like min pooling, max pooling, avg pooling, etc. the process is the same as before i.e. the kernel vector slides over the input vector and does computations on the dot product. If a 3*3 kernel is considered then it is applied over a 3*3 region inside the vector, it finds the dot product in the case of convolution. The same in pooling finds a particular value and substitutes that value in the output vector. The kernel value decides the type of pooling. The following table shows the operation done by the pooling.

Type of poolingThe value seen in the output layer
Max poolingMaximum of all considered cells
Min poolingMinimum of all considered cells
Avg poolingAverage of all considered cells

The considered cells are bounded within the kernel dimensions.

The pictorial representation of average pooling is shown above. The number of parameters in pooling is zero.

Convolution and pooling are the basis for feature extraction. The vector obtained from this step is fed into an FFN which then does the required task on the image.

Features of CNN

  1. Sparse connectivity
  2. Weight sharing.

    

    Feature extraction-CNN              classifier-FNN

In general, CNN is first then FFN is later. But the order or number or types of convolution and pooling can vary based on the complexity and choice of the user.

Already there are a lot of models like VGGNet, AlexNet, GoogleNet, and ResNet. These models are made standard and their architecture has been already defined by researchers. We have to reshape our images in accordance with the dimensions of the model.

General procedure to build an image classifier using CNN

  1. Obtain the data in the form of image datasets.
  2. Set the output classes for the model to perform the classification on.
  3. Transform or in specific reshape the dimension of the images compatible to the model. The image size maybe 20*20 but the model accepts only 200*200 images; then we must reshape them to that size.
  4. Split the given data into training data and evaluation data. This is done by creating new datasets for both training and validation. More images are required for training.
  5. Define the model used for this task.
  6. Roughly sketch the architecture of the network.
  7. Determine the number of convolutions, pooling etc. and their order
  8. Determine the dimensions for the first layer, padding, stride, number of filters and dimensions of filter.
  9. Apply the formula and find the output dimensions for the next layer.
  10. Repeat 5d till the last layer in CNN.
  11. Determine the number of layers and number of neurons per layer and parameters in FNN.
  12. Sketch the architecture with the parameters and dimension.
  13. Incorporate these details into the machine.
  14. Or import a predefined model.  In that case the classes in the last layer in the FNN must be replaced with ‘1’ for binary classification or with the number of classes. This is known as transfer learning.
  15. Train the model using the training dataset and calculate the loss function for periodic steps in the training.
  16. Check if the machine has performed correctly by comparing the true output with model prediction and hence compute the training accuracy.
  17. Test the machine with the evaluation data and verify the performance on that data and compute the validation accuracy.
  18.   If both the accuracies are satisfactory then the machine is complete.

HAPPY LEARNING!!

DEEP LEARNING SERIES- PART 7

The previous article was about the process of convolution and its implementation. This article is about the padding, stride and the parameters involved in a CNN.

We have seen that there is a reduction of dimension in the output vector. A technique known as padding is done to preserve the original dimensions in the output vector. The only change in this process is that we add a boundary of ‘0s’ over the input vector and then do the convolution process.

Procedure to implement padding

  1. To get n*n output use a (n+2*n+2) input
  2. To get 7*7 output use 9*9 input
  3. In that 9*9 input fill the first row, first column, last row and last column with zero.
  4. Now do the convolution operation on it using a filter.
  5. Observe that the output has the same dimensions as of the input.

Zero is used since it is insignificant so as to keep the output dimension without affecting the results

Here all the elements in the input vector have been transferred to the output. Hence using padding we can preserve the originality of the input. Padding is denoted using P. If P=1 then one layer of zeroes is added and so on.

It is not necessary that the filter or kernel must be applied to all the cells. The pattern of applying the kernel onto the input vector is determined using the stride. It determines the shift or gaps in the cells where the filter has to be applied.-

S=1 means no gap is created. The filter is applied to all the cells.

S=2 means gap of 1. The filter is applied to alternative cells. This halves the dimensions on the output vector.

This diagram shows the movement of filter on a vector with stride of 1 and 2. With a stride of 2; alternative columns are accessed and hence the number of computations per row decreases by 2. Hence the output dimensions reduce while use stride.

The padding and stride are some features used in CNN.

Parameters in a convolution layer

The following are the terms needed for calculating the parameter for a convolution layer.

Input layer

Width Wi – width of input image

Height Hi – height of input image

Depth Di – 3 since they follow RGB

We saw that 7*7 inputs without padding and stride along with 3*3 kernels gave a 5*5 output. It can be verified using this calculation.

The role of padding can also be verified using this calculation.

The f is known as filter size. It can be a 1*1, 3*3 and so on. It is a 1-D value so the first value is taken. There is another term K which refers to the number of kernels used. This value is fixed by user.

These values are similar to those of w and b. The machine learns the ideal value for these parameters for high efficiency. The significance of partial connection or CNN can be easily understood through the parameters.

Consider the same example of (30*30*3) vector. The parameter for CNN by using 10 kernels will be 2.7 million. This is a large number. But if the same is done using FNN then the parameters will be at least 100 million. This is almost 50 times that of before. This is significantly larger than CNN. The reason for this large number is due to the full connectivity. 

                                                 

Parameter= 30*30*3*3*10= 2.7M

HAPPY READING!!

DEEP LEARNING SERIES- PART 6

The previous article was about the procedure to develop a deep learning network and introduction to CNN. This article concentrates on the process of convolution which is the process of taking in two images and doing a transformation to produce an output image. This process is common in mathematics and signals analysis also. The CNN’s are mainly used to work with images.

In the CNN partial connection is observed. Hence all the neurons are not connected to those in the next layer. So the number of parameters reduces leading to lesser computations.

Sample connection is seen in CNN.

Convolution in mathematics refers to the process of combining two different functions. With respect to CNN, convolution occurs between the image and the filter or kernel. Convolution itself is one of the processes done on the image.

Here also the operation is mathematical. It is a kind of operation on two vectors. The input image gets converted into a vector based on colour and dimension. The kernel or filter is a predefined vector with fixed values to perform various functions onto the image.

Process of convolution

The kernel or filter is chosen in order of 1*1, 3*3, 5*5, 7*7, and so on. The given filter vector slides over the image and performs dot product over the image vector and produces an output vector with the result of each 3*3 dot product over the 7*7 vector.

A 3*3 kernel slides over the 7*7 input vector to produce a 5*5 output image vector. The reason for the reduction in the dimension is that the kernel has to do dot product operation on the input vector-only with the same dimension. I.e. the kernel slides for every three rows in the seven rows. The kernel must perfectly fit into the input vector. All the cells in the kernel must superimpose onto the vector. No cells must be left open. There are only 5 ways to keep a 3-row filter in a 7-row vector.    

This pictorial representation can help to understand even better. These colors might seem confusing, but follow these steps to analyze them.

  1. View at the first row.
  2. Analyse and number the different colours used in that row
  3. Each colour represents a 3*3 kernel.
  4. In the first row the different colours are red, orange, light green, dark green and blue.
  5. They count up to five.
  6. Hence there are five ways to keep a 3 row filter over a 7 row vector.
  7. Repeat this analysis for all rows
  8. 35 different colours will be used. The math is that in each row there will be 5 combinations. For 7 rows there will be 35 combinations.
  9. The colour does not go beyond the 7 rows signifying that kernel cannot go beyond the dimension of input vector.

These are the 35 different ways to keep a 3*3 filter over a 7*7 image vector. From this diagram, we can analyse each row has five different colours. All the nine cells in the kernel must fit inside the vector. This is the reason for the reduction in the dimension of output vector.

Procedure to implement convolution

  1. Take the input image with given dimensions.
  2. Flatten it into 1-D vector. This is the input vector whose values represent the colour of a pixel in the image.
  3. Decide the dimension, quantity and values for filter. The value in a filter is based on the function needed like blurring, fadening, sharpening etc. the quantity and dimension is determined by the user.
  4. Take the filter and keep it over the input vector from the first cell. Assume a 3*3 filter kept over a 7*7 vector.
  5. Perform the following computations on them.

5a. take the values in the first cell of the filter and the vector.

5b. multiply them.

5c. take the values in the second cell of the filter and the vector.

5d. multiply them.

5e. repeat the procedure till the last cell.

5f. take the sum for all the nine values.

  • Place this value in the output vector.
  • Using the formula mentioned later, find the dimensions of the output vector.

HAPPY LEARNING!!

DEEP LEARNING SERIES- PART 5

The previous article was on algorithm and hyper-parameter tuning. This article is about the general steps for building a deep learning model and also the steps to improve its accuracy along with the second type of network known as CNN.

General procedure to build an AI machine

  1. Obtain the data in the form of excel sheets, csv (comma separated variables) or image datasets.
  2. Perform some pre-processing onto the data like normalisation, binarisation etc. (apply principles of statistics)
  3. Split the given data into training data and testing data. Give more preference to training data since more training can give better accuracy. Standard train test split ratio is 75:25.
  4. Define the class for the model. Class includes the initialisation, network architecture, regularisation, activation functions, loss function, learning algorithm and prediction.
  5. Plot the loss function and interpret the results.
  6. Compute the accuracy for both training and testing data and check onto the steps to improve it.

Steps to improve the accuracy

  1. Increase the training and testing data. More data can increase the accuracy since the machine learns better.
  2. Reduce the learning rate. High learning rate often affects the loss plot and accuracy.
  3. Increase the number of iterations (epochs). Training for more epochs can increase the accuracy
  4. Hyper parameter tuning. One of the efficient methods to improve the accuracy.
  5. Pre-processing of data. It becomes hard for the machine to work on data with different ranges. Hence it is recommended to standardise the data within a range of 0 to 1 for easy working.

These are some of the processes used to construct a network. Only basics have been provided on the concepts and it is recommended to learn more about these concepts. 

Implementation of FFN in detecting OSTEOARTHRITIS (OA)

Advancements in the detection of OA have occurred through AI. Technology has developed where machines are created to detect OA using the X-ray images from the patient. Since the input given is in the form of images, optimum performance can be obtained using CNN’s. Since the output is binary, the task is binary classification. A combination of CNN and FFN is used. CNN handles feature extraction i.e. converting the image into a form that is accepted by the FFN without changing the values. FFN is used to classify the image into two classes.

CNN-convolutional neural network

The convolutional neural network mainly works on image data. It is used for feature extraction from the image. This is a partially connected neural network. Image can be interpreted by us but not by machines. Hence they interpret images as a vector whose values represent the color intensity of the image. Every color can be expressed as a vector of 3-D known as RGB- Red Green Blue. The size of the vector is equal to the dimensions of the image.

                                                  

This type of input is fed into the CNN. There are several processing done to the image before classifying it. The combination of CNN and FNN serves a purpose for image classification.

Problems are seen in using FFN for image

  • We have seen earlier that the gradients are chain rule of gradient at different layers. For image data, large number of layers in order of thousands may require. It can result in millions of parameters. It is very tedious to find the gradient for the millions of these parameters.
  • Using FFN for image data can often overfit the data. This may be due to the large layers and large number of parameters.

The CNN can overcome the problems seen in FFN.

HAPPY LEARNING!!!

DEEP LEARNING SERIES- PART 4

The previous article dealt with the networks and the backpropagation algorithm. This article is about the mathematical implementation of the algorithm in FFN followed by an important concept called hyper-parameter tuning.

In this FFN we apply the backpropagation to find the partial derivative of the loss function with respect to w1 so as to update w1.

Hence using backpropagation the algorithm determines the update required in the parameters so as to match the predicted output with the true output. The algorithm which performs this is known as Vanilla Gradient Descent.

The way of reading the input is determined using the strategy.

StrategyMeaning
StochasticOne by one
BatchSplitting entire input into batches
Mini-batchSplitting batch into batches

The sigmoid here is one of the types of the activation function. It is defined as the function pertaining to the transformation of input to output in a particular neuron. Differentiating the activation function gives the respective terms in the gradients.

There are two common phenomena seen in training networks. They are

  1. Under fitting
  2. Over fitting

If the model is too simple to learn the data then the model can underfit the data. In that case, complex models and algorithms must be used.

If the model is too complex to learn the data then the model can overfit the data. This can be visualized by seeing the differences in the training and testing loss function curves. The method adopted to change this is known as regularisation. Overfit and underfit can be visualized by plotting the graph of testing and training accuracies over the iterations. Perfect fit represents the overlapping of both curves.

Regularisation is the procedure to prevent the overfitting of data. Indirectly, it helps in increasing the accuracy of the model. It is either done by

  1. Adding noises to input to affect and reduce the output.
  2. To find the optimum iterations by early stopping
  3. By normalising the data (applying normal distribution to input)
  4. By forming subsets of a network and training them using dropout.

So far we have seen a lot of examples for a lot of procedures. There will be confusion arising at this point on what combination of items to use in the network for maximum optimization. There is a process known as hyper-parameter tuning. With the help of this, we can find the combination of items for maximum efficiency. The following items can be selected using this method.

  1. Network architecture
  2. Number of layers
  3. Number of neurons in each layer
  4. Learning algorithm
  5. Vanilla Gradient Descent
  6. Momentum based GD
  7. Nesterov accelerated gradient
  8. AdaGrad
  9. RMSProp
  10. Adam
  11. Initialisation
  12. Zero
  13. He
  14. Xavier
  15. Activation functions
  16. Sigmoid
  17. Tanh
  18. Relu
  19. Leaky relu
  20. Softmax
  21. Strategy
  22. Batch
  23. Mini-batch
  24. Stochastic
  25. Regularisation
  26. L2 norm
  27. Early stopping
  28. Addition of noise
  29. Normalisation
  30. Drop-out

 All these six categories are essential in building a network and improving its accuracy. Hyperparameter tuning can be done in two ways

  1. Based on the knowledge of task
  2. Random combination

The first method involves determining the items based on the knowledge of the task to be performed. For example, if classification is considered then

  • Activation function- softmax in o/p and sigmoid for rest
  • Initialisation- zero or Xavier
  • Strategy- stochastic
  • Algorithm- vanilla GD

The second method involves the random combination of these items and finding the best combination for which the loss function is minimum and accuracy is high.

Hyperparameter tuning would already be done by researchers who finally report the correct combination of items for maximum accuracy.

HAPPY READING!!!

DEEP LEARNING SERIES- PART 3

The previous article gave some introduction to the networks used in deep learning. This article provides more information on the different types of neural networks.

In a feed-forward neural network (FFN) all the neurons in one layer are connected to the next layer. The advantage is that all the information processed from the previous neurons is fed to the next layer hence getting clarity in the process. But the number of weights and biases significantly increases when there is a large number of input. This method is best used for text data.

In a convolutional neural network (CNN), some of the neurons are only connected to the next layer i.e. connection is partial. Batch-wise information is fed into the next layer. The advantage is that the number of parameters significantly reduces when compared to FFN. This method is best used for image data since there will be thousands of inputs.

In recurrent neural networks, the output of one neuron is fed back as an input to the neuron in the previous layer. A feed-forward and a feedback connection are established between the neurons. The advantage is that the neuron in the previous layer can perform efficiently and can update based on the output from the next neuron. This concept is similar to reinforcement learning in the brain. The brain learns an action based on punishment or reward given as feedback to the neuron corresponding to that action.

Once the final output is computed by the network, it is then compared with the original value, and their difference is taken in different forms like the difference of squares, etc. this term is known as loss function.

It will be better to explain the role of the learning algorithms here. The learning algorithm is the one that tries to find the relation between the input and output. In the case of neural networks, the output is indirectly related to input since there are some hidden layers in between them. This learning algorithm works in such a way so as to find the optimum w and b values for the loss function is minimum or ideally zero.

The algorithm in neural networks do this using a method called backpropagation. In this method, the algorithm starts tracing from the output. It then computes the values for the parameters corresponding to the neuron in that layer. It then goes back to the previous layer does the computations for the parameters of the neurons in that layer. This procedure is done till it encounters the inputs. In this way, we can find the optimum values for the parameters.

The computations made by the algorithm are based on the type of the algorithm. Most of the algorithms find the derivative of a parameter in one layer with respect to the loss function using backpropagation. This derivative is then subtracted from the original value.

Where lr is the learning rate; provided by the user. The lesser the learning rate, the better will be the results but more the time is taken. The starting value for w and b is determined using the initialization.

MethodMeaning
ZeroW and b are set to zero
Xavierw and b indirectly proportional to root n
He w and b indirectly proportional to root n/2

 Where n; refers to the number of neurons in a layer. These depend on the activation function used.

The derivative of the loss function determines the updating of the parameters.

Value of derivativeConsequence
-veIncreases
0No change
+veDecreases

The derivative of the loss function with respect to the weight or bias in a particular layer can be determined using the chain rule used in calculus.

HAPPY READING!!

DEEP LEARNING- PART 2

This image has an empty alt attribute; its file name is deep-learning-logo-picture-id871793108

The previous article gave a brief introduction to deep learning. This article deals with the networks used in deep learning. This network is known as a neural network. As the name suggests the network is made up of neurons

The networks used in artificial intelligence are a combination of blocks arranged in layers. These blocks are called an artificial neurons. They mimic the properties of a natural neuron. One of the neurons is the sigmoid neuron.

This is in general the formula for the sigmoid function. Every neural network consists of weights and biases.

Weights- The scalar quantities which get multiplied to the input

Biases- the threshold quantity above which a neuron fires

NotationMeaning
XInput
YOutput
WWeight
BBias

Working of a neuron

This is the simple representation of a neuron. This is similar to the biological neuron. In this neuron, the inputs are given along with some priority known as weights. The higher the value of the weights, the more prioritized is that input. This is the reason for our brain to choose one activity over the other. Activity is done only if the neuron fires. A similar situation is seen here. The particular activity is forwarded to the next layer only if this particular neuron fires. That is the output must be produced from the neuron.

Condition for the neuron to fire

The neuron will produce an output only if the inputs follow the condition.

As mentioned before, the bias is the threshold value and the neuron will fire only when the value crosses this bias. Thus the weighted sum for all the inputs must be greater than the bias in order to produce an output.

Classification of networks

Every neural network consists of three layers majorly: –

  1. Input layer
    1. Hidden layer
    1. Output layer

Input layer

The input layer consists of inputs in the form of vectors. Images are converted into 1-D vectors. Input can be of any form like audio, text, video, image, etc. which get converted into vectors.

Hidden layer

This is the layer in which all the computations occur. This is generally not visible to the user hence termed as a hidden layer. This layer may be single or multiple based on the complexity of the task to be performed. Each layer processes a part of the task and it is sent to the next layer. Vectors get multiplied with the weight matrix of correct dimensions and this vector gets passed onto the next layer.

Output layer

The output layer gets information from the last layer of the hidden layer. This is the last stage in the network. This stage depends upon the task given by the user. The output will be a 1-D vector. In the case of classification, the vector will have a value high for a particular class. In the case of regression, the output vector will have numbers representing the answer to those questions posed by the user.

The next article is about the feed-forward neural network.

HAPPY LEARNING!!

DEEP LEARNING SERIES- PART 1

Have you ever wondered how the brain works? One way of understanding it is by cutting open the brain and analyzing the structures present inside it. This however can be done by researchers and doctors. Another method is by using electricity to stimulate several regions of the brain. But what if I say that it is possible to analyze and mimic the brain in our computers? Sounds quite interesting right! This particular technology is known as deep learning.

Deep learning is the technique of producing networks that process unstructured data and gives output. With the help of deep learning, it is possible to produce and use brain-like networks for various tasks in our systems. It is like using the brain without taking it out.  Deep learning is advanced than machine learning and imitates the brain better than machine learning and also the networks built using deep learning consists of parts known as neurons which is similar to biological neurons. Artificial intelligence has attracted researchers in every domain for the past two decades especially in the medical field; AI is used to detect several diseases in healthcare.

Sl.noNameDescriptionExamples
1DataType of data provided to inputBinary(0,1) Real
2TaskThe operation required to do on the inputClassification(binary or multi) Regression(prediction)
3ModelThe mathematical relation between input and output. This varies based on the task and complexityMP neuron(Y=x+b) Perceptron(Y=wx+b) Sigmoid or logistic(Y=1/1+exp(wx+b)) *w and b are parameters corresponding to the model
4Loss functionKind of a compiler that finds errors between the output and input (how much the o/p leads or lags the i/p).Square error= square of the difference between the predicted and actual output.  
5AlgorithmA kind of learning procedure that tries to reduce the error computed beforeGradient descent
NAG
AdaGrad
Adam
RMSProp
6EvaluationFinding how good the model has performedAccuracy
Mean accuracy

Every model in this deep learning can be easily understood through these six domains. Or in other words, these six domains play an important role in the construction of any model. As we require cement, sand, pebbles, and bricks to construct a house we require these six domains to construct a network.

 Now it will be more understandable to tell about the general procedure for networks.

  1. Take in the data (inputs and their corresponding outputs) from the user.
  2. Perform the task as mentioned by the user.
  3. Apply the specific relation to the input to compute the predicted output as declared by the user in the form of model by assigning values to parameters in the model.
  4.  Find the loss the model has made through computing the difference between the predicted and actual output.
  5. Use a suitable learning algorithm so as to minimize the loss by finding the optimum value for parameters in the network
  6. Run the model and evaluate its performance in order to find its efficiency and enhance it if found less.

By following these steps correctly, one can develop their own machine. In order to learn better on this, pursuing AI either through courses or opting as a major is highly recommended. The reason is that understanding those concepts requires various divisions in mathematics like statistics, probability, calculus, vectors, and matrices apart from programming. 

       

HAPPY READING!!

IMMUNOLOGY SERIES- PART 9- VACCINES

The previous article was all about the process of inflammation. This article is about vaccines.

The vaccines fall under the type of artificial active acquired immunity. This is artificial because we are giving the vaccine externally and this is active because the body is generating the antibodies/response and it is acquired because we are getting the immunity and it is not present by birth. You must have known what immunity is at least by now.

A vaccine is a biological preparation that provides active acquired immunity to a particular infectious disease. A vaccine typically contains an agent that resembles a disease-causing microorganism and is often made from weakened or killed forms of the microbe, its toxins, or one of its surface proteins (antigens). So these vaccines are nothing but the pathogen itself but it cannot cause any disease, instead, it triggers the immune system.

This is a quick recap of the principle of working on vaccines. The vaccine contains the pathogens as a whole or the surface antigens only. These antigens stimulate the immune system. If the immune system had a memory about this antigen, then it would immediately produce an antibody, and phagocytosis of the antigen occurs by the macrophage aided by the antibody. In this scenario, the antigen is new and there is no memory, therefore the immune system struggles and takes time to produce the antibody.

So the antigen reign over the body and this can lead to inflammation. As a result, some of the symptoms of inflammation like fever, heat, pain in the area of application, and swelling may appear. The chances of them are rare and also severity is less (last for a few hours/days) since the pathogen is attenuated.

Once the immune system produces the correct antibody, phagocytosis of the antigen occurs and hence the causative agent is eliminated from the body (primary response). So if the same or similar pathogen which has disease-causing ability enters into the body, the memory triggers the immune system to produce the correct antibody. So a heightened and rapid response is generated in order to kick away the pathogen quickly (secondary response).

There are three types of vaccines:-

Live- infection is caused without any harm – measles & polio

Dead- doesn’t last long, requires booster dose- cholera

Microbial products- involves non-infectious pathogen, capsule and toxoid- anthrax, diptheria

Hence using the vaccine as a stimulus, the body is able to generate a response that is stored and can be useful for preventing the disease caused by the pathogen.

There might be an idea to generate vaccines for all diseases so that all humans are protected. But there are some difficulties in this which are listed below:-

There are new microbes being discovered every day and no one knows which microbe can cause disease. There can be multiple microbes causing the same or similar disease. So being immune to one microbe doesn’t mean being immune to the disease

The disease-causing microbe can undergo mutation meaning that there can be changes in the genetic material and hence the antigen can change. In this case, the antibody which was stimulated by the vaccine won’t work. A suitable example is a common cold, it is impossible to produce a vaccine that covers all mutants of viruses

The pathogen has to be genetically modified so as to remove its disease-causing ability which is easy to say but difficult to implement

Also, it is important that the antigen chose for the vaccine must be close to that of the original causative agent of the disease. If the original pathogen is not so close to that of the vaccine, then it will not work

Hence all these above points explain the difficulties in producing a vaccine. Despite these many research organizations in many countries have produced vaccines especially for the pandemic and dreadful diseases like the COVID-19, hepatitis, polio, etc. in which some vaccines provide lifetime immunity to some of the diseases. We must take a minute to appreciate those who have done immense work and their contribution is stopping some of the dreadful diseases.

With this, we come to the end of the series. I hope that all the concepts explained in this were simple and clear and also would have inculcated an interest in immunology. By now, it would be clear how the immune system protects us from several microbes and diseases.

HAPPY LEARNING!!

IMMUNOLOGY SERIES- PART 8- INFLAMMATION

The previous article dealt with the types and functions of immunoglobulin. This article provides a complete explanation of the process of inflammation.

Inflammation is the process of protection which was seen as one of the six mechanisms of innate immunity.

Inflammation is one of the body’s responses to the invasion of foreign particles. This is an important process in the human body that occurs to drive away from the pathogen. Inflammation is one of the stages seen in healing. Some of the changes that can be seen in the target site are:-

  • Changes in blood flow (mostly blood loss)
  • Increase in platelets (to plug the damaged vessel)
  • Increase in immune cells
  • Supply of nutrients

The word inflammation refers to a burning sensation. Hence there are five cardinal signs in inflammation namely:-

  • Rubor (redness)
  • Tumor (swelling)
  • Calor (heat)
  • Dolor (pain)
  • Functioleasia (loss of function)

These cardinal signs as well as the changes occur due to some mediators which are basically chemicals and also due to the action of various immune cells.

Mediator nameIt’s effect
Bradykinin, histamine, serotoninIncrease permeability
ProstaglandinDecreases blood pressure
CytokinesProduce fever
Toxic metabolitesDamage tissue

This inflammation can be either acute or chronic. As seen earlier, acute stays for a shorter time but produces more vigorous pain whereas chronic stays for a longer time with less vigorous pain. If the causative agent has been driven away then healing occurs either by complete restoration or scar formation. There are chances that the acute inflammation can become chronic which can be worse. It can lead to several diseases and complications.

The pathogen in order to establish its supremacy in the human body, it has to pass through the epidermis which is the outermost layer of the human body. This is known as SALT skin-associated lymphoid tissues. Hence T and B lymphocytes are prominent in the skin. Most of the pathogens get destroyed in this stage. Let us assume that our pathogen is strong and it had passed through it. The next layer it encounters is the dermis. As we go deep inside the skin, more and more immune cells get involved. In the dermis the following immune cells are seen:-

immunity in the skin
  • Macrophage
  • NK cells
  • Mast cell – produce histamine and serotonin
  • T helper cells – it provides help to other immune cells

The next stage is the hypodermis which has a large number of macrophages and neutrophils that phagocytosis the pathogen. Hence these following processes help in defending against the pathogens.

When a particular pathogen say a virus enters the cell, the immune system will get alerted through signals and they immediately send the correct immune cell to the target site. This occurs since either the immune system gets information naturally or artificially through previous infection or vaccine. This leads to the classification of immunity in humans.

So now we will consider a new and strong pathogen that has not been recognized by the immune system and has dodged those barriers and has entered inside the skin. Now it multiplies at a rapid rate and colonizes that particular area. Hence the cells in that area start to die and they release several signals like TNF, cytokines, interleukins. This gets combined with other signals like histamine, serotonin released from immune cells. Some of these signals produce direct effects on the target site as seen in the table.

An array of these signals triggers the immune system and it, in turn, starts the inflammation process and the cardinal signs are observed. This process lasts for some time and as it occurs; the pathogens decrease in number through phagocytosis and subsequently vanishes from the body. This can be observed by a decrease in the signs. After this process, the targeted site starts to heal and the immune system learns how to defend the pathogen when it enters the next time.

Now the damage caused by the pathogen has to be repaired by the process of healing.

  1. Haemostasis
  2. Inflammation
  3. Proliferation
  4. Maturation/Remodelling

The pathogen will rupture and damage the outer layer of blood vessels known as endothelium resulting in blood loss. Hence the blood vessels start to contract to prevent further loss. Also, a plug is formed at the site of leakage by the platelets. Then the process of inflammation occurs; clearing out the dead cells and the pathogen. In the proliferative stage, new blood cells are formed by a process known as neovascularisation and the new epithelium is formed. In the last phase, the newly formed cells become stronger and flexible. Hence the combination of these steps brings the affected area back to normal.

Hence the inflammation is an essential process in the immune system and it has to occur to prevent the conspiracy of the microbes. The next article is about vaccines and their principle of working.

HAPPY READING!!!

BOOK REVIEW “THE WIZARD’S OF OZ”

. ABOUT THE AUTHOR

L.Frank Baum was an American author born on May 15,1856 Chittenango New York. He has written 14 novel on Oz, plus 41 on others and many more works.

. SUMMARY

Let’s talk about one of the greatest literary work of L.Frank ‘The Wizard’s of Oz’ which became a classic of children literature. The novel is about a girl named Dorothy, who lives with her uncle Henry and aunt Em with her pet dog Toto in Kansas. A sudden cyclone strikes and swift away Dorothy and Toto along with her uncle’s farmhouse and dumped it in the land of Munchkin of Oz’s, in the process killing the wicked witch of East. Wanting to go back to her homeland the story embarks her journey on the yellow brick road to the emerald City of great wizard of oz. On the way she makes friends with the Scarecrow who wants a brain, the Tin woodmen who wants a heart and a cowardly Lion who wants courage. After many adventures they reach the Emerald City to the great wizard of Oz. The wizard lay’s a condition only if they kill the wicked witch of west the desires will be fulfilled. They commence their journey on killing the witch , after a lot of difficulties they are able to kill the witch. On returning back to the wizard they are left shocked………. Let me leave the summary on this note so the readers curiosity is not killed.

. THEME

The story has many theme ; one must find their strength in oneself and their friendship. The courage to tackle the problems comes from within and the good circle of friends who surrounds them. The grass is not greener on the other side , we should enjoy our present and stay contented from within . It also depicts there no place like home one can not find the happiness of a family to a foreign land but their own land . Life throws you many hurdles but one must fight with it with their full potential and never to lose hope .

IMMUNOLOGY SERIES- PART 7- TYPES OF IMMUNOGLOBULIN

The previous article dealt in detail with immunoglobulin and how they help in phagocytosis. This article is about the types of immunoglobulins, their functions.

The types of immunoglobulins are based on the types of light and heavy chains. There are two types of light chains namely the kappa and the lambda. An immunoglobulin contains either kappa (K-K) or lambda (L-L) but does not have a mixture of both (K-L not possible). About 60% of the immunoglobulins in humans have kappa chains.

So, the classes of immunoglobulins are based on the heavy chain. So based on this condition, there are five classes of immunoglobulins namely:-

  • Immunoglobulin G (IgG) – gamma
  • Immunoglobulin M (IgM) – mu
  • Immunoglobulin A (IgA) – alpha
  • Immunoglobulin D (IgD) – delta
  • Immunoglobulin E (IgE) – epsilon

These immunoglobulins have certain configurations and play different roles in the human body. The immunoglobulin G is present the most. It constitutes about 80% of the total immunoglobulin. These are mostly present in the blood, plasma, and other body fluids. This immunoglobulin has the lowest carbohydrate content when compared to the rest. This immunoglobulin has a half-life of 23 days which is the longest of all. Some of the unique features and functions of this immunoglobulin:-

  • This is the only immunoglobulin which can cross the placenta (this is a unique feature because this immunoglobulin provides immunity to the foetus inside the womb and also after birth for some months. Presence of others may indicate infection)
  • This helps in killing bacteria and viruses by opsonisation (the process of covering the pathogen with a protein coat such that the pathogens become more presentable to the immune cells)
  • Neutralize toxins
  • Activate complement by classical pathway (The complement system, also known as complement cascade, is a part of the immune system that enhances the ability of antibodies and phagocytic cells to clear microbes and damaged cells from an organism, promote inflammation, and attack the pathogen’s cell membrane)
  • Unique catabolism (breaking down of molecules) based on concentration
  • There are four sub classes (G1, G2, G3 and G4) out of which 1,3 and 4 cross the placenta and offer immunity
  • Also involves in the Rh immunization (there are two types’ Rh+ve and Rh-ve based on the presence of Rh factor in blood). The mother being Rh+ve and child the opposite is not a problem in the first pregnancy but can be fatal in second, killing the foetus.

The immunoglobulin M constitutes about 5-10% of total proteins. This is a pentamer structure with a J chain. This weighs about 900000-1000000 and is the heaviest of all. They have 5 days of half-life. Some of its features-

  • Presence in newborn indicate congenital infection as they don’t cross placenta
  • Short lived, so their presence indicates recent infection
  • First Ig to participate in primary response
  • Opsonisation
  • classical pathway
  • bacteria agglutination
  • Play an important role in ABO blood grouping (discovered by Landsteiner). There are 8 types of blood groups based on antigen, antibody and Rh factor

Immunoglobulin A is also known as the secretory immunoglobulin and is mostly present in body secretions (tear, saliva, sebum, mucous, and milk) in which they are dimer and are monomer in blood. They constitute 10-15% of the proteins. They also have a J chain and secretory piece. Their half-life is 6-8 days.

  • The secretory piece protects the Ig from enzymes and juices
  • Complement activation by alternate pathway
  • Promote phagocytosis
  • Intracellular microorganism killing
  • First line of defense against some microbes

Immunoglobulin E is a dimer similar to IgG. This is present in low concentrations (about 0.3) and has the weight of about 1,90,000. These have a half-life of about 2 days and can become inactivated at 56 C.

  • Present extra-cellularly
  • Associated with allergic reactions like asthma, hay fever and anaphylactic shock
  • Bind with the Fc of mast cells and basophils resulting in degranulation and release histamine which causes allergy
  • Mediate the some immunity reactions
  • No complement activation
  • Provide immunity against helminthes

The last is immunoglobulin D.  It is present in low concentrations and on the surface of B lymphocytes. They constitute 0.2% of proteins. They have a half-life of 3 days. The IgM and IgD bind on the B lymphocyte to help in antigen identification.

Hence these were the different types of immunoglobulins and the mechanisms by which they help with immunity. The next article is about the process of inflammation.

HAPPY READING!!

IMMUNOLOGY SERIES- PART 6- IMMUNOGLOBULIN

The previous article was about the different types of immune cells. This article is about a special molecule in immunity known as immunoglobulin.

There might be a question that what is so special about this immunoglobulin. There is a reason for this. These molecules play an important and inevitable role in the phagocytosis of pathogens. To understand this, it is essential to know about immunoglobulins.

The immunoglobulin is a gamma globulin, a specialized group of proteins (glycoprotein) produced in response to pathogens. It is produced by the plasma cells (a globulin protein present in the plasma). These constitute 25-30% of the blood proteins.

There are two important terms that are more commonly known by the most, they are the antigen and the antibody. The antigen is the molecule present on the surface of the pathogen and can stimulate an immune response. There is a small part of the antigen called the epitope which interacts with the antibody.  The epitope is known as the antigen determinant site. An antigen can have unlimited epitopes.

On the contrary, the antibody is the molecule produced in response to the antigen in order to kick it away. The part of the antibody which interacts with the antigen is called a paratope. An antibody must have at least 2 paratopes. These antibodies belong to the immunoglobulins. All antibodies are immunoglobulins but not immunoglobulins are antibodies. To understand how the antibody helps in immunity, it is essential to understand the structure of an antibody/immunoglobulin. The image below shows the general structure of an immunoglobulin:-

There are two chains in an immunoglobulin namely the light chain and heavy chain. The light chain has 212 amino acids (the building block of protein) and the heavy chain has 450 amino acids. Each chain has two types namely the constant and variable. These regions are based on the amino acid sequences. Half of the light chain (1 out of 2) is constant and the rest is variable. A quarter of the heavy chain (1 out of 4) is variable and the rest is constant. These are linked by two types of sulfide bonds namely the intra (H-H AND L-L) and inter (H-L). These molecules contain carbohydrates (CHO) hence these are called as glycoproteins.

The tip of the variable regions of the heavy and light chain is hypervariable in nature and these constitute the antigen-binding site (Fab). These are hyper-variable because they have to produce amino acid sequences complementary to that of the antigen so that they can interact together. The other site is called a crystallizable region (Fc).

Having known all this, now it will be convenient to explain the process by which the antibody plays in the prevention of infections.

There are millions of substances that pass through the blood every day. So there must be a criterion/substance to identify them whether they are pathogenic. This is where antigen comes to play. These antigens present on the surface of the pathogens alert the immune system which then identifies this as a pathogen. So in response to the antigen, a suitable antibody is secreted and deployed to the target site. On reaching the antigen, the Fab region binds with the antigen.

The ultimate aim of the immune system is to abolish the pathogen and one way is by phagocytosing them. This is done by the macrophages. But it is essential for them to identify the substance before engulfing it. This is where the antibody comes to play. The Fc region of the antibody combines with the receptor of the macrophage. This facilitates the process of phagocytosis.

Hence the antibody acts like a bridge between the source (antigen) and the destination (macrophage) aiding in phagocytosis. This is essential because in most of the cases the macrophages, it is difficult to identify the non-self-objects and this is where antibody helps.

In the case of the new pathogen, the antigen is new, and therefore their might not be a suitable antibody. In that case, the macrophage cannot phagocytocise the pathogen and it reigns in the body causing infection and disease.

The next article is about the types of immunoglobulins.

HAPPY LEARNING!!