DeepLearning.ai Study Group II
Report of Week 11
Deep Learning Study Group II is a 16 week-long study group, in which we cover advanced deep learning study series for AI enthusiasts and computer engineers. We follow up materials on https://www.deeplearning.ai each week and get together on Saturdays to discuss them.
On February 16, we continued to study Convolutional Neural Networks further, together with Ahmet Melek, by examining various case studies to learn how to build a well-functioning convolutional neural network on our own.
Melek started the session by reminding us of the concepts we covered during last week’s discussions such as pooling, padding, and stride. Then, he proceeded to underline the importance of evaluating the neural networks previously built by other people to see which tasks they perform in what manner in order to apply them to our own problems in terms of computer vision. While building a neural network architecture to solve such problems, we can begin by implementing a number of “classic” neural networks called LeNet-5, AlexNet, and VGGNet. Now, let’s take a look at the distinctive features of these models.
7 layers ( 3 convolutional, 2 pooling, 1 fully connected and output layer)
Used to classify handwritten grayscale digits
Softmax classifier (another classifier was implemented previously)
The deeper you get (from left to right) the more the height and width reduce, and the more the channels increase.
Sigmoid activation function
The first neural network that got widely recognized.
Similar to Le-Net but much deeper.
8 layers (5 convolutional, 3 fully connected)
ReLu applied after all convolutional and fully connected layers
Used to have local normalization applied after ReLus but is deemed unnecessary in today’s world.
Uses dropout to solve the overfitting issue.
16 convolutional layers
Replaces large size filters with multiple stacked smaller size (3x3) filters, which increases the depth of the model.
Easy to implement since it is highly uniform.
Apart from the above mentioned traditional neural networks, Melek went onto introducing some newer ones, which are popular for their ability to train deeper neural networks without missing out on the accuracy.
The first one is called Residual Neural Network (ResNet for short) that helps us avoid vanishing the gradient problem by training our model through residual modules.
Reduced filter sizes
More than 100 layers
Increases depth and accuracy without increasing the error
Before further introducing another model; Melek mentioned 1x1 convolution, a feature pooling technique used for reducing dimensions, adjusting the channel sizes and adding non-linearity to our models to add more data and training. Due to these facts, using 1x1 convolutions is the main thing that makes Inception Network, which is a type of neural network makes more work being done with less computational power.
To understand the Inception Neural Network, we need to begin by fully grasping the inception module, which is constructed by combining the filters from all sizes and performing convolutional on the output from the earlier layers. The module enables a sparse convolutional neural network to function with a construction having a normal density.
Melek also pointed out that if 1x1 convolutions used before the larger filters are applied in a reasonable manner, the performance will not decrease. He also mentioned that the inception module first applies 1x1 convolution to the input, then stacks the various convolution possibilities. In addition to all these, the inception networks help update features from hidden units at early layers, because early layers are not affected enough by softmax layer in the last row. These side branches also have regularization effects since the loss function is being used for the prediction process and to prevent overfitting.
Since we are dealing with a huge amount of data, training the models from scratch will take too much time and computational power. Thus, Melek highlighted the importance of using open source implementation and transferring the knowledge someone else gained by training their network to our model, which is a method called Transfer Learning, a technique used when we already have a model trained on one task which may be re-purposed on a second related task.
By doing that we may have higher performance and start a higher curve and also you may end up with a higher asymptote. Therefore, this technique is highly recommended for cases where we have limited sources and time, especially when it comes to computer vision projects. When we have a small dataset, we may freeze the parameters in all of these layers of the network and we would just train the parameters associated with our softmax layer. When we have a larger label data set, another thing to consider is to freeze a group of layers.
Melek proceeded to talk about real life solutions to benefit from while dealing with these complex process, like data augmentation. There are various data augmentation techniques, each with its own perks, that can be used while training our structure; such as mirroring, where we flip the image horizontally, as well as random cropping, rotating and shearing of the images.
Additionally, Melek talked about another data augmentation method named color distortion. He stated that one of the efficient ways to implement color distortion is to use an algorithm called PCA. PCA is designed in a way that can shift those colors’ R-G-B values based on which values are the most present. The images with intensive green values and minimal blue values will have their green values altered the most through PCA Color Augmentation.
After the discussions, Melek guided us while we implemented these concepts we have learned with Keras and collaboratively solved the assignments.
Next week, we will learn how to use our convolutional neural networks for a much more complicated cause called Object Detection.
Guide of the WeeK: AHMET MELEK
Ahmet Melek is studying Business Management at Bogazici University. He previously worked on topics Blockchain, Biometrics and Semantic Web.
His main interest is Brain-Computer Interfaces. More specifically, Machine Learning approaches on Signal Classification.