DeepLearning.ai Study Group II
Report of Week 3
Deep Learning Study Group II is a 16 week-long study group, in which we cover advanced deep learning study series for AI enthusiasts and computer engineers. We follow up materials on https://www.deeplearning.ai each week and get together on saturdays to discuss them.
On December 15, we gathered at our sanctuary for the third week of DeepLearning.ai Study Group II and discussed the course titled “Shallow Neural Networks”.
Since we thoroughly explained the basics of neural networks in the previous week’s sessions, we were ready to dive into technical details and focus on shallow neural networks this week with our guide Elif Salman. Shallow neutral networks can be basically defined as a type of neural network that has only one hidden layer.
When it comes to shallow neural networks, there are usually three different layers: an input layer consisting of input features that holds the raw data, a layer that cannot be observed in the training set, which is called hidden layer and learns the data by minimizing cost function, and thirdly, an output layer containing of nodes or a single node that generates the predicted value.
Elif went onto explaining the means to implement different activation functions in different layers when computing a neural network. In binary classifications, the sigmoid function is usually only used for output layer since the output value must be between 0 and 1.
In the other layers, we usually implement ReLU; this function gives output x if x is positive and 0 otherwise so the function turns into an identity function at the right of y axis and 0 at the left of y axis which makes it easy for it to be computed and differentiated, thus also making it the most widely used activation function when it comes to neural networks.
In addition to ReLU, Hyperbolic tangent function and leaky ReLU are also widely used activation functions. The common point of all these aforementioned functions is the fact that they are all nonlinear functions. Nonlinear activation functions work much more efficiently because when a linear activation function is implemented; the linear combination of a linear combination can be expressed as one linear combination which results in these two layers to be expressed as a single layer, therefore making it useless for algorithms such as shallow or deep neural networks. To thoroughly explain these activation functions and sigmoid, Elif found each of these functions’ derivatives by showing the means to calculate them on the board.
It is extremely important to choose learning rate accordingly while updating the parameters because if the learning rate is too small, the model will train too slowly but if it is too big, it may overshoot and skip the minimum. As Elif pointed out, “the safest choice is to begin by implementing a considerably larger learning rate then decrease it slowly while the train moves forward.”
Remember to note that initializing the weights to zero will not work for the neural networks as it did with logistic regression; for the neurons to get differentiated in the neural networks, they need to get assigned different values at the beginning. Furthermore, we need to initialize the weights randomly and break the symmetry so as to avoid symmetric values. If all the initial values of the weights are same, the parameters on the same layer cannot differentiate from each other because of the symmetry which causes all the neurons in a layer act as if they are one neuron. Keep in mind that we generally follow standard normal distribution while choosing these values.
Next week, we will continue discussing neural networks and focus on a model that is much harder to train compared to this week’s algorithms: Deep Neural Networks.
Guide of the WeeK: ELIF SALMAN
Elif received her B.Sc. degree in Electrical and Electronics Engineering from Bogazici University in 2017. Now, she is a Master's student and a teaching assistant in Computer Science and Engineering at Koç University.
Her research topics are hand tracking and interaction with projection-based augmented reality systems.