Study Group II

Report of Week 4

Deep Learning Study Group II is a 16 week-long study group, in which we cover advanced deep learning study series for AI enthusiasts and computer engineers. We follow up materials on each week and get together on saturdays to discuss them.

On December 22, we gathered for the fourth week of Study Group II and discussed the course titled “Deep Neural Networks”.

At the beginning of the session, Elif gave a brief description of this week’s model and told the participants: “to put it simply; deep neural networks is just an expanded version of shallow neural networks, which we covered in the previous week’s discussions.”

Since all the things we covered in the previous weeks’ sessions will be used to implement this model, let us talk about the relevant concepts we have learned up until this week before going any further.

  • Binary Classification

  • Logistic Regression

  • Cost Function (Logistic Regression)

  • Gradient Descent (Logistic Regression)

  • Vectorization

  • Activation Functions

  • Forward Propagation & Backpropagation

  • Random Initialization

  • Shallow Neural Networks

As we said previously, what differentiates shallow neural networks from the deep neural networks is basically the fact that deep neural networks are composed of multiple hidden layers as opposed to a single hidden layer. Since we do not count input layer when we count the number of layers in a neural network, for a neural network to be taken as a deep model, it has to contain at least three more layers other than the input layer. Elif also mentioned the importance of  matrix dimensions while implementing a neural network and how it can help with avoiding possible errors. Since deep neural networks contain many dot products, if we have a misleading matrix dimension, it will result in a miscalculation.

We proceed to discuss the reasons why deep neural networks have an advantage over the shallow ones by using a face recognition system as an example. In image classification, a neural network may learn to detect edges from raw pixels in the first layer, then use the edges to detect simple shapes in the second layer, and then use these shapes to deter higher-level features such as facial shapes, in higher layers. In short,  the deeper layers of a neural network are typically computing more complex features of the input than the earlier layers. The deep neural networks begin the recognition process by detecting simple and small parts such as the edges and when all these detected simple functions come together, the neural network can learn much more complex functions. This process is similar to the way human brain functions as the human brain also begins by detecting the edges of a vision, then proceeds to put together the images. Due to this fact, experts usually make an analogy between the neural networks and the human brain.

Due to their properties as a tree structure, when implementing deep neural networks, we will use fewer nodes compared to the shallow networks. As a result, we will avoid having too many hidden units in our model by adding more hidden layers, which will ultimately shorten the computing process. We will get the same accuracy either way, so it simply makes more sense to implement a deep neural network as opposed to a shallow one. According to Circuit Theory, shallow neural networks will have exponentially more hidden units compared to the deep neural networks.

To build blocks of a deep neural network, we need to implement “a forward propagation step for each layer,  and a corresponding backward propagation step”. Some calculations such as Z[L], b[L] (L indicates the Lth layers) goes into cache while we are doing forward propagation. We are going to use them while we are doing backward propagation. After the prediction process, we find the loss function by using actual output and our output. By using loss function, we will update each layer in our deep neural network.

While developing a deep neural network, we need to use parameters, which are learnable elements of the model such as W and b (bayesian theorem). In addition to the parameters, we need to implement hyperparameters consisting of variables such as the learning rate, the number of iteration or the number of hidden layers. Hyperparameters are parameters that control ultimate parameters (W and b). It is important to remember that  the deep learning is quite an empirical process. We need to try and implement values, then change them continuously until we reach the intended result. While experimenting, we will start to gain intuition which will lead us to establish an effective system when it comes to building deep neural networks.

Next week, we will start studying a new course and discuss the practical aspects of deep learning to implement the things we have learned until now.

Guide of the WeeK: ELIF SALMAN

Elif received her B.Sc. degree in Electrical and Electronics Engineering from Bogazici University in 2017. Now, she is a Master's student and a teaching assistant in Computer Science and Engineering at Koç University.

Her research topics are hand tracking and interaction with projection-based augmented reality systems.

Subscribe to our newsletter here and stay tuned for more hacker-driven activities.

inzva is supported by BEV Foundation, an education foundation for the digital native generation which aims to build communities that foster peer-learning and encourage mastery through one-to-one mentorship.