DeepLearning.ai Study Group II
Report of Week 5
Deep Learning Study Group II is a 16 week-long study group, in which we cover advanced deep learning study series for AI enthusiasts and computer engineers. We follow up materials on https://www.deeplearning.ai each week and get together on saturdays to discuss them.
On December 29, we gathered for the fifth week of DeepLearning.ai Study Group II and discussed the course titled “The Practical Aspects of Deep Learning”.
Our guide Ata Utku Oğuz, who is from the alumni of inzva’s DeepLearning.ai Study Group I from the previous year, started the session by talking about different types of tools and libraries he uses in his professional life and went onto explain ways this week’s topic will help enhancing the models we train.
When it comes to practice, implementing a neural network is a profoundly iterative process. No matter how experienced a person is, when trying to figure out which model works best for a certain case, they will surely have to take a trial-and-error approach and experiment with various values so as to reach the best outcomes.
To make things easier while iterating, we need to split our data into train/dev/test sets which are for training data, tuning the hyperparameters and testing, respectively. It is important to note that while establishing the sets, train and test sets should be chosen from the same distribution. For example, we would get high test error if we chose images of cat drawings for the train dataset and cat pictures that are taken by a camera for the test dataset. After properly splitting, we train algorithms on the training set and use our dev set to see which model works the best, then use the test on that model to see how well the algorithm is functioning. The results of the test set will offer you an unbiased estimate. One of the participants, Yasin Kaya’s comment highlighted the importance of labelling: if the data is not labelled accurately, despite having a low train and test error, we may still get misleading outcomes.
After talking about sets, we introduced the concept of bias-variance-trade-off in Deep Learning Error which basically means finding a balance of bias and variance to achieve the best possible outcome. Bias is the difference between our model’s prediction and correct value. If a model has high bias and low variance at the same time, it will be underfitting. On the other hand, since variance is our algorithm’s sensitivity to different training sets, having a classifier of high variance will make the model memorize the whole training set which will ultimately result in overfitting. Even though high variance may give an quite low error rate such as 1% in the train set, at the same time, it may give an quite large error rate such as 11% in test set. Overall, an algorithm must neither overfit nor underfit the model you train so that it can minimize the errors.
If we have a high bias problem, picking a more complex model with more hidden layers and/or training it longer will help you solve it. If you have a high variance problem, you can try to get more data and if fail to do so, you can do regularization and reduce the variance at the cost of a little amount of bias variance tradeoff.
The training speed of the neural network is another problem. We do have some methods to fasten the learning speed. The first approach would be normalization. To apply normalization to our dataset, we move the dataset to the mean. This process will give us a bowl-shaped dataset, and having a bowl-shaped dataset speeds up the learning part. The second approach would be the initializing the neuron weights with a formula instead of in random. Normally, we need to initialize neurons randomly. However, we have some other methods that are known to work better than random initialization such as Xavier Initialization. Though, these methods may give slightly better results, it is still a choice whether we implement them or not.
Another topic worth mentioning is gradient checking which helps us debug by finding out whether we do our calculations correctly or not. But, as our guide Utku pointed out, the libraries use this method by default anyway, therefore it not necessary to implement it ourselves.
Next week’s discussion will be about different types of optimization methods such as (Stochastic) Gradient Descent, Momentum, RMSProp and Adam and we will follow up the course titled “Optimization Algorithms”on coursera.
Guide of the WeeK: ATA UTKU OGUZ
Ata Utku received his B.Sc. degree in Electrical and Electronics Engineering from Bilkent University in 2014. He also graduated from UCLA in 2015 with certificate in Project Management. Now, he is working in Türkiye İş Bankası as a data engineer since 2015.
His working areas are data preparations, data analysis and data presentations.