DeepLearning.ai Study Group II
Report of Week 8
Deep Learning Study Group II is a 16 week-long study group, in which we cover advanced deep learning study series for AI enthusiasts and computer engineers. We follow up materials on https://www.deeplearning.ai each week and get together on Saturdays to discuss them.
On January 19, we gathered for the eighth week of the study group to start a new course, in which we learn how to properly structure a machine learning project.
Saturday’s discussions were led by Uzay Çetin, who is known in the field for his exceptional teaching skills -and his love for Star Wars.
Çetin started off the discussion by talking about the reasons why establishing a solid machine learning strategy is important for our systems. When training a model, we can implement and change many things to increase the prediction rate of our dataset; such as collecting more data, diversifying our data, training the algorithm with gradient descent longer, implementing ADAM algorithm or just going for a bigger network with more hidden units.
Since there are many choices to pick from, it would be too time-consuming to try each one of them until we see what is wrong with our model in terms of aspects like accuracy and speed. To improve the effectiveness of our deep learning systems, we need to make sure we build a productive and efficient system by thoroughly and correctly analysing our project and the things going wrong with it.
Çetin then went on to explain the concept of orthogonalization, which -to put it simply- is implemented by avoiding changing two or more things at once. Let’s recall the things that should go right for our machine learning system to be running smoothly. First of all, the training set must fit well on the cost function, and perform at least at the human-level performance. Then, we will try and see if our model is doing well in its dev and test sets. Once we are sure that those sets fit well on the cost function, we can move on to see whether the model, in fact, corresponds to what we try to achieve in the real world, or not. Depending on the level at which we are encountering the accuracy problems, there are different methods to improve the prediction rate of our dataset by correctly splitting our datasets. For example, if there are a lot of errors in the training set, it means that we have a bias problem, which can be solved by training a bigger network or implementing optimization algorithms. In case of a problem occurring in the dev set, it is smarter to use regularization or enlarge our training set by collecting more data. If we encounter a problem during the test set, the best way to solve it would be having a bigger dev set. The fastest way to solve the problem will be to diagnose exactly where it occurs and build a strategy accordingly.
We begin by setting up a single number evaluation metric that tells us which classifier contributes to our model better, and, along with a precise dev set, speeds up the iterative process of training our algorithm. In addition to this, Çetin mentioned a method called the confusion matrix that can be used along with a single number evaluation metric. We have True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN). Then we add the TP, TN, FP, FN values of the confusion matrix. After doing so, we use F1-score which combines the precision and recall outcomes to let us try out different evaluation techniques. It is important to note that F1-score uses harmonic mean for its calculations; and the main idea here is combining all prediction percentages into one single value.
At the times when combining all prediction percentages into one single value proves to be unproductive, we can make sure that we take a right corrective approach to different aspects of our algorithm. Let's say that we need to check both the accuracy and the running time of our algorithm. Combining these two aspects in an overall evaluation metric would be unnecessary since we have different satisfaction criteria for different tasks. In this case, while accuracy is an optimizing metric, the running time is satisficing metric; which means it does not have to be certain and only needs to fall between a range of values.
As we mentioned in the previous weeks’ session, Çetin reminded us that we should use the same distribution in the train/test/dev datasets. If we have a dataset that greatly varies between train/test/dev, this will give us a good result in the train set. However, it may cause us to have a terrible result at test/dev dataset. Therefore, we should prevent those datasets from varying. Let’s say we would like to train data that is aimed for deciding whether to approve a loan for people from the middle class, or not. In this case, if we test the data over the people from low- income class, the results would be unfruitful and meaningless. We should choose a dev set and a test set to reflect the data that we expect to get in the future and which we consider doing well on is important. We should have a test error rate that is close to train and dev datasets and if the dataset is unbalanced we can use techniques such as upsampling and downsampling
Previously, when we split the data, we would split 70% of it in favor of the train set and the remaining 30% to the dev/test set; which worked well since the data sizes were not as large as they are today. In today's world -thanks to the modern techniques- we have the chance to apply 98% of the data to the train set. At the times when the algorithm lets through unwanted images, we can block them by interpreting our cost function.
After all these, we can try and see how good our machine learning system performs in comparison with the human-level performance. No matter how good an algorithm is in terms of accuracy, its learning progress slows down after it surpasses the human-level performances regardless of the amount of data it is being exposed to. In addition, an algorithm can never surpass the Bayesian optimal error, which can be described as a theoretical limit of error. Since humans are in fact very good at many tasks, for which machine learning systems are designed to perform; human-level performance does not fall too far from the Bayesian optimal error. This explains the reason why the algorithm's progress slows down after surpassing human-level performance, but it also means as long as the machine learning system is functioning worse than human-level, there is space for improvement. To improve our model’s performance further, we can try to get labeled data from humans, gain insight from manual error analysis or conduct a better bias/variance analysis.
As you may remember from our previous weeks’ reports; when we have a high bias, it is called underfitting; and when we have high variance, it is called overfitting. If we have a high variance problem, compared to the training set error, we will get a higher dev-set error. High bias means we have high training error and relatively higher dev-set error compared to the training error. If we have both high bias and high variance; the error rates will be astronomic. Low bias means that we hit the target at the center. Low variance means all shots of the target are close to each other. To have a good score, we need to aim to have a low train and a low dev set error that are close to each other. If we have a bigger model and better optimization, this will let us have avoidable bias and we can aim to regulate the variance by adding more data to train on and apply regularization algorithms such as Dropout, L2, etc. Moreover, we can solve both problems by changing the Neural Network architectures/hyperparameters and building specialized neural networks such as Recurrent Neural Networks or Convolutional Neural Networks.
After concluding our topics on Coursera, our lead Çetin proceeded to talk about more practical and real-life aspects of deep learning. Çetin also showed us Keras, which allows us to create a deep neural network easily with various built-in methods.
At the end of sessions, Çetin had us review the topics we covered during the day, which you can find in the following file to catch up.
Next week, we will keep discuss the means to build a productive machine learning system, successfully and see how can we use all the information we learnt in real life.
Guide of the WeeK: UZAY ÇETIN
Uzay received his master degree in artificial intelligence from Pierre and Marie Curie University (PARIS VI) and his doctoral degree in complex systems from Bogazici University. He has joined to Istanbul Bilgi University Computer Engineering department as a faculty member in 2017. He organizes free machine learning (ML) courses to young people in Sariyer Municipality and he works as a ML consultant with different firms.