Deep Learning for coders course (fast.ai) — Introduction

Zora Hirbodvash
11 min readNov 6, 2020

It is my summary and classification note (lesson 1) of the deep learning course. This course is offered by Jeremy Howard and Rachel Thomas (https://course.fast.ai/). The material for this course is a book named Deep Learning for Coders with fastai and PyTorch. Sincere thanks to the book authors, Jeremy Howard and Sylvain Gugger.In this lesson, we will learn the following topics:

1- A brief history of machine learning (ML)

2- Neural networks: A brief history

3- Arthur Samuel’s view of a machine learning model

4- What is a neural network?

5- The modern deep learning terminology

6- Limitations inherent to machine learning

7- Getting a GPU deep learning server

8- How fastai image recognizer works

A brief history of machine learning (ML)

ML makes the computers to complete a specific task like any regular programming. However, ML studies the computer algorithms that improve through the experience automatically.

The ML research went back to the work of an IBM researcher named Arthur Samuel started working on a different way to get computers to complete tasks. He called it machine learning. He published an essay named “Artificial Intelligence: A Frontier of Automation” in 1962. In the paper, he wrote:

“Programming a computer for such computations is, at best, a difficult task, not primarily because of any inherent complexity in the computer itself but, rather, because of the need to spell out every minute step of the process in the most exasperating detail. Computers, as any programmer will tell you, are giant morons, not giant brains.”

His basic idea was this: we show the machine some examples of the problem to solve first. Then the computer figures out how to solve other cases itself. Therefore, ML is different from regular programming since we are not telling the machine the exact steps needed to solve a problem.

His idea was highly efficient, and his checkers-playing program had learned so much that it beat the Connecticut state champion!

Why is it hard to use a traditional computer program to recognize images in a photo?

For a human’s brain, it is easy to recognize images in photos. Our brain has learned which features define a particular object like a horse or a table. However, it is difficult to make clear rules for traditional computers to identify objects since any items have a wide variety of elements such as shapes, textures, colours, and other features. Hence, it is impossible to encode manually in traditional computers.

Neural networks: A brief History

Deep learning is a subset of a broader discipline of machine learning in artificial intelligence. Deep learning models use neural networks first introduced in the 1950s.

To build a mathematical model for an artificial neuron backed to 1943 when Warren McCulloch, a neurophysiologist, and Walter Pitts, a logician, tried to develop such a model. They realized that simple addition and thresholding could represent a simplified model of a real neuron. Later, a psychologist named Frank Rosenblatt continued to work on the artificial neuron topic. He built a device, the Mark I Perceptron. Mark I Perceptron worked based on developing an artificial neuron and give it the ability to learn. An MIT professor, Marvin Minsky, and Seymour Papert write a book called Perceptrons (MIT Press) about this invention. They talked about two crucial insights into this model. First, a single layer of these devices was unable to learn some simple but critical mathematical functions (such as XOR). Second, we can address these limitations by using multiple layers of the devices. Unfortunately, the first insight noticed, and researchers ignored the second hints for a while. As a result, the academic community gave up on neural networks for the next two decades. The most valuable work in neural networks in the last 50 years was the multi-volume Parallel Distributed Processing (PDP) by David Rumelhart, James McClellan, and the PDP Research Group, released in 1986 by MIT Press. The approach used in PDP is very similar to the way used in today’s neural networks. Most models were built with a second layer of neurons to resolve the problem mentioned by Minsky and Papert in the 1980s. Hence, the academic community used neural networks widely for a real and practical project in the 80s and 90s. However, a misunderstanding of the theoretical issues got back to the field. In theory, adding just one extra layer of neurons was enough to allow an approximation for any mathematical function with these neural networks. However, those networks were often too big and too slow to be useful in practice. Nowadays, most researchers use more layers of neurons to have good performance.

Researchers suggested this principle 30 years ago, and it has only taken seriously in the last decade. Finally, neural networks are used in a large capacity. It is not only because of using more layers but also coupled with the potential to do so due to improvements in computer hardware, increases in data availability, and algorithmic tweaks that allow neural networks to be trained faster and more easily.

Arthur Samuel’s view of a machine learning model

In Some part of Arthur Samuel’s paper, “Artificial Intelligence: A Frontier of Automation”, in 1962, he wrote:

“Suppose we arrange for some automatic means of testing the effectiveness of any current weight assignment in terms of actual performance and provide a mechanism for altering the weight assignment to maximize the performance. We need not go into the details of such a procedure to see that it could be made entirely automatic and to see that a machine so programmed would learn from its experience.”

The following concepts summarize Arthur Samuel’s view of a machine learning model:

1- The idea of a weight assignment

2- An automatic means of testing the model actual performance,

3- The need for a mechanism

We discuss these concepts one by one here.

1- Weight assignment

Weights are just variables, and a weight assignment is a particular choice of values for those variables. The program’s inputs are values that it processes to produce its results. For example, we can take image pixels as inputs, and returning the classification “dog” as a result. The program’s weight assignments are other values that define how the program will operate. To clarify, Let’ back to Samuel’s checkers’ program. Different values of the weights would lead to different checkers-playing strategies in his model.

2- Automatic means

Based on the Samuel model, we need an automatic means to test the effectiveness of any current weight assignment based on the actual performance. For example, the actual performance of his checkers’ program would be how well plays. We can the performance of two models test automatically by setting them to play against each other, and seeing which one usually wins.

3- Mechanism

We need a mechanism to alter the weight assignment to maximize the performance. In his checker program, it would be looking at the difference between weights in the winning and losing model and adjust the weights a little toward the winning direction.

Now, we can see why Samuel said that his procedure could be made entirely automatic and a mechanism so programmed would learn from its experience. This way, learning is entirely automatic when the adjusting of the weights was automatic, too. So, instead of improving a model by adjusting its weight manually, we relied on an automated mechanism that produced adjustments based on performance.

What is a neural network?

We are interested in a flexible function that we can use to solve any given problem by only changing its weights or model parameters. The neural network exactly behaves this way. The neural network is mathematical function turns out to be extremely flexible build upon on its weights. The universal approximation theorem is a mathematical proof showing that this function can solve a problem to any level of accuracy, in theory. At this point, we need to find a new “mechanism” for automatically updating weight for every problem. We also search to find a completely general way to update the weights of a neural network, to make it improve at any given task. This general way to update the weights exist and it is called stochastic gradient descent (SGD).

The modern deep learning terminology

The terminology has changed since Samuel was working in the 1960s. The modern deep learning terminology labelled those elements in the model as the following:

1- The functional form of the model is called its architecture

2- The weights are equal to parameters.

3- The predictions are extracting from the data not including the labels.

4- The model’s results are predictions.

5- The performance measurement is done by the loss.

6- The loss depends both on predictions and correct labels.

Limitations inherent to machine learning

Some fundamental things about training a deep learning model:

1- We need data to create a model.

2- Learning in the model is based on the pattern seen in the input data used to train it.

3- This learning only predicts and it does not provide any recommendation.

4- Data is not sufficient. It needed to be labelled (Lack of labelled data).

5- Positive feedback loops: This issue leads to a high biased model. To clarify, consider the predictive policing model to predict crimes. If we use arrest as a proxy, then we might end up with a positive feedback loop. Those inputs are slightly biased due to the biases in existing policing processes. Law enforcement might use the model to determine where to focus on police activity, increasing arrests in those areas. Data on these additional arrests would then be used again to retrain future versions of the model. It is called a positive feedback loop. As we use the model more, it leads to make the more biased data, and finally make the model even more biased.

Getting a GPU deep learning server

Graphics Processing Unit (GPU) is known as a graphics card. It is a special kind of processor in computers handling thousands of single tasks at the same time. It is designed for displaying a 3D environment on computers for playing games. Neural networks do almost the same basic tasks. GPUs can run neural networks hundreds of times faster than regular CPUs. You will need access to a computer with an NVIDIA GPU. Keep in mind other brands of GPU are not fully supported by the main deep learning libraries. In the following link, you can find the best way to use a GPU server online:

https://course.fast.ai/start_colab

https://course.fast.ai/start_gradient

What do you need to train a model?

We usually deal with a classification or regression problem. Classification predicts a class or category (about our example dog or cat). Regression is about finding a numeric quantity (the price of a house based on other features). Anyway, we need architecture or a model for both problems first. The architecture is the mathematical model we are trying to fit. Data works as an input to the model. Most of the time, we need to label our data in the case of deep learning. We need a loss function to measure the performance of the model. Finally, we need to update the parameters of the model to improve its performance.

How fastai image recognizer works

Here, you can find a code for dog and cat classification. It is going to predict whether the given image is a dog or a cat.

from fastai.vision.all import *

path = untar_data(URLs.PETS)/’images’

def is_cat(x): return x[0].isupper()

dls = ImageDataLoaders.from_name_func(

path, get_image_files(path), valid_pct=0.2, seed=42,

label_func=is_cat, item_tfms=Resize(224))

learn = cnn_learner(dls, resnet34, metrics=error_rate)

learn.fine_tune(1)

In the following, we discuss the code line by line to understand how we should generally do with a deep learning problem.

Step 1: Load the libraries needed

from fastai.vision.all import *

The first line imports the entire fastai. vision library and it gives us all of the functions and classes we will need to create a wide variety of computer vision models.

Step 2: Download the dataset

We need to download our dataset. Depending on the dataset we are using, we can use one of these codes:

path=untar_data(URLs.PETs)

path=Path(“/notebook/storage/data/oxford_iii_pet”)

After downloading our dataset, we have to know what kind of dataset we are dealing with. Most of the time, our dataset consists of the train set, test set, and validation set.

Step 3: Load & Label the dataset

dls = ImageDataLoaders.from_name_func(

path, get_image_files(path), valid_pct=0.2, seed=42,

label_func=is_cat, item_tfms=Resize(224))

To turn our downloaded data into a DataLoaders (https://docs.fast.ai/data.load) object we need to tell fastai these four things (https://docs.fast.ai/vision.data):

1) What kinds of data we are working with

2) How to get the list of items

3) How to label these items

4) How to create the validation set

When we use DataLoader, we have to clarify what kind of data we are working on. In the example, we use images as our data(input), so we use ImageDataLoaders. For how to get the list of items please see https://docs.fast.ai/vision.data. The other important piece of information that we have to tell fastai is how to label data. The following fastai document helps to learn how to label our data:

Here, we use this function to label our data:

def is_cat(x): return x[0].isupper()

Next, the validation set is used to evaluate the model during training to prevent overfitting. By defining the validation set we can make sure that the performance of the model is because it learns the appropriate features to use for prediction, not because of cheating or memorizing the dataset (overfitting). Therefore, the validation set is used to measure the accuracy of the model. By default, 20% of the data is selected randomly as a validation set.

The parameter seed=42 sets the random seed to the same value every time we run this code. This way we assure we get the same validation set every time we run it. Therefore, if we change our model and retrain it, we know that the difference is because of changes to the model not due to having a different random validation set.

Finally, we define the Transform. A Transform contains code that is applied automatically during training. We have many predefined Transforms in fastai. We can also add new ones since it is simple to create (like a Python function). There are two different types of Transform. Item_tfms are applied to each item (in this case, each item is resized to a 224-pixel square) and Batch_tfms is applied to a batch of items at a time using the GPU.

Overfitting

One of the most challenging issues to train the ML models is overfitting. When the model fits too closely with the exact data it has been trained on, it is difficult to make an accurate prediction on unseen data. Besides, those unseen data are the only data that is important in practice. In a neural network, there is a chance that it memorizes the dataset that the model is trained on. Therefore, it is cheating when faces with unseen data. Hence, instead of recognizing the image based on its features, it uses something memorized.

Step 4: Train the model

At this point, we have to clarify which architecture to use. There are many different architectures in fastai. This example uses a convolutional neural network (CNN). It specifies what architecture to use or what kind of model to create. We have to add what data we want to train it on, and what metric to use.

learn = cnn_learner(dls, resnet34, metrics=error_rate)

CNN structure is similar to how the human visual system works. It is a modern approach to create computer visions. CNN learner also contains a parameter called pretrained. It is defaulted to be True.

We use resnet34 here. 34 referrers to the number of layers in this variant of the architecture. There are some other options such as 18, 50, 101, and 152. The most essential thing to keep in mind here is models using more layers in the architectures are likely to be exposed to the overfitting problem.

Metric is a function measuring the quality of the model’s predictions. It uses the validation set and prints out at the end of each epoch. This example uses error_rate, which tells us what percentage of images in the validation set, are being classified incorrectly. Another common metric for classification is accuracy, which is just 1.0-error_rate. Metric is different from the loss concept. Loss provides a measure of performance for the training system to update weights automatically. So, a good choice for loss is also a great choice for SGD. Metrics give you information about hows as closely as possible are you to what you want the model to do.

The last method to describe is fine_tune. The model or architecture describes a template for a mathematical. We have to define or provide values for the parameters it has to start doing the job. Fine_tune determines how to fit the parameters of a model to get it to solve your problem.

--

--

Zora Hirbodvash

I am physicist, and I am working as data scientist now