Deep Learning for coders course (fast.ai)_SGD for a linear model and MNIST dataset with fastai library

Zora Hirbodvash
6 min readMar 26, 2021

This post is based on the course offered by Jeremy Howard and Rachel Thomas (https://course.fast.ai/). The material for this course is a book named Deep Learning for Coders with fastai and PyTorch. Sincere thanks to the book authors, Jeremy Howard and Sylvain Gugger. I also used https://docs.fast.ai/ information in this blog.

Weight assignment is the current value of model parameters. These weight assignments need to be able to update automatically to optimize the model. On the other words, we need some automatics means of testing the effectiveness of the current weight assignment in respect of actual performance. This automatic means should provide a mechanism for alternating the weight assignment to maximize the performance (optimization algorithm).

There are 7 steps to update the weights. Here, we start by going through these seven steps to update the weight for a simple example (a linear model) first and then work with the MINST model.

First, we start with a simple example (a linear model) to see how steps work in this example. Let’s generate some random data:

x=torch.arange(0,100).float()

y = 8*x + 20 + torch.randn(100) * 6

plt.scatter(x,y,’.’)

We speculate the data should be a straight line, hence:

def f(x,params):

m,b=params

return m*x+b

At this point, we can start by going through the seven steps to update the weight to optimize the model.

Step1: Set some initial values for the weight

params=torch.randn(2).requires_grad_()

Step2: Calculate the prediction

preds=f(x,params)

We can visualize everything to see how far the values of the predictions are from our target.

predsplot=preds.detach().numpy()

def show_preds(preds,ax=None):

fig=plt.figure()

if ax is None: ax=fig.add_axes([0.1,0.1,0.8,0.8])

ax.scatter(x,y)

ax.scatter(x,predsplot)

show_preds(preds)

Step3: Calculate the loss

We try to find the function that fits the best to data. To find the best linear function, we can fully define the linear model by the two parameters m and b, and the problem is restricted to find the best values for the parameters. The best fit means that the difference between the prediction (obtained from function) and the target value (y) should be a lower value. Such a method that measures that difference is known as loss function, and mean squared error is a common method for finding the loss of continuous values.

def mse(preds,targets): return ((preds-targets)**2).mean()

loss=mse(preds,y)

loss

Step4: Calculate the gradients

loss.backward()

params.grad

Step 5: Update the weight

lr = 1e-5

params.data -= lr * params.grad.data

params.grad = None

We can check if the loss has improved:

preds = f(x,params)

mse(preds, y)

Step6: Repeat the process

def apply_step(params,prn=True):

preds=f(x,params)

loss=mse(preds,y)

loss.backward()

params.data-=params.grad.data*lr

params.grad=None

if prn:print(loss.item())

return preds

for i in range(20): apply_step(params)

26225.740234375

107130.5

12706.787109375

1588.170654296875

278.917724609375

124.7435073852539

106.5823745727539

104.43724822998047

104.17802429199219

104.1406478881836

104.1296157836914

104.12178802490234

104.11418151855469

104.10661315917969

104.0989761352539

104.09158325195312

104.08395385742188

104.07635498046875

104.06875610351562

104.06132507324219

Step7: Stop

We stopped after 20 epochs arbitrary. However, we decide to stop based on training and validation losses and metrics value in general.

MNIST Dataset

Now, we can apply the same steps to a sample MNIST dataset.

First, we download a sample of MNIST (it only have images of just digits 3 and 7)

path = untar_data(URLs.MNIST_SAMPLE)

ls method helps us to see what’s in the directory. The MNIST dataset contains folders for the training set, validation set, and test set. To start our model, we try to see what is inside the training set. There’s a folder of 3s and a folder of 7s. We can take a look inside of folders using sorted to get the same order of files. We need to stack up individual tensors in a collection into a single tensor. There is a stack function in PyTorch we use for this purpose. We use float types to make sure we can do operations such as mean on our data if it is needed. When images are float, the pixel values are expected to have a value between 0 and 1, so divide them by 255.

stacked_sevens=torch.stack([tensor(Image.open(i)) for i in (path/’train’/’7').ls().sorted()]).float()/255

stacked_threes=torch.stack([tensor(Image.open(i)) for i in (path/’train’/’3').ls().sorted()]).float()/255

we need to use fastai’s show_image function to display the tenors we built.

im3=stacked_threes[1]

show_image(im3)

We do the same thing for validation set:

valid_3_tens = torch.stack([tensor(Image.open(o))

for o in (path/’valid’/’3').ls()])

valid_3_tens = valid_3_tens.float()/255

valid_7_tens = torch.stack([tensor(Image.open(o))

for o in (path/’valid’/’7').ls()])

valid_7_tens = valid_7_tens.float()/255

valid_3_tens.shape,valid_7_tens.shape

Dependent variables x are the images and we concatenate them into a single tensor. We use the view method to change them from a tensor of rank 3 (list of matrices) to a rank-2 tensor (list of vectors), as well. We use -1 which is a special parameter to make the axis big enough to fill all the necessary data. Also, we need to label each image, so we use 1 for 3s and 0 for 7s.

train_x=torch.cat([stacked_threes,stacked_sevens]).view(-1,28*28)

train_y=tensor([1]*len(stacked_threes)+[0]*len(stacked_sevens)).unsqueeze(1)

A dataset in PyTorch should be in the form of a tuple of (x,y). We can use the zip function for this purpose.

dset = list(zip(train_x,train_y))

x,y = dset[0]

x.shape,y

We create a DataLoader from our dataset. We need to use a batch size to train our model. A large batch size helps us to get a more accurate and stable estimate of the dataset’s gradients from the loss function, however, it takes a longer time and it processes fewer mini-batches per epoch.

dl=DataLoader(dset,batch_size=256)

xb,yb = first(dl)

xb.shape,yb.shape

We will do the same steps for the validation set:

valid_x = torch.cat([valid_3_tens, valid_7_tens]).view(-1, 28*28)

valid_y = tensor([1]*len(valid_3_tens) + [0]*len(valid_7_tens)).unsqueeze(1)

valid_dset = list(zip(valid_x,valid_y))

valid_dl = DataLoader(valid_dset, batch_size=256)

Let’s do the 7 steps to update the weight

Step1: Set some initial values for the weight

we have to define an initially random weight for every pixel:

def init_params(size, std=1.0): return (torch.randn(size)*std).requires_grad_()

weights = init_params((28*28,1))

bias = init_params(1)

Step2: Calculate the prediction

We use a linear model to train our data:

def linear1(xb): return xb@weights + bias

preds = linear1(train_x)

preds

batch = train_x[:4]

batch.shape

preds = linear1(batch)

preds

Step3: Calculate the loss

def mnist_loss(predictions, targets):

predictions = predictions.sigmoid()

return torch.where(targets==1, 1-predictions, predictions).mean()

loss = mnist_loss(preds, train_y[:4])

loss

Step4: Calculate the gradients

loss.backward()

weights.grad.shape,weights.grad.mean(),bias.grad

Let’s put that all in a function:

def calc_grad(xb, yb, model):

preds = model(xb)

loss = mnist_loss(preds, yb)

loss.backward()

and test it:

calc_grad(batch, train_y[:4], linear1)

weights.grad.mean(),bias.grad

Step 5 & 6 & 7: Update the weight and repeat the process and stop

The last part is to update biases and weights based on the gradient and learning rate. Here is our basic training loop for an epoch:

def train_epoch(model, lr, params):

for xb,yb in dl:

calc_grad(xb, yb, model)

for p in params:

p.data -= p.grad*lr

p.grad.zero_()

The performance of the model can be checked by the accuracy of the validation set:

def batch_accuracy(xb, yb):

preds = xb.sigmoid()

correct = (preds>0.5) == yb

return correct.float().mean()

We can check it works:

batch_accuracy(linear1(batch), train_y[:4])

and then put the batches together:

def validate_epoch(model):

accs = [batch_accuracy(model(xb), yb) for xb,yb in valid_dl]

return round(torch.stack(accs).mean().item(), 4)

validate_epoch(linear1)

lr = 1.

params = weights,bias

train_epoch(linear1, lr, params)

validate_epoch(linear1)

for i in range(20):

train_epoch(linear1, lr, params)

print(validate_epoch(linear1), end=’ ‘)

0.6298 0.8511 0.9268 0.9458 0.9521 0.9565 0.9604 0.9609 0.9639 0.9653 0.9673 0.9688 0.9692 0.9687 0.9697 0.9712 0.9717 0.9717 0.9717 0.9726

These steps can be done with an object called optimizer in PyTorch. In this post, we overview a general foundation for such an object.

PyTorch provides useful classes to implement this general foundation much easier. First, PyTorch provides nn.Linear module that can be replaced by linear1 function. nn.Linear works as init_params and linear together, and also contains both weights and bias together in a single class. We can use the parameter method to see what parameters it has that can be trained in this PyTorch module. fastai provides the SGD class which, by default, does the same thing as optimizer in PyTorch:

linear_model = nn.Linear(28*28,1)

w,b = linear_model.parameters()

w.shape,b.shape

def train_epoch(model):

for xb,yb in dl:

calc_grad(xb, yb, model)

opt.step()

opt.zero_grad()

validate_epoch(linear_model)

def train_model(model, epochs):

for i in range(epochs):

train_epoch(model)

print(validate_epoch(model), end=’ ‘)

train_model(linear_model, 20)

linear_model = nn.Linear(28*28,1)

opt = SGD(linear_model.parameters(), lr)

train_model(linear_model, 20)

--

--