Deep Learning for coders course (fast.ai)_SGD for a linear model and MNIST dataset with fastai library
This post is based on the course offered by Jeremy Howard and Rachel Thomas (https://course.fast.ai/). The material for this course is a book named Deep Learning for Coders with fastai and PyTorch. Sincere thanks to the book authors, Jeremy Howard and Sylvain Gugger. I also used https://docs.fast.ai/ information in this blog.
Weight assignment is the current value of model parameters. These weight assignments need to be able to update automatically to optimize the model. On the other words, we need some automatics means of testing the effectiveness of the current weight assignment in respect of actual performance. This automatic means should provide a mechanism for alternating the weight assignment to maximize the performance (optimization algorithm).
There are 7 steps to update the weights. Here, we start by going through these seven steps to update the weight for a simple example (a linear model) first and then work with the MINST model.
First, we start with a simple example (a linear model) to see how steps work in this example. Let’s generate some random data:
x=torch.arange(0,100).float()
y = 8*x + 20 + torch.randn(100) * 6
plt.scatter(x,y,’.’)
We speculate the data should be a straight line, hence:
def f(x,params):
m,b=params
return m*x+b
At this point, we can start by going through the seven steps to update the weight to optimize the model.
Step1: Set some initial values for the weight
params=torch.randn(2).requires_grad_()
Step2: Calculate the prediction
preds=f(x,params)
We can visualize everything to see how far the values of the predictions are from our target.
predsplot=preds.detach().numpy()
def show_preds(preds,ax=None):
fig=plt.figure()
if ax is None: ax=fig.add_axes([0.1,0.1,0.8,0.8])
ax.scatter(x,y)
ax.scatter(x,predsplot)
show_preds(preds)
Step3: Calculate the loss
We try to find the function that fits the best to data. To find the best linear function, we can fully define the linear model by the two parameters m and b, and the problem is restricted to find the best values for the parameters. The best fit means that the difference between the prediction (obtained from function) and the target value (y) should be a lower value. Such a method that measures that difference is known as loss function, and mean squared error is a common method for finding the loss of continuous values.
def mse(preds,targets): return ((preds-targets)**2).mean()
loss=mse(preds,y)
loss
Step4: Calculate the gradients
loss.backward()
params.grad
Step 5: Update the weight
lr = 1e-5
params.data -= lr * params.grad.data
params.grad = None
We can check if the loss has improved:
preds = f(x,params)
mse(preds, y)
Step6: Repeat the process
def apply_step(params,prn=True):
preds=f(x,params)
loss=mse(preds,y)
loss.backward()
params.data-=params.grad.data*lr
params.grad=None
if prn:print(loss.item())
return preds
for i in range(20): apply_step(params)
26225.740234375
107130.5
12706.787109375
1588.170654296875
278.917724609375
124.7435073852539
106.5823745727539
104.43724822998047
104.17802429199219
104.1406478881836
104.1296157836914
104.12178802490234
104.11418151855469
104.10661315917969
104.0989761352539
104.09158325195312
104.08395385742188
104.07635498046875
104.06875610351562
104.06132507324219
Step7: Stop
We stopped after 20 epochs arbitrary. However, we decide to stop based on training and validation losses and metrics value in general.
MNIST Dataset
Now, we can apply the same steps to a sample MNIST dataset.
First, we download a sample of MNIST (it only have images of just digits 3 and 7)
path = untar_data(URLs.MNIST_SAMPLE)
ls method helps us to see what’s in the directory. The MNIST dataset contains folders for the training set, validation set, and test set. To start our model, we try to see what is inside the training set. There’s a folder of 3s and a folder of 7s. We can take a look inside of folders using sorted to get the same order of files. We need to stack up individual tensors in a collection into a single tensor. There is a stack function in PyTorch we use for this purpose. We use float types to make sure we can do operations such as mean on our data if it is needed. When images are float, the pixel values are expected to have a value between 0 and 1, so divide them by 255.
stacked_sevens=torch.stack([tensor(Image.open(i)) for i in (path/’train’/’7').ls().sorted()]).float()/255
stacked_threes=torch.stack([tensor(Image.open(i)) for i in (path/’train’/’3').ls().sorted()]).float()/255
we need to use fastai’s show_image function to display the tenors we built.
im3=stacked_threes[1]
show_image(im3)
We do the same thing for validation set:
valid_3_tens = torch.stack([tensor(Image.open(o))
for o in (path/’valid’/’3').ls()])
valid_3_tens = valid_3_tens.float()/255
valid_7_tens = torch.stack([tensor(Image.open(o))
for o in (path/’valid’/’7').ls()])
valid_7_tens = valid_7_tens.float()/255
valid_3_tens.shape,valid_7_tens.shape
Dependent variables x are the images and we concatenate them into a single tensor. We use the view method to change them from a tensor of rank 3 (list of matrices) to a rank-2 tensor (list of vectors), as well. We use -1 which is a special parameter to make the axis big enough to fill all the necessary data. Also, we need to label each image, so we use 1 for 3s and 0 for 7s.
train_x=torch.cat([stacked_threes,stacked_sevens]).view(-1,28*28)
train_y=tensor([1]*len(stacked_threes)+[0]*len(stacked_sevens)).unsqueeze(1)
A dataset in PyTorch should be in the form of a tuple of (x,y). We can use the zip function for this purpose.
dset = list(zip(train_x,train_y))
x,y = dset[0]
x.shape,y
We create a DataLoader from our dataset. We need to use a batch size to train our model. A large batch size helps us to get a more accurate and stable estimate of the dataset’s gradients from the loss function, however, it takes a longer time and it processes fewer mini-batches per epoch.
dl=DataLoader(dset,batch_size=256)
xb,yb = first(dl)
xb.shape,yb.shape
We will do the same steps for the validation set:
valid_x = torch.cat([valid_3_tens, valid_7_tens]).view(-1, 28*28)
valid_y = tensor([1]*len(valid_3_tens) + [0]*len(valid_7_tens)).unsqueeze(1)
valid_dset = list(zip(valid_x,valid_y))
valid_dl = DataLoader(valid_dset, batch_size=256)
Let’s do the 7 steps to update the weight
Step1: Set some initial values for the weight
we have to define an initially random weight for every pixel:
def init_params(size, std=1.0): return (torch.randn(size)*std).requires_grad_()
weights = init_params((28*28,1))
bias = init_params(1)
Step2: Calculate the prediction
We use a linear model to train our data:
def linear1(xb): return xb@weights + bias
preds = linear1(train_x)
preds
batch = train_x[:4]
batch.shape
preds = linear1(batch)
preds
Step3: Calculate the loss
def mnist_loss(predictions, targets):
predictions = predictions.sigmoid()
return torch.where(targets==1, 1-predictions, predictions).mean()
loss = mnist_loss(preds, train_y[:4])
loss
Step4: Calculate the gradients
loss.backward()
weights.grad.shape,weights.grad.mean(),bias.grad
Let’s put that all in a function:
def calc_grad(xb, yb, model):
preds = model(xb)
loss = mnist_loss(preds, yb)
loss.backward()
and test it:
calc_grad(batch, train_y[:4], linear1)
weights.grad.mean(),bias.grad
Step 5 & 6 & 7: Update the weight and repeat the process and stop
The last part is to update biases and weights based on the gradient and learning rate. Here is our basic training loop for an epoch:
def train_epoch(model, lr, params):
for xb,yb in dl:
calc_grad(xb, yb, model)
for p in params:
p.data -= p.grad*lr
p.grad.zero_()
The performance of the model can be checked by the accuracy of the validation set:
def batch_accuracy(xb, yb):
preds = xb.sigmoid()
correct = (preds>0.5) == yb
return correct.float().mean()
We can check it works:
batch_accuracy(linear1(batch), train_y[:4])
and then put the batches together:
def validate_epoch(model):
accs = [batch_accuracy(model(xb), yb) for xb,yb in valid_dl]
return round(torch.stack(accs).mean().item(), 4)
validate_epoch(linear1)
lr = 1.
params = weights,bias
train_epoch(linear1, lr, params)
validate_epoch(linear1)
for i in range(20):
train_epoch(linear1, lr, params)
print(validate_epoch(linear1), end=’ ‘)
0.6298 0.8511 0.9268 0.9458 0.9521 0.9565 0.9604 0.9609 0.9639 0.9653 0.9673 0.9688 0.9692 0.9687 0.9697 0.9712 0.9717 0.9717 0.9717 0.9726
These steps can be done with an object called optimizer in PyTorch. In this post, we overview a general foundation for such an object.
PyTorch provides useful classes to implement this general foundation much easier. First, PyTorch provides nn.Linear module that can be replaced by linear1 function. nn.Linear works as init_params and linear together, and also contains both weights and bias together in a single class. We can use the parameter method to see what parameters it has that can be trained in this PyTorch module. fastai provides the SGD
class which, by default, does the same thing as optimizer in PyTorch:
linear_model = nn.Linear(28*28,1)
w,b = linear_model.parameters()
w.shape,b.shape
def train_epoch(model):
for xb,yb in dl:
calc_grad(xb, yb, model)
opt.step()
opt.zero_grad()
validate_epoch(linear_model)
def train_model(model, epochs):
for i in range(epochs):
train_epoch(model)
print(validate_epoch(model), end=’ ‘)
train_model(linear_model, 20)
linear_model = nn.Linear(28*28,1)
opt = SGD(linear_model.parameters(), lr)
train_model(linear_model, 20)