A general overview of neural network

To understand deep learning, we initially need to understand a single biological neuron or perceptron model. Then, this idea can be extended to a multi-layer perceptron model, and finally, we can understand the deep learning neural network.

First, let’s start by defining what is a perceptron and why we need the adjustments values (weights and bias) in this model.

A perceptron is a form of the neural network. The perceptron model has three biological units. A mathematical expression for each part can be replaced for these biological units. The simplest perceptron model from a mathematical point of view consists of having two data points (x1, and x2) that go into a single point or perceptron (it is also called a neuron). Perceptron performs some functionality (f(x1,x2)) on inputs, and then we get our output (y). The perceptron f(x1,x2) can be the sum or subtract function in a very simple example. In this case, our perceptron model would be y=x1+x2 or y=x1-x2. This functionality of the perceptron can be expanded by the definition of an activation function.

We can define some parameters that allow perceptron adjusts its ability to learn to correct the output of y. Hence, we apply adjustments values or weights (w1, and w2), and some bias (b1 and b2) to inputs. Then, if it is necessary, we can adjust these weights, and bias to correct the expected value of y. As a result, we have y=(w1*x1+b1) +(w2*x2+b2) in the simplest case that the perceptron is a sum function. In general, we consider z=w*x+b, and the perceptron model is a function of f(z).

There has been always a problem to use one single layer for most problems, so most researchers use more layers of neurons to have good performance. To build a network of the perceptron, we can use a multilayer perceptron model to connect all the perceptrons. Keep in mind there are different types of layers and different network configurations. All the information goes from input layers (first layer) to output layers (last layer) and all the layers are fully connected. Every neuron in each layer is connected to the next layer. All the layers between input and output layers are called hidden layers (it can be a single layer or multilayer). Hidden layers are difficult to interpret due to high interconnectivity and distance away from the input and output layers. A deep neural network is a situation when there are two or more hidden layers.

The perceptron is an algorithm for the supervised learning of binary classification. So, let’s consider a binary classification problem, and having a single output getting either the value 0 or 1. We are searching for the possibility we have for the activation function to complete our model. In this case of single output, the simplest network f(z) would be a step function. Another possibility to have a more dynamic function is a sigmoid function (or logistic function). The other most common activation function is a hyperbolic function, ReLU function, or gaussian. There is a bunch of other activation functions the can be used in this model.

There is a multi-class situation that the output layer has multiple neurons, and it is more than a single output. The multi-class multi-class situations have two main different types, non-exclusive classes (a data point has a multiple class or category), and mutually exclusive classes (only one class per data point). There is some certain activation function that can be used in the multi-class situation.

Finally, we have the cost function (loss function) that shows how far we are in output predictions in neural networks. To evaluate the model, there is a need to compare the estimated outputs of the network and compare them to the real value of the output (labelled data) in supervised learning. In the evaluation process, we only use the train data set. This way, we can go back and update our weights and biases if necessary. During the test set, we only evaluate how the model performs, but we will not update any parameters. One of the most common loss functions is the quadratic cost function. The gradient descent method will be used finally to minimize this cost or error.

In summary, the neural networks take in inputs, and it is multiplied by weights and added by bias. Then the result is passed through an activation function, which at the end of all layers ends up an output(s). In the last part, the loss function is calculated, and the parameters are updated by the gradient descent method if needed.

Ph.D. Candidate in Photonics (University of Ottawa)