Introduction to Neural Networks

Neural Networks are much more complex models than any other Machine Learning in the sense that they represent mathematical functions with millions of coefficients (the parameters).With such power, it is possible to train the machine on much more complicated tasks (Object recognition and facial recognition,Sentiment analysis,Natural language analysis…).
However, developing such a complex function has a cost. To achieve this, it is often necessary to provide :
- A much larger dataset (millions of data)
- A longer training time ( sometimes several days)
- More computing capacity.
To solve these challenges, researchers in the field have developed variants of the Gradient Descent,as well as other techniques to compute derivatives more easily on millions of data Some of these solutions are :
- Mini-Batch Gradient Descent: A technique in which the dataset is fragmented into small batches to simplify the gradient calculation at each iteration.
- Batch Normalization: Scaling all the input and output variables internal of the Neural Network to avoid having extreme gradient calculations.
- Distributed Deep Learning: Using the Cloud to divide the task and assign it to several machines.
Understanding Neural Networks.
This is what a Neural Network looks like:

You notice an input layer on the left, an output layer on the right, and several hidden layers.
The small circles are called neurons and represent activation functions.Let’s start by analyzing what happens in one neuron.
One Neuron Network : The perceptron
The simplest neural network that exists is called a perceptron.The inputs of the neuron are the features 𝒙 multiplied by 𝜽 parameters that must be learned.The calculation performed by the neuron can be divided into two steps:
- The neuron calculates the 𝒔𝒐𝒎𝒎𝒆 𝒛 of all inputs 𝒛 = ∑ 𝒙𝜽. This is a linear calculation
- The neuron passes z in its activation function. Here the sigmoid function (Logistic function). It is a non-linear calculation.

Notice: We often use other activation functions than the sigmoid function to simplify the calculation of the gradient and to obtain faster learning cycles
- The hyperbolic tangent function tanh (𝑧)
- The function 𝑅𝑒𝑙𝑢(𝑧)
Multi-neural networks: Deep Learning
To create a Neural Network, you just have to develop several of these perceptrons and connect them to each other in a particular way.
- We assemble the neurons in columns (we say that we assemble them in a layer). In their column, the neurons are not connected to each other.
- We connect all the outputs of the neurons of a column on the left to the inputs of all the neurons of the right that follows.
We can also build a network with as many layers and neurons as we want.More layers means that the network is deeper and the model becomes richer,but also difficult to be trained.
Here is an example of a network with 5 neurons (and 3 layers),All layers between the input layer and the output layer are called hidden because we do not have access to their inputs/outputs.

In detail a simpler network (with 3 neurons) would give us the output
𝒂𝟑 = 𝝈(𝜽𝟏𝒂𝟏 + 𝜽𝟐𝒂𝟐),where 𝜽𝟏 and 𝜽𝟐 are the coefficients related to connections between neurons 𝒂𝟏 → 𝒂𝟑 and 𝒂𝟐 → 𝒂3, These are the parameters of our model.
In the following network, we have 6 parameters (which I differentiate by the colors).

Training a Neural Network
To solve a Supervised Learning problem, we need the following three elements.
- Dataset
- A model and its parameters
- A cost Function and its gradient
For the Dataset, it is sufficient to have an array (𝑿, Y) as for the other problems. The features (𝒙𝟏, 𝒙𝟐, 𝒙𝟑, … ) are distributed at the network input (in the first layer),To express the Cost Function and its gradient, it is mathematically delicate. It is necessary to calculate the contribution of each neuron in the final error,this is possible with a technique called Back Propagation (doing the path in the opposite direction: 𝒚 to 𝑿).Finally, to minimize the Cost Function, it is sufficient to use Gradient Descent using the gradients computed with Back Propagation