You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
$\large s$= sample $\large S$= number of samples in training batch $\large l$= layer $\large L$= number of layers $\large n_l$= neuron at layer l $\large N_l$= number of neurons in layer l $\large w_{n_{l-1}n_l}$= weight between neurons$n_{l-1}$and$n_l$ $\large b_{n_l}$= bias of neuron$n_l$ $\large z_{n_l}$= intermediate quantity of neuron$n_l$ $\large y_{n_l}$= output of neuron$n_l$ $\large \hat y_{n_l}$ = target output of neuron$n_l$ $\large A_{n_l}$= activation function at neuron$n_l${Binary Step, Linear, ReLU, Sigmoid, Tanh...} $\large C$= cost function {MSE, SSE, WSE, NSE...} $\large O$= optimization Algorithm {Gradient Descend, ADAM, Quasi Newton Method...} $\large α$= learning rate
In order to reduce the errors of the network, weights and biases are adjusted to minimize the cost function $C$. This is done by an optimization algorithm $O$, that adjust the network parameters periodically after running a certain number of training samples.
Weights and biases are modified depending on their influence in the cost function, measured by the derivatives ${\partial C}/{\partial {w_{n_{l-1}n_l}}}$ and ${\partial C}/{\partial {b_{n_l}}}$.
The terms $\dot C (y_{n_l} \hat y_{n_l})$ depend on the output target value for each neuron $\hat y_{n_l}$. However, a training data set only counts on the value of $\hat y_{n_l}$ for the last layer $l = L$. Instead, for all previous layers $l < L$, components $\dot C ( y_{n_l}, \hat y_{n_l})$ are computed as a weighted sum of the components previously calculated for the next layer $\dot C (y_{n_{l+1}}, \hat y_{n_{l+1}})$ :