- List of Symbols
- Neuron Equations
- Training Algorithm
- Activation Functions
- Cost Functions
- Regularization
- Optimization Algorithms
- References
In order to reduce the errors of the network, weights and biases are adjusted to minimize the cost function
The chain rule allows to separate the derivatives of the cost function into components:
The terms
where typically:
Extra terms are added to the cost function in order to address overfitting.
Network parameters are updated after every training batch
It is a gradient descend performed after every training sample
Network parameters are updated after every training batch
where:
where typically:
- http://neuralnetworksanddeeplearning.com/
- https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/
- https://comsm0045-applied-deep-learning.github.io/Slides/COMSM0045_05.pdf
- https://towardsdatascience.com/optimizers-for-training-neural-network-59450d71caf6
- https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside-applications
- https://en.wikipedia.org/wiki/Activation_function#Table_of_activation_functions