Check out the notebooks and my code to see what I've actually written! Used numpy, pytorch (my tensors are flowing), jupyter, matplotlib. Here is Karpathy's summary of the lectures.
By the way, after some thinking, I consider the two most pivotal/important to understand moments in this whole thing about the entirety of ML would be:
- The backpropagation, via a local derivative and the next higher's gradient, and how that cascades through a whole (differentiable) system so you.
- This little codeblock in 2 - makemore (unclean version) that explains W1, and that matrix multiplication bit (right underneath a pltshow).
The other lectures are important and very useful too, but these two are fundamental for understanding. The rest is "basically" optimisation (there's a lot in "basically").
That I found fun! (like, the counts matrix, ..., the pltshow of my actual weights gradient -- shows what effect the weight has on the final loss! So crazy, and so cool.)
Name | Jupyter File | Colab Page |
---|---|---|
Makemore - Video 2 | Click here | |
Exercises - Video 2 | Click here |
Implementing a proper language model -- just as classic bigram for now, and a vs a bigram NN. Bigram count!
Viewing the activation rates of neurons in a single layer for fun
Visualising our character embeddings
- This lecture shows a lot of technical stuff about debugging neural network, checking its statistics and 'health'. Check out the notebook!
- Tanh is saturated!!
Checking for dead neurons, and making sure there's no entire layers of dead neurons
Looking at our activation distribution after adding batchnorm (they are much more equal now -- good)
A bare implementation of neural networks. Created a Value wrapper class, implemented binary operations and backwards calls to allow a backwards call through mathematical expressions. Built neurons, layers, and MLP classes and forward passed data inputs with targets. Called backwards on the value, updated weights; achieving gradient descent and thus a neural net.
A character level bigram model for name generation. Evaluated using negative log likelihood loss, then switched to constructing a neural network (in this case, effectively one layer of 27 neurons, each neuron having 27 weights). Converted characters to one hot encoded vectors, then converted logits to probability distributions by exponentiating and normalising (softmax). Optimised weights by minimising nll loss function during gradient descent stage. Created train, test, and dev sets for evaluation of models. Implemented all ideas and steps by hand. Pytorch, jupyter, matplotlib.
Like makemore, but we went into depth about creating multiple layers and generalising to a context size -- how many characters we use to predict the next one. Also embedded characters, and focused on finetuning hyperparameters like learning rate, embedding size, number of layers, ... Effectively recreated the model from the Bengio et al. 2003 MLP language model paper. Worked on many basics of ML: model training, learning rate tuning, hyperparameters, evaluation, train/dev/test splits, under/overfitting, etc.).
- Initialised the NN to have a much better initial loss (no hockey stick 'easy gains'!) (you should always have some rough idea of what initial loss should look like).
I've completed videos 5-7, but I'll write up their summaries later because... procrastination?