A WIP project featuring implementation of GPT and related transformer models from scratch. Inspired by Andrej Karpathy's famous "Let's Build GPT" tutorial. This repository is actively evolving.
- Decoder implementation of the Transformer
- Multi-head self-attention (parallel processing)
- The Transformer block: connection followed by computation
- Text generator based on a context
- Byte Pair Encoding (BPE) algorithm for tokenization, popularized by the GPT-2 paper
- Custom dataset training
- Pretraining and fine-tuning experiments
- Implementation of "Encoder" block
Contributions and feedback are welcome!