personal repo for transformers code
- probably not well-organized or optimized, but getting there
models.py
: Transformer implementations with GQA and MLA attention mechanisms, RoPE embeddingstrain.py
: Training loop and logging logicdata.py
: Basic character-level tokenization and dataset handlingutils.py
: Configuration dataclasses and JSON loadinginference.py
: Text generation utilitiesconfig/
: JSON files for model architecture and training parameters
To train on the tiny shakespeare dataset:
python train.py --model-config config/model_configs/ss_small.json \
--training-config config/training_configs/ss_small.json \
--dataset ./datasets/tinyshakespeare.txt
Visualize training with tensorboard (loss curves and text samples are saved):
tensorboard --logdir runs/