This is a small repo where I'll be uploading my own implementation of multiple transformer architectures and cool features found on the latest models.
Development might be slow as this is just a project to tackle my boredom or just curiosity from time to time.
For now the only architecture implemented is the base transformer presented in Attention is all you need paper, I'm not using the same dataset to test though, will change that in the future.
To see how I'm using and training the model you can checkout the opus.ipynb and just start digging into each file if you want to see more!