This repository provides a modified implementation of the decoder component of the architecture described in the paper: Attention is all you need [1]. In addition, example usage is provided by means of a language model tranined on a small baking cookbook.
The architecture implemented in this repository is similiar to that described in the paper: Improving Language Understanding by Generative Pre-Training [2], in that it can be thought of as a standalone decoder component from [1] that omits the attention layer which interfaces with an encoder. However, whereas the implementation in [2] makes some additional architectural changes, this implementation stays true to the original description in [1].
Figure 1: Left: The original transformer from [1], Middle: The implementation in this repository, Right: OpenAI's transformer from [2].
- Keras/Tensorflow implementation of the architecture in Figure 1.
- Language model trained on a small baking cookbook.
-
git clone [email protected]:coxy1989/superconv.git
-
cd tfmr
-
conda env create -f environment.yml
-
source activate tfmr
-
python modules/language_model.py
[1] Vaswani et al. Attention Is All You Need. arXiv:1706.03762, 2017.
[2] Radford et al. Improving Language Understanding by Generative Pre-Training. OpenAI, 2018.