Skip to content

Latest commit

 

History

History
8 lines (5 loc) · 801 Bytes

nmt-rec-attention.md

File metadata and controls

8 lines (5 loc) · 801 Bytes

TLDR; The standard attention model does not take into account the "history" of attention activations, even though this should be a good predictor of what to attend to next. The authors augment a seq2seq network with a dynamic memory that, for each input, keep track of an attention matrix over time. The model is evaluated on English-German and Englih-Chinese NMT tasks and beats competing models.

Notes

  • How expensive is this, and how much more difficult are these networks to train?
  • Sequentiallly attending to neighboring words makes sense for some language pairs, but for others it doesn't. This method seems rather restricted because it only takes into account a window of k time steps.