You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
propose additional "Attentive Recurrent Network(ARN)" to Transformer encoder to leverage the strengths of both attention and recurrent networks
WMT14 EnDe and WMT17 ZhEn demonstrates the effectiveness
study reveals that a short-cut bridge of shallow ARN outperforms deep counterpart
Details
Main Approach
use an additional recurrent encoder to the source side
recurrent model can be simple (a) RNN, GRU, LSTM or (b) Attentive Recurrent Network where context representation is generated via attention with previous hidden state
Impact of Components
ablation study on size of addition recurrent encoder
smaller BiARN encoder attached directly to top of decoder outperforms all others
ablation study on number of recurrent steps in ARN
~8 seems optimal
ablation study on how to integrate representation in the decoder side
stack on top outperformed all others
Overall Result
with additional ARN encoder, BLEU scores improve with statistical significance
Linguistic Analysis
what linguistic characteristics are models learning?
1-Layer BiARN performs better on all syntactic and some semantic tasks
List of Linguistic Characteristics
SeLen : sentence length
WC : recover original words given its source embedding
TrDep : check whether encoder infers the hierarchical structure of sentences
ToCo : classify in terms of the sequence of top constituents
BShif : tests whether two consecutive tokens are inverted
Tense : predict tense of the main-clause verb
SubN : number of main-clause subjects
ObjN : number of the direct object of the main clause
SoMo : check whether some sentences are modified by replacing a random noun or verb
CoIn : two coordinate clauses with half the sentence inverted
Personal Thoughts
Translation requires a complicated encoding function in the source side. Pros of attention, rnn, cnn can be complemented to produce a richer representation
This paper showed that there is a small room of improvement for rnn encoder to play part in Transformer encoder with short-cut trick
Abstract
Details
Main Approach
Impact of Components
Overall Result
Linguistic Analysis
Personal Thoughts
Link: https://arxiv.org/pdf/1904.03092v1.pdf
Authors: Hao et al. 2019
The text was updated successfully, but these errors were encountered: