You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here log_likelihood is the unnormalized score for each of the samples, and depends on the number of timesteps for a batch.
For eg:
Batch 1: Max sequence length (timesteps) = 100
Batch 2 : Max sequence length (timesteps) = 1000
The scale of loss values will be considerably different for both batches
Shouldn't the loss be:
loss = tf.reduce_mean(-log_likelihood / tf.shape(self.logits)[1])
tf.shape(self.logits)[1] => This represents the max timesteps (seq length) for that batch.
Hence this makes the loss independent of the sequence length.
The text was updated successfully, but these errors were encountered:
loss = tf.reduce_mean(-log_likelihood)
Here log_likelihood is the unnormalized score for each of the samples, and depends on the number of timesteps for a batch.
For eg:
Batch 1: Max sequence length (timesteps) = 100
Batch 2 : Max sequence length (timesteps) = 1000
The scale of loss values will be considerably different for both batches
Shouldn't the loss be:
loss = tf.reduce_mean(-log_likelihood / tf.shape(self.logits)[1])
tf.shape(self.logits)[1] => This represents the max timesteps (seq length) for that batch.
Hence this makes the loss independent of the sequence length.
The text was updated successfully, but these errors were encountered: