Skip to content
This repository has been archived by the owner on Nov 12, 2021. It is now read-only.

The difference in 5 equation in paper and code #35

Open
Tismoney opened this issue Jan 29, 2020 · 3 comments
Open

The difference in 5 equation in paper and code #35

Tismoney opened this issue Jan 29, 2020 · 3 comments

Comments

@Tismoney
Copy link

In original paper the last rule of update node vector is: $v_i^t = (1 - z_{s, i}^t) \odot v_i^{t-1} + z_{s, I}^t \odot v_i^t$. But following the code, the rules is not the same:

inputgate = torch.sigmoid(i_i + h_i)
newgate = torch.tanh(i_n + resetgate * h_n)
hy = newgate + inputgate * (hidden - newgate)

Code's rule is: $v_i^t = (1 - z_{s, i}^t) \odot v_i^t + z_{s, I}^t \odot v_i^{t-1}$. The difference is swap v_i^t and v_i^{t-1}.

Is it mistake in paper or in code?

@Tismoney Tismoney changed the title The difference in 5 equ The difference in 5 equation in paper and code Jan 29, 2020
@Tismoney
Copy link
Author

As I understand, there is not mistake conceptually. In paper $z$ learns how much information save, but in code it is responsible how much information forget. Am I right?

@yichudu
Copy link

yichudu commented Aug 3, 2020

As I understand, there is not mistake conceptually. In paper $z$ learns how much information save, but in code it is responsible how much information forget. Am I right?

the readability of the code in this repository is too bad.

@wangzglife
Copy link

I agree with you, I also found that this may be a mistake

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants