Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not detach the hidden state of GRU from the computational graph? #161

Open
MejiroSilence opened this issue Dec 19, 2024 · 0 comments
Open

Comments

@MejiroSilence
Copy link

In RNNs, gradients accumulate over time steps. If the sequence is long, gradients can become very large (exploding gradients) or very small (vanishing gradients), leading to unstable training or difficulty in convergence. Detaching the hidden state can limit gradient propagation within each time step, preventing gradient accumulation over the entire sequence, thus mitigating exploding/vanishing gradient problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant