-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vw core CLI: --nn data-leak: SGD update is incorrect #4614
Comments
Can be reproduced on simpler configuration:
command line:
Looking. |
Can you please elaborate your suggestion?
Here is what happens with all weights of single unit network after seeing the sequence of a/x, b/x, a/x features.
Output (please see the fact that "x" weight stays the same after seeing example with different namespace, OutputWeight got changed after second a/x example and OutputLayerConst is always getting changed):
|
Hi Alexey, First, thanks so much for looking into this. I understand that the hidden layer currently has no notion of name-space isolation. Not sure if this is too involved (won't fix, or "it is a feature, not a bug") but ideally, This switch would modify the current behavior to skip (not update) any hidden-layer pairing of features where any of the two features crossed is not in the present example name-space. Is this clearer? Too hard/involved to implement? This is not so critical for me. I figured out a way around the issue. Just thought it would be valuable to report this anyway since this behavior is unexpected in the ways demonstrated: (1) non-monotonic convergence (2) behavior unlike with other reductions. Thanks again! |
It sounds like it means that hidden layer is never going to be updated? But maybe I am missing something. |
Fixed the link, which wasn't working. Thanks so much for the invite. Sorry, I can't make it. So from what you say, I understand each node in the hidden layer is fully connected (to all possible inputs).
I realize this may not be trivial. |
Describe the bug
Using
--nn
causes an unexpected data-leak between separate name-spacesThe update goes the wrong way (against the desired gradient towards minimum loss).
This is especially important for correctness of learning from time-series data-sets.
Examples coincident in time should not be able to interact.
The current vw update is violating this principle when
--nn
is used.i.e. learning results are not generalizable when
--nn
is used with a time series + coincident examples (separated using separate name-spaces) after sorting by time.How to reproduce
Full code and data to reproduce is here:
https://github.com/arielf/vw-bugs/tree/main/nn-data-leak
Version
9.7.0 (git commit: fc9ab25)
OS
Linux, Ubuntu 20.04
Language
CLI/ C++
Additional context
I tried my best to explain the bug here.
https://github.com/arielf/vw-bugs/tree/main/nn-data-leak
My guess is that the leak happens through the full-connectivity of the features via the hidden layer.
Since the full connectivity is a done-deal (imposed at the start of run by the fact we want a fully-connected NN.)
It seems to me that the SGD update should somehow skip updates to weights that have nothing to do with the ones in the example. IOW the skips should be in run-time (rather than initialization time) and should update only those target feature-nodes that are present in the current example (and/or namespace).
Ideally, this skip vs non-skip (current default) should be controlled by a CLI switch.
--respect_namespaces
--restricted_update
or something like this.Would appreciate taking a look, thanks!
The text was updated successfully, but these errors were encountered: