-
Notifications
You must be signed in to change notification settings - Fork 298
Large Elo fluctuations starting from ID253
Below is a brief summary of our investigation into the issues that started showing up in the Elo graph from ID253. We have been performing a number of tests that require changing parameters, generating new self-play games, and training on those. This process requires many self-play games, so your help generating them is still invaluable! We have some promising leads, but still require more time and testing. Thank you for your patience.
crem's summary:
-
It was discovered that lczero.exe doesn't use rule50 and last all-ones plane (since flip bugfix), but it was filled during training. That was fixed in v0.10, rating jumped for one epoch, then started to fall down.
-
Learning rate changed to 0.0005 / 0.0001 (from 0.001 / 0.0005). Helped a bit for a few epochs, but then it started to drop again.
-
value_loss_weight changed to 0.25. Helped for one epoch so far (UPD 2018-05-14: for 4 epochs so far)
-
FPU reduction is disabled in training games (in command line sent from server)
-
Another thing which was introduced in v0.8 but not reverted is Cpuct change from 0.85 to 0.6
-
There are discussions whether we should rollback. If yes, what exactly (only network or also drop training games) and to which point.
-
There are observations that we sample too many positions from the same game (several times more than there are moves in game?). And also that our shuffle buffer is too small
-
There is observation that rule50 plane's average weights are much higher than other planes. No explanation of that. It seems that it was like that even before the rule50 bug, and it seems that it was the highest right after boostrap and it gradually decreases but not much.
-
rule50 plane is the only plane which is not normalized (other planes only contain 0 and 1, rule50 contains numbers from 0 to 99). That doesn't easily explain weights inflation, but probably still worth normalizing.
-
There is a long proposal from Dubslow about development organization and that we have to feature freeze for now. Everyone seems to agree.
-
There is a discussion that test and training data are not correctly separated. https://github.com/glinscott/leela-chess/issues/595
-
After fixing rule50 plane in lczero, lc0 and lczero return nearly identical results from NN and MCTS. Makes them easy to compare, and means that probably no major bugs are left (or both lczero and Lc0 have the same bugs).
Error323's plan: Thanks. Given all that's happened, may I propose the following. (Please correct me if I missed something).
- Normalize 50 rule plane
- Adjust the trainingpipeline to same position samplingrate as A0.
- Roll back from last high perf net with sane value head
- Let v0.10 fill the training window
- Resume training
Enforce lc0 asap, and I mean ASAP. Without OpenCL support. I know this is somewhat controversial and problematic to AMD users. But 1) It's so much faster and getting games is our #1 priority (right after being bugfree) 2) None of the people here are familiar with OpenCL and so we are making ourselves dependent on external factors which is never a good idea. 3) It's easier to maintain and understand for newcomers.
Take the hyperparameter settings union of A0 and AG0 (in accordance with A0) and make sure we replicate the work by Deepmind as faithfully as possible. I strongly believe this is our best shot at success, as none of us has the amount of compute to scientifically propose superior hyperparameters. Even though we still miss a subset of all parameters, this is still easier to control than also deviating from the ones we do know (even if it will produce faster convergence in theory).