-
Notifications
You must be signed in to change notification settings - Fork 298
Large Elo fluctuations starting from ID253
Below is a brief summary of our investigation into the issues that started showing up in the Elo graph from ID253. We have been performing a number of tests that require changing parameters, generating new self-play games, and training on those. This process requires many self-play games, so your help generating them is still invaluable! We have some promising leads, but still require more time and testing. Thank you for your patience.
crem's summary:
-
It was discovered that lczero.exe doesn't use rule50 and last all-ones plane (since flip bugfix), but it was filled during training. That was fixed in v0.10, rating jumped for one epoch, then started to fall down.
-
Learning rate changed to 0.0005 / 0.0001 (from 0.001 / 0.0005). Helped a bit for a few epochs, but then it started to drop again.
-
value_loss_weight changed to 0.25. Helped for one epoch so far
-
FPU reduction is disabled in training games (in command line sent from server)
-
Another thing which was introduced in v0.8 but not reverted is Cpuct change from 0.85 to 0.6
-
There are discussions whether we should rollback. If yes, what exactly (only network or also drop training games) and to which point.
-
There are observations that we sample too many positions from the same game (several times more than there are moves in game?). And also that our shuffle buffer is too small
-
There is observation that rule50 plane's average weights are much higher than other planes. No explanation of that. It seems that it was like that even before the rule50 bug, and it seems that it was the highest right after boostrap and it gradually decreases but not much.
-
rule50 plane is the only plane which is not normalized (other planes only contain 0 and 1, rule50 contains numbers from 0 to 99). That doesn't easily explain weights inflation, but probably still worth normalizing.
-
There is a long proposal from Dubslow about development organization and that we have to feature freeze for now. Everyone seems to agree.
-
There is a discussion that test and training data are not correctly separated. https://github.com/glinscott/leela-chess/issues/595
-
After fixing rule50 plane in lczero, lc0 and lczero return nearly identical results from NN and MCTS. Makes them easy to compare, and means that probably no major bugs are left (or both lczero and Lc0 have the same bugs).
Error323's plan: Thanks. Given all that's happened, may I propose the following. (Please correct me if I missed something).
- Adjust training pipeline to same position sampling rate as A0.
- Roll back from last high perf net with sane value head
- Let v0.10 fill 500k window
- Resume training
Make sure we stay as true to A0 as we can. And get crems' Lc0 version as default in there.