Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust root magic numbers to help search find moves for training #699

Open
Mardak opened this issue Jun 1, 2018 · 1 comment
Open

Adjust root magic numbers to help search find moves for training #699

Mardak opened this issue Jun 1, 2018 · 1 comment

Comments

@Mardak
Copy link

Mardak commented Jun 1, 2018

As an alternative to #698 for those who don't like the inelegance of just forcing two visits per root move, there's existing numbers that affect root's search: epsilon, alpha, fpu, etc. For those, they're currently 0.25, 0.3, and 0.0 respectively.

Using the same game from the initial comment for analysis trying to find Rxh4 https://clips.twitch.tv/NimbleLazyNewtPRChase:

position startpos moves d2d4 d7d5 c1f4 g7g6 e2e3 g8f6 c2c4 c7c5 d4c5 f8g7 b1c3 d8a5 c4d5 f6d5 d1d5 g7c3 b2c3 a5c3 e1e2 c3a1 f4e5 a1b1 e5h8 c8e6 d5d3 b1a2 e2f3 f7f6 h8g7 b8d7 f3g3 a8c8 c5c6 c8c6 d3d4 c6d6 d4b4 d6b6 b4h4 d7c5 h2h3 b6b2 g1e2 a2d5 g3h2 d5e5 e2g3 h7h5 h4d4 e5d4 e3d4 c5b3 g7h6 h5h4 g3e4 g6g5 f1d3 b3d4 h1a1 a7a6 e4c5 b2f2 d3e4 e6f5 e4b7 f2c2 a1a4 d4e2 c5e4 f5e4 b7e4 c2c1 e4d3 e2f4 d3a6 f4h5

screen shot 2018-05-31 at 10 38 50 am

Doing 200 runs of go nodes 800, here's how many that end up searching Rxh4 deeper (and becomes the most visited move):

epsilon alpha fpu result of 200 max prior
0.25 0.3 0.0 65 13.14%
0.25 0.6 0.0 79 10.43%
0.25 0.9 0.0 76 5.15%
0.25 3.0 0.0 83 3.71%
0.5 0.3 0.0 91 23.64%
0.5 0.6 0.0 111 17.49%
0.5 0.9 0.0 122 16.15%
0.5 3.0 0.0 162 5.22%
0.25 0.3 -0.1 77 10.97%
0.25 0.6 -0.1 81 10.22%
0.25 0.9 -0.1 97 7.79%
0.25 3.0 -0.1 100 3.82%
0.5 0.3 -0.1 101 18.36%
0.5 0.6 -0.1 116 16.57%
0.5 0.9 -0.1 126 14.62%
0.5 3.0 -0.1 159 8.17%

Where the negative fpu is from:

diff --git a/src/UCTNode.cpp b/src/UCTNode.cpp
--- a/src/UCTNode.cpp
+++ b/src/UCTNode.cpp
@@ -362,4 +362,6 @@ UCTNode* UCTNode::uct_select_child(Color color, bool is_root) {
     if (!is_root || !cfg_noise) {
         fpu_reduction = cfg_fpu_reduction * std::sqrt(total_visited_policy);
+    } else {
+        fpu_reduction = -cfg_fpu_reduction;
     }
 

I suppose first off, are AZ's numbers resulting in about 1 in 3 games in this same position finding the correct move a desired amount of randomness?

The premise behind this issue and the other issue is that for a self-play to end up in a learnable board state, it seems unfortunate that it misses the opportunity to generate valuable training data for the correct move more often than not. Clearly, AZ's numbers are good enough to eventually generate strong networks, but perhaps training search could be better optimized?

In the table, I also included the max of 200 runs observed prior N for Rxh4, which with id359 is normally 0.33%. As expected from increasing alpha, the max prior decreases as it's spread over other moves, but at least for this move, most times just increasing the prior to 1% is enough for search to direct most of the visits to Rxh4. Of course, setting a negative fpu at noised root helps give at least one visit to each move, but for this board position, 2 visits are needed to realize it's a good move.

Additionally, if these numbers become something that the server can tell the client to use, similar to turning on/off resignation for a portion of game tasks, there could be a mix of epsilon/alpha/fpu numbers.

@Videodr0me
Copy link

Videodr0me commented Jun 1, 2018

I tried a lot of these schemes at root and throughout the tree, unfortunately in self play they are always hugely inferior. Just try the selfplay option of LC0 to test your approach (min 1000 games).
For example with
lc0-cudnn selfplay --parallelism=8 --backend=multiplexing "--backend-opts=cudnn(threads=2)" --games=10000 --visits=100 --temperature=1 --tempdecay-moves=10 player1: --your-modification=1 -player2: --your-modification=0

If you just want to find tactics this might be ok, but be aware that it might be a huge elo loss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants