Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mate in 1 missed tens of times in training game #644

Open
mooskagh opened this issue May 22, 2018 · 25 comments
Open

Mate in 1 missed tens of times in training game #644

mooskagh opened this issue May 22, 2018 · 25 comments

Comments

@mooskagh
Copy link
Contributor

mooskagh commented May 22, 2018

This game:
http://lczero.org/game/14027814

In the end of the game mate in one was missed like 20 times.
While I understand that with temperature it can do bad moves, it's surely too much in that game and it's certainly something wrong happening.

Would be nice to look into it and try to reproduce.

Btw, gzip file for this particular game was corrupted. Not sure if related, can be written off to faulty hardware, but seems not that plausible.

Username of a person who generated this game is Wil54.

@mooskagh
Copy link
Contributor Author

mooskagh commented May 22, 2018

Another examples from discord:
http://lczero.org/game/13916512
http://lczero.org/game/14109239

It's clearly something broken.
Even with totally broken net, like P=0 for correct move, and P=1 and Q=0.95 for wrong move, it shouldn't fail.
Dirichlet noise would make those probs (in average) P=0.25 and P=0.75, and with such priors, Q=1 vs Q=0.95 would be a huge difference to pick correct move within 800 playouts.

Maybe there is some overflow in temperature code.

@mooskagh
Copy link
Contributor Author

Actually I'm wrong about 0.25. With many move candidates, that 0.25 will be split between all move candidates. But even with probability of 0.01 or so, it should find it.

@mooskagh
Copy link
Contributor Author

Three examples were generated by three different users, all had client v0.10,
nets 321 (game id13916512), 324 (id14027814) and 326 (id14109239)

@gyathaar
Copy link

gyathaar commented May 22, 2018

Looked at the first game you posted..id14027814, found one position you posted where there was a mate in 1 with 2 different moves (but didnt pick it)
Since this was played with lczero I used that...
fen for the position (with 4 moves to create some history):
position fen 4q3/8/8/8/8/K7/2k5/8 b - - 24 136 moves e8c6 a3a2 c6e8 a2a1

started ./lczero -w weights_324.txt.gz -v800 --fpu_reduction=0
(no randomness enabled, otherwise same as training games use.. I wanted to see percents with no noise or randomness)

the selected move Kb3 is far down the list of top moves..

info string Qe6 -> 1 (V: 50.00%) (N: 2.93%) PV: Qe6
info string Qf7 -> 1 (V: 50.00%) (N: 2.39%) PV: Qf7
info string Qg8 -> 1 (V: 50.00%) (N: 2.27%) PV: Qg8
info string Kc1 -> 7 (V: 98.75%) (N: 1.49%) PV: Kc1 Ka2 Qa4+
info string Kd1 -> 8 (V: 98.43%) (N: 1.67%) PV: Kd1 Kb2 Qe3 Kb1
info string Kd2 -> 8 (V: 98.74%) (N: 1.59%) PV: Kd2 Kb2 Qb5+ Ka3
info string Qg6 -> 10 (V: 98.78%) (N: 1.84%) PV: Qg6 Ka2 Qa6+
info string Qh5 -> 11 (V: 98.69%) (N: 2.21%) PV: Qh5 Ka2 Qa5+
info string Kd3 -> 11 (V: 98.87%) (N: 2.12%) PV: Kd3 Kb2 Qa4 Kb1
info string Kb3 -> 12 (V: 99.09%) (N: 2.10%) PV: Kb3 Kb1 Qe1+
info string Qc8 -> 14 (V: 98.66%) (N: 2.82%) PV: Qc8 Ka2 Qa6+
info string Qe2 -> 14 (V: 98.73%) (N: 2.83%) PV: Qe2 Ka2 Qa6+
info string Qd8 -> 14 (V: 98.73%) (N: 2.77%) PV: Qd8 Ka2 Qa5+
info string Qh8+ -> 14 (V: 98.77%) (N: 2.69%) PV: Qh8+ Ka2 Qa8+
info string Qf8 -> 14 (V: 98.77%) (N: 2.72%) PV: Qf8 Ka2 Qa8+
info string Qe3 -> 14 (V: 98.78%) (N: 2.75%) PV: Qe3 Ka2 Qa7+
info string Qe7 -> 15 (V: 98.68%) (N: 3.01%) PV: Qe7 Ka2 Qa7+
info string Qd7 -> 15 (V: 98.72%) (N: 2.89%) PV: Qd7 Ka2 Qa4+
info string Qc6 -> 16 (V: 98.80%) (N: 2.93%) PV: Qc6 Ka2 Qa4+
info string Qb8 -> 17 (V: 98.93%) (N: 3.10%) PV: Qb8 Ka2 Qb2+
info string Kc3 -> 17 (V: 99.00%) (N: 2.96%) PV: Kc3 Ka2 Qe1 Ka3
info string Qe4 -> 19 (V: 98.74%) (N: 3.62%) PV: Qe4 Ka2 Qa4+
info string Qe1+ -> 21 (V: 98.73%) (N: 3.98%) PV: Qe1+ Ka2 Qa5+
info string Qe5+ -> 25 (V: 98.67%) (N: 4.82%) PV: Qe5+ Ka2 Qa5+
info string Qb5 -> 37 (V: 99.06%) (N: 6.14%) PV: Qb5 Ka2 Qb2+
info string Qa8+ -> 91 (V: 100.00%) (N: 8.26%) PV: Qa8+
info string Qa4+ -> 233 (V: 100.00%) (N: 21.10%) PV: Qa4+
info string stm Black winrate 99.17%

the 2 mating moves are the 2 top picks...

interestingly enough.. I first tried this with network 237...
and these were the top 3 picks...

info string Kb3 -> 26 (V: 98.87%) (N: 5.47%) PV: Kb3 Kb1 Qe1+
info string Qa8+ -> 293 (V: 100.00%) (N: 27.33%) PV: Qa8+
info string Qa4+ -> 303 (V: 100.00%) (N: 28.06%) PV: Qa4+
info string stm Black winrate 99.41%

so while 237 was better at picking the 2 mating moves.. it was also more likely to pick the move this training move did?

@mooskagh
Copy link
Contributor Author

So NN is sane.

I seems that it's either problem with temperature or maybe promoted queen?..

Could you it several times with whole move history and --randomize to check which bestmove is returned.

Moves in uci format:

e2e4 d7d6 d2d4 g8f6 b1c3 b8d7 g2g4 h7h6 h2h3 e7e5 g1e2 b7b6 f1g2 c8b7 e1g1 e5d4 e2d4 g7g6 f1e1 d8c8 f2f4 f6h7 e4e5 d6e5 f4e5 b7g2 e5e6 d7e5 e6f7 e8f7 d4e2 g2b7 e1f1 f7g8 e2f4 h7g5 c3d5 g8h7 f4g6 h7g6 c2c4 b7d5 c4d5 g5h3 g1g2 c8g4 d1g4 e5g4 g2h3 g4f6 c1f4 f8d6 h3h4 d6e7 a1e1 f6d5 h4g3 a8f8 f4e5 f8f1 e1f1 h8b8 f1e1 b8c8 e1d1 c7c6 g3h2 e7h4 d1a1 g6f5 e5h8 c8h8 a2a4 f5e5 h2h3 h4f6 a1f1 d5f4 h3g3 h8g8 g3f3 f6h4 f1h1 g8g3 f3f2 g3h3 f2g1 e5d4 h1h3 f4h3 g1h1 h3f4 b2b4 c6c5 h1h2 h4g5 h2g3 g5f6 b4c5 d4c5 g3f2 c5d4 f2f3 f4e6 f3f2 e6c7 a4a5 b6a5 f2f3 c7a8 f3g2 f6d8 g2f2 d4e5 f2e3 e5d5 e3d2 a5a4 d2d3 d8b6 d3d2 d5c4 d2e1 b6d4 e1d1 a7a5 d1c2 h6h5 c2b1 d4a1 b1a1 c4d4 a1a2 d4c3 a2a3 a8b6 a3a2 c3d4 a2b2 a4a3 b2b3 b6d7 b3a4 d7c5 a4b5 d4d5 b5a5 d5e4 a5b6 e4e5 b6a7 c5d3 a7b6 d3b4 b6c7 e5f4 c7d8 f4e5 d8c8 h5h4 c8d7 e5d5 d7d8 d5c5 d8c8 c5d4 c8d8 b4a2 d8e8 d4e4 e8d7 a2c3 d7e6 e4f4 e6f6 f4e4 f6e7 c3a2 e7e6 e4f3 e6d7 f3e4 d7c7 e4e5 c7d7 e5f6 d7d8 f6f5 d8e7 f5g6 e7d8 g6f7 d8c8 a2b4 c8b7 b4d3 b7c7 f7g6 c7d7 d3b2 d7c7 g6f6 c7d6 f6f5 d6e7 h4h3 e7f8 f5f6 f8g8 b2d3 g8h7 h3h2 h7h8 f6e6 h8g7 h2h1 g7g8 h1e1 g8h7 e1h4 h7g8 e6f6 g8f8 h4h6 f8e8 h6g7 e8d8 g7a7 d8c8 d3e5 c8d8 a7a8 d8c7 e5c6 c7b6 a8a7 b6b5 c6a5 b5b4 a5b3 b4c3 a7e7 c3b3 e7e6 b3a4 e6b6 a4a3 f6e5 a3a2 e5f4 a2a1 f4g5 a1a2 g5g4 a2a3 g4g3 a3a2 g3f3 a2a1 b6b7 a1a2 f3e3 a2a3 e3d2 a3a2 d2c2 a2a1 b7b5 a1a2 b5e8 a2a3 e8c6 a3a2 c6e8 a2a1 c2b3 a1b1 b3c3 b1a1 e8a8 a1b1 a8a4 b1c1 a4a1

@killerducky
Copy link
Collaborator

killerducky commented May 22, 2018

training.14027814 is missing
I don't see anything suspicious about 14109239

13916512 I see it play a move (d7d8) that the training says had 0 visits, that should never happen. I remember tilps saying there was a potential problem with the rework I did for temperature, not sure where he wrote that.

Update: see post below. Mistake in the decode script.

@gyathaar
Copy link

added --randomize and -n (like training games)
bestmove e8h5
bestmove e8a4
bestmove e8a8
bestmove e8b8
bestmove c2c3
bestmove e8h8
bestmove e8a8
bestmove e8b8
bestmove e8c8
bestmove e8f8

only picked a mate move in 3 of 10 attempts on same position

@killerducky
Copy link
Collaborator

@gyathaar can you do FEN + 8 moves just to eliminate doubts about that? Also please run with -l log.txt and upload the log somewhere.

@killerducky
Copy link
Collaborator

temperature.cmd:

position startpos moves b2b3 g8f6 c1b2 d7d5 e2e3 c7c6 d2d4 d8a5 d1d2 a5d2 b1d2 c8f5 c2c4 e7e6 a2a3 h7h6 f2f3 f8d6 g1e2 e8g8 e2c1 c6c5 d4c5 d6c5 e1f2 c5e7 c4d5 e6d5 f1d3 f6e8 d3f5 b8c6 c1e2 e8c7 e2d4 h6h5 a1c1 c7e8 d4c6 b7c6 f5d7 c6c5 d7c6 a8d8 a3a4 e8d6 f2e2 d6f5 g2g4 h5g4 f3g4 f5h6 h2h3 d5d4 b2a3 d4e3 d2e4 f7f5 a3c5 e7c5 e4c5 d8c8 c6d5 g8h7 c5e6 f8e8 e2d3 f5g4 c1c8 e8c8 h3g4 h7g6 h1g1 g6f6 g4g5 f6e5 g5h6 e5d5 h6g7 a7a6 a4a5 d5e5 g1h1 e5e6 h1h8 e3e2 h8c8 e2e1q g7g8q e6d7 c8d8 d7c6 g8d5 c6c7 d5d6 c7b7 b3b4 e1f1 d3e4 f1e2 e4f5 e2c2 f5f6 c2c3 f6f7 c3f3 d6f6 f3b3 f6e6 b3f3 f7e7 f3e4 d8d7 b7b8 e6e4 b8c8
d
go nodes 800
undo
go nodes 800
undo
go nodes 800
undo
go nodes 800
undo
go nodes 800
undo
go nodes 800
undo
go nodes 800
undo
go nodes 800
undo
go nodes 800
undo
go nodes 800

Extra debug code in randomize_first_proportionally

+    for (auto a : accum_vector) {
+        myprintf("a %f\n", a);
+    }
+    myprintf("int_limit %d pick %d pick_scaled %f index %ld\n", int_limit, pick, pick_scaled, index);
+    //pick = Random::GetRng().RandInt<std::uint32_t>(int_limit);
+    //myprintf("int_limit %d pick %d pick_scaled %f index %ld (repick)\n", int_limit, pick, pick_scaled, index);
+    //pick = Random::GetRng().RandInt<std::uint32_t>(int_limit);
+    //myprintf("int_limit %d pick %d pick_scaled %f index %ld (repick)\n", int_limit, pick, pick_scaled, index);

rm log.txt; cat temperature.cmd | ./lczero -w ~/lcnetworks/id317 -s1 -t1 -n --randomize -l log.txt

grep index log.txt
int_limit 2147483647 pick 581605597 pick_scaled 0.270831 index 1
int_limit 2147483647 pick 1052307420 pick_scaled 0.490019 index 4
int_limit 2147483647 pick 607105765 pick_scaled 0.282706 index 1
int_limit 2147483647 pick 607105765 pick_scaled 0.282706 index 1
int_limit 2147483647 pick 607105765 pick_scaled 0.282706 index 1
int_limit 2147483647 pick 607105765 pick_scaled 0.282706 index 1
int_limit 2147483647 pick 607105765 pick_scaled 0.282706 index 1
int_limit 2147483647 pick 607105765 pick_scaled 0.282706 index 1
int_limit 2147483647 pick 607105765 pick_scaled 0.282706 index 1
int_limit 2147483647 pick 607105765 pick_scaled 0.282706 index 1

@gyathaar
Copy link

gyathaar commented May 22, 2018

ok.. fen with 8 moves:
8/1q6/8/8/8/8/2k5/K7 b - - 20 134 moves b7b5 a1a2 b5e8 a2a3 e8c6 a3a2 c6e8 a2a1
( this position https://lichess.org/nYsaV2ww#275 )

./lczero -w weights_324.txt.gz --fpu_reduction=0 --randomize -n -t1 -v800 -l issue644.txt

Here are 20 attempts in log file
issue644.txt

grep bestmove issue644.txt | sort | uniq -c | sort -n
1 bestmove c2d3
1 bestmove e8b8
1 bestmove e8d7
1 bestmove e8d8
1 bestmove e8e2
1 bestmove e8e3
1 bestmove e8e4
1 bestmove e8e7
1 bestmove e8h8
2 bestmove e8e5
3 bestmove e8a8
6 bestmove e8a4

picked check mate move 9 out of those 20 attempts

@killerducky
Copy link
Collaborator

I think the repeated numbers out of RNG must be due to the thread_local in the code below. Each search is created on it's own thread. My computer must start reusing the same thread_id, and therefore re-seeding the RNG the same every time? This is maybe not ideal but it's not the cause of this issue.

Random& Random::GetRng(void) {
  // the rng is initialized on first GetRng call which is after the cli parsing.
  static thread_local Random rng{cfg_rng_seed};
  return rng;
}

@killerducky
Copy link
Collaborator

After deleting the thread_local from the RNG code above, I now see the moves are being picked randomly. It looks to be working as intended. I also ran with go nodes 10 to test the 0 visit case, I didn't see any 0 visit moves getting picked.

@mooskagh
Copy link
Contributor Author

@killerducky I've checked training data of that move, and there are many moves with probabilities in range [0%; 1%), and I believe d7d8 is one of them.

So it looks like it's an issue in your tool (something like if (int(probability)) output;) rather than in training data.

@killerducky
Copy link
Collaborator

decode_training.py had this code: if prob > 0.01:. 1/800 is less than 1% so it cut those moves. After fixing this I see d7d8 had 0.9% chance (probably 7/800). So there isn't a clear bug, just a question if this game did more randomness than we expected.

ply 117 move 59 (Not actually part of training data)
us = White won
rule50_count 1 b_ooo b_oo, w_ooo, w_oo 0 0 0 0
  abcdefgh
8 ..k..... .k...... .k...... ........ ...R.... ...R.... ...R.... ...R....
7 ...RK... ...RK... ...RK... .k.RK... .k..K... .k..K... .k...K.. .k...K..
6 p....... p....... p...Q... p...Q... p...Q... p...Q... p...Q... p...Q...
5 P....... P....... P....... P....... P....... P....... P....... P.......
4 .P..Q... .P..Q... .P..q... .P..q... .P..q... .P...... .P...... .P......
3 ........ ........ ........ ........ ........ .....q.. .....q.. .q......
2 ........ ........ ........ ........ ........ ........ ........ ........
1 ........ ........ ........ ........ ........ ........ ........ ........
   reps 0   reps 0   reps 0   reps 0   reps 0   reps 0   reps 0   reps 0
e4b7 30.3%
e4a8  8.8%
e4c6  4.6%
...
d7d6  1.0%
d7d4  0.9%
d7d8  0.9%
d7c7  0.8%
d7d5  0.6%
e7f6  0.5%
e7f7  0.5%
e4f4  0.1%
e4e5  0.1%
d7b7  0.1%

@mooskagh
Copy link
Contributor Author

Fyi I ran this 800 times:

$ cat /tmp/uci | ./lczero -w ~/my/lc0/nets/id324.gz  --randomize -t 1 2> /dev/null | grep bestmove

(without Dirichlet noise, but with temperature)
and distribution of bestmove is very-very similar to distribution of visits for that position.
So no obvious temperature bugs...

While immediate checkmate has win probability 100%, being a queen up gives a win probability ~99.9% for other moves, which is good enough to attract many visits and pick that move.

I expect that similar thing happens with QRK vs K games, it's fine to drop queen if rook is enough for win.

contents of /tmp/uci

position startpos moves e2e4 d7d6 d2d4 g8f6 b1c3 b8d7 g2g4 h7h6 h2h3 e7e5 g1e2 b7b6 f1g2 c8b7 e1g1 e5d4 e2d4 g7g6 f1e1 d8c8 f2f4 f6h7 e4e5 d6e5 f4e5 b7g2 e5e6 d7e5 e6f7 e8f7 d4e2 g2b7 e1f1 f7g8 e2f4 h7g5 c3d5 g8h7 f4g6 h7g6 c2c4 b7d5 c4d5 g5h3 g1g2 c8g4 d1g4 e5g4 g2h3 g4f6 c1f4 f8d6 h3h4 d6e7 a1e1 f6d5 h4g3 a8f8 f4e5 f8f1 e1f1 h8b8 f1e1 b8c8 e1d1 c7c6 g3h2 e7h4 d1a1 g6f5 e5h8 c8h8 a2a4 f5e5 h2h3 h4f6 a1f1 d5f4 h3g3 h8g8 g3f3 f6h4 f1h1 g8g3 f3f2 g3h3 f2g1 e5d4 h1h3 f4h3 g1h1 h3f4 b2b4 c6c5 h1h2 h4g5 h2g3 g5f6 b4c5 d4c5 g3f2 c5d4 f2f3 f4e6 f3f2 e6c7 a4a5 b6a5 f2f3 c7a8 f3g2 f6d8 g2f2 d4e5 f2e3 e5d5 e3d2 a5a4 d2d3 d8b6 d3d2 d5c4 d2e1 b6d4 e1d1 a7a5 d1c2 h6h5 c2b1 d4a1 b1a1 c4d4 a1a2 d4c3 a2a3 a8b6 a3a2 c3d4 a2b2 a4a3 b2b3 b6d7 b3a4 d7c5 a4b5 d4d5 b5a5 d5e4 a5b6 e4e5 b6a7 c5d3 a7b6 d3b4 b6c7 e5f4 c7d8 f4e5 d8c8 h5h4 c8d7 e5d5 d7d8 d5c5 d8c8 c5d4 c8d8 b4a2 d8e8 d4e4 e8d7 a2c3 d7e6 e4f4 e6f6 f4e4 f6e7 c3a2 e7e6 e4f3 e6d7 f3e4 d7c7 e4e5 c7d7 e5f6 d7d8 f6f5 d8e7 f5g6 e7d8 g6f7 d8c8 a2b4 c8b7 b4d3 b7c7 f7g6 c7d7 d3b2 d7c7 g6f6 c7d6 f6f5 d6e7 h4h3 e7f8 f5f6 f8g8 b2d3 g8h7 h3h2 h7h8 f6e6 h8g7 h2h1q g7g8 h1e1 g8h7 e1h4 h7g8 e6f6 g8f8 h4h6 f8e8 h6g7 e8d8 g7a7 d8c8 d3e5 c8d8 a7a8 d8c7 e5c6 c7b6 a8a7 b6b5 c6a5 b5b4 a5b3 b4c3 a7e7 c3b3 e7e6 b3a4 e6b6 a4a3 f6e5 a3a2 e5f4 a2a1 f4g5 a1a2 g5g4 a2a3 g4g3 a3a2 g3f3 a2a1 b6b7 a1a2 f3e3 a2a3 e3d2 a3a2 d2c2 a2a1 b7b5 a1a2
go nodes 800
quit

@so-much-meta
Copy link

I did take a look at 10k training games (latest batch at the time - ~16 hours ago), and found that no moves were selected with 0 probability (i.e., 0 node counts)... One thing that is probably irrelevant, but just a little interesting, is that there seemed to be a lot of probabilities of 1/799 - not 1/800 as I'd expect for nodes visited 800 times. More rarely, there were some probabilities less than 1/800 -- I think the smallest was like 1/885, suggesting a node that was visited about 885 times... I plan on looking more closely at all of this this evening, and verifying that the distribution of selected moves matches what I'd expect based on the probabilities - but a quick check showed that it all looked reasonable.

@mooskagh
Copy link
Contributor Author

mooskagh commented May 23, 2018 via email

@Tilps
Copy link
Contributor

Tilps commented May 24, 2018

799 is definitely expected, its actually the 800 which have been confusing me, I didn't manage to work out why those would be happening, even when tree reuse was enabled, and a sequence of completely forced moves, it shouldn't have started accumulating beyond the limit unless there was multiple threads.

@so-much-meta
Copy link

Yeah, I do see some outliers on either side (node visits)... I'm looking at games14280000.tar.gz

Total Positions: 1323099
799: 1320116
0..499: 0
500..798: 223
800+: 2760
850+: 134

The outliers do seem to happen in groups -- so most likely same user, same games, same engine... It looks like that would signal a possible engine bug, but maybe pretty minor in severity given the counts.

@so-much-meta
Copy link

Here's an example of one of the groups of outliers. It has both high and low visit counts. It's one of only 4 groups like this in the 1.3M positions from 10k games in training data I looked at.

1st number is position index, 2nd number is calculated number of visits, then the minimum and maximum of the policy distribution.

590801 853
==> [max,min] probs: [0.37983587 0.00117233]
590802 716
==> [max,min] probs: [0.67877096 0.00139665]
590803 868
==> [max,min] probs: [0.531106   0.00115207]
590804 643
==> [max,min] probs: [0.81804043 0.00155521]
590805 618
==> [max,min] probs: [0.8721683  0.00161812]
590807 755
==> [max,min] probs: [0.69933772 0.0013245 ]
590808 1086
==> [max,min] probs: [0.96869242 0.00184162]
590809 551
==> [max,min] probs: [0.98911071 0.00181488]
590810 866
==> [max,min] probs: [0.42725173 0.00115473]
590811 591
==> [max,min] probs: [0.94585449 0.00169205]
590812 672
==> [max,min] probs: [0.75595236 0.0014881 ]
590814 718
==> [max,min] probs: [0.71587741 0.00139276]
590816 717
==> [max,min] probs: [0.70850766 0.0013947 ]
590818 741
==> [max,min] probs: [0.58164644 0.00134953]
590819 580
==> [max,min] probs: [0.92241377 0.00172414]
590820 649
==> [max,min] probs: [0.75654852 0.00154083]
590821 554
==> [max,min] probs: [0.98014438 0.00180505]
590825 596
==> [max,min] probs: [0.86073828 0.00167785]
590826 1072
==> [max,min] probs: [0.99813432 0.00186567]
590827 636
==> [max,min] probs: [0.78144652 0.00157233]
590830 606
==> [max,min] probs: [0.9092409  0.00165017]
590831 702
==> [max,min] probs: [0.6837607 0.0014245]
590832 576
==> [max,min] probs: [0.91666669 0.00173611]
590833 1078
==> [max,min] probs: [0.9888683  0.00185529]
590834 1092
==> [max,min] probs: [0.99267399 0.0018315 ]
590835 603
==> [max,min] probs: [0.85406303 0.00165837]
590840 710
==> [max,min] probs: [0.67605633 0.00140845]
590841 665
==> [max,min] probs: [0.71879697 0.00150376]
590842 658
==> [max,min] probs: [0.7401216  0.00151976]
590845 568
==> [max,min] probs: [0.95070422 0.00176056]
590847 911
==> [max,min] probs: [0.61580682 0.00109769]
590849 1060
==> [max,min] probs: [0.99622643 0.00188679]
590853 647
==> [max,min] probs: [0.98608965 0.0015456 ]
590854 608
==> [max,min] probs: [0.90296054 0.00164474]
590856 863
==> [max,min] probs: [0.2410197  0.00926999]
590858 856
==> [max,min] probs: [0.64602804 0.35397196]
590859 893
==> [max,min] probs: [0.54535276 0.00111982]

@so-much-meta
Copy link

And here's the code I used to calculate node visits. About 70% of them can be found by just doing 1/min(policy)... But if that number turns out to be too small (suggesting multiple visits to the least visited node), it starts to get a bit more complicated. Don't know if this is right, but it's at least close.

from itertools import chain
visits_list = []
eps = 0.00005
one_visit_count = 0
for idx, dist in enumerate(all_probs):
    visits = round(1/min(dist))
    if 799<=visits:
        one_visit_count += 1
    else:
        arr = np.array(dist)
        for visits in chain(range(799, 900),
                            range(700,799),
                            range(900,1000),
                            range(600, 700),
                            range(1000,1100),
                            range(500,600)):
            err = np.sqrt(np.mean((np.round(arr*visits) - (arr*visits))**2))
            if err<eps:
                break
        else:
            raise Exception("oops")
    if visits<799 or visits>850:
        print(idx, visits)
        print("==> [max,min] probs: {}".format(np.array([max(dist), min(dist)])))
    visits_list.append(visits)
print("Total positions: {}".format(len(visits_list)))
print(pd.DataFrame(visits_list, columns=['visits']).groupby('visits').size())

@so-much-meta
Copy link

Ok, more data points... The game indexes with the outliers were as follows: 4479, 4513, 7156, and 8439... That should correspond to:
http://lczero.org/game/14284479
http://lczero.org/game/14284513
http://lczero.org/game/14287156
http://lczero.org/game/14288439

Although the last two look normal, the first two appear to have moves encoded in UCI, whereas everything else is SAN... I don't think this is right??

@mooskagh
Copy link
Contributor Author

Those are games by http://lczero.org/user/ignore
I believe that is @jnewlin12345 testing lc0 client in #647.
It would be nice to have a separate testing for test, but it's not trivial to set up because there are many hardcoded things there (I gave up after ~1 hour), and we decided that for small amount of games it's fine to use production server.

But it shows that there is error in lc0 somewhere. :) While <800 visits I can explain with --smart-pruning which will be off by default but currently is not, large visits values are harder to explain.

Slightly above 800 values can also be explained with default batch size 256 (will also be changed).

@jnewlin12345
Copy link

@mooskagh I did run a couple of games with --smart-pruning

@so-much-meta
Copy link

so-much-meta commented May 24, 2018

Ah nice - makes sense then :). Also, glad to hear lc0 has this type of pruning feature - though I’m curious how that might be used during training without excessively flattening out policies.

And I think there is a mistake in the numbers above. Some of the ones above 1000 should actually be half of what I calculated. You can verify this simply by doing 1/(min_policy)... This keeps everything consistent with pruning and 256 batch size, I think.... Reason for this is I searched for candidate visit counts between 1000-1100 before 500-600.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants