Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement bayesian optimization for parameter tuning #793

Closed
wakamex opened this issue Mar 10, 2019 · 4 comments
Closed

implement bayesian optimization for parameter tuning #793

wakamex opened this issue Mar 10, 2019 · 4 comments

Comments

@wakamex
Copy link

wakamex commented Mar 10, 2019

we already have difficulty tuning existing parameters (as in pr #755), with several new ones just introduced (pr #750), and at least another waiting in pr #791

deepmind and others had success with bayesian optimization for efficient tuning of large numbers of parameters. deepmind optimised between 3 and 8 (deepmind paper)

applied to SF it approximated major piece value (4 params) in less than 10 mins, to only 14 less elo, LOS 25%, see discussion

it seems this could be easily applied to lc0, including upcoming PRs. i'll try to get something working using the linked resources. would be great if anyone else wants to try or share their results.

it also seems possible to me to even distribute trials through the client, centralising results on the server to determine tuned values, and distribute further trials for evaluation, if any. though I know nothing about how the client/server works, and it seems you can get good results locally as well.

however deepmind seems confident tuning greatly improved training, not just match play, by applying it between subsequent versions. from the paper (my emphasis):

3.2 Task 2: Tuning fast AlphaGo players for data generation We generated training datasets for the policy and value networks by running self-play games with a very short search time, e.g., 0.25 seconds in contrast to the regular search time. The improvement of AlphaGo over various versions depended on the quality of these datasets. Therefore, it was crucial for the fast players for data generation to be as strong as possible. Under this special time setting, the optimal hyper-parameters values were very different, making manual tuning prohibitive without proper prior knowledge. Tuning the different versions of the fast players resulted in Elo gains of 300, 285, 145, and 129 for four key versions of these players.

initial dicussion on leela-zero
leela-zero
further discussion on fishcooking:
fishcooking

@jhorthos
Copy link
Contributor

this is a good idea and i will try to educate myself on what is involved in efficient tuning. we already know of one obvious thing, which is to reduce or eliminate temperature. so far i have used CLOP one by one for some parameters and I can say with some confidence that if we stick with Fpu absolute, -0.7 is a better setting than -1.0.

@fischerandom
Copy link
Contributor

@Naphthalin
Copy link
Contributor

currently @kiudee is using his library with a slightly more advanced technique than this for parameter tuning, so we are already doing this -- maybe add some documentation on the process or at least linking to @kiudee's repo?

@mooskagh
Copy link
Member

mooskagh commented May 9, 2020

I think that's implemented by @kiudee.
Documenting it is probably out of scope of this issue, although it's a good idea.

@mooskagh mooskagh closed this as completed May 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants