Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to do better at estimating playouts remaining, while requiring less time to do so. #582

Open
wants to merge 3 commits into
base: next
Choose a base branch
from

Conversation

Tilps
Copy link
Contributor

@Tilps Tilps commented May 11, 2018

Implementing the idea in #581. A key aspect here is that we shift the start time later, but we don't decrement the playouts by how many there are at that start time. This means we generally shift from an underestimate that converges upwards, to an overestimate which mostly converges downwards and so inaccuracy due to early calculation is less likely to cause us to prune early.

I don't actually know if this provides an Elo win yet, but it does seem to provide an improvement in ability to estimate playouts at short time scales. I'm running a self-play tournament on 1+1 to start.

I think the logic may not currently be very sound with large thread count of slow evals though, if each of your threads can only do 50nps, but you have 40+ of them (aka TCEC), 10 playouts/10ms is basically not going to limit anything. I need to think more about how to scale those constants with thread count.

@Tilps
Copy link
Contributor Author

Tilps commented May 11, 2018

Currently +100 Elo after 25 games with network 245 at 1+1. (Large error bars.)

// Until we reach 1 second or 100 playouts playout_rate
// is not reliable, so just return max.
} else if (elapsed_millis < 10 || playouts < 10) {
// Until we reach 10 millisecond and 10 playouts playout_rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you changed the comment from "or" to "and" but the logic remains ||, not &&

Copy link
Contributor

@jjoshua2 jjoshua2 May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a or b == !(!a and !b)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the existing comment was wrong.

@Tilps
Copy link
Contributor Author

Tilps commented May 11, 2018

Final results after 150 games not very convincing. Only +15 Elo, with error bars which include no improvement.

@Tilps
Copy link
Contributor Author

Tilps commented May 12, 2018

I suspect that this approach has improved the estimate too well, and that it might need a reduction multiplier like jjoshua2 has in his tuning PRs to compensate for the unlikelyhood that every visit goes to one of the trailing ones.

@jjoshua2
Copy link
Contributor

@zz tried clop tuning a combination of my patch and this, and it came out with 100 for slow mover 1.4 for time multipler and 1.0 for pruning factor. Which I was very happy with because it makes a lot of sense.

@Tilps
Copy link
Contributor Author

Tilps commented May 12, 2018

Sounds good. I think this PR will probably be good to go if I add a multiplier to the minimum playouts equal to the number of threads. (I assume no one will ever set the threads at ridiculously large levels compared to their actual hardware capacity.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants