Authors:
Nathan Guzman | Christopher Nguyen |
---|---|
Stanford University | Stanford University |
This project aims to develop a reinforcement learning (RL) agent inspired by AlphaGo Zero, using self-play and Monte-Carlo Tree Search (MCTS) for the board game Catan. By extending the functionality of MCTS with policy optimization, we guide the simulation policy to make optimal actions rather than random sub-optimal ones. The agent learns to develop strategies through exploration and exploitation, optimizing performance against human competitors. We evaluate the policy optimized MCTS against random MCTS and a heuristic-based AI player in terms of win rate, point-giving buildings built, and average points per game. Results show limited performance improvement due to insufficient training time.
This project explores applying reinforcement learning (RL) to the strategic board game Settlers of Catan. By implementing MCTS enhanced with Proximal Policy Optimization (PPO) during rollout, we aim to develop an AI agent capable of making sophisticated moves, surpassing the random action selection in standard MCTS. Inspired by AlphaGo Zero, we use self-play, exploration, and exploitation to refine the agent's strategies. We evaluate the policy-optimized MCTS against baseline MCTS and heuristic players, aiming to address strategic decision-making challenges in complex environments.
This project was completed as the final project for our CS234 Reinforcement Learning class at Stanford University, which we took in Spring 2024.
Catan is a strategic board game where players build settlements, cities, and roads, competing for resources. Players collect and trade resources like wood, brick, wheat, sheep, and ore, which are produced based on dice rolls corresponding to numbered tiles. The game involves multi-action turns based on resource availability, build locations, and proximity to victory. Our project aims to provide insights into developing RL agents for complex real-world scenarios.
Each game features a unique hex tile layout with associated resources and dice numbers. The goal is to accumulate 10 or more victory points through construction and special achievements, requiring careful resource management, strategic placement, and negotiation.
- Roads: Cost brick and wood, max 15 per player.
- Settlements: Cost brick, wood, wheat, and sheep, max 5 per player.
- Cities: Built on settlements, cost 3 ore and 2 wheat, max 4 per player.
Players can trade resources with the bank or other players. Ports allow resource exchanges at set ratios, e.g., 4:1.
We focus on the Victory Point (VP) card, granting 2 victory points when drawn. Other development cards were not utilized in this project.
A dice roll of 7 allows the current player to move the robber, stealing resources from players adjacent to the chosen tile.
The player with the longest road receives 2 points, which can be stolen by others.
Road, City, and Settlement pieces Resource and Development cards
AlphaGo Zero demonstrated the power of combining MCTS with deep learning, starting from random play without supervision. We adopt a similar approach, focusing on implementing a workable Catan game using self-play and MCTS.
Szita et al. (2010) successfully adapted MCTS for Catan, showing strong performance compared to heuristic-based implementations. We followed a similar approach, focusing on rule changes and action availability.
The core of our project is the implementation of Monte-Carlo Tree Search (MCTS) to develop an AI agent for Catan. Using a pre-implemented game environment, we added MCTS with UCB for decision-making during simulations.
MCTS involves four phases: selection, expansion, simulation, and backpropagation. These phases are iterated to determine the best actions:
- Selection: Traversing the tree to select a promising node.
- Expansion: Adding a new node to the tree.
- Simulation: Running a simulation from the new node to a terminal state.
- Backpropagation: Updating the nodes based on the simulation results.
Phases of the Monte Carlo Tree Search algorithm
Experiments involved two or three players, comparing MCTS and heuristic-based agents over multiple games.
Initial results showed the heuristic-based agent outperformed MCTS. Modifying the reward function improved MCTS performance.
Adjusted rewards increased the MCTS win rate and average points per game.
Testing with three players provided insights into multi-player dynamics.
Reward alignment significantly improved MCTS performance, highlighting the importance of well-designed reward functions.
Catan's complexity, stochasticity, and multi-agent environment posed significant challenges for MCTS implementation.
Reward alignment and variability in performance were key observations, emphasizing the need for further tuning and training.
Future work includes performance tuning, optimizing the reward function, enabling all possible actions, and improving the custom Catan gym environment.
We successfully implemented an MCTS agent for Catan. While performance was limited, the project demonstrated the potential of MCTS in complex multi-player games and provided valuable insights for future improvements.
- Baier, H., & Cowling, P. I. (2018). Evolutionary MCTS for multi-action adversarial games. IEEE Conference on Computational Intelligence and Games (CIG), 1-8.
- Silver, D., Schrittwieser, J., Simonyan, K., et al. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354-359.
- Swiechowski, M., Godlewski, K., Sawicki, B., & Mandziuk, J. (2021). Monte Carlo tree search: A review of recent modifications and applications. arXiv preprint arXiv:2103.04931.
- Szita, I., Chaslot, G., & Spronck, P. (2010). Monte-Carlo tree search in settlers of Catan. Advances in Computer Games, 21-32.
- Vombatkere, K. (2022). Catan-AI. Retrieved from https://github.com/kvombatkere/Catan-AI