This assignment is motivated by Alpha Zero, an artificial intelligence computer program with superhuman level of play in Chess, Shogi and Go. In this assignment, our main purpose is using deep Q-learning, a reinforcement learning algorithm, to build a model for Coganh - a Vietnamese Chess. There is no such model for Coganh built before, so we introduce our own way to train as well as evaluate the model. Although there are some aspects need to be improved and require further experiments, we can conclude that the outcome model, based on our measurement, is acceptable at the moment.
The benchmark of two experimented methods:
git clone https://github.com/edwardly1002/deep-Qlearning-coganh-vietnamese
cd deep-Qlearning-coganh-vietnamese
pip install tensorflow
You may want to install tensorflow-gpu
to accelerate training. Do not forget to uninstall tensorflow
and tensorflow-cpu
if you have already installed any of them.
pip install tensorflow-gpu
A checkpoint of our trained model is located at Google Drive. Download it and put it in the folder named cp
.
You can play against the AI player by execute:
python vshuman.py
You then just need to follow the instruction. Remember that your pieces are labeled 1 and is unchangable. To change your side, you need to refactor the script.
You can observe the games between AI player vs AI player (AIvsAI.py
), Minimax vs Minimax (MvsM.py
), Minimax vs Random (Mvsrandom.py
), AI vs random (vsrandom.py
). For example:
python AIvsAI.py
Train the model by running train_zero.py
python train_zero.py
There are utilities for training I have created as listed below.
- Evaluate the model throughout a large number of games (100) against a random player by using
vsrandom.py
. - Evaluate the model throughout games against Minimax (depth 1 to 4) by using
evaluate.py
. - There are multiple checkpoints during training, you may want to plot the model's efficiency over time. Use
record_vs_random.py
to get the WRRG and WDRRG of checkpoints over checkpoints and then plot them usingplot_WRRD.py
. The result of plotting is saved in folderdocs/images
.
We have written a report located in docs
. It will explain what exactly we are trying to do.
- The OOP architecture of implementation is inspired from CodeLearn.
- Minimax algorithm implemented in
src/Minimax.py
is borrowed from Github Page