Skip to content

Latest commit

 

History

History
70 lines (37 loc) · 4.21 KB

README.md

File metadata and controls

70 lines (37 loc) · 4.21 KB

Quickstart

Installation

This repo has been tested with tensorflow versions 2.9-2.15. If you want to use the newest tensorflow version (which handles all the cuda stuff for you), run pip install -r requirements.txt. Otherwise, you'll need to install tensorflow-addons, which requires tensorflow<=2.12. To use an older version, run pip install -r trequirements_tf_2_9.txt Note that only tensorflow versions <2.11 support Windows natively. Run ./init.sh to compile the protobuf files.

Data preparation

In order to start a training session you first need to download training data from https://storage.lczero.org/files/training_data/. Several chunks/games are packed into a tar file, and each tar file contains an hour worth of chunks. Preparing data requires the following steps:

wget https://storage.lczero.org/files/training_data/training-run1--20200711-2017.tar
tar -xzf training-run1--20200711-2017.tar

Training pipeline

Now that the data is in the right format one can configure a training pipeline. This configuration is achieved through a yaml file, see training/tf/configs/example.yaml:

The configuration is pretty self explanatory, if you're new to training I suggest looking at the machine learning glossary by google. Now you can invoke training with the following command:

./train.py --cfg configs/example.yaml --output /tmp/mymodel.txt

This will initialize the pipeline and start training a new neural network. You can view progress by invoking tensorboard:

tensorboard --logdir leelalogs

If you now point your browser at localhost:6006 you'll see the trainingprogress as the trainingsteps pass by. Have fun!

Restoring models

The training pipeline will automatically restore from a previous model if it exists in your training:path as configured by your yaml config. For initializing from a raw weights.txt file you can use training/tf/net_to_model.py, this will create a checkpoint for you.

Supervised training

Generating trainingdata from pgn files is currently broken and has low priority, feel free to create a PR.

What's new here?

I've added a few features and modules to the training pipeline. Please direct any questions to the Leela Discord server.

Architectural improvements

Replacing the smolgen layer used by the second, third, and fourth models in the BT series, we're now using learned relative position encodings.

Quality of life

There are three quality of life improvements: a progress bar, new metrics, and pure attention code

Progress bar: A simple progress bar implemented in the Python rich module displays the current steps (including part-steps if the batches are split) and the expected time to completion.

Pure attention: The pipeline no longer contains any code from the original ResNet architecture. This makes for clearer yamls and code. The protobuf has been updated to support smolgen, input gating, and the square relu activation function.

More metrics

I've added train value accuracy and train policy accuracy for the sake of completeness and to help detect overfitting. The speed difference is negligible. There are also three new losses metrics to evaluate policy. The cross entropy we are currently using is probably still the best for training, though we could try instead to turn the task into a classification problem, effectively using one-hot vectors at the targets' best moves, though this would run the risk of overfitting.

Thresholded policy accuracies: the thresholded policy accuracy @x% is the percent of moves for which the net has policy at least x% at the move the target thinks is best.

Reducible policy loss is the amount of policy loss we can reduce, i.e., the policy loss minus the entropy of the policy target.

The search policy loss is designed to loosely describe how long it would take to find the best move in the average position. It is implemented as the average of the multiplicative inverses of the network's policies at the targets' top moves, or one over the harmonic mean of those values. This is not too accurate since the search algorithm will often give up on moves the network does not like unless they provide returns that the network can immediately recognize.