This is an old repository and code is not working properly. Sequences of (s,a,r) should be batched properly with timestep dimension.
A2C with ConvLSTM agent playing Starcraft 2 (DeepMind's FullyConv LSTM)
Synchronous Advantage Actor Critic (synchronous variation of the A3C) with Convolutional LSTM playing Starcraft 2 using DeepMind's API pysc2.
The code is based on pekaalto's FullyConv Net, although there are some modifications of the original version, and there is the ConvLSTM added after the state concatenation. Please note that there is no PPO active here and the code is for experimentation purposes.
- Python 3
- pysc2
- Tensorflow