In this work, we implemented World Models in order to train reinforcement learning agents in the simulated Neurosmash environment. We used background subtraction to enhance the performance of the variational autoencoder component, which allowed the recurrent world model to learn to predict plausible future states. We used a Deep Q Network as the controller component of the World Model setup. We found that both the standard DQN and the DQN with access to the World Model achieved average win rates of over 70%. However, we constructed several baselines, that had no access to any game-state information, and these performed similarly. These findings serve as an indication that the environment is not complex in its current form.
This is the repository for the final project of the (2019) "Neural Information Processing Systems" course at the Radboud University. Please find the remainder of our paper here.
To reproduce our experiments, various steps have to be taken. Please take into account that for steps 1 and 4, an active environment is required and the settings (such as the port number) should be equal to those in the code.
- A training set for our VAE has to be generated by running get_data.
- Run VAE_train to generate two models, first being the VAE without a weighted loss function and the second being the VAE with a weighted loss function.
- Run rnn_VAE.py with the correct VAE model parameters specificied at the top of the file.
- The final step consists of training our controller (i.e. DQN) and is done by running pipeline_DQN. For this we performed four experiments:
- Vanilla DQN can be trained by changing the hyperparameters: USE_WM=False, ZERO_INPUT=False
- World Models can be trained by changing the hyperparameters: USE_WM=True, USE_RNN=True, ZERO_INPUT=False
- Zero input model without RNN can be trained by changing the hyperparameters: USE_WM=True, USE_RNN=False, ZERO_INPUT=True
- Zero input model with RNN can be trained by changing the hyperparameters: USE_WM=True, USE_RNN=True, ZERO_INPUT=True