Decision Transformer for offline single-agent autonomous highway driving.
.
├── README.md
├── modules # all model modules contained here
│ ├── __init__.py
├── pipelines # all training, testing, preprocessing, and data gathering pipelines contained here
│ ├── __init__.py
├── expert_scripts # all expert data collection contained here
├── example-notebooks # scartch/example jupyternotebooks
├── experiments # all training, testing, and demo experiment files for various models
We used the open source highway-env
Python framework to simulate a highway environment. highway-env is built on top of the OpenAI Gym toolkit, which is widely used in the field of reinforcement learning. highway-env
provides a customizable and modular framework for designing experiments related to highway traffic, such as vehicle dynamics, traffic flow, and behavior modeling. It also includes various metrics and evaluation tools to assess the performance of different agents in the simulated environment. In our experiment, we will focus on a class of highway-env instance, consisting of 3 lanes and 20 other simulated cars. Our goal is to train an agent to drive safely in this scenario and maximize the default highway-env
reward:
where
Three online RL methods were used to collect expert data: Proximal Policy Optimization (PPO), Deep Q-Network (DQN), and Monte Carlo Tree Search (MCTS). The scripts to collect data can be found in /expert_scripts
.
Below is a demonstration of the performance of the various experts. PPO and DQN are highly popular, state-of-the-art online RL methods while MCTS completely searches the game tree at each iteration and thus always finds the maximum reward/best move.
This is one of two benchmark models used by the original DT paper. By following an imitation-learning approach, we developed an agent to mimic the behaviours of the expert on which it is trained on.
The is the state-of-the-art offline RL method. It uses a temporal difference learning approach.
Several experiments with various configurations of training datasets and parameters have been conducted in /experiments/
.
The DT model is based on GPT-2 defined in /modules/trajectory_gpt2.py
.
We used two embeddings
- Kinematics: We used a 5x5 array of the position and velocities of the nearest 5 vehicles.
- Image-Based: We used 4 of the most recent grayscale birdseye view images.
LSTM is a type of recurrent neural network (RNN) that is classically used for sequence modelling problems. A common criticism of DTs is that they are no different than sequence modelling with RNNs. We plan on replacing DT blocks with LSTM blocks to verify whether this criticism holds.