Skip to content

Latest commit

 

History

History
80 lines (50 loc) · 14.8 KB

readme.md

File metadata and controls

80 lines (50 loc) · 14.8 KB

Spinning Up Re-implementation

My re-implementation of the six reinforcement learning algorithms featured in OpenAI's Spinning Up.

Save for the implementation of the first algorithm, VPG, I generally tried to implement everything without looking at the reference code, using only the pseudocode on the Spinning Up site and the original whitepapers. Despite that I did borrow from the ActorCritic class during while implementing VPG.

Code

base contains packages that are shared by the actual algorithms. This includes the abstract base class Algorithm, which implements the training loop itself. Each algorithm is responsible for implementing update and act. update contains the logic for updating model parameters according the specification each specific algorithm.

Algorithm implementations:

VPG TRPO PPO DDPG TD3 SAC

Results

Below are benchmarks for each of the 6 algorithms in 6 Mujoco environments. Each agent was allowed to learn with 3 random seeds, each seed exposed to 3 million total frames per environment. These benchmarks can be compared to the Spinning Up Benchmarks. Note that these benchmarks are not fair to the on-policy algorithms: they generally require significantly more experience to reach a comparable level of performance to off-policy algorithms, and so in most cases one could expect the on-policy algorithms to display better performance with say 10 million frames of experience. It would probably be more fair to allow all algorithms a chance to converge rather than artificially restricting total experience. This is justifiable because on-policy algorithms are generally less computationally intensive per update and faster in terms of wall-clock time. However, the Spinning Up benchmarks use 3 million frames of experience, so I followed suit.

Links to gifs of agent behavior are also included. Note that there is no attempt to cherry-pick agents with the best-performing random seed or particularly good episodes. These videos were recorded from the first random seed for each algorithm/environment for three episodes.

Swimmer-v3

swimmer benchmark

VPG TRPO PPO DDPG TD3 SAC
Swimmer-v3

Halfcheetah-v3

halfcheetah benchmark

VPG TRPO PPO DDPG TD3 SAC
HalfCheetah-v3

Hopper-v3

hopper benchmark

VPG TRPO PPO DDPG TD3 SAC
Hopper-v3

Walker2d-v3

walker2d benchmark

VPG TRPO PPO DDPG TD3 SAC
Walker2d-v3

Ant-v3

ant benchmark

VPG TRPO PPO DDPG TD3 SAC
Ant-v3

Humanoid-v1

humanoid benchmark

VPG TRPO PPO DDPG TD3 SAC
Humanoid-v1

Citation

The contents of this repository are based on OpenAI's spinningup repository.

@article{SpinningUp2018,
    author = {Achiam, Joshua},
    title = {{Spinning Up in Deep Reinforcement Learning}},
    year = {2018}
}