Spinning Up Re-implementation

My re-implementation of the six reinforcement learning algorithms featured in OpenAI's Spinning Up.

Save for the implementation of the first algorithm, VPG, I generally tried to implement everything without looking at the reference code, using only the pseudocode on the Spinning Up site and the original whitepapers. Despite that I did borrow from the ActorCritic class during while implementing VPG.

Code

base contains packages that are shared by the actual algorithms. This includes the abstract base class Algorithm, which implements the training loop itself. Each algorithm is responsible for implementing update and act. update contains the logic for updating model parameters according the specification each specific algorithm.

Algorithm implementations:

VPG	TRPO	PPO	DDPG	TD3	SAC

Results

Below are benchmarks for each of the 6 algorithms in 6 Mujoco environments. Each agent was allowed to learn with 3 random seeds, each seed exposed to 3 million total frames per environment. These benchmarks can be compared to the Spinning Up Benchmarks. Note that these benchmarks are not fair to the on-policy algorithms: they generally require significantly more experience to reach a comparable level of performance to off-policy algorithms, and so in most cases one could expect the on-policy algorithms to display better performance with say 10 million frames of experience. It would probably be more fair to allow all algorithms a chance to converge rather than artificially restricting total experience. This is justifiable because on-policy algorithms are generally less computationally intensive per update and faster in terms of wall-clock time. However, the Spinning Up benchmarks use 3 million frames of experience, so I followed suit.

Links to gifs of agent behavior are also included. Note that there is no attempt to cherry-pick agents with the best-performing random seed or particularly good episodes. These videos were recorded from the first random seed for each algorithm/environment for three episodes.

Swimmer-v3

	VPG	TRPO	PPO	DDPG	TD3	SAC
Swimmer-v3

Halfcheetah-v3

	VPG	TRPO	PPO	DDPG	TD3	SAC
HalfCheetah-v3

Hopper-v3

	VPG	TRPO	PPO	DDPG	TD3	SAC
Hopper-v3

Walker2d-v3

	VPG	TRPO	PPO	DDPG	TD3	SAC
Walker2d-v3

Ant-v3

	VPG	TRPO	PPO	DDPG	TD3	SAC
Ant-v3

Humanoid-v1

	VPG	TRPO	PPO	DDPG	TD3	SAC
Humanoid-v1

Citation

The contents of this repository are based on OpenAI's spinningup repository.

@article{SpinningUp2018,
    author = {Achiam, Joshua},
    title = {{Spinning Up in Deep Reinforcement Learning}},
    year = {2018}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Spinning Up Re-implementation

Code

Results

Swimmer-v3

Halfcheetah-v3

Hopper-v3

Walker2d-v3

Ant-v3

Humanoid-v1

Citation

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Spinning Up Re-implementation

Code

Results

Swimmer-v3

Halfcheetah-v3

Hopper-v3

Walker2d-v3

Ant-v3

Humanoid-v1

Citation