From d3a278d63bdc31a6747ea57b7625bae43eef64c8 Mon Sep 17 00:00:00 2001 From: Vincent Moens Date: Mon, 2 Sep 2024 16:56:06 +0100 Subject: [PATCH] [Doc] Document losses in README.md ghstack-source-id: 5bf0c1f8a12b54aad667b0061633d9fdbf296c0b Pull Request resolved: https://github.com/pytorch/rl/pull/2408 --- README.md | 286 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 273 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 64559f7af37..e663a376270 100644 --- a/README.md +++ b/README.md @@ -523,19 +523,279 @@ If you would like to contribute to new features, check our [call for contributio ## Examples, tutorials and demos A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are provided with an illustrative purpose: -- [DQN](https://github.com/pytorch/rl/blob/main/sota-implementations/dqn) -- [DDPG](https://github.com/pytorch/rl/blob/main/sota-implementations/ddpg/ddpg.py) -- [IQL](https://github.com/pytorch/rl/blob/main/sota-implementations/iql/iql_offline.py) -- [CQL](https://github.com/pytorch/rl/blob/main/sota-implementations/cql/cql_offline.py) -- [TD3](https://github.com/pytorch/rl/blob/main/sota-implementations/td3/td3.py) -- [TD3+BC](https://github.com/pytorch/rl/blob/main/sota-implementations/td3+bc/td3+bc.py) -- [A2C](https://github.com/pytorch/rl/blob/main/examples/a2c_old/a2c.py) -- [PPO](https://github.com/pytorch/rl/blob/main/sota-implementations/ppo/ppo.py) -- [SAC](https://github.com/pytorch/rl/blob/main/sota-implementations/sac/sac.py) -- [REDQ](https://github.com/pytorch/rl/blob/main/sota-implementations/redq/redq.py) -- [Dreamer](https://github.com/pytorch/rl/blob/main/sota-implementations/dreamer/dreamer.py) -- [Decision Transformers](https://github.com/pytorch/rl/blob/main/sota-implementations/decision_transformer) -- [RLHF](https://github.com/pytorch/rl/blob/main/examples/rlhf) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Algorithm + Compile Support** + Tensordict-free API + Modular Losses + Continuous and Discrete +
DQN + 1.53x + + + NA + + (through ActionDiscretizer transform) +
DDPG + 1.54x + + + + + - (continuous only) +
IQL + 2.55x + + + + + + +
CQL + 1.91x + + + + + + +
TD3 + 1.79x + + + + + - (continuous only) +
+ TD3+BC + untested + + + + + - (continuous only) +
+ A2C + 1.76x + + + - + + +
+ PPO + 2.67x + + + - + + +
SAC + 2.01x + + + - + + +
REDQ + 2.35x + + + - + - (continuous only) +
Dreamer v1 + untested + + + + (different classes) + - (continuous only) +
Decision Transformers + untested + + + NA + - (continuous only) +
CrossQ + untested + + + + + - (continuous only) +
Gail + untested + + + NA + + +
Impala + untested + + + - + + +
IQL (MARL) + untested + + + + + + +
DDPG (MARL) + untested + + + + + - (continuous only) +
PPO (MARL) + untested + + + - + + +
QMIX-VDN (MARL) + untested + + + NA + + +
SAC (MARL) + untested + + + - + + +
RLHF + NA + + + NA + NA +
+ +** The number indicates expected speed-up compared to eager mode when executed on CPU. Numbers may vary depending on + architecture and device and many more to come!