Pytorch-Lightning Implementations of Fundamental RL Algorithms
This section lists the off-policy algorithms implemented in this project. Off-policy methods can learn from experiences generated by a different policy, allowing for more flexible learning.
- DQN (24-09-24)
- Double DQN (24-10-10)
- Dueling DQN (24-10-31)
- Noisy DQN
- DQN with Prioritized Experience Replay (24-10-25)
- C51 (24-11-05)
- QR-DQN
- N Step DQN
- DDPG
- TD3
- SAC
This section outlines our future plans and upcoming implementations. We're constantly working to expand our collection of algorithms and improve existing ones.
These are the additional features and tools we've integrated to enhance the functionality and usability of our implementations.
- Wandb Logger
- Record a training video and show in wandb (24-09-25)
This section tracks our recent accomplishments and completed tasks, providing a clear view of our progress.
- Implement Double DQN
- Change Wandb Logging naming convention -> "(Env-Algo-Number)"
- Using Argparse to manage hyperparams
- Implement Dueling DQN
Here we present the performance results of our implemented algorithms. These graphs and metrics help visualize the effectiveness of each method.
Cartpole-v1 average reward This graph shows the average reward achieved by our algorithms on the CartPole-v1 environment, demonstrating their learning progress and comparative performance.