Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing "Sibling Rivalry" Method from "Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards" Paper #224

Open
2 tasks done
vladyskai opened this issue Jan 11, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@vladyskai
Copy link

🚀 Feature

I propose the implementation of the "Sibling Rivalry" method, as outlined in the paper "Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards." Link to GitHub: https://github.com/salesforce/sibling-rivalry

This method offers a novel approach to solving sparse reward tasks in reinforcement learning (RL) by utilizing self-balancing shaped rewards, particularly effective in goal-reaching tasks.

Motivation

While training a PPO agent to play the Battle City game, I've encountered a significant challenge: the agent more or less learns to eliminate enemy tanks but fails to protect the base, often resulting in suboptimal strategies like camping near the enemy spawn spot. I tried everything I could think of to make it defend the base, but it seems to default to camping or doing nothing. This behavior indicates that the agent is stuck in a local optima, focusing solely on tank destruction and neglecting the critical objective of base defense. Implementing the "Sibling Rivalry" method could potentially enable the agent to recognize situations where the base is in danger, encouraging it to learn strategies that involve defending the base rather than just attacking enemies. I hope that this method might help it to avoid this local optima and it might be the key to overcoming the current limitations in the agent's learning process.

Pitch

I suggest integrating the "Sibling Rivalry" method into the PPO algorithm. This would require adapting the code in base/learners/distance.py of the GitHub repository and integrating it into the PPO class as an option.

Alternatives

Currently, other methods like intrinsic curiosity models and reward relabeling strategies are used, but they often show limited performance in hard exploration scenarios. The "Sibling Rivalry" method outperforms these techniques, especially in diverse environments like 3D construction and navigation tasks.

Additional context

This issue has been moved from the DLR-RM/stable-baselines3 repository. Also, I know little about writing proper code and I do not have the technical skills to implement something so complex. I will give it a try anyways, but I'm probably better off with a simpler task like benchmarking it.

Checklist

  • I have checked that there is no similar issue in the repo
  • If I'm requesting a new feature, I have proposed alternatives
@vladyskai vladyskai added the enhancement New feature or request label Jan 11, 2024
@araffin
Copy link
Member

araffin commented Jan 11, 2024

Original issue: DLR-RM/stable-baselines3#1802

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants