Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added toggle to train after game or after step in DreamerV3 #319

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

LucaVendruscolo
Copy link

Summary

Describe the purpose of the pull request, including:

added a config called "train_after_step" to sheeprl\sheeprl\configs\exp\dreamer_v3.yaml. It is set to true by default which doesn't affect the standard way the program runs but if it's set to false it will make the program wait until the end of the episode/game to train the algorithm with all the data gathered.

Type of Change

Please select the one relevant option below:

  • New feature (non-breaking change that adds functionality)

Checklist

Please confirm that the following tasks have been completed:

  • [ yes] I have tested my changes locally and they work as expected. (Please describe the tests you performed.)
  • [ no] I have added unit tests for my changes, or updated existing tests if necessary.
  • [ no] I have updated the documentation, if applicable.
  • [ n/a] I have installed pre-commit and run locally for my code changes.

Copy link
Member

@michele-milesi michele-milesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR, I would kindly ask you to:

  1. Fix as described in the other comments.
  2. Add this argument to all the dreamer's algorithms (dreamer V1, V2, V3 and P2E-DV1, P2E-DV2, P2E-DV3), to have them configured in the same way.
  3. Update the documentation:
    a. Add here the explanation of the argument you are adding and how it influences the gradient steps.
    b. Update the yaml file here, by updating with the new version of the sheeprl/configs/algo/dreamer_v3.yaml file (after moving the argument definition as specified in the comment of sheeprl/configs/exp/dreamer_v3.yaml file).

UPDATE:
another consideration is that when training is distributed, the processes must train all at the same time, which may not happen if the episodes end at different policy steps. Therefore, I would ask you to disable the possibility of training after the episode end with distributed training.

@belerico what do you think?

)
cumulative_per_rank_gradient_steps += 1
train_step += world_size
if cfg.algo.train_after_game or reset_envs > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @LucaVendruscolo, I would change the name of the argument to train_on_episode_end.
By default its value is set to False and it maintains the standard behaviour.
When set to True, it will start training only if the episode is finished.
So, you should change the control to: if (cfg.algo.train_on_episode_end and reset_envs > 0) or not cfg.algo.train_on_episode_end:.

I prefer to have a configuration with a clear argument name.

@@ -8,6 +8,7 @@ defaults:

# Algorithm
algo:
train_after_step: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you set an argument with a different name than the other file, I ask you to change its name and default value according to the previous comment and move the definition of this parameter in the ./sheeprl/configs/algo/dreamer_v3.yaml file.
Please add also a brief comment where you indicate the meaning of the parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants