Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runner-Pool #11

Open
elrodrigues opened this issue Sep 18, 2023 · 3 comments
Open

Runner-Pool #11

elrodrigues opened this issue Sep 18, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@elrodrigues
Copy link
Collaborator

elrodrigues commented Sep 18, 2023

Build a Trainer/Runner pool (where each runner would have 1 environment and 1 job associated) for parallel training.

@elrodrigues elrodrigues self-assigned this Sep 18, 2023
@elrodrigues elrodrigues added the enhancement New feature or request label Sep 18, 2023
@elrodrigues elrodrigues changed the title Job Profiles and Pools Job Profiles and Env Pools Sep 18, 2023
@elrodrigues elrodrigues changed the title Job Profiles and Env Pools Env Pool Wrapper Sep 18, 2023
@elrodrigues elrodrigues changed the title Env Pool Wrapper Runner-Pool Sep 18, 2023
@elrodrigues
Copy link
Collaborator Author

elrodrigues commented Sep 18, 2023

The objective of this issue has changed because of design of the middleware. Pool is probably the wrong word to use here since I'm talking about lazily spinning up Trainers/Runners when a new job is added and strapping a manager to these runners, but it's the closest word I have in my vocabulary to describe this.

This is a certified schizo moment.

@elrodrigues
Copy link
Collaborator Author

The runners/trainers will have a 'hook' to sync their models to the manager's master model every couple episodes. The manager will also periodically 'down'-sync its master-model to its trainers.

I haven't yet decided Taus for the up-sync/hook and down-sync. These will be set in config.

@elrodrigues
Copy link
Collaborator Author

This has changed a little now. The trainer's 'hook' is no longer an up-sync but instead a down-sync. The up-sync is handled internally by the trainer after its master model is set by the manager.

I imagine trainers implementing their form of soft_update_agent(local, target). For BDQTrainer for example, this function would contain something along the lines of:

BDQAgent.soft_update(self.pre_net, self.pre_target, self.tau)
BDQAgent.soft_update(self.state_net, self.state_target, self.tau)
for i in range(self.num_actions):
    BDQAgent.soft_update(self.adv_targets[i], self.adv_nets[i], self.tau)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant