You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for target_param, param in zip(self.target_critic.parameters(), self.critic.parameters()):
target_param.data.copy_(param.data)
for target_param, param in zip(self.target_actor.parameters(), self.actor.parameters()):
target_param.data.copy_(param.data)
return value + advantage - advantage.mean()
可能有误,应该改为return value + advantage - advantage.mean(dim=1, keepdim=True)
。因为按照定义,优势网络输出的值要满足的条件应该是保持在动作维度上的和为0,那么减去的均值应该只是动作维度的均值,而不是总体的均值。
self.policy_net = model.to(self.device)
与self.target_net = model.to(self.device)
有误,应该改成self.policy_net = DuelingNet(cfg.n_states, cfg.n_actions, hidden_dim=cfg.hidden_dim).to(self.device)
和self.target_net = DuelingNet(cfg.n_states, cfg.n_actions, hidden_dim=cfg.hidden_dim).to(self.device)
。因为原初始化方式是初始化了两个相同内存地址的policy_net和target_net对象,修改后的初始化方式才是初始化两个不同内存地址的对象。
The text was updated successfully, but these errors were encountered: