fix(wzl): fix vdn mixer to avoid Q value dim mismatch with reward #13

zerlinwang · 2022-11-29T03:13:00Z

command

./train_mpe_vdn.sh

problem

Traceback (most recent call last):
File "train/train_mpe.py", line 192, in
main(sys.argv[1:])
File "train/train_mpe.py", line 177, in main
total_num_steps = runner.run()
File "/home/zerlinwang/Projects/off-policy/offpolicy/runner/rnn/base_runner.py", line 190, in run
self.train()
File "/home/zerlinwang/Projects/off-policy/offpolicy/runner/rnn/base_runner.py", line 272, in batch_train_q
train_info, new_priorities, idxes = self.trainer.train_policy_on_batch(sample)
File "/home/zerlinwang/Projects/off-policy/offpolicy/algorithms/qmix/qmix.py", line 164, in train_policy_on_batch
Q_tot_target_seq = rewards + (1 - dones_env_batch) * self.args.gamma * next_step_Q_tot_seq
RuntimeError: The size of tensor a (32) must match the size of tensor b (800) at non-singleton dimension 1

reason

next_step_Q_tot_seq dim error

solution

~~return agent_q_inps.sum(dim=-1).view(-1, 1, 1)~~
->
batch_size = agent_q_inps.size(1)
return agent_q_inps.sum(dim=-1).view(-1, batch_size, 1, 1)

result

fix(wzl): fix shape error in vdn mix

030e629

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(wzl): fix vdn mixer to avoid Q value dim mismatch with reward #13

fix(wzl): fix vdn mixer to avoid Q value dim mismatch with reward #13

zerlinwang commented Nov 29, 2022

fix(wzl): fix vdn mixer to avoid Q value dim mismatch with reward #13

Are you sure you want to change the base?

fix(wzl): fix vdn mixer to avoid Q value dim mismatch with reward #13

Conversation

zerlinwang commented Nov 29, 2022

command

problem

reason

solution

result