Add the deep dyna-q agent #322

alirahkay · 2023-01-30T19:58:34Z

No description provided.

hive/agents/__init__.py

hive/agents/deep_dyna_q.py

dapatil211 · 2023-02-06T20:24:54Z

hive/agents/deep_dyna_q.py

+                )
+                batch = self.preprocess_update_batch(batch)
+
+                self._model_optimizer.zero_grad()


I think this can be moved down to right above line 435

hive/agents/world_models/dyna_models.py

arnavkj1995 · 2023-03-20T21:43:37Z

hive/agents/deep_dyna_q.py

+                :py:class:`~hive.replays.circular_replay.CircularReplayBuffer`.
+            discount_rate (float): A number between 0 and 1 specifying how much
+                future rewards are discounted by the agent.
+            n_step (int): The horizon used in n-step returns to compute TD(n) targets.


Doubt: Is the length of the horizon while planning to tune the policy?

arnavkj1995 · 2023-03-20T21:44:48Z

hive/agents/deep_dyna_q.py

+            stack_size=stack_size,
+            gamma=discount_rate,
+        )
+        self._planning_buffer = planning_buffer(


Doubt: Why are there separate replay buffers for planning and learning?

arnavkj1995 · 2023-03-20T21:54:29Z

hive/agents/deep_dyna_q.py

+        ):
+            self._logger.log_scalar("train_qval", torch.max(qvals), self._timescale)
+            agent_traj_state = {}
+        return action, agent_traj_state


Minor comment: Defining agent_traj_state might not be necessary.

arnavkj1995 · 2023-03-20T21:56:19Z

hive/agents/deep_dyna_q.py

+            "observation": update_info["observation"],
+            "action": update_info["action"],
+            "reward": update_info["reward"],
+            "done": update_info["terminated"],


Why "or update_info["truncated"]" in not added for this replay buffer?

arnavkj1995 · 2023-03-20T21:59:52Z

hive/agents/deep_dyna_q.py

+            return
+
+        (
+            preprocessed_learning_update_info,


Why have 2 replay buffers? From what I understood, both replay buffers are storing the same transitions. It's just that the batch_size for planning and model learning might change. But that can be passed as a separate instead. Also, having 2 buffers increases the memory required by the model.

arnavkj1995 · 2023-03-20T22:07:32Z

hive/agents/world_models/dyna_models.py

+
+        # Observations
+        obs_pred_list = []
+        for a in range(self._act_dim):


Curious question: Isn't there a better way to do it without the for loop?

arnavkj1995 · 2023-03-20T22:10:10Z

hive/agents/world_models/dyna_models.py

+        # Observations
+        self._obs_encoder = observation_encoder_net(in_dim)
+        obs_predictor_in_dim = (
+            np.prod(calculate_output_dim(self._obs_encoder, in_dim)) + 1


Question: Is the dimension 1 added for the action? I thought the actions are one-hot in general for discrete action spaces.

Add the deep dyna-q agent

849c9df

alirahkay requested review from arnavkj1995 and dapatil211 January 31, 2023 19:16

dapatil211 requested changes Feb 6, 2023

View reviewed changes

Fix the code styling

e33f436

arnavkj1995 reviewed Mar 20, 2023

View reviewed changes

arnavkj1995 requested changes Mar 20, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the deep dyna-q agent #322

Add the deep dyna-q agent #322

alirahkay commented Jan 30, 2023

dapatil211 Feb 6, 2023

arnavkj1995 Mar 20, 2023

arnavkj1995 Mar 20, 2023

arnavkj1995 Mar 20, 2023

arnavkj1995 Mar 20, 2023

arnavkj1995 Mar 20, 2023

arnavkj1995 Mar 20, 2023

arnavkj1995 Mar 20, 2023

Add the deep dyna-q agent #322

Are you sure you want to change the base?

Add the deep dyna-q agent #322

Conversation

alirahkay commented Jan 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment