Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dv3 Imagination notebook re-visited #285

Closed
anthony0727 opened this issue May 12, 2024 · 6 comments · Fixed by #290
Closed

dv3 Imagination notebook re-visited #285

anthony0727 opened this issue May 12, 2024 · 6 comments · Fixed by #290

Comments

@anthony0727
Copy link

anthony0727 commented May 12, 2024

I can't find a button to reopen the issue,
regarding #272, I have two questions!

  1. shouldn't initial stochastic_state be flattened s.t.
stochastic_state = player.stochastic_state.view(1, 1, -1).clone()

in

    if i == initial_steps - imagination_steps - 1:
        stochastic_state = player.stochastic_state.clone()
        recurrent_state = player.recurrent_state.clone()
  1. shouldn't 0.5 also be added to imagined reconstructed obs? s.t.
        step_data["rgb"] = rec_obs["rgb"].unsqueeze(0).detach().cpu().numpy() + 0.5

in

 # imagination step
        stochastic_state, recurrent_state = world_model.rssm.imagination(stochastic_state, recurrent_state, actions)
        # update current state
        imagined_latent_states = torch.cat((stochastic_state.view(1, 1, -1), recurrent_state), -1)
        rec_obs = world_model.observation_model(imagined_latent_states)
        step_data["rgb"] = rec_obs["rgb"].unsqueeze(0).detach().cpu().numpy()
@anthony0727
Copy link
Author

anthony0727 commented May 12, 2024

  1. Also -0.5 is missing here.
    This is quite critical in the test phase.
    when testing with the notebook, I couldn't get similar return seen at training,
    and this was the main cause
                preprocessed_obs[k] = preprocessed_obs[k] / 255.0 - 0.5

in

 preprocessed_obs[k] = torch.as_tensor(v[np.newaxis], dtype=torch.float32, device=fabric.device)
            if k in cfg.algo.cnn_keys.encoder:
                preprocessed_obs[k] = preprocessed_obs[k] / 255.0 
        mask = {k: v for k, v in preprocessed_obs.items() if k.startswith("mask")}
  1. shouldn't below fixes should be made to align buffers' timeline? I'm a bit confused in this though.
    I think -1 for actions is correct,
            # actions actually played by the agent
            actions = torch.tensor(
                rb_initial["actions"][-imagination_steps + i - 1], # (anthony) subtract 1
                device=fabric.device,
                dtype=torch.float32,
            )[None]

I think the original -imagination_steps + i is correct with -1 for actions, but instead I got desired trajectory with -imagination_steps + i + 1... confusing

        # reconstruct the observations from the latent states used when interacting with the environment
        played_latent_states = torch.cat(
            (
                torch.tensor(rb_initial["stochastic_state"][-imagination_steps + i + 1], device=fabric.device), # (anthony) add 1
                torch.tensor(rb_initial["recurrent_state"][-imagination_steps + i + 1], device=fabric.device), # (anthony) add 1
            ),
            -1,
        )

@anthony0727
Copy link
Author

With the changes above, now I got the desired performance & results!

@michele-milesi
Copy link
Member

michele-milesi commented May 13, 2024

Hi @anthony0727,

  1. The player has the stochastic state that is flattened, as you can see here:
    self.stochastic_state = stochastic_state.reshape(1, self.num_envs, -1)
    or
    self.stochastic_state = self.stochastic_state.view(
  2. You are right, the reconstructed observation should be incremented by 0.5
  3. Same as 2., we modified Dreamer and these two modifications escaped us.
  4. +1 is correct because there is a mismatch between actions and latent states (due to the inclusion of the initial latent state in the buffer). The problem with +1 is that the last action is the one in position 0.

I will fix these problems as soon as possible, thanks for reporting these.

EDIT:
4. As it stands, latent states at index i are generated by the observation at index i-1 and are used to generate the action at index i-1. This is because the initial latent state is buffered. I will modify it because it only creates confusion.

@belerico
Copy link
Member

@anthony0727 @michele-milesi thank you both!
Effectively we save the stochastic and recurrent state that refers to the previous action instead of the one that has been generated

@michele-milesi michele-milesi linked a pull request May 20, 2024 that will close this issue
4 tasks
@michele-milesi
Copy link
Member

Hi @anthony0727,
we created a branch for fixing this issue, can you check if it works? (https://github.com/Eclectic-Sheep/sheeprl/tree/fix/dv3-imagination-notebook)

Thanks

@anthony0727
Copy link
Author

I'll soon look into it & new dv3 after my deadline thing!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants