Some questions about BEBOLD #12

CrazySssst · 2021-04-28T09:07:19Z

Sorry to bother you again.

I am trying to reproduce Bebold based on your code. I got the results of RIDE/RND/ICM/Count that matches RIDE paper.

Then, I try to modify your RND agent to implement Bebold. However, I cannot get the performance reported in the bebold paper.

Could you give me some advices?

I use the following code to calculate the bebold bonus

        random_embedding = random_target_network(batch['partial_obs'].to(device=flags.device))
        predicted_embedding = predictor_network(batch['partial_obs'].to(device=flags.device))

        intrinsic_rewards = torch.norm(predicted_embedding.detach() - random_embedding.detach(), dim=2, p=2)

        intrinsic_rewards = intrinsic_rewards[1:] - intrinsic_rewards[:-1]
        # ep_ind is an indicator 
        intrinsic_rewards = torch.clamp(intrinsic_rewards, 0,100000) * ep_ind * (1-dones)

        rnd_loss = flags.rnd_loss_coef * losses.compute_forward_dynamics_loss(predicted_embedding[1:], random_embedding.detach()[1:])

Here is hyper-parameter

    "args": {
        "alpha": 0.99,
        "baseline_cost": 0.5,
        "batch_size": 32,
        "checkpoint_num_frames": 10000000,
        "disable_checkpoint": false,
        "disable_cuda": false,
        "discounting": 0.99,
        "entropy_cost": 0.0005,
        "env": "MiniGrid-KeyCorridorS4R3-v0",
        "env_seed": 1,
        "epsilon": 1e-05,
        "fix_seed": false,
        "forward_loss_coef": 10.0,
        "intrinsic_reward_coef": 0.1,
        "inverse_loss_coef": 0.1,
        "learning_rate": 0.0001,
        "max_grad_norm": 40.0,
        "model": "bebold_count_rnd",
        "momentum": 0,
        "no_reward": false,
        "num_actors": 40,
        "num_buffers": 80,
        "num_input_frames": 1,
        "num_threads": 4,
        "queue_timeout": 1,
        "rnd_loss_coef": 0.1,
        "run_id": 0,
        "save_interval": 10000000,
        "seed": 0,
        "total_frames": 40000000,
        "unroll_length": 100,
        "use_fullobs_intrinsic": false,
        "use_fullobs_policy": false,
    }

Thanks in advance

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about BEBOLD #12

Some questions about BEBOLD #12

CrazySssst commented Apr 28, 2021

Some questions about BEBOLD #12

Some questions about BEBOLD #12

Comments

CrazySssst commented Apr 28, 2021