Merge pull request #57 from Eclectic-Sheep/feature/minerl_wrapper

Feature/minerl wrapper
Eclectic-Sheep · Jul 14, 2023 · 921a447 · 921a447
2 parents 45a28f7 + 278e04b
commit 921a447
Show file tree

Hide file tree

Showing 14 changed files with 870 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -23,6 +23,21 @@ The algorithms sheeped by sheeprl out-of-the-box are:
 
 and more are coming soon! [Open a PR](https://github.com/Eclectic-Sheep/sheeprl/pulls) if you have any particular request :sheep:
 
+
+The environments supported by sheeprl are:
+
+| Algorithm          | Installation command         | More info                                       | Status             |
+| ------------------ | ---------------------------- | ----------------------------------------------- | ------------------ |
+| Classic Control    | `pip install -e .`           |                                                 | :heavy_check_mark: |
+| Box2D              | `pip install -e .`           |                                                 | :heavy_check_mark: |
+| Mujoco (Gymnasium) | `pip install -e .`           | [how_to/mujoco](./howto/learn_in_dmc.md)        | :heavy_check_mark: |
+| Atari              | `pip install -e .[atari]`    | [how_to/atari](./howto/learn_in_atari.md)       | :heavy_check_mark: |
+| DeepMind Control   | `pip install -e .[dmc]`      | [how_to/dmc](./howto/learn_in_dmc.md)           | :heavy_check_mark: |
+| MineRL             | `pip install -e .[minerl]`   | [how_to/minerl](./howto/learn_in_minerl.md)     | :heavy_check_mark: |
+| MineDojo           | `pip install -e .[minedojo]` | [how_to/minedojo](./howto/learn_in_minedojo.md) | :heavy_check_mark: |
+| DIAMBRA            |                              |                                                 | :construction:     |
+
+
 ## Why
 
 We want to provide a framework for RL algorithms that is at the same time simple and scalable thanks to Lightning Fabric.
@@ -75,6 +90,8 @@ pip install "sheeprl @ git+https://github.com/Eclectic-Sheep/sheeprl.git"
 pip install "sheeprl[atari,mujoco,dev]  @ git+https://github.com/Eclectic-Sheep/sheeprl.git"
 # or, to install with minedojo environment support, do
 pip install "sheeprl[minedojo,dev]  @ git+https://github.com/Eclectic-Sheep/sheeprl.git"
+# or, to install with minedojo environment support, do
+pip install "sheeprl[minerl,dev]  @ git+https://github.com/Eclectic-Sheep/sheeprl.git"
 # or, to install all extras, do
 pip install "sheeprl[atari,mujoco,miedojo,dev,test]  @ git+https://github.com/Eclectic-Sheep/sheeprl.git"
 ```
@@ -86,6 +103,8 @@ pip install "sheeprl[atari,mujoco,miedojo,dev,test]  @ git+https://github.com/Ec
 > if you are on an M-series mac and encounter an error attributed box2dpy during install, you need to install SWIG using the instructions shown below.
 >
 > if you want to install the minedojo environment support, Java JDK 8 is required: you can install it by following the instructions at this [link](https://docs.minedojo.org/sections/getting_started/install.html#on-ubuntu-20-04).
+>
+> **MineRL** and **MineDojo** environments have **conflicting requirements**, so **DO NOT install them together** with the `pip install -e .[minerl,minedojo]` command, but instead **install them individually** with either the command `pip install -e .[minerl]` or `pip install -e .[minedojo]` before running an experiment with the MineRL or MineDojo environment, respectively.
 
 <details>
     <summary>Installing SWIG</summary>

diff --git a/howto/learn_in_atari.md b/howto/learn_in_atari.md
@@ -9,9 +9,7 @@ The code for this section is available in `algos/ppo_pixel/ppo_atari.py`.
 First we should install the Atari environments with:
 
 ```bash
-pip install gymnasium[other]
-pip install gymnasium[atari]
-pip install gymnasium[accept-rom-license]
+pip install .[atari]
 ```
 
 For more information: https://gymnasium.farama.org/environments/atari/ 
@@ -154,14 +152,19 @@ Options:
   --sheeprl_help  Show this message and exit.
 
 Commands:
+  dreamer_v1
+  dreamer_v2
   droq
+  p2e_dv1
   ppo
   ppo_atari
   ppo_continuous
   ppo_decoupled
+  ppo_pixel_continuous
   ppo_recurrent
   sac
   sac_decoupled
+  sac_pixel_continuous
 ```
 
 Once this is done, we are all set. We can now train the model by running:

diff --git a/howto/learn_in_dmc.md b/howto/learn_in_dmc.md
@@ -0,0 +1,22 @@
+## Install Gymnasium MuJoCo/DMC environments
+First you should install the proper environments:
+
+- MuJoCo (Gymnasium): you do not need to install extra pakages, the `pip install -e .` command is enough to have available all the MuJoCo environments provided by Gym 
+- DMC: you have to install extra packages with the following command: `pip install -e .[dmc]`.
+
+## Install OpenGL rendering backands packages
+
+MuJoCo supports three different OpenGL rendering backends: EGL (headless), GLFW (windowed), OSMesa (headless).
+For each of them, you need to install some pakages:
+- GLFW: `sudo apt-get install libglfw3 libglew2.0`
+- EGL: `sudo apt-get install libglew2.0`
+- OSMesa: `sudo apt-get install libgl1-mesa-glx libosmesa6`
+In order to use one of these rendering backends, you need to set the `MUJOCO_GL` environment variable to `"glfw"`, `"egl"`, `"osmesa"`, respectively.
+
+For more information: [https://github.com/deepmind/dm_control](https://github.com/deepmind/dm_control) and [https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl](https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl)
+
+## MuJoCo Gymnasium
+In order to train your agents on the [MuJoCo environments](https://gymnasium.farama.org/environments/mujoco/) provided by Gymnasium, it is sufficient to set the `env_id` with the name of the environment you want to use. For instance, `"Walker2d-v4"` if you want to train your agent on the *walker walk* environment.
+
+## DeepMind Control
+In order to train your agents on the [DeepMind control suite](https://github.com/deepmind/dm_control/blob/main/dm_control/suite/README.md), you have to prefix `"dmc_"` to the environment you want to use. A list of the available environments can be found [here](https://arxiv.org/abs/1801.00690). For instance, if you want to train your agent on the *walker walk* environment, you need to set the `env_is` to `"dmc_walker_walk"`.
diff --git a/howto/learn_in_minedojo.md b/howto/learn_in_minedojo.md
@@ -0,0 +1,54 @@
+## Install MineDojo environment
+First you need to install the JDK 1.8, on Debian based systems you can run the following:
+
+```bash
+sudo apt update -y
+sudo apt install -y software-properties-common
+sudo add-apt-repository ppa:openjdk-r/ppa
+sudo apt update -y
+sudo apt install -y openjdk-8-jdk
+sudo update-alternatives --config java
+```
+
+> **Note**
+>
+> If you work on another OS, you can follow the instructions [here](https://docs.minedojo.org/sections/getting_started/install.html#on-macos) to install JDK 1.8.
+
+Now, you can install the MineDojo environment:
+
+```bash
+pip install -e .[minedojo]
+```
+
+## MineRL environments
+It is possible to train your agents on all the tasks provided by MineDojo, you need to prefix `"minedojo"` to the `task_id` of the task on which you want to train your agent, and pass it to the `env_id` argument.
+For instance, you have to set the `env_id` argument to `"minedojo_open-ended"` to select the MineDojo open-ended environment.
+
+### Observation Space
+We slightly modified the observation space, by reshaping it (based on the idea proposed by Hafner in [DreamerV3](https://arxiv.org/abs/2301.04104)):
+- We represent the inventory with a vector with one entry for each item of the game which gives the quantity of the corresponding item in the inventory.
+- A max inventory vector with one entry for each item which contains the maximum number of items obtained by the agent so far in the episode.
+- A delta inventory vector with one entry for each item which contains the difference of the items in the inventory after the performed action.
+- The RGB first-person camera image.
+- A vector of three elements representing the life, the food and the oxygen levels of the agent.
+- A one-hot vectir indicating the equipped item.
+- A mask for the action type indicating which actions can be executed.
+- A mask for the equip/place arguments indicating which elements can be equipped or placed..
+- A mask for the destroy arguments indicating which items can be destroyed.
+- A mask for *craft smelt* indicating which items can be crafted.
+
+### Action Space
+We decided to convert the 8 multi-discrete action space into a 3 multi-discrete action space: the first maps all the functional actions (movement, craft, jump, camera, attack, ...); the second one maps the argument for the *craf* action; the third one maps the argument for the *equip*, *place*, and *destroy* actions. Moreover, we restrict the look up/down actions between `min_pitch` and `max_pitch` degrees.
+In addition, we added the forward action when the agent selects one of the follwing actions: `jump`, `sprint`, and `sneak`.
+Finally we added sticky action for the `jump` and `attack` actions. You can set the values of the `sticky_jump` and `sticky_attack` parameters through the `mine_sticky_jump` and `mine_sticky_attack` arguments, respectively. The sticky actions, if set, force the agent to repeat the selected actions for a certain number of steps.
+
+> **Note**
+> Since the MineDojo environments have a multi-discrete action space, the sticky actions can be easily implemented. The agent will perform the selected action and the sticky actions simultaneously.
+>
+> The action repeat in the Minecraft environments is set to 1, indedd, It makes no sense to force the agent to repeat an action such as crafting (it may not have enough material for the second action).
+
+## Headless machines
+
+If you work on a headless machine, you need to software renderer. We recommend to adopt one of the following solutions:
+1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run lightning run model --devices=1 sheeprl.py dreamer_v2 --env_id=minedojo_open-ended`, or `MINEDOJO_HEADLESS=1 lightning run model --devices=1 sheeprl.py dreamer_v2 --env_id=minedojo_open-ended`.
+2. Exploit the [PyVirtualDisplay](https://github.com/ponty/PyVirtualDisplay) package.
diff --git a/howto/learn_in_minerl.md b/howto/learn_in_minerl.md
@@ -0,0 +1,56 @@
+## Install MineRL environment
+First you need to install the JDK 1.8, on Debian based systems you can run the following:
+
+```bash
+sudo add-apt-repository ppa:openjdk-r/ppa
+sudo apt-get update
+sudo apt-get install openjdk-8-jdk
+```
+
+> **Note**
+>
+> If you work on another OS, you can follow the instructions [here](https://minerl.readthedocs.io/en/v0.4.4/tutorials/index.html) to install JDK 1.8.
+
+Now, you can install the MineRL environment:
+
+```bash
+pip install -e .[minerl]
+```
+
+## MineRL environments
+We modified the MineRL environments to have a custom action and observation space. We provide three different tasks:
+1. Navigate: you need to set the `env_id` argument to `"minerl_custom_navigate"`.
+2. Obtain Iron Pickaxe: you need to set the `env_id` argument to `"minerl_custom_obtain_iron_pickaxe"`.
+3. Obtain Diamond: you need to set the `env_id` argument to `"minerl_custom_obtain_diamond"`.
+
+> **Note**
+> In all these environments, it is possible to have or not a dense reward, you can set the type of the reward by setting the `minerl_dense` argument to `True` if you want a dense reward, to `False` otherwise.
+>
+> In the Navigate task, it is also the possibility to choose wheter or not to train the agent on an extreme environment (For more info, check [here](https://minerl.readthedocs.io/en/v0.4.4/environments/index.html#minerlnavigateextreme-v0)). To choose wheter or not to train the agent on an extreme environment, you need to set the `minerl_extreme` argument to `True` or `False`.
+>
+> In addition, in all the environments, it is possible to set the break speed multiplier through the `mine_break_speed` argument.
+
+### Observation Space
+We slightly modified the observation space, by adding the *life stats* (life, food and oxygen) and reshaping those already present (based on the idea proposed by Hafner in [DreamerV3](https://arxiv.org/abs/2301.04104)):
+- We represent the inventory with a vector with one entry for each item of the game which gives the quantity of the corresponding item in the inventory.
+- A max inventory vector with one entry for each item which contains the maximum number of items obtained by the agent so far in the episode.
+- The RGB first-person camera image.
+- A vector of three elements representing the life, the food and the oxygen levels of the agent.
+- A one-hot vectir indicating the equipped item, only for the *obtain* tasks.
+- A scalar indicating the compass angle to the goal location, only for the *navigate* tasks.
+
+### Action Space
+We decided to convert the multi-discrete action space into a discrete action space. Moreover, we restrict the look up/down actions between `min_pitch` and `max_pitch` degrees.
+In addition, we added the forward action when the agent selects one of the follwing actions: `jump`, `sprint`, and `sneak`.
+Finally we added sticky action for the `jump` and `attack` actions. You can set the values of the `sticky_jump` and `sticky_attack` parameters through the `mine_sticky_jump` and `mine_sticky_attack` arguments, respectively. The sticky actions, if set, force the agent to repeat the selected actions for a certain number of steps.
+
+> **Note**
+> Since the MineRL environments have a multi-discrete action space, the sticky actions can be easily implemented. The agent will perform the selected action and the sticky actions simultaneously.
+>
+> The action repeat in the Minecraft environments is set to 1, indedd, It makes no sense to force the agent to repeat an action such as crafting (it may not have enough material for the second action).
+
+## Headless machines
+
+If you work on a headless machine, you need to software renderer. We recommend to adopt one of the following solutions:
+1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run lightning run model --devices=1 sheeprl.py dreamer_v2 --env_id=minerl_custom_navigate`.
+2. Exploit the [PyVirtualDisplay](https://github.com/ponty/PyVirtualDisplay) package.
diff --git a/pyproject.toml b/pyproject.toml
@@ -57,6 +57,7 @@ atari = [
   "gymnasium[other]==0.28.*",
 ]
 minedojo = ["minedojo==0.1"]
+minerl = ["minerl==0.4.4"]
 
 [tool.ruff]
 line-length = 120

diff --git a/sheeprl/__init__.py b/sheeprl/__init__.py
@@ -23,5 +23,11 @@
 except ModuleNotFoundError:
     pass
 
+# Needed because MineRL 0.4.4 is not compatible with the latest version of numpy
+import numpy as np
+
+np.float = np.float32
+np.int = np.int64
+np.bool = bool
 
 __version__ = "0.1.0"
diff --git a/sheeprl/algos/dreamer_v2/args.py b/sheeprl/algos/dreamer_v2/args.py
@@ -108,3 +108,8 @@ class DreamerV2Args(StandardArgs):
     mine_start_position: Optional[List[str]] = Arg(
         default=None, help="The starting position of the agent in Minecraft environment. (x, y, z, pitch, yaw)"
     )
+    minerl_dense: bool = Arg(default=False, help="whether or not the task has dense reward")
+    minerl_extreme: bool = Arg(default=False, help="whether or not the task is extreme")
+    mine_break_speed: int = Arg(default=100, help="the break speed multiplier of Minecraft environments")
+    mine_sticky_attack: int = Arg(default=30, help="the sticky value for the attack action")
+    mine_sticky_jump: int = Arg(default=10, help="the sticky value for the jump action")
diff --git a/sheeprl/algos/dreamer_v2/utils.py b/sheeprl/algos/dreamer_v2/utils.py
@@ -85,6 +85,22 @@ def make_env(
             start_position=start_position,
         )
         args.action_repeat = 1
+    elif "minerl" in _env_id:
+        from sheeprl.envs.minerl import MineRLWrapper
+
+        task_id = "_".join(env_id.split("_")[1:])
+        env = MineRLWrapper(
+            task_id,
+            height=64,
+            width=64,
+            pitch_limits=(args.mine_min_pitch, args.mine_max_pitch),
+            seed=args.seed,
+            break_speed_multiplier=args.mine_break_speed,
+            sticky_attack=args.mine_sticky_attack,
+            sticky_jump=args.mine_sticky_jump,
+            dense=args.minerl_dense,
+            extreme=args.minerl_extreme,
+        )
     else:
         env_spec = gym.spec(env_id).entry_point
         env = gym.make(env_id, render_mode="rgb_array")