-
Notifications
You must be signed in to change notification settings - Fork 33
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #57 from Eclectic-Sheep/feature/minerl_wrapper
Feature/minerl wrapper
- Loading branch information
Showing
14 changed files
with
870 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
## Install Gymnasium MuJoCo/DMC environments | ||
First you should install the proper environments: | ||
|
||
- MuJoCo (Gymnasium): you do not need to install extra pakages, the `pip install -e .` command is enough to have available all the MuJoCo environments provided by Gym | ||
- DMC: you have to install extra packages with the following command: `pip install -e .[dmc]`. | ||
|
||
## Install OpenGL rendering backands packages | ||
|
||
MuJoCo supports three different OpenGL rendering backends: EGL (headless), GLFW (windowed), OSMesa (headless). | ||
For each of them, you need to install some pakages: | ||
- GLFW: `sudo apt-get install libglfw3 libglew2.0` | ||
- EGL: `sudo apt-get install libglew2.0` | ||
- OSMesa: `sudo apt-get install libgl1-mesa-glx libosmesa6` | ||
In order to use one of these rendering backends, you need to set the `MUJOCO_GL` environment variable to `"glfw"`, `"egl"`, `"osmesa"`, respectively. | ||
|
||
For more information: [https://github.com/deepmind/dm_control](https://github.com/deepmind/dm_control) and [https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl](https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl) | ||
|
||
## MuJoCo Gymnasium | ||
In order to train your agents on the [MuJoCo environments](https://gymnasium.farama.org/environments/mujoco/) provided by Gymnasium, it is sufficient to set the `env_id` with the name of the environment you want to use. For instance, `"Walker2d-v4"` if you want to train your agent on the *walker walk* environment. | ||
|
||
## DeepMind Control | ||
In order to train your agents on the [DeepMind control suite](https://github.com/deepmind/dm_control/blob/main/dm_control/suite/README.md), you have to prefix `"dmc_"` to the environment you want to use. A list of the available environments can be found [here](https://arxiv.org/abs/1801.00690). For instance, if you want to train your agent on the *walker walk* environment, you need to set the `env_is` to `"dmc_walker_walk"`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
## Install MineDojo environment | ||
First you need to install the JDK 1.8, on Debian based systems you can run the following: | ||
|
||
```bash | ||
sudo apt update -y | ||
sudo apt install -y software-properties-common | ||
sudo add-apt-repository ppa:openjdk-r/ppa | ||
sudo apt update -y | ||
sudo apt install -y openjdk-8-jdk | ||
sudo update-alternatives --config java | ||
``` | ||
|
||
> **Note** | ||
> | ||
> If you work on another OS, you can follow the instructions [here](https://docs.minedojo.org/sections/getting_started/install.html#on-macos) to install JDK 1.8. | ||
Now, you can install the MineDojo environment: | ||
|
||
```bash | ||
pip install -e .[minedojo] | ||
``` | ||
|
||
## MineRL environments | ||
It is possible to train your agents on all the tasks provided by MineDojo, you need to prefix `"minedojo"` to the `task_id` of the task on which you want to train your agent, and pass it to the `env_id` argument. | ||
For instance, you have to set the `env_id` argument to `"minedojo_open-ended"` to select the MineDojo open-ended environment. | ||
|
||
### Observation Space | ||
We slightly modified the observation space, by reshaping it (based on the idea proposed by Hafner in [DreamerV3](https://arxiv.org/abs/2301.04104)): | ||
- We represent the inventory with a vector with one entry for each item of the game which gives the quantity of the corresponding item in the inventory. | ||
- A max inventory vector with one entry for each item which contains the maximum number of items obtained by the agent so far in the episode. | ||
- A delta inventory vector with one entry for each item which contains the difference of the items in the inventory after the performed action. | ||
- The RGB first-person camera image. | ||
- A vector of three elements representing the life, the food and the oxygen levels of the agent. | ||
- A one-hot vectir indicating the equipped item. | ||
- A mask for the action type indicating which actions can be executed. | ||
- A mask for the equip/place arguments indicating which elements can be equipped or placed.. | ||
- A mask for the destroy arguments indicating which items can be destroyed. | ||
- A mask for *craft smelt* indicating which items can be crafted. | ||
|
||
### Action Space | ||
We decided to convert the 8 multi-discrete action space into a 3 multi-discrete action space: the first maps all the functional actions (movement, craft, jump, camera, attack, ...); the second one maps the argument for the *craf* action; the third one maps the argument for the *equip*, *place*, and *destroy* actions. Moreover, we restrict the look up/down actions between `min_pitch` and `max_pitch` degrees. | ||
In addition, we added the forward action when the agent selects one of the follwing actions: `jump`, `sprint`, and `sneak`. | ||
Finally we added sticky action for the `jump` and `attack` actions. You can set the values of the `sticky_jump` and `sticky_attack` parameters through the `mine_sticky_jump` and `mine_sticky_attack` arguments, respectively. The sticky actions, if set, force the agent to repeat the selected actions for a certain number of steps. | ||
|
||
> **Note** | ||
> Since the MineDojo environments have a multi-discrete action space, the sticky actions can be easily implemented. The agent will perform the selected action and the sticky actions simultaneously. | ||
> | ||
> The action repeat in the Minecraft environments is set to 1, indedd, It makes no sense to force the agent to repeat an action such as crafting (it may not have enough material for the second action). | ||
## Headless machines | ||
|
||
If you work on a headless machine, you need to software renderer. We recommend to adopt one of the following solutions: | ||
1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run lightning run model --devices=1 sheeprl.py dreamer_v2 --env_id=minedojo_open-ended`, or `MINEDOJO_HEADLESS=1 lightning run model --devices=1 sheeprl.py dreamer_v2 --env_id=minedojo_open-ended`. | ||
2. Exploit the [PyVirtualDisplay](https://github.com/ponty/PyVirtualDisplay) package. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
## Install MineRL environment | ||
First you need to install the JDK 1.8, on Debian based systems you can run the following: | ||
|
||
```bash | ||
sudo add-apt-repository ppa:openjdk-r/ppa | ||
sudo apt-get update | ||
sudo apt-get install openjdk-8-jdk | ||
``` | ||
|
||
> **Note** | ||
> | ||
> If you work on another OS, you can follow the instructions [here](https://minerl.readthedocs.io/en/v0.4.4/tutorials/index.html) to install JDK 1.8. | ||
Now, you can install the MineRL environment: | ||
|
||
```bash | ||
pip install -e .[minerl] | ||
``` | ||
|
||
## MineRL environments | ||
We modified the MineRL environments to have a custom action and observation space. We provide three different tasks: | ||
1. Navigate: you need to set the `env_id` argument to `"minerl_custom_navigate"`. | ||
2. Obtain Iron Pickaxe: you need to set the `env_id` argument to `"minerl_custom_obtain_iron_pickaxe"`. | ||
3. Obtain Diamond: you need to set the `env_id` argument to `"minerl_custom_obtain_diamond"`. | ||
|
||
> **Note** | ||
> In all these environments, it is possible to have or not a dense reward, you can set the type of the reward by setting the `minerl_dense` argument to `True` if you want a dense reward, to `False` otherwise. | ||
> | ||
> In the Navigate task, it is also the possibility to choose wheter or not to train the agent on an extreme environment (For more info, check [here](https://minerl.readthedocs.io/en/v0.4.4/environments/index.html#minerlnavigateextreme-v0)). To choose wheter or not to train the agent on an extreme environment, you need to set the `minerl_extreme` argument to `True` or `False`. | ||
> | ||
> In addition, in all the environments, it is possible to set the break speed multiplier through the `mine_break_speed` argument. | ||
### Observation Space | ||
We slightly modified the observation space, by adding the *life stats* (life, food and oxygen) and reshaping those already present (based on the idea proposed by Hafner in [DreamerV3](https://arxiv.org/abs/2301.04104)): | ||
- We represent the inventory with a vector with one entry for each item of the game which gives the quantity of the corresponding item in the inventory. | ||
- A max inventory vector with one entry for each item which contains the maximum number of items obtained by the agent so far in the episode. | ||
- The RGB first-person camera image. | ||
- A vector of three elements representing the life, the food and the oxygen levels of the agent. | ||
- A one-hot vectir indicating the equipped item, only for the *obtain* tasks. | ||
- A scalar indicating the compass angle to the goal location, only for the *navigate* tasks. | ||
|
||
### Action Space | ||
We decided to convert the multi-discrete action space into a discrete action space. Moreover, we restrict the look up/down actions between `min_pitch` and `max_pitch` degrees. | ||
In addition, we added the forward action when the agent selects one of the follwing actions: `jump`, `sprint`, and `sneak`. | ||
Finally we added sticky action for the `jump` and `attack` actions. You can set the values of the `sticky_jump` and `sticky_attack` parameters through the `mine_sticky_jump` and `mine_sticky_attack` arguments, respectively. The sticky actions, if set, force the agent to repeat the selected actions for a certain number of steps. | ||
|
||
> **Note** | ||
> Since the MineRL environments have a multi-discrete action space, the sticky actions can be easily implemented. The agent will perform the selected action and the sticky actions simultaneously. | ||
> | ||
> The action repeat in the Minecraft environments is set to 1, indedd, It makes no sense to force the agent to repeat an action such as crafting (it may not have enough material for the second action). | ||
## Headless machines | ||
|
||
If you work on a headless machine, you need to software renderer. We recommend to adopt one of the following solutions: | ||
1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run lightning run model --devices=1 sheeprl.py dreamer_v2 --env_id=minerl_custom_navigate`. | ||
2. Exploit the [PyVirtualDisplay](https://github.com/ponty/PyVirtualDisplay) package. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.