Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/minerl wrapper #57

Merged
merged 16 commits into from
Jul 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,21 @@ The algorithms sheeped by sheeprl out-of-the-box are:

and more are coming soon! [Open a PR](https://github.com/Eclectic-Sheep/sheeprl/pulls) if you have any particular request :sheep:


The environments supported by sheeprl are:

| Algorithm | Installation command | More info | Status |
| ------------------ | ---------------------------- | ----------------------------------------------- | ------------------ |
| Classic Control | `pip install -e .` | | :heavy_check_mark: |
| Box2D | `pip install -e .` | | :heavy_check_mark: |
| Mujoco (Gymnasium) | `pip install -e .` | [how_to/mujoco](./howto/learn_in_dmc.md) | :heavy_check_mark: |
| Atari | `pip install -e .[atari]` | [how_to/atari](./howto/learn_in_atari.md) | :heavy_check_mark: |
| DeepMind Control | `pip install -e .[dmc]` | [how_to/dmc](./howto/learn_in_dmc.md) | :heavy_check_mark: |
| MineRL | `pip install -e .[minerl]` | [how_to/minerl](./howto/learn_in_minerl.md) | :heavy_check_mark: |
| MineDojo | `pip install -e .[minedojo]` | [how_to/minedojo](./howto/learn_in_minedojo.md) | :heavy_check_mark: |
| DIAMBRA | | | :construction: |


## Why

We want to provide a framework for RL algorithms that is at the same time simple and scalable thanks to Lightning Fabric.
Expand Down Expand Up @@ -75,6 +90,8 @@ pip install "sheeprl @ git+https://github.com/Eclectic-Sheep/sheeprl.git"
pip install "sheeprl[atari,mujoco,dev] @ git+https://github.com/Eclectic-Sheep/sheeprl.git"
# or, to install with minedojo environment support, do
pip install "sheeprl[minedojo,dev] @ git+https://github.com/Eclectic-Sheep/sheeprl.git"
# or, to install with minedojo environment support, do
pip install "sheeprl[minerl,dev] @ git+https://github.com/Eclectic-Sheep/sheeprl.git"
# or, to install all extras, do
pip install "sheeprl[atari,mujoco,miedojo,dev,test] @ git+https://github.com/Eclectic-Sheep/sheeprl.git"
```
Expand All @@ -86,6 +103,8 @@ pip install "sheeprl[atari,mujoco,miedojo,dev,test] @ git+https://github.com/Ec
> if you are on an M-series mac and encounter an error attributed box2dpy during install, you need to install SWIG using the instructions shown below.
>
> if you want to install the minedojo environment support, Java JDK 8 is required: you can install it by following the instructions at this [link](https://docs.minedojo.org/sections/getting_started/install.html#on-ubuntu-20-04).
>
> **MineRL** and **MineDojo** environments have **conflicting requirements**, so **DO NOT install them together** with the `pip install -e .[minerl,minedojo]` command, but instead **install them individually** with either the command `pip install -e .[minerl]` or `pip install -e .[minedojo]` before running an experiment with the MineRL or MineDojo environment, respectively.

<details>
<summary>Installing SWIG</summary>
Expand Down
9 changes: 6 additions & 3 deletions howto/learn_in_atari.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,7 @@ The code for this section is available in `algos/ppo_pixel/ppo_atari.py`.
First we should install the Atari environments with:

```bash
pip install gymnasium[other]
pip install gymnasium[atari]
pip install gymnasium[accept-rom-license]
pip install .[atari]
```

For more information: https://gymnasium.farama.org/environments/atari/
Expand Down Expand Up @@ -154,14 +152,19 @@ Options:
--sheeprl_help Show this message and exit.

Commands:
dreamer_v1
dreamer_v2
droq
p2e_dv1
ppo
ppo_atari
ppo_continuous
ppo_decoupled
ppo_pixel_continuous
ppo_recurrent
sac
sac_decoupled
sac_pixel_continuous
```

Once this is done, we are all set. We can now train the model by running:
Expand Down
22 changes: 22 additions & 0 deletions howto/learn_in_dmc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## Install Gymnasium MuJoCo/DMC environments
First you should install the proper environments:

- MuJoCo (Gymnasium): you do not need to install extra pakages, the `pip install -e .` command is enough to have available all the MuJoCo environments provided by Gym
- DMC: you have to install extra packages with the following command: `pip install -e .[dmc]`.

## Install OpenGL rendering backands packages

MuJoCo supports three different OpenGL rendering backends: EGL (headless), GLFW (windowed), OSMesa (headless).
For each of them, you need to install some pakages:
- GLFW: `sudo apt-get install libglfw3 libglew2.0`
- EGL: `sudo apt-get install libglew2.0`
- OSMesa: `sudo apt-get install libgl1-mesa-glx libosmesa6`
In order to use one of these rendering backends, you need to set the `MUJOCO_GL` environment variable to `"glfw"`, `"egl"`, `"osmesa"`, respectively.

For more information: [https://github.com/deepmind/dm_control](https://github.com/deepmind/dm_control) and [https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl](https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl)

## MuJoCo Gymnasium
In order to train your agents on the [MuJoCo environments](https://gymnasium.farama.org/environments/mujoco/) provided by Gymnasium, it is sufficient to set the `env_id` with the name of the environment you want to use. For instance, `"Walker2d-v4"` if you want to train your agent on the *walker walk* environment.

## DeepMind Control
In order to train your agents on the [DeepMind control suite](https://github.com/deepmind/dm_control/blob/main/dm_control/suite/README.md), you have to prefix `"dmc_"` to the environment you want to use. A list of the available environments can be found [here](https://arxiv.org/abs/1801.00690). For instance, if you want to train your agent on the *walker walk* environment, you need to set the `env_is` to `"dmc_walker_walk"`.
54 changes: 54 additions & 0 deletions howto/learn_in_minedojo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
## Install MineDojo environment
First you need to install the JDK 1.8, on Debian based systems you can run the following:

```bash
sudo apt update -y
sudo apt install -y software-properties-common
sudo add-apt-repository ppa:openjdk-r/ppa
sudo apt update -y
sudo apt install -y openjdk-8-jdk
sudo update-alternatives --config java
```

> **Note**
>
> If you work on another OS, you can follow the instructions [here](https://docs.minedojo.org/sections/getting_started/install.html#on-macos) to install JDK 1.8.

Now, you can install the MineDojo environment:

```bash
pip install -e .[minedojo]
```

## MineRL environments
It is possible to train your agents on all the tasks provided by MineDojo, you need to prefix `"minedojo"` to the `task_id` of the task on which you want to train your agent, and pass it to the `env_id` argument.
For instance, you have to set the `env_id` argument to `"minedojo_open-ended"` to select the MineDojo open-ended environment.

### Observation Space
We slightly modified the observation space, by reshaping it (based on the idea proposed by Hafner in [DreamerV3](https://arxiv.org/abs/2301.04104)):
- We represent the inventory with a vector with one entry for each item of the game which gives the quantity of the corresponding item in the inventory.
- A max inventory vector with one entry for each item which contains the maximum number of items obtained by the agent so far in the episode.
- A delta inventory vector with one entry for each item which contains the difference of the items in the inventory after the performed action.
- The RGB first-person camera image.
- A vector of three elements representing the life, the food and the oxygen levels of the agent.
- A one-hot vectir indicating the equipped item.
- A mask for the action type indicating which actions can be executed.
- A mask for the equip/place arguments indicating which elements can be equipped or placed..
- A mask for the destroy arguments indicating which items can be destroyed.
- A mask for *craft smelt* indicating which items can be crafted.

### Action Space
We decided to convert the 8 multi-discrete action space into a 3 multi-discrete action space: the first maps all the functional actions (movement, craft, jump, camera, attack, ...); the second one maps the argument for the *craf* action; the third one maps the argument for the *equip*, *place*, and *destroy* actions. Moreover, we restrict the look up/down actions between `min_pitch` and `max_pitch` degrees.
In addition, we added the forward action when the agent selects one of the follwing actions: `jump`, `sprint`, and `sneak`.
Finally we added sticky action for the `jump` and `attack` actions. You can set the values of the `sticky_jump` and `sticky_attack` parameters through the `mine_sticky_jump` and `mine_sticky_attack` arguments, respectively. The sticky actions, if set, force the agent to repeat the selected actions for a certain number of steps.

> **Note**
> Since the MineDojo environments have a multi-discrete action space, the sticky actions can be easily implemented. The agent will perform the selected action and the sticky actions simultaneously.
>
> The action repeat in the Minecraft environments is set to 1, indedd, It makes no sense to force the agent to repeat an action such as crafting (it may not have enough material for the second action).

## Headless machines

If you work on a headless machine, you need to software renderer. We recommend to adopt one of the following solutions:
1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run lightning run model --devices=1 sheeprl.py dreamer_v2 --env_id=minedojo_open-ended`, or `MINEDOJO_HEADLESS=1 lightning run model --devices=1 sheeprl.py dreamer_v2 --env_id=minedojo_open-ended`.
2. Exploit the [PyVirtualDisplay](https://github.com/ponty/PyVirtualDisplay) package.
56 changes: 56 additions & 0 deletions howto/learn_in_minerl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
## Install MineRL environment
First you need to install the JDK 1.8, on Debian based systems you can run the following:

```bash
sudo add-apt-repository ppa:openjdk-r/ppa
sudo apt-get update
sudo apt-get install openjdk-8-jdk
```

> **Note**
>
> If you work on another OS, you can follow the instructions [here](https://minerl.readthedocs.io/en/v0.4.4/tutorials/index.html) to install JDK 1.8.

Now, you can install the MineRL environment:

```bash
pip install -e .[minerl]
```

## MineRL environments
We modified the MineRL environments to have a custom action and observation space. We provide three different tasks:
1. Navigate: you need to set the `env_id` argument to `"minerl_custom_navigate"`.
2. Obtain Iron Pickaxe: you need to set the `env_id` argument to `"minerl_custom_obtain_iron_pickaxe"`.
3. Obtain Diamond: you need to set the `env_id` argument to `"minerl_custom_obtain_diamond"`.

> **Note**
> In all these environments, it is possible to have or not a dense reward, you can set the type of the reward by setting the `minerl_dense` argument to `True` if you want a dense reward, to `False` otherwise.
>
> In the Navigate task, it is also the possibility to choose wheter or not to train the agent on an extreme environment (For more info, check [here](https://minerl.readthedocs.io/en/v0.4.4/environments/index.html#minerlnavigateextreme-v0)). To choose wheter or not to train the agent on an extreme environment, you need to set the `minerl_extreme` argument to `True` or `False`.
>
> In addition, in all the environments, it is possible to set the break speed multiplier through the `mine_break_speed` argument.

### Observation Space
We slightly modified the observation space, by adding the *life stats* (life, food and oxygen) and reshaping those already present (based on the idea proposed by Hafner in [DreamerV3](https://arxiv.org/abs/2301.04104)):
- We represent the inventory with a vector with one entry for each item of the game which gives the quantity of the corresponding item in the inventory.
- A max inventory vector with one entry for each item which contains the maximum number of items obtained by the agent so far in the episode.
- The RGB first-person camera image.
- A vector of three elements representing the life, the food and the oxygen levels of the agent.
- A one-hot vectir indicating the equipped item, only for the *obtain* tasks.
- A scalar indicating the compass angle to the goal location, only for the *navigate* tasks.

### Action Space
We decided to convert the multi-discrete action space into a discrete action space. Moreover, we restrict the look up/down actions between `min_pitch` and `max_pitch` degrees.
In addition, we added the forward action when the agent selects one of the follwing actions: `jump`, `sprint`, and `sneak`.
Finally we added sticky action for the `jump` and `attack` actions. You can set the values of the `sticky_jump` and `sticky_attack` parameters through the `mine_sticky_jump` and `mine_sticky_attack` arguments, respectively. The sticky actions, if set, force the agent to repeat the selected actions for a certain number of steps.

> **Note**
> Since the MineRL environments have a multi-discrete action space, the sticky actions can be easily implemented. The agent will perform the selected action and the sticky actions simultaneously.
>
> The action repeat in the Minecraft environments is set to 1, indedd, It makes no sense to force the agent to repeat an action such as crafting (it may not have enough material for the second action).

## Headless machines

If you work on a headless machine, you need to software renderer. We recommend to adopt one of the following solutions:
1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run lightning run model --devices=1 sheeprl.py dreamer_v2 --env_id=minerl_custom_navigate`.
2. Exploit the [PyVirtualDisplay](https://github.com/ponty/PyVirtualDisplay) package.
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ atari = [
"gymnasium[other]==0.28.*",
]
minedojo = ["minedojo==0.1"]
minerl = ["minerl==0.4.4"]

[tool.ruff]
line-length = 120
Expand Down
6 changes: 6 additions & 0 deletions sheeprl/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,11 @@
except ModuleNotFoundError:
pass

# Needed because MineRL 0.4.4 is not compatible with the latest version of numpy
import numpy as np

np.float = np.float32
np.int = np.int64
np.bool = bool

__version__ = "0.1.0"
5 changes: 5 additions & 0 deletions sheeprl/algos/dreamer_v2/args.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,8 @@ class DreamerV2Args(StandardArgs):
mine_start_position: Optional[List[str]] = Arg(
default=None, help="The starting position of the agent in Minecraft environment. (x, y, z, pitch, yaw)"
)
minerl_dense: bool = Arg(default=False, help="whether or not the task has dense reward")
minerl_extreme: bool = Arg(default=False, help="whether or not the task is extreme")
mine_break_speed: int = Arg(default=100, help="the break speed multiplier of Minecraft environments")
mine_sticky_attack: int = Arg(default=30, help="the sticky value for the attack action")
mine_sticky_jump: int = Arg(default=10, help="the sticky value for the jump action")
16 changes: 16 additions & 0 deletions sheeprl/algos/dreamer_v2/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,22 @@ def make_env(
start_position=start_position,
)
args.action_repeat = 1
elif "minerl" in _env_id:
from sheeprl.envs.minerl import MineRLWrapper

task_id = "_".join(env_id.split("_")[1:])
env = MineRLWrapper(
task_id,
height=64,
width=64,
pitch_limits=(args.mine_min_pitch, args.mine_max_pitch),
seed=args.seed,
break_speed_multiplier=args.mine_break_speed,
sticky_attack=args.mine_sticky_attack,
sticky_jump=args.mine_sticky_jump,
dense=args.minerl_dense,
extreme=args.minerl_extreme,
)
else:
env_spec = gym.spec(env_id).entry_point
env = gym.make(env_id, render_mode="rgb_array")
Expand Down
Loading