Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add support for Unity MLAgents environments. #1201

Closed
wants to merge 148 commits into from

Conversation

hyerra
Copy link
Contributor

@hyerra hyerra commented May 29, 2023

Description

This adds support for Unity MLAgent environments which is a popular method of creating games for reinforcement learning settings.

This pull request is still definitely a work-in-progress, and I would love to get feedback/suggestions on how to make this better. I still have to write unit tests and update documentation, but wanted to get some general input on this work before I started making these.

The code right now should work well for single agent environments. However, we need support for heterogeneous spaces in order for this to work well with multiple agents (or even a single agent with different types of observations). Ideally, we would need some kind of Tuple support.

A big limitation of this code though is that in multi-agent reinforcement learning settings, it requires all of the agents to have the same type of observation and action space because the spec created in _make_specs must be fixed. It would be nice if we could have specs defined on a per-agent level in order to fix this. I'm also open to other suggestions as well! Also, one other important point to note is that the number of agents is not necessarily fixed. New agents may be spawned and sometimes agents might not request a decision at a timestamp (which means there might be more/fewer agents requesting decisions from time to time). This is an issue only for multi-agent settings but not single-agent ones.

Motivation and Context

Why is this change required? What problem does it solve?
close #1110

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

  • New feature (non-breaking change which adds core functionality)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 29, 2023
@hyerra
Copy link
Contributor Author

hyerra commented May 29, 2023

cc: @vmoens @matteobettini

@vmoens
Copy link
Contributor

vmoens commented May 29, 2023

Thanks so much for this contribution, really appreciated.

A few high level remarks/questions:

  • A quick description of how to install MLAgents would be awesome. Along the same line of thoughts: how difficult would it be to write a dedicated CI workflow to test the environments execution? Is there a docker or anything we can use?
  • The specs can be stacked together even if they are heterogeneous.
    This feature needs a bit of polishing but I believe this kind of use cases is what we need to make it work by fireproofing it!
    Here's an example:
from torchrl.data import CompositeSpec, UnboundedContinuousTensorSpec as Spec
import torch

spec1 = CompositeSpec(obs=Spec(shape=(3,)))
spec2 = CompositeSpec(obs=Spec(shape=(4,)))
s = torch.stack([spec1, spec2], 0)

The code will surely break here and there when playing with collectors and similar but nothing that can't be fixed! We can work on that, I know @matteobettini is also looking forward to this feature.

  • For varying numbers of Agents I'm not sure how to go about that. In the past, I worked in contexts where a simple mask could be used. You would have
>>> env.step(tensordict)
>>> tensordict["next", "valid_mask"]
Tensor([True, False, True])

which would say that agent 0 and 2 are present, but 1 isn't. Then you would have

>>> tensordict["next", "agent_feature"]
TensorDict({
    "observation": ...
}, batch_size[3])

which contains all the features on a per-agent basis, and you can recover the valid one with

>>> tensordict["next", "agent_feature"][tensordict["next", "valid_mask"]]

@matteobettini
Copy link
Contributor

This is extremely exciting thank you!

yes, there is a lot of meat to discuss for multiagent.

  • as @vmoens said, yes the idea for heterogeneous spaces is to hide heterogeneity using a lazy stack of tensors (a list). This is something high priority for us.

  • For agents that do not participate in certain steps, we can use a mask as @vmoens said. Or we could also take a step more and just carry the tensors of the participating ones.

for example, let’s say we have an env with 5 agents. We would normally transport tensors (or lazy stacks) of size [*B, 5,*F]. Now, if at a certain step only 2 agents partake, we could carry only [*B,2,*F] and have a function in the env which tells me what those 2 correspond to. Wdyt?

  • for agents being added and removed during execution, i don’t know how we would face this as torchrl specs are static.

@hyerra
Copy link
Contributor Author

hyerra commented May 30, 2023

Thanks for the feedback everyone, I'll look into this soon and get back to you

@vmoens vmoens changed the title Add support for Unity MLAgents environments. [Feature] Add support for Unity MLAgents environments. Jun 2, 2023
@hyerra
Copy link
Contributor Author

hyerra commented Jun 24, 2023

Hi @vmoens @matteobettini, thanks for the feedback and sorry for the delay! I've addressed some of the issues and I also have a question with the heterogeneous specs via stacking. I've pushed my latest changes so you can take a look. Firstly though, to answer some of the questions.

  1. Regarding testing, we could use some of the example environments provided my MLAgents. We could do something like Unit Tests + integration tests using the example environments there. With regards to the description, would you like to see that in the documentation or would you prefer for me to describe this here?
  2. For the stacking, I didn't realize that! I worked on implementing that, but ran into some issues. I'll describe them below.
  3. For varying number of agents, I implemented what you said just now. Essentially, the environment kind of behaves like a parallel environment now. All of the agents observations will be recorded and sent to the user now rather than doing it one at a time, and we have a valid mask to accompany this so it is possible to tell which agent requested a decision. The only requirement now though is that all agents MUST request a decision at the first timestep so that the make_specs method can make the proper specs for all agents. After the first timestep though, agents do not need to request a decision and we apply a valid mask to filter which agents actually requested a decision. Also, with regards to @matteobettini's comment of carrying the tensors' forward, I think the problem with this is that you can only provide actions to agent's that requested a decision at that timestep and if you don't supply an action to an agent that requested a decision, the default 0 action is used. This can lead to unexpected behavior because maybe at timestep 1, Agent 1 requests a decision and at timestep 2 Agent 2 requests a decision. We wouldn't be able to supply Agent 1's action at timestep 2 as it didn't request a decision at that timestep. Also, Agent 1 would be given the 0 action which might not be desired.

With regards to the heterogeneous specs, I'm getting a really weird issue that I've been kind of stuck on. I'm not sure if you might have any insights. Essentially, in make_specs there's this piece of code:

agent_observation_specs = [
        _unity_to_torchrl_spec_transform(
              spec, dtype=np.dtype("float32"), device=self.device
        )
        for spec in behavior_unity_spec.observation_specs
]
agent_observation_spec = torch.stack(agent_observation_specs, dim=0)
observation_specs[agent_id] = agent_observation_spec

Using this code produces the following error:

AttributeError: 'TransformedEnv' object has no attribute 'action_spec'. Did you mean: 'action_key'?

However, if I change the code slightly so that I include every observation besides the last one, it works. Namely, I need to change the code to be:

agent_observation_specs = [
        _unity_to_torchrl_spec_transform(
              spec, dtype=np.dtype("float32"), device=self.device
        )
        for spec in behavior_unity_spec.observation_specs
]
agent_observation_spec = torch.stack(agent_observation_specs[:-1], dim=0)
observation_specs[agent_id] = agent_observation_spec

Notice, the [:-1] on the second to last line. Once this change is made, and I ignore the last observation, I don't get that error and the code works using a random policy. Some important points to note are that the first 3 observations in my case have the same shape, however the last observation has a different shape. Here is a log of the observation specs:

[UnboundedContinuousTensorSpec(
    shape=torch.Size([128, 128, 3]),
    space=None,
    device=cpu,
    dtype=torch.float32,
    domain=continuous),
UnboundedContinuousTensorSpec(
    shape=torch.Size([128, 128, 3]),
    space=None,
    device=cpu,
    dtype=torch.float32,
    domain=continuous),
UnboundedContinuousTensorSpec(
    shape=torch.Size([128, 128, 3]),
    space=None,
    device=cpu,
    dtype=torch.float32,
    domain=continuous),
UnboundedContinuousTensorSpec(
    shape=torch.Size([4]),
    space=None,
    device=cpu,
    dtype=torch.float32,
    domain=continuous)]

I'm not sure how this relates to the action spec error above though and have been having some trouble debugging that.

@hyerra hyerra force-pushed the main branch 2 times, most recently from e2466bd to 47df231 Compare June 25, 2023 14:32
@hyerra
Copy link
Contributor Author

hyerra commented Jun 28, 2023

@vmoens just wanted to follow up in case this might have gotten lost over the weekend. Also, I noticed in #1027, the VMAS environment is updated to use nested keys, so like Agent -> Observation, Agent -> Action. Should we do something similar here?

@matteobettini
Copy link
Contributor

Hey @hyerra, sorry for the delay (i am just starting here working with Vincent). I can help you w this. Feel free to message me also via email if you want to discuss more.

Basically, regarding your last comment: yes! we are moving to that format for multiagent. As you noticed, all the keys the have the agent dimension will be present in under the "agents" nested key. you can look at the vmas.py file in #1027 to see what I mean and uniform unity ML to that. If a key is shared among all agents (like sometimes done and reward are), this will be in the normal place and not under "agents".

With regards to your problem with stacking, here is where what is our view on stacking het tensors and specs:
Tensors or specs can be stacked under the same key (or lazystack) when they are semantically similar (ie. they typically come from similar sources and can be processed by the same neural network).

For example, you can do

torch.stack(
  [UnboundedContinuousTensorSpec(
      shape=torch.Size([128, 128, 3]),
      space=None,
      device=cpu,
      dtype=torch.float32,
      domain=continuous),
  UnboundedContinuousTensorSpec(
      shape=torch.Size([64, 64, 3]),
      space=None,
      device=cpu,
      dtype=torch.float32,
      domain=continuous)]
)

because these (eg. images) are sematically similar.
In practice (in the code) we condsider thing to be sematically similar when they have the same number of dims and dtype.

In your case, the last spec is dramatically different from the first 3. The first 3 are more likely images and the last is just a small vector. Therefore you should stack them under different keys with something like

stack = torch.stack(
  [
         CompositeSpec(
              {"image":
                      UnboundedContinuousTensorSpec(
                            shape=torch.Size([128, 128, 3]),
                            space=None,
                            device=cpu,
                            dtype=torch.float32,
                            domain=continuous)
               }
         ),
         CompositeSpec(
              {"image":
                      UnboundedContinuousTensorSpec(
                            shape=torch.Size([128, 128, 3]),
                            space=None,
                            device=cpu,
                            dtype=torch.float32,
                            domain=continuous)
               }
         ),
         CompositeSpec(
              {"image":
                      UnboundedContinuousTensorSpec(
                            shape=torch.Size([128, 128, 3]),
                            space=None,
                            device=cpu,
                            dtype=torch.float32,
                            domain=continuous)
               }
         ),
        CompositeSpec(
              {"vec":
                      UnboundedContinuousTensorSpec(
                            shape=torch.Size([4]),
                            space=None,
                            device=cpu,
                            dtype=torch.float32,
                            domain=continuous)
               }
         ),
  ]
)

Then when you will do stack[2] you will retrieve an image spec and with stack[-1] you will retrieve a vec spec.

Does this make sense?

I am down to also schedule a brief call if you would like further help

@hyerra
Copy link
Contributor Author

hyerra commented Jun 29, 2023

Hi @matteobettini, no worries, thanks for the help! I think your feedback makes sense, maybe I can try to fix it and if we are still running into difficulties we could hop on a call?

Just to make sure I understand, so individual agent observations go under an agents key and become nested. Things that are shared between all agents still remain at the top level.

Also, with regards to the second point about having different keys, I can't necessarily predetermine which index has different observations and know which ones will be similar. For my own environments I could; however, for everyone else they can customize the observations. So like in someone else's environment they might decide to have 4 images, some vector based observations, and maybe a Ray sensor (MLAgents has a lot!).

In this case would you recommend treating each observation as drastically different and using the keys "observation_1", "observation_2", etc.. and stacking them like that?

@matteobettini
Copy link
Contributor

matteobettini commented Jun 30, 2023

Hi @matteobettini, no worries, thanks for the help! I think your feedback makes sense, maybe I can try to fix it and if we are still running into difficulties we could hop on a call?

Sure

Just to make sure I understand, so individual agent observations go under an agents key and become nested. Things that are shared between all agents still remain at the top level.

Yes, that is because tensors which are shared will have a dimension less, while the "agents" tensordict will have a batch_size with a dimension corresponding to the number of agents

Also, with regards to the second point about having different keys, I can't necessarily predetermine which index has different observations and know which ones will be similar. For my own environments I could; however, for everyone else they can customize the observations. So like in someone else's environment they might decide to have 4 images, some vector based observations, and maybe a Ray sensor (MLAgents has a lot!).

In this case would you recommend treating each observation as drastically different and using the keys "observation_1", "observation_2", etc.. and stacking them like that?

Here are some options i can think of:

  • One option would be to try and identify the sensor used and then assign it to that name. or use another identification methods and group under the same name obs with same n_dims and dtype (although I am not a super fan of this)
  • I am wondering, does ml agent have non-named observations? i.e. sometimes observations come in a dict (this is possible in vmas) so if that is possible in ml agents you could carry those names forward.
  • there is always the option of naming them observation_agent_i, as you say

Form this options my take would be: see if there is any way to get the sensors names from the unity env and if there isn't we can either try to stack them if they are compatible or have a key per agent otherwise

@vmoens i would like to know your take on this (tl;dr) when obs can be heterogenous but we do not have access to their semantic differences

@matteobettini
Copy link
Contributor

@hyerra some FYI for you on timelines

For a full multiagent integration we need to support nested parametric keys and heterogeneous tensors across all the library components. These features will currently cause bugs here and there (you can find some issues and prs I opened about this).

I am currently working on the nested side of things and after that i will move on the heterogeneous part.

If you encounter issues on these things please open them in separate github issues and cc me

@hyerra
Copy link
Contributor Author

hyerra commented Jun 30, 2023

@matteobettini This makes sense, I made some progress on this yesterday for having nested keys but ran into some issues. I'll keep you posted on what I find!

Also, for similar/different observations, I think maybe option 3 is the best. So the way MLAgents work is there are different behaviors and all of the agents fall under a certain behavior. Each behavior defines the observation and action space for all agents that fall under that behavior. For observations specifically, they are collected through "Sensors" in Unity like a Camera Sensor or a Ray Sensor for instance. In their docs, only a list of observation specs are returned (no name information) and each element of the list corresponds to an observation a sensor collected.

I think option 3 is the best for the following reasons:

  • Yea I agree that option 1 might not be the ideal approach. If we get it wrong, users might get confused as to what's happening. Also, another thing that makes this approach more error-prone is users can actually stack observations temporally in Unity. So for instance I could stack the last two images that were captured which might be useful in some cases. If users did this and stacked 2 images together, the number of channels would grow from 3 to 6. This would lead us to maybe incorrectly classify an observation.
  • For number 2, this would be ideal but I don't think we have access to the name information.

Since there's a lot of flexibility, maybe it's best to just do 3 since it's the safest option. If people really did want to stack things together and they knew it was safe, maybe we could offer some sort of Transform for it, but we could explore that later too.

Does this kind of make sense? I can work on having an implantation and sharing issues I find.

@matteobettini
Copy link
Contributor

Also, for similar/different observations, I think maybe option 3 is the best. So the way MLAgents work is there are different behaviors and all of the agents fall under a certain behavior. Each behavior defines the observation and action space for all agents that fall under that behavior. For observations specifically, they are collected through "Sensors" in Unity like a Camera Sensor or a Ray Sensor for instance. In their docs, only a list of observation specs are returned (no name information) and each element of the list corresponds to an observation a sensor collected.

If there is a way to get the behavior names maybe that could also be an option, since behaviors I guess are aligned with neural networks.

In any case, I agree, if there is not a way that we can get observation semantics (names) in a principled manner, we are left only with option 3. We can maybe allow in the future for users to have a way to label observations to make things nicer.

Let's start to see if we can reach a working solution this way and then we can iterate.

On the nested side I am fixing a bunch of things so in a week or so that should work easily. Feel free to open any issues or problems you find on that.

Heterogenous stacking and passing to collectors will be a harder problem but I ll eventually get to that too

@hyerra
Copy link
Contributor Author

hyerra commented Jul 4, 2023

Sounds good! Also, I was wondering what the best way to encode individual agent observations is. Essentially, I have the observation spec setup now so that it looks something like:

CompositeSpec("agents":
   CompositeSpec("observation":
       LazyStack(
           LazyStack( # agent_1 observations
               CompositeSpec("observation_1": ...),
               CompositeSpec("observation_2": ...),
           ),
           LazyStack( # agent_2 observations
               CompositeSpec("observation_1": ...),
               CompositeSpec("observation_2": ...),
           ),
          ...
      ),
      ....
   )
)

Given a list of numpy vectors representing an individual agent's observations:

[numpy_agent_1_observation_1, numpy_agent_1_observation_2]

How do I convert this into a tensordict for that individual agent? I think I have most of it successfully converted, but I'm having difficulty converting the observation part. Essentially, I want to have a tensordict for just that agent, similar to how you do it in VMAS, so something like:

agent_td = TensorDict(
     source={
          "agents": {
                "observation": self.read_obs(step.obs),
                 "behavior_name": self.read_behavior(behavior_name_),
                  "reward": self.read_reward(step.reward),
                   ...
            }
     }
)

Essentially, I'm not sure how to write the read_obs function to do the conversion.

@matteobettini
Copy link
Contributor

Sounds good! Also, I was wondering what the best way to encode individual agent observations is. Essentially, I have the observation spec setup now so that it looks something like:

Mmh ok so you are stacking individual agent observations and also stacking the agents. Why do you need to stack each agent's observations? Couldn't it be simply a Tensordict? Cause otherwise this is getting a little bit convoluted even just to access.

How do I convert this into a tensordict for that individual agent? I think I have most of it successfully converted, but I'm having difficulty converting the observation part. Essentially, I want to have a tensordict for just that agent, similar to how you do it in VMAS, so something like:

Supposing that in your case each agent has an individual reward it should look like

agent_i_td = TensorDict(
     source={
          "agents": {
                "observation": TensorDict({
                        "obs_1": Tensor,
                        "obs_2" Tensor,
                   })
                 "reward": Tensor,
            }
     }

The problem is that in your case if you have no guarantees that obs are similar (among agents and even within an agents), I don't know if stacking makes sense at all.

Another option (since we have completely 0 guarantees on obs similarity) could be to put all obs in the root td like so

all_agents_td = TensorDict(
     source={
         "agent_1_observation_1": Tensor,
         "agent_1_observation_n": Tensor,
         ....,
         "agent_n_observation_n": Tensor,
         "done": Tensor
          "agents": {
                 "reward": Tensor,
            }, [n_agents,])
     }

I dunno what makes more sense.
cc @vmoens if you have an opinion

@hyerra
Copy link
Contributor Author

hyerra commented Jul 5, 2023

Mmh ok so you are stacking individual agent observations and also stacking the agents. Why do you need to stack each agent's observations? Couldn't it be simply a Tensordict? Cause otherwise this is getting a little bit convoluted even just to access.

Yep, currently I'm stacking the individual agent observations and then stacking the agents together. For the part about it being a Tensordict, I think I'm a little confused. How would the observation_spec look like for that?

Also, we could do the second approach as well. However, having everything grouped in the agents key seems nice. I can do either approach. @vmoens do you have a preference?

@hyerra
Copy link
Contributor Author

hyerra commented Jul 8, 2023

Hey @matteobettini, I think I'm pretty close to finishing implementation of the solution we discussed earlier, and I'll push my changes soon so we can both take a look and iterate.

However, one thing I noticed is that I think the _EnvWrapper is transforming the specs automatically. For instance, for the done spec (this also happens for the reward spec btw), I might assign it to something like this:

CompositeSpec(
    agents: CompositeSpec(
        done: DiscreteTensorSpec(
            shape=torch.Size([4, 1]),
            space=DiscreteBox(n=2),
            device=cpu,
            dtype=torch.bool,
            domain=discrete), device=cpu, shape=torch.Size([4])), device=cpu, shape=torch.Size([]))

However, when I do print(self.done_spec), I will get:

DiscreteTensorSpec(
    shape=torch.Size([4, 1]),
    space=DiscreteBox(n=2),
    device=cpu,
    dtype=torch.bool,
    domain=discrete)

Do we want this behavior or should _EnvWrapper not remove the nesting? If we don't want this, I could raise an issue.

@hyerra
Copy link
Contributor Author

hyerra commented Jul 8, 2023

Just updated the PR @matteobettini if you want to take a look and let me know what your thoughts are from a high level. It works using a random policy on my end, but still of course we need to do some rigorous testing and have a way to test this in CI on a real environment. Just wanted to get your thoughts on the overall structure and high-level implementation.

@matteobettini
Copy link
Contributor

However, when I do print(self.done_spec)

This is normal. it gives you the leaf spec to make it like single agent.
To retrieve the full nested spec you can do

env.input_spec["_action_spec"] (or reward, or done and so on)

Just updated the PR @matteobettini if you want to take a look and let me know what your thoughts are from a high level. It works using a random policy on my end, but still of course we need to do some rigorous testing and have a way to test this in CI on a real environment. Just wanted to get your thoughts on the overall structure and high-level implementation.

Cool, I ll take a look monday

Copy link
Contributor

@matteobettini matteobettini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems good to me!

See a few comments.

  • Also can you add some tests so that we can have them along the way. See the vmas tests to see what kind of things to test.
  • Can you give me a few instrucntions on how to instull unity? it seems it requires a paid license, is that so?

torchrl/envs/libs/unity.py Outdated Show resolved Hide resolved
torchrl/envs/libs/unity.py Outdated Show resolved Hide resolved
torchrl/envs/libs/unity.py Show resolved Hide resolved
torchrl/envs/libs/unity.py Outdated Show resolved Hide resolved
@hyerra
Copy link
Contributor Author

hyerra commented Jul 10, 2023

Also, for your comment about tests, yep for sure, I'll work on this soon! And let me get back to you about the Unity license this afternoon. Personally, I've just been using an educational/personal license.

@hyerra
Copy link
Contributor Author

hyerra commented Jul 10, 2023

For the license, do you know if you could use the Personal license here? If so, I think you should be able to download/install Unity from that. They also then have some games here we could use for testing. Essentially, it would just involve choosing one of these games and building them I believe. We could then test our integration on the built game. I'll double check to make sure this works though. So far, I've just been testing this on my own custom game environments.

matteobettini and others added 27 commits October 10, 2023 10:16
Co-authored-by: Mateusz Guzek <[email protected]>
Co-authored-by: vmoens <[email protected]>
Co-authored-by: Alessandro Pietro Bardelli <[email protected]>
Co-authored-by: Tom Begley <[email protected]>
@matteobettini
Copy link
Contributor

For the record, there are now more MARL environments in the library to take example from that use the MARL grouping in #1463 (e.g., PettingZoo)

We also have some utils for the grouping mechanism

class MarlGroupMapType(Enum):

def check_marl_grouping(group_map: Dict[str, List[str]], agent_names: List[str]):

I believe we could use that API to group agents here by behaviours.

@vmoens
Copy link
Contributor

vmoens commented May 16, 2024

Closed in favour of #2161

@vmoens vmoens closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Environments Adds or modifies an environment wrapper
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Unity MLAgents Support