-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add support for Unity MLAgents environments. #1201
Conversation
Thanks so much for this contribution, really appreciated. A few high level remarks/questions:
from torchrl.data import CompositeSpec, UnboundedContinuousTensorSpec as Spec
import torch
spec1 = CompositeSpec(obs=Spec(shape=(3,)))
spec2 = CompositeSpec(obs=Spec(shape=(4,)))
s = torch.stack([spec1, spec2], 0) The code will surely break here and there when playing with collectors and similar but nothing that can't be fixed! We can work on that, I know @matteobettini is also looking forward to this feature.
>>> env.step(tensordict)
>>> tensordict["next", "valid_mask"]
Tensor([True, False, True]) which would say that agent 0 and 2 are present, but 1 isn't. Then you would have >>> tensordict["next", "agent_feature"]
TensorDict({
"observation": ...
}, batch_size[3]) which contains all the features on a per-agent basis, and you can recover the valid one with >>> tensordict["next", "agent_feature"][tensordict["next", "valid_mask"]] |
This is extremely exciting thank you! yes, there is a lot of meat to discuss for multiagent.
for example, let’s say we have an env with 5 agents. We would normally transport tensors (or lazy stacks) of size [*B, 5,*F]. Now, if at a certain step only 2 agents partake, we could carry only [*B,2,*F] and have a function in the env which tells me what those 2 correspond to. Wdyt?
|
Thanks for the feedback everyone, I'll look into this soon and get back to you |
Hi @vmoens @matteobettini, thanks for the feedback and sorry for the delay! I've addressed some of the issues and I also have a question with the heterogeneous specs via stacking. I've pushed my latest changes so you can take a look. Firstly though, to answer some of the questions.
With regards to the heterogeneous specs, I'm getting a really weird issue that I've been kind of stuck on. I'm not sure if you might have any insights. Essentially, in agent_observation_specs = [
_unity_to_torchrl_spec_transform(
spec, dtype=np.dtype("float32"), device=self.device
)
for spec in behavior_unity_spec.observation_specs
]
agent_observation_spec = torch.stack(agent_observation_specs, dim=0)
observation_specs[agent_id] = agent_observation_spec Using this code produces the following error:
However, if I change the code slightly so that I include every observation besides the last one, it works. Namely, I need to change the code to be: agent_observation_specs = [
_unity_to_torchrl_spec_transform(
spec, dtype=np.dtype("float32"), device=self.device
)
for spec in behavior_unity_spec.observation_specs
]
agent_observation_spec = torch.stack(agent_observation_specs[:-1], dim=0)
observation_specs[agent_id] = agent_observation_spec Notice, the [:-1] on the second to last line. Once this change is made, and I ignore the last observation, I don't get that error and the code works using a random policy. Some important points to note are that the first 3 observations in my case have the same shape, however the last observation has a different shape. Here is a log of the observation specs:
I'm not sure how this relates to the action spec error above though and have been having some trouble debugging that. |
e2466bd
to
47df231
Compare
Hey @hyerra, sorry for the delay (i am just starting here working with Vincent). I can help you w this. Feel free to message me also via email if you want to discuss more. Basically, regarding your last comment: yes! we are moving to that format for multiagent. As you noticed, all the keys the have the agent dimension will be present in under the "agents" nested key. you can look at the vmas.py file in #1027 to see what I mean and uniform unity ML to that. If a key is shared among all agents (like sometimes done and reward are), this will be in the normal place and not under "agents". With regards to your problem with stacking, here is where what is our view on stacking het tensors and specs: For example, you can do
because these (eg. images) are sematically similar. In your case, the last spec is dramatically different from the first 3. The first 3 are more likely images and the last is just a small vector. Therefore you should stack them under different keys with something like
Then when you will do stack[2] you will retrieve an image spec and with stack[-1] you will retrieve a vec spec. Does this make sense? I am down to also schedule a brief call if you would like further help |
Hi @matteobettini, no worries, thanks for the help! I think your feedback makes sense, maybe I can try to fix it and if we are still running into difficulties we could hop on a call? Just to make sure I understand, so individual agent observations go under an agents key and become nested. Things that are shared between all agents still remain at the top level. Also, with regards to the second point about having different keys, I can't necessarily predetermine which index has different observations and know which ones will be similar. For my own environments I could; however, for everyone else they can customize the observations. So like in someone else's environment they might decide to have 4 images, some vector based observations, and maybe a Ray sensor (MLAgents has a lot!). In this case would you recommend treating each observation as drastically different and using the keys "observation_1", "observation_2", etc.. and stacking them like that? |
Sure
Yes, that is because tensors which are shared will have a dimension less, while the "agents" tensordict will have a batch_size with a dimension corresponding to the number of agents
Here are some options i can think of:
Form this options my take would be: see if there is any way to get the sensors names from the unity env and if there isn't we can either try to stack them if they are compatible or have a key per agent otherwise @vmoens i would like to know your take on this (tl;dr) when obs can be heterogenous but we do not have access to their semantic differences |
@hyerra some FYI for you on timelines For a full multiagent integration we need to support nested parametric keys and heterogeneous tensors across all the library components. These features will currently cause bugs here and there (you can find some issues and prs I opened about this). I am currently working on the nested side of things and after that i will move on the heterogeneous part. If you encounter issues on these things please open them in separate github issues and cc me |
@matteobettini This makes sense, I made some progress on this yesterday for having nested keys but ran into some issues. I'll keep you posted on what I find! Also, for similar/different observations, I think maybe option 3 is the best. So the way MLAgents work is there are different behaviors and all of the agents fall under a certain behavior. Each behavior defines the observation and action space for all agents that fall under that behavior. For observations specifically, they are collected through "Sensors" in Unity like a Camera Sensor or a Ray Sensor for instance. In their docs, only a list of observation specs are returned (no name information) and each element of the list corresponds to an observation a sensor collected. I think option 3 is the best for the following reasons:
Since there's a lot of flexibility, maybe it's best to just do 3 since it's the safest option. If people really did want to stack things together and they knew it was safe, maybe we could offer some sort of Transform for it, but we could explore that later too. Does this kind of make sense? I can work on having an implantation and sharing issues I find. |
If there is a way to get the behavior names maybe that could also be an option, since behaviors I guess are aligned with neural networks. In any case, I agree, if there is not a way that we can get observation semantics (names) in a principled manner, we are left only with option 3. We can maybe allow in the future for users to have a way to label observations to make things nicer. Let's start to see if we can reach a working solution this way and then we can iterate. On the nested side I am fixing a bunch of things so in a week or so that should work easily. Feel free to open any issues or problems you find on that. Heterogenous stacking and passing to collectors will be a harder problem but I ll eventually get to that too |
Sounds good! Also, I was wondering what the best way to encode individual agent observations is. Essentially, I have the observation spec setup now so that it looks something like:
Given a list of numpy vectors representing an individual agent's observations:
How do I convert this into a tensordict for that individual agent? I think I have most of it successfully converted, but I'm having difficulty converting the observation part. Essentially, I want to have a tensordict for just that agent, similar to how you do it in VMAS, so something like:
Essentially, I'm not sure how to write the |
Mmh ok so you are stacking individual agent observations and also stacking the agents. Why do you need to stack each agent's observations? Couldn't it be simply a Tensordict? Cause otherwise this is getting a little bit convoluted even just to access.
Supposing that in your case each agent has an individual reward it should look like agent_i_td = TensorDict(
source={
"agents": {
"observation": TensorDict({
"obs_1": Tensor,
"obs_2" Tensor,
})
"reward": Tensor,
}
} The problem is that in your case if you have no guarantees that obs are similar (among agents and even within an agents), I don't know if stacking makes sense at all. Another option (since we have completely 0 guarantees on obs similarity) could be to put all obs in the root td like so all_agents_td = TensorDict(
source={
"agent_1_observation_1": Tensor,
"agent_1_observation_n": Tensor,
....,
"agent_n_observation_n": Tensor,
"done": Tensor
"agents": {
"reward": Tensor,
}, [n_agents,])
} I dunno what makes more sense. |
Yep, currently I'm stacking the individual agent observations and then stacking the agents together. For the part about it being a Tensordict, I think I'm a little confused. How would the Also, we could do the second approach as well. However, having everything grouped in the agents key seems nice. I can do either approach. @vmoens do you have a preference? |
Hey @matteobettini, I think I'm pretty close to finishing implementation of the solution we discussed earlier, and I'll push my changes soon so we can both take a look and iterate. However, one thing I noticed is that I think the
However, when I do
Do we want this behavior or should |
Just updated the PR @matteobettini if you want to take a look and let me know what your thoughts are from a high level. It works using a random policy on my end, but still of course we need to do some rigorous testing and have a way to test this in CI on a real environment. Just wanted to get your thoughts on the overall structure and high-level implementation. |
This is normal. it gives you the leaf spec to make it like single agent.
Cool, I ll take a look monday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems good to me!
See a few comments.
- Also can you add some tests so that we can have them along the way. See the vmas tests to see what kind of things to test.
- Can you give me a few instrucntions on how to instull unity? it seems it requires a paid license, is that so?
Also, for your comment about tests, yep for sure, I'll work on this soon! And let me get back to you about the Unity license this afternoon. Personally, I've just been using an educational/personal license. |
For the license, do you know if you could use the Personal license here? If so, I think you should be able to download/install Unity from that. They also then have some games here we could use for testing. Essentially, it would just involve choosing one of these games and building them I believe. We could then test our integration on the built game. I'll double check to make sure this works though. So far, I've just been testing this on my own custom game environments. |
Signed-off-by: Matteo Bettini <[email protected]>
…pytorch#1581) Co-authored-by: Skander Moalla <[email protected]> Co-authored-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]> Co-authored-by: vmoens <[email protected]>
Co-authored-by: vmoens <[email protected]>
Co-authored-by: vmoens <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]> Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Co-authored-by: Mateusz Guzek <[email protected]> Co-authored-by: vmoens <[email protected]>
Co-authored-by: vmoens <[email protected]>
…rch#1607) Co-authored-by: Matteo Bettini <[email protected]>
Co-authored-by: vmoens <[email protected]>
Co-authored-by: Alessandro Pietro Bardelli <[email protected]> Co-authored-by: Tom Begley <[email protected]>
…torch#1616) Signed-off-by: Matteo Bettini <[email protected]>
For the record, there are now more MARL environments in the library to take example from that use the MARL grouping in #1463 (e.g., PettingZoo) We also have some utils for the grouping mechanism Line 616 in fdee633
Line 691 in fdee633
I believe we could use that API to group agents here by behaviours. |
Closed in favour of #2161 |
Description
This adds support for Unity MLAgent environments which is a popular method of creating games for reinforcement learning settings.
This pull request is still definitely a work-in-progress, and I would love to get feedback/suggestions on how to make this better. I still have to write unit tests and update documentation, but wanted to get some general input on this work before I started making these.
The code right now should work well for single agent environments. However, we need support for heterogeneous spaces in order for this to work well with multiple agents (or even a single agent with different types of observations). Ideally, we would need some kind of Tuple support.
A big limitation of this code though is that in multi-agent reinforcement learning settings, it requires all of the agents to have the same type of observation and action space because the spec created in _make_specs must be fixed. It would be nice if we could have specs defined on a per-agent level in order to fix this. I'm also open to other suggestions as well! Also, one other important point to note is that the number of agents is not necessarily fixed. New agents may be spawned and sometimes agents might not request a decision at a timestamp (which means there might be more/fewer agents requesting decisions from time to time). This is an issue only for multi-agent settings but not single-agent ones.
Motivation and Context
Why is this change required? What problem does it solve?
close #1110
Types of changes
What types of changes does your code introduce? Remove all that do not apply:
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!