This repo contains the source code of MATE
, the Multi-Agent Tracking Environment. The full documentation can be found at https://mate-gym.readthedocs.io. The full list of implemented agents can be found in section Implemented Algorithms. For detailed description, please checkout our paper (PDF, bibtex).
This is an asymmetric two-team zero-sum stochastic game with partial observations, and each team has multiple agents (multiplayer). Intra-team communications are allowed, but inter-team communications are prohibited. It is cooperative among teammates, but it is competitive among teams (opponents).
git config --global core.symlinks true # required on Windows
pip3 install git+https://github.com/XuehaiPan/mate.git#egg=mate
NOTE: Python 3.7+ is required, and Python versions lower than 3.7 is not supported.
It is highly recommended to create a new isolated virtual environment for MATE
using conda
:
git clone https://github.com/XuehaiPan/mate.git && cd mate
conda env create --no-default-packages --file conda-recipes/basic.yaml # or full-cpu.yaml to install RLlib
conda activate mate
Make the MultiAgentTracking
environment and play!
import mate
# Base environment for MultiAgentTracking
env = mate.make('MultiAgentTracking-v0')
env.seed(0)
done = False
camera_joint_observation, target_joint_observation = env.reset()
while not done:
camera_joint_action, target_joint_action = env.action_space.sample() # your agent here (this takes random actions)
(
(camera_joint_observation, target_joint_observation),
(camera_team_reward, target_team_reward),
done,
(camera_infos, target_infos)
) = env.step((camera_joint_action, target_joint_action))
Another example with a built-in single-team wrapper (see also Built-in Wrappers):
import mate
env = mate.make('MultiAgentTracking-v0')
env = mate.MultiTarget(env, camera_agent=mate.GreedyCameraAgent(seed=0))
env.seed(0)
done = False
target_joint_observation = env.reset()
while not done:
target_joint_action = env.action_space.sample() # your agent here (this takes random actions)
target_joint_observation, target_team_reward, done, target_infos = env.step(target_joint_action)
4 Cameras vs. 8 Targets (9 Obstacles)
mate/evaluate.py
contains the example evaluation code for the MultiAgentTracking
environment. Try out the following demos:
# <MultiAgentTracking<MultiAgentTracking-v0>>(4 cameras, 2 targets, 9 obstacles)
python3 -m mate.evaluate --episodes 1 --config MATE-4v2-9.yaml
# <MultiAgentTracking<MultiAgentTracking-v0>>(4 cameras, 8 targets, 9 obstacles)
python3 -m mate.evaluate --episodes 1 --config MATE-4v8-9.yaml
# <MultiAgentTracking<MultiAgentTracking-v0>>(8 cameras, 8 targets, 9 obstacles)
python3 -m mate.evaluate --episodes 1 --config MATE-8v8-9.yaml
# <MultiAgentTracking<MultiAgentTracking-v0>>(4 cameras, 8 targets, 0 obstacle)
python3 -m mate.evaluate --episodes 1 --config MATE-4v8-0.yaml
# <MultiAgentTracking<MultiAgentTracking-v0>>(0 camera, 8 targets, 32 obstacles)
python3 -m mate.evaluate --episodes 1 --config MATE-Navigation.yaml
4 Cameras vs. 2 Targets (9 obstacles) |
4 Cameras vs. 8 Targets (9 obstacles) |
8 Cameras vs. 8 Targets (9 obstacles) |
4 Cameras vs. 8 Targets (no obstacles) |
8 Targets Navigation (no cameras) |
You can specify the agent classes and arguments by:
python3 -m mate.evaluate --camera-agent module:class --camera-kwargs <JSON-STRING> --target-agent module:class --target-kwargs <JSON-STRING>
You can find the example code for agents in examples
. The full list of implemented agents can be found in section Implemented Algorithms. For example:
# Example demos in examples
python3 -m examples.naive
# Use the evaluation script
python3 -m mate.evaluate --episodes 1 --render-communication \
--camera-agent examples.greedy:GreedyCameraAgent --camera-kwargs '{"memory_period": 20}' \
--target-agent examples.greedy:GreedyTargetAgent \
--config MATE-4v8-9.yaml \
--seed 0
You can implement your own custom agents classes to play around. See Make Your Own Agents for more details.
The MultiAgentTracking
environment accepts a Python dictionary mapping or a configuration file in JSON or YAML format.
If you want to use customized environment configurations, you can copy the default configuration file:
cp "$(python3 -m mate.assets)"/MATE-4v8-9.yaml MyEnvCfg.yaml
Then make some modifications for your own. Use the modified environment by:
env = mate.make('MultiAgentTracking-v0', config='/path/to/your/cfg/file')
There are several preset configuration files in mate/assets
directory.
# <MultiAgentTracking<MultiAgentTracking-v0>>(4 camera, 2 targets, 9 obstacles)
env = mate.make('MATE-4v2-9-v0')
# <MultiAgentTracking<MultiAgentTracking-v0>>(4 camera, 8 targets, 9 obstacles)
env = mate.make('MATE-4v8-9-v0')
# <MultiAgentTracking<MultiAgentTracking-v0>>(8 camera, 8 targets, 9 obstacles)
env = mate.make('MATE-8v8-9-v0')
# <MultiAgentTracking<MultiAgentTracking-v0>>(4 camera, 8 targets, 0 obstacles)
env = mate.make('MATE-4v8-0-v0')
# <MultiAgentTracking<MultiAgentTracking-v0>>(0 camera, 8 targets, 32 obstacles)
env = mate.make('MATE-Navigation-v0')
You can reinitialize the environment with a new configuration without creating a new instance:
>>> env = mate.make('MultiAgentTracking-v0', wrappers=[mate.MoreTrainingInformation]) # we support wrappers
>>> print(env)
<MoreTrainingInformation<MultiAgentTracking<MultiAgentTracking-v0>>(4 cameras, 8 targets, 9 obstacles)>
>>> env.load_config('MATE-8v8-9.yaml')
>>> print(env)
<MoreTrainingInformation<MultiAgentTracking<MultiAgentTracking-v0>>(8 cameras, 8 targets, 9 obstacles)>
Besides, we provide a script mate/assets/generator.py
to generate a configuration file with responsible camera placement:
python3 -m mate.assets.generator --path 24v48.yaml --num-cameras 24 --num-targets 48 --num-obstacles 20
See Environment Customization for more details.
MATE provides multiple wrappers for different settings. Such as fully observability, discrete action spaces, single team multi-agent, etc. See Built-in Wrappers for more details.
Wrapper | Description | |
---|---|---|
observation | EnhancedObservation |
Enhance the agent’s observation, which sets all observation mask to True .
|
SharedFieldOfView |
Share field of view among agents in the same team, which applies the or operator over the observation masks. The target agents share the empty status of warehouses.
|
|
MoreTrainingInformation |
Add more environment and agent information to the info field of step() , enabling full observability of the environment.
|
|
RescaledObservation |
Rescale all entity states in the observation to [-1, +1]. | |
RelativeCoordinates |
Convert all locations of other entities in the observation to relative coordinates. | |
action | DiscreteCamera |
Allow cameras to use discrete actions. |
DiscreteTarget |
Allow targets to use discrete actions. | |
reward | AuxiliaryCameraRewards |
Add additional auxiliary rewards for each individual camera. |
AuxiliaryTargetRewards |
Add additional auxiliary rewards for each individual target. | |
single-team | MultiCamera
| Wrap into a single-team multi-agent environment. |
MultiTarget |
||
SingleCamera |
Wrap into a single-team single-agent environment. | |
SingleTarget |
||
communication | MessageFilter |
Filter messages from agents of intra-team communications. |
RandomMessageDropout |
Randomly drop messages in communication channels. | |
RestrictedCommunicationRange |
Add a restricted communication range to channels. | |
NoCommunication |
Disable intra-team communications, i.e., filter out all messages. | |
ExtraCommunicationDelays |
Add extra message delays to communication channels. | |
miscellaneous | RepeatedRewardIndividualDone |
Repeat the reward field and assign individual done field of step() , which is similar to MPE.
|
You can create an environment with multiple wrappers at once. For example:
env = mate.make('MultiAgentTracking-v0',
wrappers=[
mate.EnhancedObservation,
mate.MoreTrainingInformation,
mate.WrapperSpec(mate.DiscreteCamera, levels=5),
mate.WrapperSpec(mate.MultiCamera, target_agent=mate.GreedyTargetAgent(seed=0)),
mate.RepeatedRewardIndividualDone,
mate.WrapperSpec(mate.AuxiliaryCameraRewards,
coefficients={'raw_reward': 1.0,
'coverage_rate': 1.0,
'soft_coverage_score': 1.0,
'baseline': -2.0}),
])
The following algorithms are implemented in examples
:
-
Rule-based:
- Random (source:
mate/agents/random.py
) - Naive (source:
mate/agents/naive.py
) - Greedy (source:
mate/agents/greedy.py
) - Heuristic (source:
mate/agents/heuristic.py
)
- Random (source:
-
Multi-Agent Reinforcement Learning Algorithms:
- IQL (https://arxiv.org/abs/1511.08779)
- QMIX (https://arxiv.org/abs/1803.11485)
- MADDPG (MA-TD3) (https://arxiv.org/abs/1706.02275)
- IPPO (https://arxiv.org/abs/2011.09533)
- MAPPO (https://arxiv.org/abs/2103.01955)
-
Multi-Agent Reinforcement Learning Algorithms with Multi-Agent Communication:
- TarMAC (base algorithm: IPPO) (https://arxiv.org/abs/1810.11187)
- TarMAC (base algorithm: MAPPO)
- I2C (base algorithm: MAPPO) (https://arxiv.org/abs/2006.06455)
-
Population Based Adversarial Policy Learning, available meta-solvers:
- Self-Play (SP)
- Fictitious Self-Play (FSP) (https://proceedings.mlr.press/v37/heinrich15.html)
- PSRO-Nash (NE) (https://arxiv.org/abs/1711.00832)
NOTE: all learning-based algorithms are tested with Ray 1.12.0 on Ubuntu 20.04 LTS.
If you find MATE useful, please consider citing:
@inproceedings{pan2022mate,
title = {{MATE}: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control},
author = {Xuehai Pan and Mickel Liu and Fangwei Zhong and Yaodong Yang and Song-Chun Zhu and Yizhou Wang},
booktitle = {Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year = {2022},
url = {https://openreview.net/forum?id=SyoUVEyzJbE}
}
MIT License