Non-linear policy graph improvement

Non-linear policy graph improvement algorithm for finding policies for decentralized partially observable Markov decision processes, or Dec-POMDPs.

Decentralized POMDPs

A Dec-POMDP is a decentralized decision-making problem under uncertainty, where a team of agents acts jointly to maximize the expected sum of shared rewards. The objective is to find for each agent a policy that tells how that agent should act to maximize the shared reward.

The algorithm implemented here is suitable for standard Dec-POMDPs and ones that have an information-theoretic reward function, that is, the reward depends on the joint information state of the agents. Such Dec-POMDPs can model decentralized information gathering tasks, such as monitoring or surveillance tasks.

The algorithm represents policies as finite state controllers, or policy graphs. The policy graphs are iteratively improved by finding the action labels of nodes, and the configuration of the out-edges corresponding to observations.

Blog post and related work

You can find a high-level summary of this work from this blog post.

The follow-up to this work is Multi-agent active perception with prediction rewards that you may be interested to check out as well.

Publications

For more background information on the algorithm and Dec-POMDPs with information-theoretic rewards, you can refer to the publications below. If you find this repository useful, please consider citing them.

Mikko Lauri, Joni Pajarinen, Jan Peters. Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement, Autonomous Agents and Multi-Agent Systems, 42(34), 2020.

Link to open access version

BiBTeX:

@article{Lauri2020multiagent,
  	author = {Lauri, Mikko and Pajarinen, Joni and Peters, Jan},
	title = {Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement},
	year = {2020},
	volume = {34},
	number = {42},
	year = {2020},
	doi = {10.1007/S10458-020-09467-6},
	journal = {Autonomous Agents and Multi-Agent Systems}
}

Mikko Lauri, Joni Pajarinen, Jan Peters. Information gathering in decentralized POMDPs by policy graph improvement, in 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019)

Link to conference proceedings You can find a preprint on arXiv.

BiBTeX:

@inproceedings{Lauri2019information,
  author    = {Mikko Lauri and Joni Pajarinen and Jan Peters}, 
  title     = {Information gathering in decentralized POMDPs by policy graph improvement},
  booktitle = {Autonomous Agents and Multiagent Systems (AAMAS)},
  year    = 2019
}

Compilation and installation

Your compiler must support C++14.

Required system libraries:

eigen3
Boost libraries: program_options regex graph

Required external libraries:

MADP toolbox will be installed automatically, or you can provide a path where to find the library if you have it already by cmake options -DMADPPATH=/path/to/madp. If you need a different version than the auto-installed one, modify the GIT_TAG in CMakeLists.txt or use the DMADPPATH variable to point to your preferred installation.

How to compile

mkdir build && cd build
cmake ..
make

Recommended extra software

You can use the xdot viewer tool to visualize the policy graphs.

Getting started

Problem domains

The MAV and Information gathering rovers domains from the AAMAS paper are included in the subfolder problems.

Solving a Dec-POMDP

After you have compiled the software, you can try to find a policy for the classical Tiger Dec-POMDP. We can try with a random seed equal to 100 with the option -s and run 100 improvement steps on the policy graphs by the -i option:

./NPGI -s 100 -i 100 ../problems/dectiger.dpomdp

You will see some outputs with policy values and timings. The software will create some files in the current directory:

best_policy_agentI.dot the best policy for agent I in GraphViz DOT format
best_value.txt the value of the best joint policy found by applying best policy graphs for each agent
stepXYZ_agentI.dot the policy graph of agent I after improvement step XYZ in the GraphViz DOT Format
policy_values.txt the values of policies found at each improvement step
duration_microseconds.txt timing for how long each improvement step took in microseconds

Solving a Dec-POMDP with information-theoretic rewards

The software supports using negative Shannon information entropy as the final step reward. This can be useful for information gathering tasks such as those discussed in the paper. If you want to solve a Dec-POMDP that uses negative entropy as the reward on the final step, just run NPGI with the option -e.

For example, we can solve the MAV problem from the AAMAS paper for a horizon of 3 steps, with policy graph width of 2. First, remember to first uncompress the problem file in the problems folder! Then run:

./NPGI -e -h 3 -w 2 -i 30 ../problems/mav.dpomdp

The outputs are as in the usual Dec-POMDP case.

Modifying the reward

The software only supports using the negative entropy of the belief state as a final reward. If you want to modify the information reward, you can do so by modifying the two functions named final_reward in the file RewardMatrix.hpp.

Solving a continuous-state Dec-POMDP

You can look at the files solve_graphsensing.cpp, GraphSensingProblem.h and GraphSensingProblem.cpp to see how you can define your own problem with a continuous state. More details on this problem and the algorithm variant that solves them is provided in the journal article linked above.

Authors

Mikko Lauri - University of Hamburg, Hamburg, Germany

License

Licensed under the Apache 2.0 license - see LICENSE for details.

Acknowledgments

Joni Pajarinen for providing a reference implementation of the original policy graph improvement algorithm
We use the parser from the excellent MADP toolbox to read .dpomdp files.
We use combinations_cpp by Artem Amirkhanov to enumerate combinations.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
include		include
problems		problems
src		src
test		test
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Non-linear policy graph improvement

Decentralized POMDPs

Blog post and related work

Publications

Compilation and installation

How to compile

Recommended extra software

Getting started

Problem domains

Solving a Dec-POMDP

Solving a Dec-POMDP with information-theoretic rewards

Modifying the reward

Solving a continuous-state Dec-POMDP

Authors

License

Acknowledgments

About

Releases

Packages

Languages

License

laurimi/npgi

Folders and files

Latest commit

History

Repository files navigation

Non-linear policy graph improvement

Decentralized POMDPs

Blog post and related work

Publications

Compilation and installation

How to compile

Recommended extra software

Getting started

Problem domains

Solving a Dec-POMDP

Solving a Dec-POMDP with information-theoretic rewards

Modifying the reward

Solving a continuous-state Dec-POMDP

Authors

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages