Skip to content


Repository files navigation

SubWorld Dynamic Programming

The goal of this project is to solve the semi-controlled sensing POMDP SubWorld (more officially referred to as our Nautical Navigation Environment). This environment is a semi-controlled sensing problem because it contains both traditional state-modifying actions in typical POMDPs and non-state modifying information revealing measurement actions in controlled sensing POMDPs.

This repository conatins the code required to create dynamic programming based SubWorld:

  • Charts
  • Water currents
  • Value functions
  • Policies

If you use this work, please cite

	author = {Beeler, Chris and Li, Xinkai and Bellinger, Colin and Crowley, Mark and Fraser, Maia and Tamblyn, Isaac},
	journal = {Proceedings of the Canadian Conference on Artificial Intelligence},
	year = {2024},
	month = {may 27},
	note = {},
	publisher = {Canadian Artificial Intelligence Association (CAIAC)},
	title = {Dynamic programming with incomplete information to overcomenavigational uncertainty in {POMDPs}},

Additional information for the above manuscript can be found here.

Released Versions

For specific versions used in published (or submitted) papers, please view the other existing branches.

Version Paper
v1.0 Beeler, Chris, et al. "Dynamic programming with incomplete information to overcomenavigational uncertainty in POMDPs." Proceedings of the Canadian Conference on Artificial Intelligence (2024).


pip install ./

Generating Charts

Running will generate a single chart with water currents stored in data/charts/charts_SEED.npz. Running will generate many charts with water currents stored in data/charts/charts_SEED.npz. These charts are generated in parallel using multiprocessing.

The charts and water currents can be visualized using

Generating Value Functions

Running will generate the value function assuming the chart has no water currents. The value function is generated using Bellman's dynamic programming algorithm and will be stored in data/value/value_SEED.npz. Running will generate the value functions for each chart from stored in data/value/value_SEED.npz.

The value functions can be visualized using

Generating Policies

Running will generate a policy based on the value function. The policy's trajectectory will be stored in data/policy/policy_gps_SEED.npz. Running will generate policies for each value function from stored in data/policy/policy_gps_SEED.npz.

The policy trajectories can be visualized using

YAML Parameters

The parameters used in each task are stored in params.yaml. Some parameters may carry forward into other tasks.

Generating Charts

Parameter Type Description
seed Non-negative int The random seed used create the islands that define the chart.
dim Positive int The dimension size of the chart.
n_islands Non-negative int or None The number of islands that will be generated. Setting None will result in a random number of islands.
min_islands Non-negative int The minimum number of islands that will be generated if n_islands is None.
max_islands Non-negative int The maximum number of islands that will be generated if n_islands is None.
size Non-negative float The size of the x and y dimensions of the chart. The Submarine can move up to 1 unit per action.
min_height Non-negative float The minimum height of each island.
max_height Positive float > min_height The maximum height of each island.
x_decay_min Positive float The minimum decay rate in the x direction of each island.
x_decay_max Positive float > x_decay_min The maximum decay rate in the x direction of each island.
y_decay_min Positive float The minimum decay rate in the y direction of each island.
y_decay_max Positive float > y_decay_min The maximum decay rate in the y direction of each island.
max_cur Non-negative float The maximum water current magnitude.

Value Function Generation

Parameter Type Description
target_x Non-negative float < dim or None The x coordinate of the target.
target_y Non-negative float < dim or None The y coordinate of the target.
discount Non-negative float < 1 The discount factor used in Bellman's dynamic programming algorithm.
n_t Positive int The number of discretized throttle actions.
n_h Positive int The number of discritized heading actions.
tol float Log base 10 of the convergence tolerence used in Bellman's dynamic programming algorithm. I.e. tol=-6 -> a tolerence of 1e-6.

Policy Generation

Parameter Type Description
sub_x Non-negative float < dim or None The x coordinate for the submarine's starting position. Setting None will result in a random coordinate.
sub_y Non-negative float < dim or None The y coordinate for the submarine's starting position. Setting None will result in a random coordinate.
n_steps Positive int The maximum number of steps the agent can take before the episode ending.
gps_cost Non-negative float The cost required to use the GPS. If set to zero, the GPS will be used at every step.
cur_cost Non-negative float The cost required to use the Current Profiler. If set to zero, the Current Profiler will be used at every step.
uncert_pos Non-negative float The rate at which uncertainty in position increases.
uncert_cur Non-negative float The rate at which uncertainty in water current increases.
uncert_res Positive int The number of trajectories simulated for each axis of uncertainty, (x, y) for both position and water current, per action estimate. I.e. uncert_res=3 -> 81 trajectories simulated per action estimate.

Extra Paramters for Parallelization

Parameter Type Description
n_cpu Positive int The number of CPUs used in multiprocessing.Pool when generating many charts/value functions.
maps_i Non-negative int The seed to start at when generating many charts.
maps_f Positive int > maps_i The seed to end at when generating many charts.


No description, website, or topics provided.






No releases published


No packages published
