Skip to content

Commit

Permalink
Added examples to documentation (followed outline of POMDPExamples.jl)
Browse files Browse the repository at this point in the history
  • Loading branch information
dylan-asmar committed Jan 24, 2024
1 parent 6c19b3d commit 7c2562e
Show file tree
Hide file tree
Showing 11 changed files with 903 additions and 0 deletions.
6 changes: 6 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
[deps]
BasicPOMCP = "d721219e-3fc6-5570-a8ef-e5402f47c49e"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
LightGraphs = "093fc24a-ae57-5d10-9952-331d41423f4d"
NamedTupleTools = "d9ec5142-1e00-5aa0-9d6a-321866360f50"
NativeSARSOP = "a07c76ea-660d-4c9a-8028-2e6dbd212cb8"
POMDPLinter = "f3bd98c0-eb40-45e2-9eb1-f2763262d755"
POMDPModels = "355abbd5-f08e-5560-ac9e-8b5f2592a0ca"
POMDPTools = "7588e00f-9cae-40de-98dc-e0c70c48cdd7"
POMDPs = "a93abf59-7444-517b-a68a-c42f96afdd7d"
QMDP = "3aa3ecc9-5a5d-57c8-8188-3e47bd8068d2"
QuickPOMDPs = "8af83fb2-a731-493c-9049-9e19dbce6165"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

[compat]
Documenter = "1"
10 changes: 10 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,15 @@ makedocs(
"run_simulation.md",
"policy_interaction.md"
],

"Examples and Gallery" => [
"examples.md",
"example_defining_problems.md",
"example_solvers.md",
"example_simulations.md",
"example_gridworld_mdp.md",
"gallery.md"
],

"POMDPTools" => [
"POMDPTools/index.md",
Expand All @@ -59,4 +68,5 @@ makedocs(

deploydocs(
repo = "github.com/JuliaPOMDP/POMDPs.jl.git",
push_preview=true
)
314 changes: 314 additions & 0 deletions docs/src/example_defining_problems.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,314 @@
# Defining a POMDP
As mentioned in the [Defining POMDPs and MDPs](@ref defining_pomdps) section, there are verious ways to define a POMDP using POMDPs.jl. In this section, we provide more examples of how to define a POMDP using the different interfaces.

There is a large variety of problems that can be expressed as MDPs and POMDPs and different solvers require different components of the POMDPs.jl interface to be defined. Therefore, these examples are not intended to cover all possible use cases. When deeloping a problem and you have an idea of what solver(s) you would like to use, it is recommended to use [POMDPLinter](https://github.com/JuliaPOMDP/POMDPLinter.jl) to help you to determine what components of the POMDPs.jl interface need to be defined. Reference the [Checking Requirements](@ref) section for an example of using POMDPLinter.

## CryingBaby Problem Definition
For the examples, we will use the CryingBaby problem from [Algorithms for Decision Making](https://algorithmsbook.com/) by Mykel J. Kochenderfer, Tim A. Wheeler, and Kyle H. Wray.

!!! note
This craying baby problem follows the description in Algorithsm for Decision Making and is different than `BabyPOMDP` defined in [POMDPModels.jl](https://github.com/JuliaPOMDP/POMDPModels.jl).

From [Appendix F](https://algorithmsbook.com/files/appendix-f.pdf) of Algorithms for Decision Making:
> The crying baby problem is a simple POMDP with two states, three actions, and two observations. Our goal is to care for a baby, and we do so by choosing at each time step whether to feed the baby, sing to the baby, or ignore the baby.
>
> The baby becomes hungry over time. We do not directly observe whether the baby is hungry; instead, we receive a noisy observation in the form of whether the baby is crying. The state, action, and observation spaces are as follows:
> ```math
> \begin{align*}
> \mathcal{S} &= \{\text{sated}, \text{hungry} \}\\
> \mathcal{A} &= \{\text{feed}, \text{sing}, \text{ignore} \} \\
> \mathcal{O} &= \{\text{crying}, \text{quiet} \}
> \end{align*}
> ```
>
> Feeding will always sate the baby. Ignoring the baby risks a sated baby becoming hungry, and ensures that a hungry baby remains hungry. Singing to the baby is an information-gathering action with the same transition dynamics as ignoring, but without the potential for crying when sated (not hungry) and with an increased chance of crying when hungry.
>
> The transition dynamics are as follows:
> ```math
> \begin{align*}
> & T(\text{sated} \mid \text{hungry}, \text{feed}) = 100\% \\
> & T(\text{hungry} \mid \text{hungry}, \text{sing}) = 100\% \\
> & T(\text{hungry} \mid \text{hungry}, \text{ignore}) = 100\% \\
> & T(\text{sated} \mid \text{sated}, \text{feed}) = 100\% \\
> & T(\text{hungry} \mid \text{sated}, \text{sing}) = 10\% \\
> & T(\text{hungry} \mid \text{sated}, \text{ignore}) = 10\%
> \end{align*}
> ```
>
> The observation dynamics are as follows:
> ```math
> \begin{align*}
> & O(\text{crying} \mid \text{feed}, \text{hungry}) = 80\% \\
> & O(\text{crying} \mid \text{sing}, \text{hungry}) = 90\% \\
> & O(\text{crying} \mid \text{ignore}, \text{hungry}) = 80\% \\
> & O(\text{crying} \mid \text{feed}, \text{sated}) = 10\% \\
> & O(\text{crying} \mid \text{sing}, \text{sated}) = 0\% \\
> & O(\text{crying} \mid \text{ignore}, \text{sated}) = 10\%
> \end{align*}
> ```
>
> The reward function assigns ``−10`` reward if the baby is hungry, independent of the action taken. The effort of feeding the baby adds a further ``−5`` reward, whereas singing adds ``−0.5`` reward. As baby caregivers, we seek the optimal infinite-horizon policy with discount factor ``\gamma = 0.9``.
## [QuickPOMDP Interface](@id quick_crying)
```julia
using POMDPs
using POMDPTools
using QuickPOMDPs

quick_crying_baby_pomdp = QuickPOMDP(
states = [:sated, :hungry],
actions = [:feed, :sing, :ignore],
observations = [:quiet, :crying],
initialstate = Deterministic(:sated),
discount = 0.9,
transition = function (s, a)
if a == :feed
return Deterministic(:sated)
elseif s == :sated # :sated and a != :feed
return SparseCat([:sated, :hungry], [0.9, 0.1])
else # s == :hungry and a != :feed
return Deterministic(:hungry)
end
end,
observation = function (a, sp)
if sp == :hungry
if a == :sing
return SparseCat([:crying, :quiet], [0.9, 0.1])
else # a == :ignore || a == :feed
return SparseCat([:crying, :quiet], [0.8, 0.2])
end
else # sp = :sated
if a == :sing
return Deterministic(:quiet)
else # a == :ignore || a == :feed
return SparseCat([:crying, :quiet], [0.1, 0.9])
end

end
end,
reward = function (s, a)
r = 0.0
if s == :hungry
r += -10.0
end
if a == :feed
r += -5.0
elseif a == :sing
r+= -0.5
end
return r
end
)
```

## [Explicit Interface](@id explicit_crying)
```julia
using POMDPs
using POMDPTools

struct CryingBabyState
hungry::Bool
end

struct CryingBabyPOMDP <: POMDP{CryingBabyState, Symbol, Symbol}
p_sated_to_hungry::Float64
p_cry_feed_hungry::Float64
p_cry_sing_hungry::Float64
p_cry_ignore_hungry::Float64
p_cry_feed_sated::Float64
p_cry_sing_sated::Float64
p_cry_ignore_sated::Float64
reward_hungry::Float64
reward_feed::Float64
reward_sing::Float64
discount_factor::Float64
end

function CryingBabyPOMDP(;
p_sated_to_hungry=0.1,
p_cry_feed_hungry=0.8,
p_cry_sing_hungry=0.9,
p_cry_ignore_hungry=0.8,
p_cry_feed_sated=0.1,
p_cry_sing_sated=0.0,
p_cry_ignore_sated=0.1,
reward_hungry=-10.0,
reward_feed=-5.0,
reward_sing=-0.5,
discount_factor=0.9
)
return CryingBabyPOMDP(p_sated_to_hungry, p_cry_feed_hungry,
p_cry_sing_hungry, p_cry_ignore_hungry, p_cry_feed_sated,
p_cry_sing_sated, p_cry_ignore_sated, reward_hungry,
reward_feed, reward_sing, discount_factor)
end

POMDPs.actions(::CryingBabyPOMDP) = [:feed, :sing, :ignore]
POMDPs.states(::CryingBabyPOMDP) = [CryingBabyState(false), CryingBabyState(true)]
POMDPs.observations(::CryingBabyPOMDP) = [:crying, :quiet]
POMDPs.stateindex(::CryingBabyPOMDP, s::CryingBabyState) = s.hungry ? 2 : 1
POMDPs.obsindex(::CryingBabyPOMDP, o::Symbol) = o == :crying ? 1 : 2
POMDPs.actionindex(::CryingBabyPOMDP, a::Symbol) = a == :feed ? 1 : a == :sing ? 2 : 3

function POMDPs.transition(pomdp::CryingBabyPOMDP, s::CryingBabyState, a::Symbol)
if a == :feed
return Deterministic(CryingBabyState(false))
elseif s == :sated # :sated and a != :feed
return SparseCat([CryingBabyState(false), CryingBabyState(true)], [1 - pomdp.p_sated_to_hungry, pomdp.p_sated_to_hungry])
else # s == :hungry and a != :feed
return Deterministic(CryingBabyState(true))
end
end

function POMDPs.observation(pomdp::CryingBabyPOMDP, a::Symbol, sp::CryingBabyState)
if sp.hungry
if a == :sing
return SparseCat([:crying, :quiet], [pomdp.p_cry_sing_hungry, 1 - pomdp.p_cry_sing_hungry])
elseif a== :ignore
return SparseCat([:crying, :quiet], [pomdp.p_cry_ignore_hungry, 1 - pomdp.p_cry_ignore_hungry])
else # a == :feed
return SparseCat([:crying, :quiet], [pomdp.p_cry_feed_hungry, 1 - pomdp.p_cry_feed_hungry])
end
else # sated
if a == :sing
return SparseCat([:crying, :quiet], [pomdp.p_cry_sing_sated, 1 - pomdp.p_cry_sing_sated])
elseif a== :ignore
return SparseCat([:crying, :quiet], [pomdp.p_cry_ignore_sated, 1 - pomdp.p_cry_ignore_sated])
else # a == :feed
return SparseCat([:crying, :quiet], [pomdp.p_cry_feed_sated, 1 - pomdp.p_cry_feed_sated])
end
end
end

function POMDPs.reward(pomdp::CryingBabyPOMDP, s::CryingBabyState, a::Symbol)
r = 0.0
if s.hungry
r += pomdp.reward_hungry
end
if a == :feed
r += pomdp.reward_feed
elseif a == :sing
r += pomdp.reward_sing
end
return r
end

POMDPs.discount(pomdp::CryingBabyPOMDP) = pomdp.discount_factor

POMDPs.initialstate(::CryingBabyPOMDP) = Deterministic(CryingBabyState(false))

explicit_crying_baby_pomdp = CryingBabyPOMDP()
```

## [Generative Interface](@id gen_crying)
This crying baby problem should not be implemented using the generative interface. However, this exmple is provided for pedagogical purposes.

```julia
using POMDPs
using POMDPTools
using Random

struct GenCryingBabyState
hungry::Bool
end

struct GenCryingBabyPOMDP <: POMDP{CryingBabyState, Symbol, Symbol}
p_sated_to_hungry::Float64
p_cry_feed_hungry::Float64
p_cry_sing_hungry::Float64
p_cry_ignore_hungry::Float64
p_cry_feed_sated::Float64
p_cry_sing_sated::Float64
p_cry_ignore_sated::Float64
reward_hungry::Float64
reward_feed::Float64
reward_sing::Float64
discount_factor::Float64

GenCryingBabyPOMDP() = new(0.1, 0.8, 0.9, 0.8, 0.1, 0.0, 0.1, -10.0, -5.0, -0.5, 0.9)
end

function POMDPs.gen(pomdp::GenCryingBabyPOMDP, s::CryingBabyState, a::Symbol, rng::AbstractRNG)

if a == :feed
sp = GenCryingBabyState(false)
else
sp = rand(rng) < pomdp.p_sated_to_hungry ? GenCryingBabyState(true) : GenCryingBabyState(false)
end

if sp.hungry
if a == :sing
o = rand(rng) < pomdp.p_cry_sing_hungry ? :crying : :quiet
elseif a== :ignore
o = rand(rng) < pomdp.p_cry_ignore_hungry ? :crying : :quiet
else # a == :feed
o = rand(rng) < pomdp.p_cry_feed_hungry ? :crying : :quiet
end
else # sated
if a == :sing
o = rand(rng) < pomdp.p_cry_sing_sated ? :crying : :quiet
elseif a== :ignore
o = rand(rng) < pomdp.p_cry_ignore_sated ? :crying : :quiet
else # a == :feed
o = rand(rng) < pomdp.p_cry_feed_sated ? :crying : :quiet
end
end

r = 0.0
if sp.hungry
r += pomdp.reward_hungry
end
if a == :feed
r += pomdp.reward_feed
elseif a == :sing
r += pomdp.reward_sing
end

return (sp=sp, o=o, r=r)
end

POMDPs.initialstate(::GenCryingBabyPOMDP) = Deterministic(GenCryingBabyState(false))

gen_crying_baby_pomdp = GenCryingBabyPOMDP()
```

## [Probability Tables](@id tab_crying)
For this implementaion we will use the following indexes:
- States
- `:sated` = 1
- `:hungry` = 2
- Actions
- `:feed` = 1
- `:sing` = 2
- `:ignore` = 3
- Observations
- `:crying` = 1
- `:quiet` = 2

```julia
using POMDPModels

T = zeros(2, 3, 2) # |S| x |A| x |S'|, T[sp, a, s] = p(sp | a, s)
T[:, 1, :] = [1.0 1.0;
0.0 0.0]
T[:, 2, :] = [0.9 0.0;
0.1 1.0]
T[:, 3, :] = [0.9 0.0;
0.1 1.0]

O = zeros(2, 3, 2) # |O| x |A| x |S'|, O[o, a, sp] = p(o | a, sp)
O[:, 1, :] = [0.1 0.8;
0.9 0.2]
O[:, 2, :] = [0.0 0.9;
1.0 0.1]
O[:, 3, :] = [0.1 0.8;
0.9 0.2]

R = zeros(2, 3) # |S| x |A|
R = [-5.0 -0.5 0.0;
-15.0 -10.5 0.0]

discount = 0.9

tabular_crying_baby_pomdp = TabularPOMDP(T, R, O, discount)
```
2 changes: 2 additions & 0 deletions docs/src/example_gridworld_mdp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# GridWorld MDP using Value Iteration and MCTS

Loading

0 comments on commit 7c2562e

Please sign in to comment.