Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QuickPOMDP Interfaces in python do not work: Solver application and isterminal issues #8

Open
afansi opened this issue Nov 28, 2019 · 4 comments

Comments

@afansi
Copy link

afansi commented Nov 28, 2019

Hello Guys,

I have been trying to use the QUICKPOMDPs interfaces with python. While I managed to use the DiscreteExplicitPOMDP interface, I am struggling in using the QuickPOMDP interface in python.

Indeed, here is the list of difficulties I am facing:

  1. While I am able to define an instance of QuickPOMDP object, I can't apply any solver on it, for example SARSOP. I was able to run this solver with the DiscreteExplicitPOMDP object as in tiger.py (so it is not an installation issue). I am globally having this error (it is the same if I use another solver such as QMDPSolver):

========================================

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-5d5ce468204e> in <module>
      1 solver = SARSOPSolver()
----> 2 policy = solve(solver, pomdp)

TypeError: '>' not supported between instances of 'method' and 'int'

=============================================

here is my code:

import julia
from julia.QuickPOMDPs import DiscreteExplicitPOMDP, QuickPOMDP
from julia.POMDPs import solve, pdf
from julia.QMDP import QMDPSolver
from julia.SARSOP import SARSOPSolver
from julia.POMDPPolicies import alphavectors, RandomPolicy
from julia.POMDPModelTools import Deterministic, SparseCat
from julia.POMDPSimulators import stepthrough, HistoryRecorder, eachstep, simulate
from julia.BeliefUpdaters import DiscreteUpdater
from julia import Base
from julia import Random

import pickle
import itertools
import copy
import time
import itertools
from collections import namedtuple
import typing
import random

class POMDPGenerator:
    def __init__(self, seed=1234):
        self.states = ['left', 'right']
        self.actions = ['left', 'right', 'listen']
        self.observations = ['left', 'right']
        
        self.rng = random.Random(seed)
        self.good_obs = .85
        self.init_state = .5
        self.random_obs = .5
  
    
    def stateindex(self, s):
        idx_p = self.states.index(p)
        return (idx_p) #+ 1
    
    def actionindex(self, a):
        idx_a = self.actions.index(a)
        return (idx_a) #+ 1
    
    def obsindex(self, o):
        idx_o = self.observations.index(o)
        return (idx_o) #+ 1
    
    def initialstate_distribution(self):
        #self.init_state = self.rng.random()    
        return SparseCat(self.states, [self.init_state, 1-self.init_state])
    
    def initialstate(self, rng):
        # return Random.rand(rng, self.initialstate_distribution())
        return Random.rand(self.initialstate_distribution())
    
    def transition(self, s, a):
        if a == 'listen':
            sp = s
            return SparseCat([sp], [1.0])
        else: # a door is opened
            return self.initialstate_distribution()
        
    def transition2(self, s, a, sp):
        if a == 'listen':
            if sp == s:
                return 1.0
            else:
                return 0.0
        else: # a door is opened
            #d= self.initialstate_distribution()
            if sp=='left':
                return self.init_state
            else:
                return 1.0-self.init_state
        
    def observation(self, s, a, sp):        
        return self.observation2(a, sp)
        
    def observation2(self, a, sp):
        if a == 'listen':
            if 'left' == sp:
                return SparseCat(['left', 'right'], [self.good_obs, 1.0-self.good_obs])
            else:
                return SparseCat(['left', 'right'], [1.0-self.good_obs, self.good_obs])
        else:
            return SparseCat(['left', 'right'], [self.random_obs, 1.0-self.random_obs])
        
    def observation3(self, a, sp, o):
        if a == 'listen':
            if o == sp:
                return self.good_obs
            else:
                return 1.0-self.good_obs
        else:
            if o == 'left':
                return self.random_obs
            else:
                return 1.0-self.random_obs

            
    def reward(self, s, a):
        if a == 'listen':
            return -1.0
        elif s == a: # the tiger was found
            return -100.0
        else: # the tiger was escaped
            return 10.0
        
    def generate_pomdp(self, discount=0.95):
        
        return QuickPOMDP(
            initialstate_distribution = self.initialstate_distribution,
            transition = self.transition,
            observation = self.observation,
            reward=self.reward,
            states=self.states,
            actions=self.actions,
            observations=self.observations,
            initialstate=self.initialstate(Random.AbstractRNG),
            discount=discount,
            stateindex=self.stateindex,
            actionindex=self.actionindex,
            obsindex=self.obsindex,
        )
    
    def generate_pomdp2(self, discount=0.95):
        
        return DiscreteExplicitPOMDP(
            
            self.states,
            self.actions,
            self.observations,
            self.transition2,
            self.observation3,
            self.reward,
            discount,
            self.initialstate_distribution(),
        )
    
    def isterminal(self, s):
        return s =='terminal'
    
    def generate_pomdp_with_terminal(self, discount=0.95):
        
        return QuickPOMDP(
            initialstate_distribution = self.initialstate_distribution,
            transition = self.transition,
            observation = self.observation,
            reward=self.reward,
            states=self.states + ['terminal'],
            actions=self.actions,
            observations=self.observations,
            initialstate=self.initialstate(Random.AbstractRNG),
            discount=discount,
            stateindex=self.stateindex,
            actionindex=self.actionindex,
            obsindex=self.obsindex,
            isterminal=self.isterminal
        )
    
    def generate_pomdp_without_terminal(self, discount=0.95):
        
        return QuickPOMDP(
            initialstate_distribution = self.initialstate_distribution,
            transition = self.transition,
            observation = self.observation,
            reward=self.reward,
            states=self.states + ['terminal'],
            actions=self.actions,
            observations=self.observations,
            initialstate=self.initialstate(Random.AbstractRNG),
            discount=discount,
            stateindex=self.stateindex,
            actionindex=self.actionindex,
            obsindex=self.obsindex,
        )




Gen = POMDPGenerator()
pomdp = Gen.generate_pomdp()

solver = SARSOPSolver()
policy = solve(solver, pomdp)

print('alpha vectors:')
for v in alphavectors(policy):
    print(v)

print()

for step in stepthrough(pomdp, policy, "s,a,o", max_steps=10):
    print(step.s)
    print(step.a)
    print(step.o)
    print()
  1. Another problem I am facing is that the parameter "isterminal" of the QuickPOMDP is not well handled. Actually when implementing the function isterminal(s) which a return a boolean and use it as the terminal function in the QuickPOMDP iterfacte, I got an error:

==========================================

TypeError: Julia exception: TypeError: non-boolean (PyObject) used in boolean context
Stacktrace:
 [1] iterate(::POMDPSimulators.POMDPSimIterator{(:s, :a, :o),QuickPOMDPs.QuickPOMDP{UUID("0fbc5ae6-f61e-43a8-ac7e-9abfa223a2ec"),String,String,String,NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :initialstate_distribution, :observation, :actionindex, :transition, :reward, :initialstate),Tuple{PyObject,PyObject,PyObject,Array{String,1},Array{String,1},Float64,Array{String,1},PyObject,PyObject,PyObject,PyObject,PyObject,String}}},POMDPPolicies.RandomPolicy{Random.MersenneTwister,QuickPOMDPs.QuickPOMDP{UUID("0fbc5ae6-f61e-43a8-ac7e-9abfa223a2ec"),String,String,String,NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :initialstate_distribution, :observation, :actionindex, :transition, :reward, :initialstate),Tuple{PyObject,PyObject,PyObject,Array{String,1},Array{String,1},Float64,Array{String,1},PyObject,PyObject,PyObject,PyObject,PyObject,String}}},BeliefUpdaters.NothingUpdater},BeliefUpdaters.NothingUpdater,Random.MersenneTwister,Nothing,String}, ::Tuple{Int64,String,Nothing}) at /home/fansitca/.julia/packages/POMDPSimulators/nMXAP/src/stepthrough.jl:86 (repeats 2 times)
 [2] jlwrap_iterator(::POMDPSimulators.POMDPSimIterator{(:s, :a, :o),QuickPOMDPs.QuickPOMDP{UUID("0fbc5ae6-f61e-43a8-ac7e-9abfa223a2ec"),String,String,String,NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :initialstate_distribution, :observation, :actionindex, :transition, :reward, :initialstate),Tuple{PyObject,PyObject,PyObject,Array{String,1},Array{String,1},Float64,Array{String,1},PyObject,PyObject,PyObject,PyObject,PyObject,String}}},POMDPPolicies.RandomPolicy{Random.MersenneTwister,QuickPOMDPs.QuickPOMDP{UUID("0fbc5ae6-f61e-43a8-ac7e-9abfa223a2ec"),String,String,String,NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :initialstate_distribution, :observation, :actionindex, :transition, :reward, :initialstate),Tuple{PyObject,PyObject,PyObject,Array{String,1},Array{String,1},Float64,Array{String,1},PyObject,PyObject,PyObject,PyObject,PyObject,String}}},BeliefUpdaters.NothingUpdater},BeliefUpdaters.NothingUpdater,Random.MersenneTwister,Nothing,String}) at /home/fansitca/.julia/packages/PyCall/ttONZ/src/pyiterator.jl:150
 [3] pyjlwrap_getiter(::Ptr{PyCall.PyObject_struct}) at /home/fansitca/.julia/packages/PyCall/ttONZ/src/pyiterator.jl:131
 [4] macro expansion at /home/fansitca/.julia/packages/PyCall/ttONZ/src/exception.jl:81 [inlined]
 [5] __pycall!(::PyObject, ::Ptr{PyCall.PyObject_struct}, ::PyObject, ::Ptr{Nothing}) at /home/fansitca/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:44
 [6] _pycall!(::PyObject, ::PyObject, ::Tuple{Array{String,1}}, ::Int64, ::Ptr{Nothing}) at /home/fansitca/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:29
 [7] #call#111 at /home/fansitca/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:11 [inlined]
 [8] (::PyObject)(::Array{String,1}) at /home/fansitca/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:89
 [9] top-level scope at none:0
 [10] eval(::Module, ::Any) at ./boot.jl:319
 [11] exec_options(::Base.JLOptions) at ./client.jl:243
 [12] _start() at ./client.jl:425

=========================================

The code for reproducing that error is the following:

Gen = POMDPGenerator()
pomdp = Gen.generate_pomdp_with_terminal()
policy = RandomPolicy(pomdp)
for step in stepthrough(pomdp, policy, "s,a,o", max_steps=10):
    print(step.s)
    print(step.a)
    print(step.o)
    print()

Please note that the following code works properly when removing the "isterminal" parameter:

Gen = POMDPGenerator()
pomdp = Gen.generate_pomdp_without_terminal()
policy = RandomPolicy(pomdp)
for step in stepthrough(pomdp, policy, "s,a,o", max_steps=10):
    print(step.s)
    print(step.a)
    print(step.o)
    print()

I am using Julia 1.0.5 (2019-09-09) and here is the list of installed packages:

Dict{String,Union{Nothing, VersionNumber}} with 18 entries:
  "BeliefUpdaters"  => v"0.1.2"
  "POMDPModelTools" => v"0.2.0"
  "Distributions"   => v"0.21.8"
  "QuickPOMDPs"     => v"0.2.0"
  "BasicPOMCP"      => v"0.2.1"
  "PyCall"          => v"1.91.2"
  "QMDP"            => v"0.1.2"
  "Compose"         => v"0.7.4"
  "IJulia"          => v"1.20.2"
  "Colors"          => v"0.9.6"
  "POMDPSimulators" => v"0.3.2"
  "POMDPPolicies"   => v"0.2.1"
  "SARSOP"          => v"0.4.0"
  "StaticArrays"    => v"0.12.1"
  "POMDPToolbox"    => v"0.3.0"
  "POMDPGifs"       => v"0.1.0"
  "POMDPs"          => v"0.8.1"
  "Parameters"      => v"0.12.0"
@zsunberg
Copy link
Member

Hi @afansi , thanks for reporting this! I think it should be fairly straightforward to fix. I'm guessing the problem is that QuickPOMDPs is not recognizing that the PyObject is a function. I'll address it as soon as possible after the Thanksgiving holiday.

@zsunberg
Copy link
Member

Working on this... almost done.

@zsunberg
Copy link
Member

zsunberg commented Nov 30, 2019

@afansi this has been fixed in the quick_pycall branch in #9 . see examples/issue_8.py for some changes that had to be made to the python code. Still deciding if I should merge it because it introduces a dependency on PyCall

@afansi
Copy link
Author

afansi commented Dec 1, 2019

Hi @zsunberg , Thanks you. I will give a try right away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants