Skip to content

Latest commit

 

History

History
12 lines (5 loc) · 471 Bytes

README.md

File metadata and controls

12 lines (5 loc) · 471 Bytes

Misc experiments

$$V(s) , = \max_a \left{ \sum_{s'} P_a(s,s') \left( R_a(s,s') + \gamma V(s') \right) \right}$$

Without \left and \right:

$$V(s) , = \max_a { \sum_{s'} P_a(s,s') ( R_a(s,s') + \gamma V(s') ) }$$

The monad-parametrized representation is isomorphic to a state transition function together with an initial state $s$, i.e., $(a \times s \to b \times s) \ \times \ s$ (where $a$ and $b$ are the input and output value types, respectively).