For experiments replication, see here.
This project provides a number of solvers for JLabGym. JLabGym
provides a Java-based environment that will allow you to use the game Lab Recruits as an 'AI Gym'. An AI Gym is an environment where you can try out your AI, or whatever algorithm X, to control agent to perform some tasks in the environment. The Gym itself is included in this project. For its documentation, see here. In the orginal gym, the task is to discover connections between buttons and doors in a given Lab Recuits level. Here, we change that: the goal is to open a given target door (more precisely, to reach a state where the door is observed to be open).
Provided algorithms:
-
Random: repeatedly alternates between exploring to discover game objects, and randomly choosing a pair (b,d) of button and door; it then toggles b to find out if it opens d.
Implementing class:
BaseSearchAlgorithm
. -
Evo implements an evolutionary algorithm for automated testing. Each gene in a chromosome is a sequence
navigateTo_(o); intertact_(o)
where o is a button. Exploration is implicitly invoked if at the moment o is not known to the agent. The fitness of a chromosome is maximum if it solves the given testing task. Else the fitness is the number of button-door connections found when the chromosome is executed, plus the number of doors that are open at the end of the execution. Chromosomes are generated offline, without taking into account if they can actually be fully executed. If a chromosome is only partially executable, its fitness is calculated after the last executable gene.Implementing class:
Evolutionary
-
MTCS implements a reinforcement learning algorithm called Monte Carlo tree search. It is popularly used to train computers to play a board game such as the Go and Hex. The search tree is generated online. At each node N in the search tree, possible actions take the form of a sequence
navigateTo(o);intertact(o)
where o is a button. However, unlike Evo (which is offline), only buttons that are reachable from N are considered. So, every path in the search tree is always executable. The reward of a play is defined the same as the fitness value of Evo.Implementing class:
MCTS
The current implementation stops the algorithm as soon as a sequence of steps that completes the given task is found. If you wish to continue the learning, you can modify the termination condition in the implementation. Note that despite this early termination, each episode still learns from previous ones.
-
Q implements the Q-learning algorithm. It is a reinforcement learning algorithm. Each 'state' in the Q-table takes the form of a sequence σ of the buttons interacted so far. The table is incrementally populated as the algorithm proceeds. Like MTCS, Q is also online. So, when it is in a current state σ, possible actions again take the form of a sequence a =
navigateTo(o);intertact(o)
where o is a button. However, only buttons that are reachable from σ are considered. The direct reward of executing a is a maximum value if it solves the given testing task. Else it is 3C + O where C is the number of new button-door connections discovered by a and O the number of open doors in the new state minus the number of open doors in σ.Implementing class:
QAlg
As in MCTS, the implementation stops the algorithm as soon as a sequence of actions that completes the given task is found. You can modify the termination condition in the implementation. Note that despite this early termination, each episode still learns from previous ones.
All these algorithms are implemented to operate on top of automated navigation and exploration provided by another library called iv4xrDemo
, which in turn is based on a BDI agent programming framework called iv4xr/aplib.
By leveraging this auto-navigation and exploration, the algorithms can operate using 'actions' at a high level. E.g. algorithms only need to specify which button/door it wants to go and interact with; the underlying path-finding algorithm will guide the test agent to the target item, provided the item's location is known to the agent (e.g. it saw it few minutes ago), and the agent believes that the path to the item is clear (e.g. not blocked by a closed door, or a door the agent believes to be closed).
You need to first install the game Lab Recuits. You can get a pre-compiled executable from the github home of Lab Recruits. We need version 2.3.3. If the executables are not there anymore, then you will have to build the game yourself using Unity :) See the README of Lab Recruits for the specific version of Unity that you need.
In the project root, create a directory named gym
, if it is not already there. Then you need to put Lab Recruits' executable there:
(project root)
|-- gym
|-- Windows
| |-- bin
|-- Mac
|-- bin
|-- Linux
|-- bin
- Windows: put
LabRecruits.exe
and related files ingym/Windows/bin
. - Mac: put
LabRecruits.app
ingym/Mac/bin
. - Linux: put
LabRecruits
executable and related files ingym/Linux/bin
.
After this, you are good to go.
See the method createAnAlgorithm()
in the class STVRExperiment
. You can specify which algorithm you want to run, which Lab Recruits level you want to target, etc.
Invoking createAnAlgorithm()
returns an instance i of MyTestingAI
. Invoking i.exploreLRLogic()
will run the algorithm.