Skip to content
rferdouss edited this page Jan 19, 2023 · 14 revisions

RLbT

Welcome to the RLbT wiki!

RLbT is a tool that applies curiosity-based Reinforcement Learning for automated testing of games. RLbT represents high level actions in the game, such as interacting with an entity or moving to an entity, as possible choices for the RL agent and the corresponding reward is computed based on the resulting state after the action has been performed. RLbT does explorative testing to increase coverage of the game under test, hence the reward function tries to encourage the RL agent to explore as much new areas of the game as possible, hence the agent is encouraged to be curious. RLbT keeps track of the coverage information as well as additional data, such as the agent's positions for calculating spatial coverage, and reports it at the end of the exploration session.

RLbT works in two different modes: 1) single-agent mode and 2) multi-agent mode. In the single agent mode, a single player is deployed in the game and performs actions, while in the multi-agent mode multiple agents are deployed. In the later case, the agents are specialised into active and passive (observer) agents. The active agent (typically there would be one per session) performs the actions in the game environment while the passive agents observe the environment and report their observations to the active agent.

Document structure

First run

RLBT build is managed by Maven, and it should suffice to run the following command from the project root:

mvn clean package -DskipTests

which should perform a basic build and produce an executable jar iv4xr-rlbt-x.x-jar-with-dependencies.jar in the target folder. To have a flavour of RLBT (Single-agent mode) run the following command

java -jar target/iv4xr-rlbt-x.x-jar-with-dependencies.jar -trainingMode -burlapConfig src/test/resources/configurations/burlap_test.config -sutConfig src/test/resources/configurations/lrLevelSingleAgent.config.config

To have a flavour of RLBT (Multi-agent mode) run the following command

java -jar target/iv4xr-rlbt-x.x-jar-with-dependencies.jar -multiagentTrainingMode -burlapConfig src/test/resources/configurations/burlap_test.config -sutConfig src/test/resources/configurations/lrLevelMultiAgent.config

This launches RLBT on a simple example model distributed with the package. If all works correctly, the tool prints the log messages and creates the folder rlbt-files. The output files are stored in subfolder: results/$level-name$/$trainingmode$/$timestamp$.

RLBT output

By default, RLBT creates a folder named rlbt-files in the same location RLBT is run. Results of RLBT running on a $level-name$ on a specific training $mode$ on a specific $timestamp$ are stored inside Folder rlbt-files in the following sub-folder: results/$level-name$/$mode$/$timestamp$. RLBT produces the following output files:

  • episodesummary.csv: containing the summary of training episodes including
    • number of action performed in each episode
    • total reward/penalty obtained in each episode
    • entity coverage achieved per episode and globally : measured by the percentage of elements that the agent is able to interact with during the training session
    • time : training time

RLBT parameters

BURLAP Q-learning configuration file

BURLAP Q-learning configuration file is a text file where each line contains a parameter and its value separated by an equal (=) symbol. Available parameters are:

  • burlap.algorithm: only QLearning is available for now
  • burlap.qlearning.qinit: initial Q-value to use everywhere
  • burlap.qlearning.lr: value of learning rate. Learning rate determines to what extent newly acquired information overrides old information. A factor of 0 makes the agent learn nothing (exclusively exploiting prior knowledge), while a factor of 1 makes the agent consider only the most recent information (ignoring prior knowledge to explore possibilities).
  • burlap.qlearning.gamma: value for gamma (discount factor). It determines the importance of future rewards. A factor of 0 will make the agent only considering current rewards, while a factor approaching 1 will make it strive for a long-term high reward.
  • burlap.qlearning.epsilonval: value of epsilon parameter for Epsilon-Greedy algorithm. Here, Epsilon-Greedy method is used to balance exploration and exploitation while interacting with the Reinforcement Learning environment.
  • burlap.qlearning.out_qtable: path to the Q-table
  • burlap.num_of_episodes: number of episodes to run
  • burlap.max_update_cycles: number of steps in each episode

SUT configuration file

SUT configuration file is a text file where each line contains a parameter and its value separated by an equal (=) symbol. Depending on the training mode, specific SUT configuration file should be used.

  • labrecruits.level_name : name of the file containing the Lab Recruits level (without csv extension)

  • labrecruits.level_folder : folder where the Lab Recruits level is stored

  • labrecruits.execution_folder : Lab Recruits folder

  • labrecruits.use_graphics : whether or not to enable graphic

  • labrecruits.max_ticks_per_action : number of time to complete a goal

  • labrecruits.max_actions_per_episode : number of action in each BURLAP episode

  • labrecruits.target_entity_name : name of the entity in the level that the agent has to reach

  • labrecruits.target_entity_type : type of the entity (Switch or Door)

  • labrecruits.target_entity_property_name : name of the property of the entity that the agent has to check (isOn for a Switch and isOpen for the a Door)

  • labrecruits.target_entity_property_value : value of the property the agent has to check

  • labrecruits.search_mode : search mode is either 'CoverageOriented' or 'GoalOriented'. The RL agent aims to cover most entities in 'CoverageOriented' mode, while it tries to learn how to achieve a goal (such as, reaching a specific room) in 'GoalOriented' mode.

  • labrecruits.functionalCoverage :'true'/'false' to switch on/off the calculation of achieved functional coverage during training sessions

  • labrecruits.rewardtype : Reward type is either 'Sparse' or 'CuriousityDriven'. In 'Sparse' reward type follows a classic RL approach based on the intrinsic sparse reward received from the environment. 'CuriousityDriven' RL approach follows a reward mechanism that enables the agent to explore the space of interactions in the game.

  • For single-agent training mode, the name of the RL agent should be defined:

    • labrecruits.agent_id : name of the agent in the Lab Recruits level
  • For multi-agent training mode, Available parameters are:

    • labrecruits.agentpassive_id: name of the passive agent in the Lab Recruits level
    • labrecruits.agentactive_id: name of the active agent in the Lab Recruits level

References

[1] R. Ferdous, F. M. Kifetew, D. Prandi, A. Susi. Towards Agent-Based Testing of 3D Games using Reinforcement Learning. 37th IEEE/ACM International Conference on Automated Software Engineering 2022. [doi:10.1007/978-3-030-88106-1_5](https://dl.acm.org/doi/abs/10.1145/3551349.3560507).