Skip to content

Commit

Permalink
Merge pull request #262 from /issues/261-ohol-docs
Browse files Browse the repository at this point in the history
WIP: Issues/261 ohol docs
  • Loading branch information
jessesnyder authored Aug 16, 2023
2 parents 338af7b + e524d66 commit f5e511e
Show file tree
Hide file tree
Showing 3 changed files with 196 additions and 16 deletions.
56 changes: 53 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![codecov](https://codecov.io/gh/Dallinger/Griduniverse/branch/master/graph/badge.svg)](https://codecov.io/gh/Dallinger/Griduniverse)


Reinforcement learning is an area of machine learning that considers the problem faced by a decision-maker in a setting partly under control of the environment. To illustrate the complexities of learning in even simple scenarios, researchers often turn to so-called “Gridworlds”, toy problems that nonetheless capture the rich difficulties that arise when learning in an uncertain world. By adjusting the state space (i.e., the grid), the set of actions available to the decision maker, the reward function, and the mapping between actions and states, a richly structured array of reinforcement learning problems can be generated — a Griduniverse, one might say. To design a successful reinforcement learning AI system, then, is to develop an algorithm that learns well across many such Gridworlds. Indeed, state-of-the-art reinforcement learning algorithms such as deep Q-networks, for example, have achieved professional-level performance across tens of video games from raw pixel input.
Reinforcement learning is an area of machine learning that considers the problem faced by a decision-maker in a setting partly under control of the environment. To illustrate the complexities of learning in even simple scenarios, researchers often turn to so-called “Gridworlds”, toy problems that nonetheless capture the rich difficulties that arise when learning in an uncertain world. By adjusting the state space (i.e., the grid), the set of actions available to the decision maker, the reward function, and the mapping between actions and states, a richly structured array of reinforcement learning problems can be generated — a Griduniverse, one might say. To design a successful reinforcement learning AI system, then, is to develop an algorithm that learns well across many such Gridworlds. Indeed, state-of-the-art reinforcement learning algorithms such as deep Q-networks, for example, have achieved professional-level performance across tens of video games from raw pixel input.

Fig. 1. A small Gridworld, reprinted from Sutton & Barto (1998). At each time step, the agent selects a move (up, down, left, right) and receives the reward specified in the grid.

Expand All @@ -29,9 +29,23 @@ Players take on one of some number of distinguishable identities (NUM_IDENTITIES

Fig. 3. Sample colors that could serve as distinguishable identities.

### Objects
### Other Grid Objects

The world may contain non-player objects that are immovable (e.g., walls) or movable (e.g., blocks).
The world may contain various other non-player objects, some of which may be interactive, provide "calories" or "points", or enable players to progress through the
game in various ways, and some of which are inert and non-interactive (walls). Points contribute to the players overall score in the game.

#### Walls

Walls are labyrinth of immovable obstacles added to the grid when it's initially constructed. The density and contiguity of the labyrinth can by configured
via configuration parameters (see below).

#### Items

Griduniverse provides a rich system for defining interactive and/or nutrition-providing "items" which will also be added to the world. In addition to defining
properties of the items themselves (caloric value, whether than can be carried by players, whether they respawn automatically, etc.), experiment authors
can also define transitions that execute when players interact with the item on the block they currently occupy, potentially in combination with an item
they are carrying. For example, a player carrying a stone might be able to transform the stone into a more useful "sharp stone" by sharpening against a
"large rock" that exists in the block they currently occupy. For more details, see [Items and Transitions](#items-and-transitions) below.

### Chatroom

Expand Down Expand Up @@ -481,6 +495,42 @@ Default is False.

Which Bot class to run. Default: `RandomBot`.

## Items and Transitions

Griduniverse provides a configuration syntax
(see [game_config.yml](./dlgr/griduniverse/game_config.yml)) for defining custom
objects that will be added to the grid world, and transitions that can be triggered by
players, either independently or in cooperation, that extract some value from the items
they're interacting with, and transform items of one type into another type. This makes
it possible for the experiment author to create pathways for techological evolution in
the game. For example, a `wild_carrot_plant` may only yield a `wild_carrot` if the
`wild_carrot` can be cut from the tree using a `sharpened_stone`, and only unsharpened
`stone`s exist in the grid world's intial state. Two players might need to collaborate
to sharpen a plain `stone` against a `big_hard_rock` to transition the `stone` into
a `sharpened_stone`, which can then be used to harvest a `wild_carrot` from the `wild_carrot_plant`.

Transitions are modeled as a pair of states: prior to the transition execution, and after
the transition has finished. Each state has two sub-componenents: the item
in the possesion of the player executing the transtion, and the item in the grid block
they are currently occupy during the transition.

Prior to transition execution:
- `actor_start` - the ID of the item the player must be holding for the transition to be available
- `target_start` - the ID of the item that must exist on the player's current grid block for the
transition to be available

After transition execution:
- `actor_end` - the ID of the item that will exist in the player's hand after the transition
has executed
- `target_end` - the ID of the item left in the player's grid block after the transition executes

Note that any of these values may be `null`. For example, a transition may result in the item
in the player's current grid block to be consumed, leaving nothing behind.

### Configuration

See detailed explanations for each value for items and transitions on the item_defaults
and transition_defaults definitions in [game_config.yml](./dlgr/griduniverse/game_config.yml).

## Griduniverse bots

Expand Down
9 changes: 4 additions & 5 deletions dlgr/griduniverse/experiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -806,7 +806,7 @@ def spawn_item(self, position=None, item_id=None):
item_id = obj.get("item_id", 1)

if not position:
position = self._random_empty_position(item_id)
position = self._find_empty_position(item_id)

item_props = self.item_config[item_id]
new_item = Item(
Expand Down Expand Up @@ -894,7 +894,7 @@ def spawn_player(self, id=None, **kwargs):
"""Spawn a player."""
player = Player(
id=id,
position=self._random_empty_position(player=True),
position=self._find_empty_position(player=True),
num_possible_colors=self.num_colors,
motion_speed_limit=self.motion_speed_limit,
motion_cost=self.motion_cost,
Expand All @@ -912,9 +912,8 @@ def spawn_player(self, id=None, **kwargs):
self._start_if_ready()
return player

def _random_empty_position(self, item_id=None, player=False):
"""Select an empty cell at random, using the configured probability
distribution."""
def _find_empty_position(self, item_id=None, player=False):
"""Select an empty cell, using the configured probability distribution."""
rows = self.rows
columns = self.columns
empty_cell = False
Expand Down
147 changes: 139 additions & 8 deletions dlgr/griduniverse/game_config.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,18 @@
---
player_config:
# The distribution model used when spawning players on the grid.
# Griduniverse provides a pre-defined set of distribution options:
# - random
# - sinusoidal
# - horizontal
# - vertical
# - edge_bias
# - center_bias
#
# See the distributions.py module for their implementations
probability_distribution: "random"

# Griduniverse uses different colors to represent player groups or teams
available_colors:
BLUE: [0.50, 0.86, 1.00]
YELLOW: [1.00, 0.86, 0.50]
Expand All @@ -9,38 +21,157 @@ player_config:
PURPLE: [0.85, 0.60, 0.85]
TEAL: [0.77, 0.96, 0.90]


item_defaults:
# Each item definition must include a unique item_id. This actual value doesn't matter,
# though an easily identifiable label can be helpful in developing and
# debugging your experiment.
item_id: default

# How many instances of this item should the world initially include?
# If "respawn" (see below) is true, consumed items will be replaced to maintain
# this number of instances. Note that this value may increase over the course
# of a game depending on the values of seasonal_growth_rate and spawn_rate (see below).
item_count: 8

# How many calories does a single instance provide when a player consumes it?
# In the simple case, the consuming player will get all the caloric benefit from
# items they consume, but there are also options for dividing this benefit among
# other players (see public_good_multiplier, below).
calories: 0

# Can a player co-occupy the grid block this item is sitting on?
crossable: true
interactive: false

# Does a player need to explicitly interact with this item via the action button
# when they co-occupy its grid block, or do the immediately consume the item
# without needing to take any explicit action?
interactive: true

# How rapidly this item progresses through its maturation lifecycle
maturation_speed: 0.0

# Level of maturity ("ripeness") at which this item ready for consumption.
# Prior to reaching this threshold, a player may be able to co-occupy the same
# grid block, but not consume the item.
maturation_threshold: 0.0

# Some items can be consumed (or otherwise used) more than once; for example,
# a berry bush might provide multiple "servings" of berries before it's exhausted.
# On the last use, a special transition may be triggered (see note on last_use
# in transition_defaults configuration below), which may transform this item into
# another. For example, a berry bush may be transformed into an empty berry bush,
# which might have different properties (different sprint, perhaps non-crossable,
# etc.)
n_uses: 1
name: Food

# Friendly name of the item that may be displayed to players.
name: Generic Item

# Controls whether this item be "planted" (added to the gridworld) by the players
# themselves.
plantable: false

# If this item is plantable (see above), specifies how many points/calories are
# deducted from the player's score each time they plant one.
planting_cost: 1

# Controls whether this item be picked up and carried to another location by the player.
portable: true

# The distribution model used when spawning instances of this item on the grid.
# Griduniverse provides a pre-defined set of distribution options:
# - random
# - sinusoidal
# - horizontal
# - vertical
# - edge_bias
# - center_bias
#
# See the distributions.py module for their implementations.
#
# To implement a custom distribution option, add a function to distributions.py,
# with a name following the pattern [some_name]_probability_distribution(), with
# a signature matching the other functions in the module (rows, columns, *args),
# and returning an two-item array of integers representing a [row, column] grid position.
#
# To use your custom distibution for an item, specify only the prefix portion as the
# configuration value here (if your function name is "amazing_probability_distribution",
# the value to use here would be "amazing").
probability_distribution: "random"
public_good: 0.0

# Basis from computing calories credited to all *other players* when a player
# consumes an instance of this item. The credit will be equal to:
# calories * public_good_multiplier / number of players
public_good_multiplier: 0.0

# Controls whether a replacement of this same item should be immediately added to the
# gridworld when an existing item is consumed.
respawn: false

# If the current number of instances of this item in the gridworld exceeds the
# configured item_count (because players are planting additional instances, for example),
# should we prune items to limit the total to item_count?
#
# Note that item_count is potentially dynamic, changing over time based on
# seasonal_growth_rate and spawn_rate (see below).
limit_quantity: false

# Degree to which the quantity of this item should fluctuate based on "seasons"
# (expressed as alternating rounds of the game, so there are just two seasons).
# This value is an *exponential* multiplier.
seasonal_growth_rate: 1.0

# At what rate should additional instances of this item be added to the gridworld?
# A rate of 1.0 means that the target number of items (item_count) will not grow over
# time, but a value greater than 1.0 will result in a steadily growing number of items
# of this type.
spawn_rate: 1.0
sprite: "#8a9b0f,#7a6b54"

# Visual representation of this item in the UI.
# This value can be any of:
# - A single hex color value, prefixed with "color:". Example: "color:#8a9b0f"
# - A comma-separated pair of hex colors representing the items immature and mature
# states (rendered color will be along a continuum between these colors based on
# current maturity), also prefixed with "color:" Example: "color:#8a9b0f,#7a6b54"
# - A unicode emoji, prefixed with "emoji:". Example: "emoji:🍓"
# - The path of an image in within the images/ folder, prefixed with "image:".
# Example ("image:sprites/strawberry.png")
sprite: "color:#8a9b0f,#7a6b54"

transition_defaults:
visible: seen # Can be set to "never", "always", or "seen" for transitions that become
# visible to a player after they have been executed for the first time
# Can be set to "never", "always", or "seen" for transitions that become
# visible to a player after they have been executed for the first time
visible: seen

# item_id for the item that will exist in the player's hand after the transition
# has executed
actor_end: null

# item_id for the item that must be in the player's hand in order to execute
# the transition
actor_start: null

# item_id for the item that will exist in the player's grid block after the transition
# has executed
target_end: null

# item_id for the item that must exist at the player's current position in order
# to execute the transition
target_start: null

# For items that have an n_uses value greater than 1, if last_use is true,
# the transition will be executed when the final use is exhausted. For example,
# a gooseberry bush with 5 uses could be transitioned to an empty bush when the
# last serving of berries has been harvested. In this case, the target_start
# would be the item_id of the gooseberry bush, and the target_end would be the
# item_id of the item representing the empty bush.
last_use: false
modify_uses: [0, 0] # How should the number of uses for the actor and target
# be changed by the transition.

# How should the number of uses for the actor and target be changed by the transition.
# These can be positive or negative integers: -1 would decrement n_uses, 1 would add
# an additional use.
modify_uses: [0, 0]

items:
# Legacy GU Food item
Expand Down

0 comments on commit f5e511e

Please sign in to comment.