Merge pull request #262 from /issues/261-ohol-docs

WIP: Issues/261 ohol docs
Dallinger · Aug 16, 2023 · f5e511e · f5e511e
2 parents 338af7b + e524d66
commit f5e511e
Show file tree

Hide file tree

Showing 3 changed files with 196 additions and 16 deletions.
diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@
 [![codecov](https://codecov.io/gh/Dallinger/Griduniverse/branch/master/graph/badge.svg)](https://codecov.io/gh/Dallinger/Griduniverse)
 
 
-Reinforcement learning is an area of machine learning that considers the problem faced by a decision-maker in a setting partly under control of the environment. To illustrate the complexities of learning in even simple scenarios, researchers often turn to so-called “Gridworlds”, toy problems that nonetheless capture the rich difficulties that arise when learning in an uncertain world. By adjusting the state space (i.e., the grid), the set of actions available to the decision maker, the reward function, and the mapping between actions and states, a richly structured array of reinforcement learning problems can be generated — a Griduniverse, one might say. To design a successful reinforcement learning AI system, then, is to develop an algorithm that learns well across many such Gridworlds. Indeed, state-of-the-art reinforcement learning algorithms such as deep Q-networks, for example, have achieved professional-level performance across tens of video games from raw pixel input.  
+Reinforcement learning is an area of machine learning that considers the problem faced by a decision-maker in a setting partly under control of the environment. To illustrate the complexities of learning in even simple scenarios, researchers often turn to so-called “Gridworlds”, toy problems that nonetheless capture the rich difficulties that arise when learning in an uncertain world. By adjusting the state space (i.e., the grid), the set of actions available to the decision maker, the reward function, and the mapping between actions and states, a richly structured array of reinforcement learning problems can be generated — a Griduniverse, one might say. To design a successful reinforcement learning AI system, then, is to develop an algorithm that learns well across many such Gridworlds. Indeed, state-of-the-art reinforcement learning algorithms such as deep Q-networks, for example, have achieved professional-level performance across tens of video games from raw pixel input.
 
 Fig. 1. A small Gridworld, reprinted from Sutton & Barto (1998). At each time step, the agent selects a move (up, down, left, right) and receives the reward specified in the grid.
 
@@ -29,9 +29,23 @@ Players take on one of some number of distinguishable identities (NUM_IDENTITIES
 
 Fig. 3. Sample colors that could serve as distinguishable identities.
 
-### Objects
+### Other Grid Objects
 
-The world may contain non-player objects that are immovable (e.g., walls) or movable (e.g., blocks).
+The world may contain various other non-player objects, some of which may be interactive, provide "calories" or "points", or enable players to progress through the
+game in various ways, and some of which are inert and non-interactive (walls). Points contribute to the players overall score in the game.
+
+#### Walls
+
+Walls are labyrinth of immovable obstacles added to the grid when it's initially constructed. The density and contiguity of the labyrinth can by configured
+via configuration parameters (see below).
+
+#### Items
+
+Griduniverse provides a rich system for defining interactive and/or nutrition-providing "items" which will also be added to the world. In addition to defining
+properties of the items themselves (caloric value, whether than can be carried by players, whether they respawn automatically, etc.), experiment authors
+can also define transitions that execute when players interact with the item on the block they currently occupy, potentially in combination with an item
+they are carrying. For example, a player carrying a stone might be able to transform the stone into a more useful "sharp stone" by sharpening against a
+"large rock" that exists in the block they currently occupy. For more details, see [Items and Transitions](#items-and-transitions) below.
 
 ### Chatroom
 
@@ -481,6 +495,42 @@ Default is False.
 
 Which Bot class to run. Default: `RandomBot`.
 
+## Items and Transitions
+
+Griduniverse provides a configuration syntax
+(see [game_config.yml](./dlgr/griduniverse/game_config.yml)) for defining custom
+objects that will be added to the grid world, and transitions that can be triggered by
+players, either independently or in cooperation, that extract some value from the items
+they're interacting with, and transform items of one type into another type. This makes
+it possible for the experiment author to create pathways for techological evolution in
+the game. For example, a `wild_carrot_plant` may only yield a `wild_carrot` if the
+`wild_carrot` can be cut from the tree using a `sharpened_stone`, and only unsharpened
+`stone`s exist in the grid world's intial state. Two players might need to collaborate
+to sharpen a plain `stone` against a `big_hard_rock` to transition the `stone` into
+a `sharpened_stone`, which can then be used to harvest a `wild_carrot` from the `wild_carrot_plant`.
+
+Transitions are modeled as a pair of states: prior to the transition execution, and after
+the transition has finished. Each state has two sub-componenents: the item
+in the possesion of the player executing the transtion, and the item in the grid block
+they are currently occupy during the transition.
+
+Prior to transition execution:
+  - `actor_start` - the ID of the item the player must be holding for the transition to be available
+  - `target_start` - the ID of the item that must exist on the player's current grid block for the
+     transition to be available
+
+After transition execution:
+  - `actor_end` - the ID of the item that will exist in the player's hand after the transition
+     has executed
+  - `target_end` - the ID of the item left in the player's grid block after the transition executes
+
+Note that any of these values may be `null`. For example, a transition may result in the item
+in the player's current grid block to be consumed, leaving nothing behind.
+
+### Configuration
+
+See detailed explanations for each value for items and transitions on the item_defaults
+and transition_defaults definitions in [game_config.yml](./dlgr/griduniverse/game_config.yml).
 
 ## Griduniverse bots
 

diff --git a/dlgr/griduniverse/experiment.py b/dlgr/griduniverse/experiment.py
@@ -806,7 +806,7 @@ def spawn_item(self, position=None, item_id=None):
                         item_id = obj.get("item_id", 1)
 
         if not position:
-            position = self._random_empty_position(item_id)
+            position = self._find_empty_position(item_id)
 
         item_props = self.item_config[item_id]
         new_item = Item(
@@ -894,7 +894,7 @@ def spawn_player(self, id=None, **kwargs):
         """Spawn a player."""
         player = Player(
             id=id,
-            position=self._random_empty_position(player=True),
+            position=self._find_empty_position(player=True),
             num_possible_colors=self.num_colors,
             motion_speed_limit=self.motion_speed_limit,
             motion_cost=self.motion_cost,
@@ -912,9 +912,8 @@ def spawn_player(self, id=None, **kwargs):
         self._start_if_ready()
         return player
 
-    def _random_empty_position(self, item_id=None, player=False):
-        """Select an empty cell at random, using the configured probability
-        distribution."""
+    def _find_empty_position(self, item_id=None, player=False):
+        """Select an empty cell, using the configured probability distribution."""
         rows = self.rows
         columns = self.columns
         empty_cell = False

diff --git a/dlgr/griduniverse/game_config.yml b/dlgr/griduniverse/game_config.yml
@@ -1,6 +1,18 @@
 ---
 player_config:
+  # The distribution model used when spawning players on the grid.
+  # Griduniverse provides a pre-defined set of distribution options:
+  #    - random
+  #    - sinusoidal
+  #    - horizontal
+  #    - vertical
+  #    - edge_bias
+  #    - center_bias
+  #
+  # See the distributions.py module for their implementations
   probability_distribution: "random"
+
+  # Griduniverse uses different colors to represent player groups or teams
   available_colors:
     BLUE: [0.50, 0.86, 1.00]
     YELLOW: [1.00, 0.86, 0.50]
@@ -9,38 +21,157 @@ player_config:
     PURPLE: [0.85, 0.60, 0.85]
     TEAL: [0.77, 0.96, 0.90]
 
+
 item_defaults:
+  # Each item definition must include a unique item_id. This actual value doesn't matter,
+  # though an easily identifiable label can be helpful in developing and
+  # debugging your experiment.
   item_id: default
+
+  # How many instances of this item should the world initially include?
+  # If "respawn" (see below) is true, consumed items will be replaced to maintain
+  # this number of instances. Note that this value may increase over the course
+  # of a game depending on the values of seasonal_growth_rate and spawn_rate (see below).
   item_count: 8
+
+  # How many calories does a single instance provide when a player consumes it?
+  # In the simple case, the consuming player will get all the caloric benefit from
+  # items they consume, but there are also options for dividing this benefit among
+  # other players (see public_good_multiplier, below).
   calories: 0
+
+  # Can a player co-occupy the grid block this item is sitting on?
   crossable: true
-  interactive: false
+
+  # Does a player need to explicitly interact with this item via the action button
+  # when they co-occupy its grid block, or do the immediately consume the item
+  # without needing to take any explicit action?
+  interactive: true
+
+  # How rapidly this item progresses through its maturation lifecycle
   maturation_speed: 0.0
+
+  # Level of maturity ("ripeness") at which this item ready for consumption.
+  # Prior to reaching this threshold, a player may be able to co-occupy the same
+  # grid block, but not consume the item.
   maturation_threshold: 0.0
+
+  # Some items can be consumed (or otherwise used) more than once; for example,
+  # a berry bush might provide multiple "servings" of berries before it's exhausted.
+  # On the last use, a special transition may be triggered (see note on last_use
+  # in transition_defaults configuration below), which may transform this item into
+  # another. For example, a berry bush may be transformed into an empty berry bush,
+  # which might have different properties (different sprint, perhaps non-crossable,
+  # etc.)
   n_uses: 1
-  name: Food
+
+  # Friendly name of the item that may be displayed to players.
+  name: Generic Item
+
+  # Controls whether this item be "planted" (added to the gridworld) by the players
+  # themselves.
   plantable: false
+
+  # If this item is plantable (see above), specifies how many points/calories are
+  # deducted from the player's score each time they plant one.
   planting_cost: 1
+
+  # Controls whether this item be picked up and carried to another location by the player.
   portable: true
+
+  # The distribution model used when spawning instances of this item on the grid.
+  # Griduniverse provides a pre-defined set of distribution options:
+  #    - random
+  #    - sinusoidal
+  #    - horizontal
+  #    - vertical
+  #    - edge_bias
+  #    - center_bias
+  #
+  # See the distributions.py module for their implementations.
+  #
+  # To implement a custom distribution option, add a function to distributions.py,
+  # with a name following the pattern [some_name]_probability_distribution(), with
+  # a signature matching the other functions in the module (rows, columns, *args),
+  # and returning an two-item array of integers representing a [row, column] grid position.
+  #
+  # To use your custom distibution for an item, specify only the prefix portion as the
+  # configuration value here (if your function name is "amazing_probability_distribution",
+  # the value to use here would be "amazing").
   probability_distribution: "random"
-  public_good: 0.0
+
+  # Basis from computing calories credited to all *other players* when a player
+  # consumes an instance of this item. The credit will be equal to:
+  #     calories * public_good_multiplier / number of players
   public_good_multiplier: 0.0
+
+  # Controls whether a replacement of this same item should be immediately added to the
+  # gridworld when an existing item is consumed.
   respawn: false
+
+  # If the current number of instances of this item in the gridworld exceeds the
+  # configured item_count (because players are planting additional instances, for example),
+  # should we prune items to limit the total to item_count?
+  #
+  # Note that item_count is potentially dynamic, changing over time based on
+  # seasonal_growth_rate and spawn_rate (see below).
   limit_quantity: false
+
+  # Degree to which the quantity of this item should fluctuate based on "seasons"
+  # (expressed as alternating rounds of the game, so there are just two seasons).
+  # This value is an *exponential* multiplier.
   seasonal_growth_rate: 1.0
+
+  # At what rate should additional instances of this item be added to the gridworld?
+  # A rate of 1.0 means that the target number of items (item_count) will not grow over
+  # time, but a value greater than 1.0 will result in a steadily growing number of items
+  # of this type.
   spawn_rate: 1.0
-  sprite: "#8a9b0f,#7a6b54"
+
+  # Visual representation of this item in the UI.
+  # This value can be any of:
+  #    - A single hex color value, prefixed with "color:". Example: "color:#8a9b0f"
+  #    - A comma-separated pair of hex colors representing the items immature and mature
+  #      states (rendered color will be along a continuum between these colors based on
+  #      current maturity), also prefixed with "color:" Example: "color:#8a9b0f,#7a6b54"
+  #    - A unicode emoji, prefixed with "emoji:". Example: "emoji:🍓"
+  #    - The path of an image in within the images/ folder, prefixed with "image:".
+  #      Example ("image:sprites/strawberry.png")
+  sprite: "color:#8a9b0f,#7a6b54"
 
 transition_defaults:
-  visible: seen # Can be set to "never", "always", or "seen" for transitions that become
-                 # visible to a player after they have been executed for the first time
+  # Can be set to "never", "always", or "seen" for transitions that become
+  # visible to a player after they have been executed for the first time
+  visible: seen
+
+  # item_id for the item that will exist in the player's hand after the transition
+  # has executed
   actor_end: null
+
+  # item_id for the item that must be in the player's hand in order to execute
+  # the transition
   actor_start: null
+
+  # item_id for the item that will exist in the player's grid block after the transition
+  # has executed
   target_end: null
+
+  # item_id for the item that must exist at the player's current position in order
+  # to execute the transition
   target_start: null
+
+  # For items that have an n_uses value greater than 1, if last_use is true,
+  # the transition will be executed when the final use is exhausted. For example,
+  # a gooseberry bush with 5 uses could be transitioned to an empty bush when the
+  # last serving of berries has been harvested. In this case, the target_start
+  # would be the item_id of the gooseberry bush, and the target_end would be the
+  # item_id of the item representing the empty bush.
   last_use: false
-  modify_uses: [0, 0] # How should the number of uses for the actor and target
-                      # be changed by the transition.
+
+  # How should the number of uses for the actor and target be changed by the transition.
+  # These can be positive or negative integers: -1 would decrement n_uses, 1 would add
+  # an additional use.
+  modify_uses: [0, 0]
 
 items:
   # Legacy GU Food item