Skip to content

Latest commit

 

History

History
 
 

examples

Examples

This directory includes a number of working examples of Acme agents. These examples are not meant to be comprehensive and instead show a number of common use cases under which Acme agents can be applied.

Our quickstart guide can be used to get running quickly. This notebook will show how to instantiate a simple agent and run it on an environment. You can also take a look at our tutorial, which takes a more in-depth look at the construction of the D4PG agent. This also highlights the general structure of most Acme agents which applies more broadly to all agents implemented in Acme.

Continuous control

We include a number of agents running on continuous control tasks. These agents are representative examples, but any continuous control algorithm implemented in Acme should be able to be swapped in.

Note that many of the examples, particularly those based on the DeepMind Control Suite, will require a MuJoCo license in order to run. See our tutorial for more details or see refer to the dm_control repository for further information.

  • D4PG: a deterministic policy gradient (D4PG) agent which includes a determinstic policy and a distributional critic running on the DeepMind Control Suite or the OpenAI Gym. By default it runs on the "half cheetah" environment from the OpenAI Gym.
  • MPO: a maximum-a-posterior policy optimization agent which combines both a distributional critic and a stochastic policy.

Discrete agents (Atari)

The development of the Arcade Learning environment and the coinciding use of Atari as a benchmark has played a very prominent role in the modern usage and testing of reinforcement learning algorithms. As a result we've also included direct examples of prominent discrete-action algorithms implemented in Acme and running on this environment.

  • DQN: a "classic" benchmark agent for Atari; and

Offline agents

Acme includes examples of offline agents, i.e. agents trained using external data generated by another agent:

  • BC: a behaviour cloning agent.
  • BC (JAX): a behaviour cloning agent (implemented in jax).
  • BCQ: an implementation of BCQ.

Similarly we also include so-called "from demonstration" agents which mix offline and online data:

  • DQfD: the DQfD agent running on hard-exploration tasks within bsuite (e.g. deep sea) using demonstration data; and

Behaviour Suite

The Behaviour Suite for Reinforcement Learning defines a collection of tasks and environments which collectively investigate core capabilities of RL algorithms across a number of different axes. The examples we include show how to run Acme agents on this suite.

  • DQN: an off-policy DQN examples;
  • Impala: an on-policy Impala agent; and
  • MCTS: a model-based agent running on the task suite using either a simulator of the environment or a learned model.

For more information see https://github.com/deepmind/bsuite.