Skip to content

Latest commit

 

History

History
622 lines (444 loc) · 18.6 KB

agents_and_ets.livemd

File metadata and controls

622 lines (444 loc) · 18.6 KB

State: Agent And ETS

Mix.install([
  {:jason, "~> 1.4"},
  {:kino, "~> 0.9", override: true},
  {:youtube, github: "brooklinjazz/youtube"},
  {:hidden_cell, github: "brooklinjazz/hidden_cell"}
])

Navigation

Review Questions

Upon completing this lesson, a student should be able to answer the following questions.

  • Why would you use an Agent instead of a GenServer?
  • Why might you use :ets instead of an Agent?
  • Why and how could you implement a cache?

State

State allows us to persist a value during the runtime of our application. We've seen how we can start a GenServer process which stores and manipulates its internal state.

Now we're going to learn about managing state using Agents and ETS (Erlang Term Storage)

Agents

Agents are a wrapper around state. They are less fully-featured and simpler than a GenServer. Unlike GenServers, they do not send and receive messages. Instead, the Agent module provides functions such as Agent.get/3 and Agent.update/3 to get and update the Agent process's state.

{:ok, counter_pid} = Agent.start_link(fn -> 0 end)
Agent.update(counter_pid, fn state -> state + 1 end)
Agent.get(counter_pid, fn state -> state end)

Often, we'll put Agent functions into a module to abstract how state is stored and manipulated and only expose the desired functionality.

defmodule Counter do
  use Agent

  def start_link(_opts) do
    Agent.start_link(fn -> 0 end)
  end

  def increment(pid) do
    Agent.update(pid, fn state -> state + 1 end)
  end

  def get_count(pid) do
    Agent.get(pid, fn state -> state end)
  end
end

{:ok, counter_pid} = Counter.start_link([])

:ok = Counter.increment(counter_pid)
:ok = Counter.increment(counter_pid)
2 = Counter.get_count(counter_pid)

Anything you can accomplish with an Agent, you could instead accomplish with a GenServer. However, using an Agent is more lightweight and effective if all you need is to manage state.

Erlang Term Storage (ETS)

We can use ETS tables for a convenient in-memory cache. :ets is a library provided by Erlang through the :ets module.

While an Agent is a simple solution for an in-memory cache best for small loads or non-concurrent operations, an :ets table generally performs better and can support concurrent read and write operations.

Generally, We can use :ets for a more performant in-memory key-value store.

:ets tables are much like a GenServer except designed explicitly for in-memory key-value storage.

flowchart
subgraph ETS
  Process[Process] --> State
  State --> KV[Key Value Storage]
end
Loading

Livebook allows us to view the :ets process as a table. Let's start a new :ets table called :example_table.

table = :ets.new(:example_table, [])

We can then insert values into the table with :ets.insert/2. We insert values as {key, value} tuples.

The key and the value can be nearly any value, but we'll often use an atom for the key.

Your Turn

Try changing the :key and value to see that the :ets table will store any key-value pair. However, the :ets table will only store one value for a given key.

:ets.insert(table, {:hello, "hi"})
kw_list = :ets.lookup(table, :key)

value = kw_list[:key]

Named Tables

:ets tables can be created with a :named_table option. This allows us to create the table and refer to them without the variable bound to their pid.

:ets.new(:my_table, [:named_table])

Now we can insert values to the table using the atom name :my_table instead of the PID.

:ets.insert(:my_table, {:key, "value"})

The same goes for looking up values.

:ets.lookup(:my_table, :key)

Your Turn

In the Elixir cell below, create a :super_heros :ets table. You should be able to insert super hero's and their secret identities.

:ets.insert(:super_heros, {"Spider Man", "Peter Parker"})

ETS Configuration

:ets Tables are configured with a Table Type and an Access Control.

Table Types

  • :set (default). One value per unique key.
  • :ordered_set One value per unique key, ordered by Elixir terms.
  • :bag Many values per key, but only one instance of each value per key.
  • :duplicate_bag Many values per key with duplicates allowed.

Access Control

  • :protected (default) Read from all process. Write allowed only for the parent process.
  • :public Read/Write available from all processes.
  • :private Read/Write allowed only for the parent process.

By default, :ets tables use the :set and :protected configuration values. So we may include or exclude them when starting an :ets table, and it does not have any effect.

default_table = :ets.new(:default_table, [:set, :protected])
:ets.insert(default_table, {:key, "value"})

# We Return The Default Table To Display The Table In Livebook.
default_table

With :protected access control, the :ets raises an ArgumentError if another process attempts to write to it.

Uncomment and execute this code. Notice that the child process crashes.

# Task.start(fn ->
# :ets.insert(default_table, {:new_key, "new value"})
# End)

However, reading from other processes is allowed.

Task.start(fn ->
  :ets.lookup(default_table, :key) |> IO.inspect(label: "lookup result")
end)

:public

A public table can be read and written from any process, so the following no longer crashes.

public = :ets.new(:public_example, [:public])

Task.start(fn ->
  :ets.insert(public, {:key, "value"})
end)

# We Return The Public Table To Display It In Livebook.
public

:bag

A :bag table type allows for multiple keys but not the same value under the same key.

bag = :ets.new(:bag_example, [:bag])

:ets.insert(bag, {:key, "duplicate value"})
:ets.insert(bag, {:key, "duplicate value"})
:ets.insert(bag, {:key, "non-duplicate value"})

# We Return The Bag Table To Display It In Livebook.
bag
:ets.lookup(bag, :key)

:duplicate_bag

:duplicate_bag allows for duplicate keys with the same value.

bag = :ets.new(:bag_example, [:duplicate_bag])

:ets.insert(bag, {:key, "duplicate value"})
:ets.insert(bag, {:key, "duplicate value"})
:ets.insert(bag, {:key, "non-duplicate value"})

# We Return The Bag Table To Display It In Livebook.
bag

Your Turn

In the Elixir cell below, use the :ordered_set and :private configuration values to make an ordered and private :ets table.

Demonstrate that the table will not allow read/write operations to the :ets table from another process.

Example Solution
table = :ets.new(:example, [:private, :ordered_set])

# Writing From The Owner Process Is Allowed.
:ets.insert(table, {:key, "value"})

# The Following Will Cause A Crash.
spawn(fn ->
  :ets.insert(table, {:key, "value"})
end)

Race Conditions

Concurrency is a powerful tool, but it introduces a new class of bugs. For example, race conditions are a common problem.

A race condition occurs when events of operations occur in an unexpected order.

Below we have an :ets table, which stores a count. Two tasks read the current count and then increment it. We use Process.sleep(100) to simulate a time-consuming computation, then insert a new count.

If these tasks were synchronous, we would expect the count to increment twice and return two. However, when both of these operations are concurrent, both tasks read the count when it's zero, then increment the count to one. They then save the resulting count in the :ets table. As a result, the count is one rather than the expected value of two. This is a classic issue you'll often run into when working with concurrency.

sequenceDiagram
ETS Table->>ETS Table: sets count = 0
    ETS Table->>Task 1: reads count (0)
    ETS Table->>Task 2: reads count (0)
    Task 1->>ETS Table: sets count = 1
    Task 2->>ETS Table: sets count = 1
Loading
activate Child Process
Child Process-->>Parent Process: pid
deactivate Child Process
table = :ets.new(:concurrent, [:public])

:ets.insert(table, {:count, 0})

increment_task1 =
  Task.async(fn ->
    [count: count] = :ets.lookup(table, :count)
    Process.sleep(100)
    :ets.insert(table, {:count, count + 1})
  end)

increment_task2 =
  Task.async(fn ->
    [count: count] = :ets.lookup(table, :count)
    Process.sleep(100)
    :ets.insert(table, {:count, count + 1})
  end)

Task.await(increment_task1)
Task.await(increment_task2)

table

For an overview of race conditions and how large systems deal with many concurrent operations, there is an excellent video by Tom Scott.

YouTube.new("https://www.youtube.com/watch?v=RY_2gElt3SA")

Caches

A cache allows us to store data to avoid computing values.

We commonly use caches to avoid performance-demanding functions or re-retrieving external resources.

flowchart LR
input --> e[expensive computation] --> output
Loading

A cache can store the expected output for a given input to a function for quicker access.

flowchart
input --> cache --cached value--> output
cache --no cached value --> e[expensive computation] --> output
Loading

Some caches are smart and save newly computed values in the cache to avoid recomputing the same value.

Other caches may be static or infrequently changing data that occasionally updates in the background.

Let's take a slow implementation of the fibonacci sequence $fib(n) = fib(n - 1) + fib(n-2)$ as an example. Currently, Fib will recompute many of the same inputs. In the example below, we print every computation of fib(n). Notice that many of the same inputs repeat.

defmodule SlowFib do
  def of(0), do: 0
  def of(1), do: 1

  def of(n) do
    IO.puts("Computing fib(#{n})")
    of(n - 1) + of(n - 2)
  end
end

SlowFib.of(10)

We can improve performance by saving each input/output in a cache so we don't have to re-compute the same values over and over. You can think of this cache as a table of inputs and outputs. For each input, we store the computed output rather than re-performing the calculation.

defmodule Fib do
  def get(n) do
    sequence =
      Stream.unfold({1, 1}, fn {a, b} ->
        {a, {b, a + b}}
      end)
      |> Enum.take(n)

    [0 | sequence]
  end
end

Fib.get(150)
|> Enum.with_index()
|> Enum.map(fn {value, index} -> %{input: index, output: value} end)
|> Kino.DataTable.new()

For any already computed input, the cache will return the cached output. This significantly improves our performance at the cost of memory and initial setup time.

However, be aware that caching can lead to issues and should be well considered. The cache permanently takes up memory in your program, so there's a considerable memory cost. Often it's best to consider how you can improve the performance of your application rather than immediately reaching for a cache.

For example, a slow Fibonacci function with a cache is no replacement for a fast Fibonacci function!

There are many ways to implement a cache. For example GenServer, Agent, and :ets all expose state which could be used as a cache. We can even use module attributes for simple static caches.

Here's an example cache using Agent which can get and set values.

defmodule AgentCache do
  use Agent

  def start_link(opts \\ []) do
    Agent.start_link(fn -> %{} end, opts)
  end

  def get(pid, input) do
    Agent.get(pid, fn state -> state[input] end)
  end

  def set(pid, input, output) do
    Agent.update(pid, fn state -> Map.put(state, input, output) end)
  end
end

We can use this cache to store values during the runtime of our application. For example, here's how we can use this cache with our Fib module. Notice we no longer re-compute already cached values.

# Named Process Simplifies Accessing The Cache.
AgentCache.start_link(name: :fib_cache)
# Resetting Cache To Preserve The Example.
Agent.update(:fib_cache, fn state -> %{} end)

defmodule CachedFib do
  def of(0), do: 0
  def of(1), do: 1

  def of(n) do
    cached_value = AgentCache.get(:fib_cache, n)

    if cached_value do
      cached_value
    else
      IO.puts("Computing fib(#{n})")
      result = of(n - 1) + of(n - 2)
      AgentCache.set(:fib_cache, n, result)
      result
    end
  end
end

CachedFib.of(100)

Your Turn

Compare the SlowFib module with the CachedFib module. What value of n can they both handle before they are no longer performant?

Depending on your computer, we recommend you don't use SlowFib with a value larger than 25 or so otherwise you might crash this Livebook!

However, CachedFib should handle larger numbers.

Your Turn

Use GenServer to create a cache.

Example Solution
defmodule GenServerCache do
  use GenServer

  def start_link(opts) do
    GenServer.start_link(__MODULE__, [], opts)
  end

  def get(pid, key) do
    GenServer.call(pid, {:get, key})
  end

  def set(pid, key, value) do
    GenServer.call(pid, {:set, key, value})
  end

  @impl true
  def init(_opts) do
    {:ok, %{}}
  end

  @impl true
  def handle_call({:get, key}, _from, state) do
    {:reply, state[key], state}
  end

  def handle_call({:set, key, value}, _from, state) do
    {:reply, :ok, Map.put(state, key, value)}
  end
end
defmodule GenServerCache do
  use GenServer
end

Your Turn

Use :ets to create a cache.

Example Solution
defmodule ETSCache do
  def new(opts \\ []) do
    :ets.new(:cache, opts)
  end

  def set(ref, key, value) do
    :ets.insert(ref, {key, value})
  end

  def get(ref, key) do
    case :ets.lookup(ref, key) do
      [{_key, value}] -> value
      _ -> nil
    end
  end
end
defmodule ETSCache do
end

Further Reading

Consider the following resource(s) to deepen your understanding of the topic.

Commit Your Progress

DockYard Academy now recommends you use the latest Release rather than forking or cloning our repository.

Run git status to ensure there are no undesirable changes. Then run the following in your command line from the curriculum folder to commit your progress.

$ git add .
$ git commit -m "finish State: Agent And ETS reading"
$ git push

We're proud to offer our open-source curriculum free of charge for anyone to learn from at their own pace.

We also offer a paid course where you can learn from an instructor alongside a cohort of your peers. We will accept applications for the June-August 2023 cohort soon.

Navigation