Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User guide #364

Merged
merged 91 commits into from
Feb 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
5ca37a4
Add small user guide intro
KohlerHECTOR Jul 24, 2023
de0678f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 24, 2023
ad0b97b
Some draft of a plan for the user guide
KohlerHECTOR Jul 24, 2023
4b8a535
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 24, 2023
a43fe65
some text
KohlerHECTOR Jul 24, 2023
b97174e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 24, 2023
1015af9
Merge remote-tracking branch 'origin/KohlerHECTOR/issue325' into User…
JulienT01 Sep 19, 2023
69cd7bc
Merge remote-tracking branch 'origin/main' into User_guide
JulienT01 Sep 19, 2023
f0d2076
update userguide installation page : remove ffmpeg(added in setup.py)…
JulienT01 Sep 21, 2023
c002585
update CSS to distinguish "code" from "output"
JulienT01 Sep 22, 2023
047ff8d
rename agentManager -> experimentManager
JulienT01 Sep 22, 2023
2831f03
update CSS to distinguish "code" from "output" with markDown
JulienT01 Sep 22, 2023
a119c0e
updating quickstart
JulienT01 Sep 22, 2023
2570e22
updating quickstart doc from .rst to .md
JulienT01 Sep 27, 2023
668aa30
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 27, 2023
89aff8c
fix pandoc mistakes
JulienT01 Sep 27, 2023
9409244
updating TutorialDeepRL doc from .rst to .md
JulienT01 Sep 27, 2023
ac592fa
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 27, 2023
0d8e81c
add warning about seeding
JulienT01 Sep 29, 2023
dfa2e2a
adding temporarily userguide2 with a link from index
JulienT01 Oct 3, 2023
7b6e146
adding 'environment' page
JulienT01 Oct 3, 2023
1eb6443
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 3, 2023
f52aa3d
updating comments
JulienT01 Oct 3, 2023
4854537
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 3, 2023
c2c6914
add basewrapper to the doc "Environments"
JulienT01 Oct 3, 2023
069b98d
updating video breakout
JulienT01 Oct 3, 2023
5f7f1a5
add empty experimentManager page
JulienT01 Oct 11, 2023
922b114
change path for environment.md
JulienT01 Oct 11, 2023
fadbdb6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 11, 2023
ba55916
Add experimentManager userguide page
JulienT01 Oct 13, 2023
fc0e90a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 13, 2023
5ebff6b
typo correction
JulienT01 Oct 13, 2023
57f0168
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 13, 2023
db9d43d
add Agent in new userguide
JulienT01 Oct 17, 2023
71cb78d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 17, 2023
9e7fef8
add logging page
JulienT01 Nov 14, 2023
5c85c5b
Merge branch 'main' into User_guide
JulienT01 Nov 14, 2023
cb71225
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 14, 2023
72452c4
update tests to be run
JulienT01 Nov 14, 2023
f1fb20e
Merge branch 'User_guide' of github.com:rlberry-py/rlberry into User_…
JulienT01 Nov 14, 2023
7f2ecbd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 14, 2023
ea87202
Merge remote-tracking branch 'origin/main' into User_guide
JulienT01 Nov 22, 2023
b8a198a
move import inside def _make_tuple_env(env):
JulienT01 Nov 24, 2023
2a458d7
update the 2 quickstarts with the new module separation (research/sco…
JulienT01 Nov 24, 2023
bef16c0
Update the (begin of the) userguide with the new module separation (r…
JulienT01 Nov 24, 2023
f8ba387
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 24, 2023
790c9f9
Update the example with the new module separation (research/scool/main)
JulienT01 Nov 24, 2023
8bf748e
update Dummy agent to improve coverage
JulienT01 Nov 28, 2023
b6a0c11
update introduction from Userguide
JulienT01 Nov 28, 2023
320b2f3
add link to the userguide
JulienT01 Nov 29, 2023
ebb7451
add initial userguide for seeding
JulienT01 Nov 30, 2023
c66f85f
add ExperimentManager example with(out) seed
JulienT01 Nov 30, 2023
86274a7
word update
JulienT01 Nov 30, 2023
ca573ac
word update
JulienT01 Nov 30, 2023
fc7cb11
update poetry after update on rlberry-scool
JulienT01 Dec 4, 2023
5f0b5a9
add save/load experiment user guide
JulienT01 Dec 5, 2023
fd7528d
Merge remote-tracking branch 'origin/main' into User_guide
JulienT01 Dec 12, 2023
bbdb73c
Merge remote-tracking branch 'main' into User_guide
JulienT01 Dec 15, 2023
ff6a71f
UCBVI in now in rlberry_research
JulienT01 Jan 16, 2024
c73607f
Merge remote-tracking branch 'origin/main' into User_guide
JulienT01 Jan 16, 2024
66880e6
first corrections from review
JulienT01 Feb 14, 2024
cc505c3
update link and typo
JulienT01 Feb 14, 2024
bc7a9f0
update "center alignment" for images
JulienT01 Feb 14, 2024
61cea45
typo
JulienT01 Feb 14, 2024
2fb3827
update text on comparing agents
JulienT01 Feb 14, 2024
3efb754
add link to comparison page
JulienT01 Feb 14, 2024
f8493c0
typo
JulienT01 Feb 14, 2024
4909992
update from review
JulienT01 Feb 14, 2024
67ed9ea
update "notes" icon
JulienT01 Feb 15, 2024
03cc647
typo
JulienT01 Feb 15, 2024
714e36f
add informations about gymnasium
JulienT01 Feb 15, 2024
b436a1d
Merge remote-tracking branch 'origin/main' into User_guide
JulienT01 Feb 15, 2024
0fd46d0
update images and visualisation part
JulienT01 Feb 15, 2024
9d96258
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 15, 2024
baca4e7
switch old and new user guide
JulienT01 Feb 15, 2024
b3f6246
update following review
JulienT01 Feb 15, 2024
7d3f9d6
typo
JulienT01 Feb 15, 2024
9116152
update new visualisation
JulienT01 Feb 16, 2024
660f114
update following review
JulienT01 Feb 16, 2024
e242752
update from review
JulienT01 Feb 16, 2024
e20a5fe
update from review
JulienT01 Feb 16, 2024
d129fc3
typo
JulienT01 Feb 16, 2024
768ae95
typo
JulienT01 Feb 16, 2024
965e29e
Merge branch 'main' into User_guide
TimotheeMathieu Feb 20, 2024
e376bf3
change .rst to .md
JulienT01 Feb 21, 2024
adce03c
rework of the index
JulienT01 Feb 21, 2024
d1fb937
more readable userguide
JulienT01 Feb 21, 2024
398dc0f
typo
JulienT01 Feb 21, 2024
658997a
rework of the index
JulienT01 Feb 21, 2024
8618221
Merge remote-tracking branch 'origin/main' into User_guide
JulienT01 Feb 23, 2024
8d607c7
typo
JulienT01 Feb 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added docs/_video/user_guide_video/_env_page_chain.mp4
Binary file not shown.
Binary file not shown.
1 change: 1 addition & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ Base class
:template: class.rst

envs.interface.Model
envs.basewrapper.Wrapper

Spaces
------
Expand Down
581 changes: 581 additions & 0 deletions docs/basics/DeepRLTutorial/TutorialDeepRL.md

Large diffs are not rendered by default.

591 changes: 0 additions & 591 deletions docs/basics/DeepRLTutorial/TutorialDeepRL.rst

This file was deleted.

Binary file modified docs/basics/DeepRLTutorial/output_10_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/basics/DeepRLTutorial/output_6_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/basics/DeepRLTutorial/output_9_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/basics/comparison.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
(comparison_page)=

# Comparison of Agents

Expand Down
Binary file added docs/basics/quick_start_rl/Figure_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/basics/quick_start_rl/Figure_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/basics/quick_start_rl/Figure_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/basics/quick_start_rl/agent_manager_diagram.png
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/basics/quick_start_rl/output_10_1.png
Binary file not shown.
Binary file removed docs/basics/quick_start_rl/output_19_0.png
Binary file not shown.
356 changes: 356 additions & 0 deletions docs/basics/quick_start_rl/quickstart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,356 @@
(quick_start)=

Quick Start for Reinforcement Learning in rlberry
=================================================

$$\def\CC{\bf C}
\def\QQ{\bf Q}
\def\RR{\bf R}
\def\ZZ{\bf Z}
\def\NN{\bf N}$$

Importing required libraries
----------------------------

```python
import numpy as np
import pandas as pd
import time
from rlberry.agents import AgentWithSimplePolicy
from rlberry_research.agents import UCBVIAgent
from rlberry_research.envs import Chain
TimotheeMathieu marked this conversation as resolved.
Show resolved Hide resolved
from rlberry.manager import (
ExperimentManager,
evaluate_agents,
plot_writer_data,
read_writer_data,
)
from rlberry.wrappers import WriterWrapper
```

Choosing an RL environment
--------------------------

In this tutorial, we will use the Chain(from [rlberry_scool](https://github.com/rlberry-py/rlberry-scool))
environment, which is a very simple environment where the agent has to
go from one end of a chain to the other end.

```python
env_ctor = Chain
env_kwargs = dict(L=10, fail_prob=0.1)
# chain of length 10. With proba 0.1, the agent will not be able to take the action it wants to take.
env = env_ctor(**env_kwargs)
```

The agent has two actions, going left or going right, but it might
move in the opposite direction according to a failure probability
`fail_prob=0.1`.

Let us see a graphical representation

```python
env.enable_rendering()
observation, info = env.reset()
for tt in range(5):
observation, reward, terminated, truncated, info = env.step(1)
done = terminated or truncated
video = env.save_video("video_chain.mp4", framerate=5)
```

```none
ffmpeg version n5.0 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 11.2.0 (GCC)
configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-amf --enable-avisynth --enable-cuda-llvm --enable-lto --enable-fontconfig --enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libdrm --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libjack --enable-libmfx --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librav1e --enable-librsvg --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-nvdec --enable-nvenc --enable-shared --enable-version3
libavutil 57. 17.100 / 57. 17.100
libavcodec 59. 18.100 / 59. 18.100
libavformat 59. 16.100 / 59. 16.100
libavdevice 59. 4.100 / 59. 4.100
libavfilter 8. 24.100 / 8. 24.100
libswscale 6. 4.100 / 6. 4.100
libswresample 4. 3.100 / 4. 3.100
libpostproc 56. 3.100 / 56. 3.100
Input #0, rawvideo, from 'pipe:':
Duration: N/A, start: 0.000000, bitrate: 7680 kb/s
Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 800x80, 7680 kb/s, 5 tbr, 5 tbn
Stream mapping:
Stream #0:0 -> #0:0 (rawvideo (native) -> h264 (libx264))
[libx264 @ 0x564b9e570340] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x564b9e570340] profile High, level 1.2, 4:2:0, 8-bit
[libx264 @ 0x564b9e570340] 264 - core 164 r3081 19856cc - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=2 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=5 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'video_chain.mp4':
Metadata:
encoder : Lavf59.16.100
Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv420p(tv, progressive), 800x80, q=2-31, 5 fps, 10240 tbn
Metadata:
encoder : Lavc59.18.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
frame= 6 fps=0.0 q=-1.0 Lsize= 4kB time=00:00:00.60 bitrate= 48.8kbits/s speed=56.2x
video:3kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 32.212582%
[libx264 @ 0x564b9e570340] frame I:1 Avg QP: 6.94 size: 742
[libx264 @ 0x564b9e570340] frame P:5 Avg QP:22.68 size: 267
[libx264 @ 0x564b9e570340] mb I I16..4: 95.2% 0.0% 4.8%
[libx264 @ 0x564b9e570340] mb P I16..4: 1.2% 2.1% 2.0% P16..4: 0.2% 0.0% 0.0% 0.0% 0.0% skip:94.6%
[libx264 @ 0x564b9e570340] 8x8 transform intra:8.2% inter:0.0%
[libx264 @ 0x564b9e570340] coded y,uvDC,uvAC intra: 6.5% 12.3% 11.4% inter: 0.0% 0.0% 0.0%
[libx264 @ 0x564b9e570340] i16 v,h,dc,p: 79% 1% 20% 0%
[libx264 @ 0x564b9e570340] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 0% 0% 100% 0% 0% 0% 0% 0% 0%
[libx264 @ 0x564b9e570340] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 52% 22% 19% 1% 0% 3% 1% 3% 1%
[libx264 @ 0x564b9e570340] i8c dc,h,v,p: 92% 4% 3% 0%
[libx264 @ 0x564b9e570340] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x564b9e570340] kb/s:13.85
```
</br>

<video controls="controls" style="max-width: 600px;">
<source src="../../video_chain_quickstart.mp4" type="video/mp4">
</video>



Defining an agent and a baseline
--------------------------------

We will compare a RandomAgent (which select random action) to the
UCBVIAgent(from [rlberry_research](https://github.com/rlberry-py/rlberry-research)), which is a algorithm that is designed to perform an
efficient exploration. Our goal is then to assess the performance of the
two algorithms.

This is the code to create your RandomAgent agent :

```python
# Create random agent as a baseline
class RandomAgent(AgentWithSimplePolicy):
name = "RandomAgent"

def __init__(self, env, **kwargs):
AgentWithSimplePolicy.__init__(self, env, **kwargs)

def fit(self, budget=100, **kwargs):
observation, info = self.env.reset()
for ep in range(budget):
action = self.policy(observation)
observation, reward, terminated, truncated, info = self.env.step(action)

def policy(self, observation):
return self.env.action_space.sample() # choose an action at random
```

Experiment Manager
------------------

One of the main feature of rlberry is its
[ExperimentManager](rlberry.manager.ExperimentManager)
class. Here is a diagram to explain briefly what it does.


```{image} experiment_manager_diagram.png
:align: center
```

In a few words, ExperimentManager spawns agents and environments for
training and then once the agents are trained, it uses these agents and
new environments to evaluate how well the agent perform. All of these
steps can be done several times to assess the stochasticity of agents and/or
environment.

Comparing the expected rewards of the final policies
----------------------------------------------------

We want to assess the expected reward of the policy learned by our
agents for a time horizon of (say) $T=20$.

To evaluate the agents during the training, we can (arbitrary) use 10 Monte-Carlo simulations (```n_simulations``` in eval_kwargs), i.e., we do the evaluation 10 times for each agent and at the end we take the mean of the obtained reward.

To check variability, we can train many instance of the same agent with
`n_fit` (here, we use only 1 to be faster). Each instance of agent will train with a specific
budget `fit_budget` (here 100). Remark that `fit_budget` may not mean
the same thing among agents.

In order to manage the agents, we use an Experiment Manager. The manager
will then spawn agents as desired during the experiment.

To summarize:
We will train 1 agent (```n_fit```) with a budget of 100 (```fit_budget```). During the training, the evaluation will be on 10 Monte-Carlo run (```n_simulations```), and we doing it for both Agent (```UCBVIAgent``` and ```RandomAgent```)

```python
# Define parameters
ucbvi_params = {"gamma": 0.9, "horizon": 100}

# Create ExperimentManager to fit 1 agent
ucbvi_stats = ExperimentManager(
UCBVIAgent,
(env_ctor, env_kwargs),
fit_budget=100,
eval_kwargs=dict(eval_horizon=20, n_simulations=10),
init_kwargs=ucbvi_params,
n_fit=1,
)
ucbvi_stats.fit()

# Create ExperimentManager for baseline
baseline_stats = ExperimentManager(
RandomAgent,
(env_ctor, env_kwargs),
fit_budget=100,
eval_kwargs=dict(eval_horizon=20, n_simulations=10),
n_fit=1,
)
baseline_stats.fit()
```

```none
[INFO] Running ExperimentManager fit() for UCBVI with n_fit = 1 and max_workers = None.
[INFO] ... trained!
[INFO] Running ExperimentManager fit() for RandomAgent with n_fit = 1 and max_workers = None.
[INFO] ... trained!
```

</br>

Evaluating and comparing the agents :

```python
output = evaluate_agents([ucbvi_stats, baseline_stats], n_simulations=10, plot=True)
```

```none
[INFO] Evaluating UCBVI...
[INFO] [eval]... simulation 1/10
[INFO] [eval]... simulation 2/10
[INFO] [eval]... simulation 3/10
[INFO] [eval]... simulation 4/10
[INFO] [eval]... simulation 5/10
[INFO] [eval]... simulation 6/10
[INFO] [eval]... simulation 7/10
[INFO] [eval]... simulation 8/10
[INFO] [eval]... simulation 9/10
[INFO] [eval]... simulation 10/10
[INFO] Evaluating RandomAgent...
[INFO] [eval]... simulation 1/10
[INFO] [eval]... simulation 2/10
[INFO] [eval]... simulation 3/10
[INFO] [eval]... simulation 4/10
[INFO] [eval]... simulation 5/10
[INFO] [eval]... simulation 6/10
[INFO] [eval]... simulation 7/10
[INFO] [eval]... simulation 8/10
[INFO] [eval]... simulation 9/10
[INFO] [eval]... simulation 10/10
```

</br>

```{image} Figure_1.png
:align: center
```

<span>&#9728;</span> : For more in depth methodology to compare agents, you can check [here](comparison_page)

JulienT01 marked this conversation as resolved.
Show resolved Hide resolved
Comparing the agents during the learning period
-----------------------------------------------

In the previous section, we compared the performance of the **final**
policies learned by the agents, **after** the learning period.

To compare the performance of the agents **during** the learning period
(in the fit method), we can estimate their cumulative regret, which is
the difference between the rewards gathered by the agents during
training and the rewards of an optimal agent. Alternatively, if we
cannot compute the optimal policy, we could simply compare the rewards
gathered during learning, instead of the regret.

First, we have to record the reward during the fit as this is not done
automatically. To do this, we can use the
[WriterWrapper](rlberry.wrappers.writer_utils.WriterWrapper)
module, or simply the [writer](rlberry.agents.Agent.writer) attribute.

```python
class RandomAgent2(RandomAgent):
name = "RandomAgent2"

def __init__(self, env, **kwargs):
RandomAgent.__init__(self, env, **kwargs)
self.env = WriterWrapper(self.env, self.writer, write_scalar="reward")


class UCBVIAgent2(UCBVIAgent):
name = "UCBVIAgent2"

def __init__(self, env, **kwargs):
UCBVIAgent.__init__(self, env, **kwargs)
self.env = WriterWrapper(self.env, self.writer, write_scalar="reward")
```


Then, we fit the two agents.

```python
# Create ExperimentManager for UCBI to fit 10 agents
ucbvi_stats = ExperimentManager(
UCBVIAgent2,
(env_ctor, env_kwargs),
fit_budget=50,
init_kwargs=ucbvi_params,
n_fit=10,
)
ucbvi_stats.fit()

# Create ExperimentManager for baseline to fit 10 agents
baseline_stats = ExperimentManager(
RandomAgent2,
(env_ctor, env_kwargs),
fit_budget=5000,
n_fit=10,
)
baseline_stats.fit()
```

```none
[INFO] Running ExperimentManager fit() for UCBVIAgent2 with n_fit = 10 and max_workers = None.
[INFO] ... trained!
[INFO] Running ExperimentManager fit() for RandomAgent2 with n_fit = 10 and max_workers = None.
[INFO] ... trained!
[INFO] Running ExperimentManager fit() for OptimalAgent with n_fit = 10 and max_workers = None.
[INFO] ... trained!
```

Remark that `fit_budget` may not mean the same thing among agents. For RandomAgent `fit_budget` is the number of steps in the environments that the agent is allowed to take.

The reward that we recover is recorded every time `env.step` is called.

For UCBVI this is the number of iterations of the algorithm and in each
iteration, the environment takes 100 steps (`horizon`) times the
`fit_budget`. Hence the fit_budget used here



Finally, we plot the reward: Here you can see the mean value over the 10 fited agent, with 2 options (raw and smoothed).

```python
# Plot of the reward.
output = plot_writer_data(
[ucbvi_stats, baseline_stats],
tag="reward",
title="Episode Reward",
)


# Plot of the reward.
output = plot_writer_data(
[ucbvi_stats, baseline_stats],
tag="reward",
smooth=True,
title="Episode Reward smoothed",
)
```

JulienT01 marked this conversation as resolved.
Show resolved Hide resolved
```{image} Figure_2.png
:align: center
```
```{image} Figure_3.png
:align: center
```


<span>&#9728;</span> : As you can see, different visualizations are possible. For more information on plots and visualization, you can check [here (in construction)](visualization_page)
Loading
Loading