-
Notifications
You must be signed in to change notification settings - Fork 8
/
05_Building.jl
442 lines (356 loc) · 15.9 KB
/
05_Building.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
### A Pluto.jl notebook ###
# v0.20.4
using Markdown
using InteractiveUtils
# This Pluto notebook uses @bind for interactivity. When running this notebook outside of Pluto, the following 'mock version' of @bind gives bound variables a default value (instead of an error).
macro bind(def, element)
#! format: off
quote
local iv = try Base.loaded_modules[Base.PkgId(Base.UUID("6e696c72-6542-2067-7265-42206c756150"), "AbstractPlutoDingetjes")].Bonds.initial_value catch; b -> missing; end
local el = $(esc(element))
global $(esc(def)) = Core.applicable(Base.get, el) ? Base.get(el) : iv(el)
el
end
#! format: on
end
# ╔═╡ f0ca431d-082b-4a25-8d07-8aea51517435
begin
using Pkg
Pkg.activate("../.")
using PlutoUI
end
# ╔═╡ 359ce3ca-9fe3-4248-808e-52cd58b62bc6
# ╠═╡ show_logs = false
begin
if !@isdefined MountainCar
include("mountaincar.jl") # uses Cairo, Color, Printf
end
end
# ╔═╡ 1ecaaeb0-38ce-11eb-2304-65c70a221e0a
md"""
# Building and Evaluating Autonomous Systems
AA120Q: *Building Trust in Autonomy*
v2025.0.1
## Why Build Autonomous Systems?
Autonomous systems are transforming industries, from aerospace to self-driving cars, by enabling machines to perform complex tasks with minimal human intervention. These systems must:
- Make intelligent decisions in dynamic, uncertain environments.
- Optimize outcomes while considering constraints like safety, efficiency, and resource limitations.
In this notebook, we will explore:
1. **Decision-making processes**: How we can model agents making decisions.
2. **Policy evaluation**: Comparing different strategies (policies) for solving a given problem.
"""
# ╔═╡ a6a13246-e465-4ebb-af2c-4eb998f334a9
md"#### Packages Used in this Notebook"
# ╔═╡ fe2c8100-38ce-11eb-281e-277e6848a981
md"""
# Decision-Making in Autonomous Systems
Decision-making is at the core of any intelligent system. It involves selecting the best course of action based on the system's goals and its understanding of the environment. An _intelligent agent_ interacts with the environment to achieve its objectives over time.
At its simplest, an _agent_ interacts with an environment by taking actions determined by its state:
$$a_t = \pi(s_t)$$
where $s_t$ is the state at time $t$, $a_t$ is the action at time $t$, and $\pi$ is the policy that maps states to actions.
However, in real-world scenarios, agents often operate with incomplete or uncertain information. In such cases, decision-making involves reasoning about the environment using observations. Agents often maintain a **belief** about the environment, which they update based on the previous action taken and the observation received:
$$b_t = \text{update}(b_{t-1}, a_{t-1}, o_{t-1})$$
$$a_t = \pi(b_t)$$
where $o_{t-1}$ is the observation received at $t-1$, $b_t$ is the belief at time $t$, and $\text{update}$ is a method to update our belief.
$(PlutoUI.LocalResource("./figures/agent_environment_interaction.png"))
"""
# ╔═╡ 26fabac1-76ac-4ed6-9032-03bfcbf2eaa1
md"""
# Mountain Car Problem
The [Mountain Car problem](https://en.wikipedia.org/wiki/Mountain_Car) is a fundamental benchmark in decision-making and reinforcement learning. An underpowered car must reach the top of a steep hill, but it lacks the power to climb directly. Instead, the car must employ a strategy to make it up the hill.
- **State**: The car's position $x$ and velocity $v$.
- **Actions**: Choose to accelerate left, right, or coast.
- **Reward**: Rewards can vary, such as receiving a positive value for reaching the goal or using the height of the car to encourage upward movement.
This problem is a simple example that illustrates the challenge of balancing immediate rewards with long-term strategy—a key concept in intelligent decision-making. We'll simulate the car's dynamics and evaluate different strategies!
"""
# ╔═╡ a7298c09-e0be-41ba-859e-1e523bc75cbf
md"""
## Problem Definition
The problem is defined in `mountaincar.jl` for us to use. We included this file at the top of this notebook.
### State
The file defines a `MountainCar` type with properties `x::Float64` and `v::Float64` for the position and velocity of the car. The position is restricted to ``x \in [-1.2, 0.6]`` and the velocity is restricted to ``v \in [-0.07, 0.07]``.
### Actions
The possible actions are acceleration represented as Symbols: `:left`, `:right`, and `:coast`. The magnitude of the acceleration is $0.001$ in the designated direction.
### Reward
The reward is defined as the height of the car with the lowest point in the terrain set at ``0``. The reward is defined by the function `reward(car::MountainCar, a::Symbol)` and return a `Float64`.
### Dynamics
The dynamics are provided by the function `update(car::MountainCar, a::Symbol)` and returns a `MountainCar` type with the updated position and velocity.
"""
# ╔═╡ 4129dd45-d0b0-48a0-ad94-0afc65bbc6c2
md"""
## Visualization of Mountain Car"""
# ╔═╡ df864630-38e4-11eb-227e-4be958de9558
md"""
## Policy Evaluation
One way to evaluate a policy is by simulating it and analyzing the resulting metrics.
In **deterministic environments**, a single simulation run is often sufficient to assess a policy's performance since the outcome is consistent. However, in **stochastic environments**, where outcomes vary due to randomness, evaluating a policy typically requires running multiple simulations and aggregating results. This approach helps estimate the policy's expected performance and provides insights into its robustness under uncertainty.
For example, given a policy $\pi(s) \rightarrow a$ and a distribution over the initial states, we can simulate the policy across many scenarios, compute the reward for each run, and use the mean reward as an evaluation metric.
In the case of the **Mountain Car domain**, which is deterministic, we only need to perform one simulation to evaluate a policy. Here, we use the sum of rewards over a fixed number of steps as our metric of interest.
"""
# ╔═╡ aa87ed01-6a92-4dbf-8a60-b8a57767845e
md"""
### Policy 1: _Floor It!_
The first policy we will consider is the simplest possible strategy: always accelerate towards the peak on the right.
This policy does not consider the car's velocity or position—it just applies maximum acceleration to the right at all times.
"""
# ╔═╡ 8df03a10-771e-4998-b442-b370a0744982
function policy_floor_it(car::MountainCar)
return :right # Always accelerate to the right!
end
# ╔═╡ 9e515a61-efc2-4aaf-9b62-268b41eaa3b4
md"""
### Policy 2: _Accelerate in the Direction of Velocity_
The second policy uses the car's current velocity. This policy accelerates in the direction the car is already traveling, aiming to maintain or increase its speed.
- If the car's velocity is negative, the policy accelerates left.
- If the velocity is zero or positive, it accelerates right.
"""
# ╔═╡ 117c546b-3fa2-4ad4-9f7c-5c5acd2b74c7
function policy_accel_in_direc_vel(car::MountainCar)
if car.v < 0
return :left
else # car.v ≥ 0
return :right
end
end
# ╔═╡ bf7395e4-5ce7-48c2-9804-5ada290b691c
md"""
### Policy Evaluation
Now that we have our policies, we need to evaluate them. We can do this by simulating them and collecting the rewards during the simulation.
"""
# ╔═╡ 07e0217c-6215-4c20-95a2-b91a6a3a0db7
function run_single_simulation(policy::Function; number_steps=500)
rewards = zeros(number_steps) # Vector to hold rewards
states = Vector{MountainCar}(undef, number_steps) # Vector to hold states
actions = Vector{Symbol}(undef, number_steps) # Vector to hold actions taken
sₜ = MountainCar(-0.5, 0.0) # Start at x = -0.5, v = 0.0
step_num = 0
while step_num < number_steps
step_num += 1
aₜ = policy(sₜ)
sₜ = update(sₜ, aₜ)
rₜ = reward(sₜ, aₜ)
rewards[step_num] = rₜ # store reward
states[step_num] = sₜ # store state
actions[step_num] = aₜ # store action
end
return rewards, states, actions
end
# ╔═╡ cd67112a-d161-42fc-829e-7713d39ef6fe
function evaluate_policy(policy::Function;
number_steps=500, number_of_simulations=1
)
rewards = zeros(number_of_simulations)
for sim_num in 1:number_of_simulations
sim_rewards, _, _ = run_single_simulation(policy; number_steps=number_steps)
sum_of_rewards = sum(sim_rewards)
rewards[sim_num] = sum_of_rewards
end
if number_of_simulations == 1
return rewards[1]
else
return rewards
end
end
# ╔═╡ 171f5bf8-d210-4f41-9772-191bd9f92368
rewards_policy_1 = evaluate_policy(policy_floor_it)
# ╔═╡ 19480506-a380-4b10-998a-ffd569f7569a
rewards_policy_2 = evaluate_policy(policy_accel_in_direc_vel)
# ╔═╡ 836e7602-b0eb-4265-8cb3-ea75259d9497
md"""
### Visualize the Policy
While metrics are essential for evaluating a policy's performance, visualizing the policy or its behavior in action can provide additional, critical insights. Visualizations help us identify patterns, inefficiencies, or unexpected behavior that metrics alone might not reveal.
"""
# ╔═╡ 1c7a90d3-0d3a-4a1a-8024-551640478d74
function animate(policy::Function; number_steps=200)
rewards, states, actions = run_single_simulation(policy; number_steps=number_steps)
frames = []
for (rₜ, sₜ, aₜ) in zip(rewards, states, actions)
frame_t = render_mountain_car(sₜ; render_pos_overlay=true, reward=rₜ, action=aₜ)
push!(frames, frame_t)
end
return frames
end;
# ╔═╡ 3bafccd4-a910-4131-bbb8-6ea11fd93eb4
animation_floor_it = animate(policy_floor_it);
# ╔═╡ 3e323c6a-a07b-47a8-b9f8-bf83a805ccf6
md"#### Animation of Policy 1: Floor It!"
# ╔═╡ b9af3a4d-6cc8-4d4d-893a-9447601294fd
animation_accel_in_d = animate(policy_accel_in_direc_vel);
# ╔═╡ ba84676f-7427-4126-845e-b21806b48aba
md"#### Animation of Policy 2: Accelerate in the Direction of Velocity"
# ╔═╡ 58869265-0fe0-470a-bf03-1fe4f887718f
md"""
# Backend
_Helper functions and project management. Please do not edit._
"""
# ╔═╡ 2c3b3292-38e7-11eb-2ee1-518ce8840823
PlutoUI.TableOfContents()
# ╔═╡ dd82ccdb-4565-48a1-b037-2b8596e05157
begin
start_code() = html"""
<div class='container'><div class='line'></div><span class='text' style='color:#B1040E'><b><code><START CODE></code></b></span><div class='line'></div></div>
<p> </p>
<!-- START_CODE -->
"""
end_code() = html"""
<!-- END CODE -->
<p><div class='container'><div class='line'></div><span class='text' style='color:#B1040E'><b><code><END CODE></code></b></span><div class='line'></div></div></p>
"""
function combine_html_md(contents::Vector; return_html=true)
process(str) = str isa HTML ? str.content : html(str)
return join(map(process, contents))
end
function html_expand(title, content::Markdown.MD)
return HTML("<details><summary>$title</summary>$(html(content))</details>")
end
function html_expand(title, contents::Vector)
html_code = combine_html_md(contents; return_html=false)
return HTML("<details><summary>$title</summary>$html_code</details>")
end
html_space() = html"<br><br><br><br><br><br><br><br><br><br><br><br><br><br>"
html_half_space() = html"<br><br><br><br><br><br><br>"
html_quarter_space() = html"<br><br><br>"
Bonds = PlutoUI.BuiltinsNotebook.AbstractPlutoDingetjes.Bonds
struct DarkModeIndicator
default::Bool
end
DarkModeIndicator(; default::Bool=false) = DarkModeIndicator(default)
function Base.show(io::IO, ::MIME"text/html", link::DarkModeIndicator)
print(io, """
<span>
<script>
const span = currentScript.parentElement
span.value = window.matchMedia('(prefers-color-scheme: dark)').matches
</script>
</span>
""")
end
Base.get(checkbox::DarkModeIndicator) = checkbox.default
Bonds.initial_value(b::DarkModeIndicator) = b.default
Bonds.possible_values(b::DarkModeIndicator) = [false, true]
Bonds.validate_value(b::DarkModeIndicator, val) = val isa Bool
end
# ╔═╡ 365c58a0-38e5-11eb-2652-7f33ce2a63ab
md"""
Position: $(@bind pos Slider(-1.2:0.1:0.6; default=-0.5))
Velocity: $(@bind vel Slider(-0.070:0.01:0.07; default=0.0))
Action: $(@bind act Select([:left, :right, :coast]))
"""
# ╔═╡ 45e3b7f0-38e5-11eb-2688-0782700d5bb8
render_mountain_car(MountainCar(pos, vel); render_pos_overlay=true, action=act)
# ╔═╡ 3854a2b4-3fc7-45a0-980c-f97a4c517efa
@bind t_1 PlutoUI.Clock(interval=1/10, max_value=length(animation_floor_it))
# ╔═╡ 4a64a3eb-6eed-4839-abe2-68163ac0da54
animation_floor_it[t_1]
# ╔═╡ b1bc3676-4764-464c-a5fa-98f4d5f3f677
@bind t_2 PlutoUI.Clock(interval=1/10, max_value=length(animation_accel_in_d))
# ╔═╡ 335e61e8-db8d-4a0d-a0f8-ccd38a2b7ae5
animation_accel_in_d[t_2]
# ╔═╡ 2b1661a0-38e7-11eb-0e0a-45a7964fa12f
html_half_space()
# ╔═╡ e02824f3-1728-460c-8f65-a96ce5572778
html"""
<style>
h3 {
border-bottom: 1px dotted var(--rule-color);
}
summary {
font-weight: 500;
font-style: italic;
}
.container {
display: flex;
align-items: center;
width: 100%;
margin: 1px 0;
}
.line {
flex: 1;
height: 2px;
background-color: #B83A4B;
}
.text {
margin: 0 5px;
white-space: nowrap; /* Prevents text from wrapping */
}
h2hide {
border-bottom: 2px dotted var(--rule-color);
font-size: 1.8rem;
font-weight: 700;
margin-bottom: 0.5rem;
margin-block-start: calc(2rem - var(--pluto-cell-spacing));
font-feature-settings: "lnum", "pnum";
color: var(--pluto-output-h-color);
font-family: Vollkorn, Palatino, Georgia, serif;
line-height: 1.25em;
margin-block-end: 0;
display: block;
margin-inline-start: 0px;
margin-inline-end: 0px;
unicode-bidi: isolate;
}
h3hide {
border-bottom: 1px dotted var(--rule-color);
font-size: 1.6rem;
font-weight: 600;
color: var(--pluto-output-h-color);
font-feature-settings: "lnum", "pnum";
font-family: Vollkorn, Palatino, Georgia, serif;
line-height: 1.25em;
margin-block-start: 0;
margin-block-end: 0;
display: block;
margin-inline-start: 0px;
margin-inline-end: 0px;
unicode-bidi: isolate;
}
.styled-button {
background-color: var(--pluto-output-color);
color: var(--pluto-output-bg-color);
border: none;
padding: 10px 20px;
border-radius: 5px;
cursor: pointer;
font-family: Alegreya Sans, Trebuchet MS, sans-serif;
}
</style>
<script>
const buttons = document.querySelectorAll('input[type="button"]');
buttons.forEach(button => button.classList.add('styled-button'));
</script>"""
# ╔═╡ Cell order:
# ╟─1ecaaeb0-38ce-11eb-2304-65c70a221e0a
# ╟─f0ca431d-082b-4a25-8d07-8aea51517435
# ╟─a6a13246-e465-4ebb-af2c-4eb998f334a9
# ╠═359ce3ca-9fe3-4248-808e-52cd58b62bc6
# ╟─fe2c8100-38ce-11eb-281e-277e6848a981
# ╟─26fabac1-76ac-4ed6-9032-03bfcbf2eaa1
# ╟─a7298c09-e0be-41ba-859e-1e523bc75cbf
# ╟─4129dd45-d0b0-48a0-ad94-0afc65bbc6c2
# ╟─365c58a0-38e5-11eb-2652-7f33ce2a63ab
# ╟─45e3b7f0-38e5-11eb-2688-0782700d5bb8
# ╟─df864630-38e4-11eb-227e-4be958de9558
# ╟─aa87ed01-6a92-4dbf-8a60-b8a57767845e
# ╠═8df03a10-771e-4998-b442-b370a0744982
# ╟─9e515a61-efc2-4aaf-9b62-268b41eaa3b4
# ╠═117c546b-3fa2-4ad4-9f7c-5c5acd2b74c7
# ╟─bf7395e4-5ce7-48c2-9804-5ada290b691c
# ╠═07e0217c-6215-4c20-95a2-b91a6a3a0db7
# ╠═cd67112a-d161-42fc-829e-7713d39ef6fe
# ╠═171f5bf8-d210-4f41-9772-191bd9f92368
# ╠═19480506-a380-4b10-998a-ffd569f7569a
# ╟─836e7602-b0eb-4265-8cb3-ea75259d9497
# ╟─1c7a90d3-0d3a-4a1a-8024-551640478d74
# ╟─3bafccd4-a910-4131-bbb8-6ea11fd93eb4
# ╟─3e323c6a-a07b-47a8-b9f8-bf83a805ccf6
# ╟─3854a2b4-3fc7-45a0-980c-f97a4c517efa
# ╟─4a64a3eb-6eed-4839-abe2-68163ac0da54
# ╟─b9af3a4d-6cc8-4d4d-893a-9447601294fd
# ╟─ba84676f-7427-4126-845e-b21806b48aba
# ╟─b1bc3676-4764-464c-a5fa-98f4d5f3f677
# ╟─335e61e8-db8d-4a0d-a0f8-ccd38a2b7ae5
# ╟─2b1661a0-38e7-11eb-0e0a-45a7964fa12f
# ╟─58869265-0fe0-470a-bf03-1fe4f887718f
# ╟─2c3b3292-38e7-11eb-2ee1-518ce8840823
# ╟─dd82ccdb-4565-48a1-b037-2b8596e05157
# ╟─e02824f3-1728-460c-8f65-a96ce5572778