-
Notifications
You must be signed in to change notification settings - Fork 1
/
aises_4_5
500 lines (499 loc) · 30.9 KB
/
aises_4_5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
<h1 id="component-failure-accident-models-and-methods">4.5 Component Failure
Accident Models and Methods</h1>
<p>As a system is being created and used, it is important to analyze it
to identify potential risks. One way of doing this is to look at systems
through the lens of an accident model: a theory about how accidents
happen in systems and the factors that lead to them <span
class="citation" data-cites="Marsden2017">[1]</span>. We will now look
at some common accident models. These impact system design and
operational decisions.</p>
<h2 id="swiss-cheese-model">4.5.1 Swiss Cheese Model</h2>
<p><strong>The Swiss cheese model helps us analyze defenses and identify
pathways to accidents <span class="citation"
data-cites="Reason1990">[2]</span>.</strong> The diagram in Figure 4.4 shows multiple slices of Swiss cheese,
each representing a particular defense feature in a system. The holes in
a slice represent the weaknesses in a defense feature—the ways in which
it could be bypassed. If there are any places where holes in all the
slices line up, creating a continuous hole through all of them, this
represents a possible route to an accident. This model highlights the
importance of defense in depth, since having more layers of defense
reduces the probability of there being a pathway to an accident that can
bypass them all.</p>
<figure id="fig:cheese">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/swiss_cheese.png" class="tb-img-full"/>
<p class="tb-caption">Figure 4.4: Each layer of defense (safety culture, red teaming, etc.) is a layer of defense with its own
holes in the Swiss cheese model. With enough layers, we hope to avoid pathways that can bypass
them all.
</p>
<!--<figcaption>Swiss cheese defense model</figcaption>-->
</figure>
<p>Consider the example of an infectious disease as a hazard. There are
multiple possible defenses that reduce the risk of infection.
Preventative measures include avoiding large gatherings, wearing a mask
and regularly washing hands. Protective measures include maintaining a
healthy lifestyle to support a strong immune system, getting vaccinated,
and having access to healthcare. Each of these defenses can be
considered a slice of cheese in the diagram.<p>
However, none of these defenses are 100% effective. Even if an
individual avoids large gatherings, they could still become infected
while buying groceries. A mask might mostly block contact with the
pathogen, but some of it could still get through. Vaccination might not
protect those who are immunocompromised due to other conditions, and may
not be effective against all variants of the pathogen. These
imperfections are represented by the holes in the slices of cheese. From
this, we can infer various potential routes to an accident, such as an
immunocompromised person with a poorly fitting mask in a shopping mall,
or someone who has been vaccinated encountering a new variant at the
shops that can evade vaccine-induced immunity.</p>
<p><strong>We can improve safety by increasing the number of slices, or
by reducing the holes.</strong> Adding more layers of defense will
reduce the chances of holes lining up to create an unobstructed path
through all the defenses. For example, adopting more of the practices
outlined above would reduce an individual’s chances of infection more
than if they adopt just one.<p>
Similarly, reducing the size of a hole in any layer of defense will
reduce the probability that it will overlap with a hole in another
layer. For example, we could reduce the weaknesses in wearing a mask by
getting a better-fitting, more effective mask. Scientists might also
develop a vaccine that is effective against a wider range of variants,
thus reducing the weaknesses in vaccination as a layer of defense.</p>
<p><strong>We can think of layers of defense as giving us additional
nines of reliability.</strong> In many cases, it seems reasonable to
assume that adding a layer of defense helps reduce remaining risks by
approximately an order of magnitude by eliminating 90% of the risks
still present. Consider how adding the following three layers of defense
can give our AIs additional nines of reliability:<p>
</p>
<ol>
<li><p><em>Adversarial fine-tuning</em>: By fine-tuning our model, we
ensure that it rejects harmful requests. This works mostly reliably,
filtering out 90% of the harmful requests received.</p></li>
<li><p><em>Artificial conscience</em>: By giving our AI an artificial
conscience, it is less likely to take actions that result in low human
wellbeing in pursuit of its objective. However, 10% of the time, it may
take actions that are great for its objective and bad for human
wellbeing regardless.</p></li>
<li><p><em>AI watchdogs</em>: By monitoring deployed AIs to detect signs
of malfeasance, we catch AIs acting in ways contrary to how we want them
to act nine times out of ten.</p></li>
</ol>
<p><strong>Swiss cheese model for emergent capabilities.</strong> To
reduce the risk of unexpected emergent capabilities, multiple lines of
defense could be employed. For example, models could be gradually scaled
(e.g., using <span class="math inline">3×</span> more compute than the
previous training run, rather than a larger number such as <span
class="math inline">10×</span>); as a result, there will be fewer
emergent capabilities to manage. An additional layer of defense is
screening for hazardous capabilities, which could involve deploying
comprehensive test beds, and red teaming with behavioral tests and
representation reading. Another defense is staged releases; rather than
release the model to all users at once, gradually release it to more and
more users, and manage discovered capabilities as they emerge. Finally,
post-deployment monitoring through anomaly detection adds another layer
of defense.</p>
<p>Each of these aim at largely different areas, with the first focusing
on robustness, the second on control, and the third on monitoring. By
ensuring we have many defenses, we prevent a wider array of risks,
improving our system reliability by many nines of reliability.<p>
</p>
<h2 id="bow-tie-model">4.5.2 Bow Tie Model</h2>
<p><strong>The bow tie model splits defenses into preventative and
protective measures<span class="citation"
data-cites="Marsden2017">[1]</span>.</strong> In the diagram in Figure 4.5, the triangle on the left-hand side
contains features that are intended to prevent an accident from
happening, while the triangle on the right-hand side contains features
that are intended to mitigate damage if an accident does happen. The
point in the middle where the two triangles meet can be thought of as
the point where any given accident happens. This is an example of a bow
tie diagram, which can help us visualize the preventative and protective
measures in place against any potential adverse event.<p>
</p>
<figure id="fig:bowtie">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/Bow-tie updated.png" class="tb-img-full"/>
<p class="tb-caption">Figure 4.5: The bow tie diagram can tie together hazards and their consequences with control and
recovery measures to mitigate the effects of an adverse event.</p>
<!--<figcaption>Bow tie diagram</figcaption>-->
</figure>
<p>For example, if an individual goes rock climbing, a potential
accident is falling. We can draw a bow tie for this situation, with the
center representing a fall. On the left, we note any measures the
individual could take to prevent a fall, for example using chalk on
their hands to increase friction. On the right, we note any protective
measures they could take to reduce harm from falling, such as having a
cushioned floor below.</p>
<p><strong>Bow tie analysis of proxy gaming.</strong> In the Single Agent Safety chapter, we
learned that one hazard of using AIs is that they might learn to “game”
the objectives we give them. If the specified objectives are only
proxies for what we actually want, then an AI might find a way of
optimizing them that is not beneficial overall, possibly due to
unintended harmful side effects.<p>
To analyze this hazard, we can draw a bow tie diagram, with the center
representing the event of an AI gaming its proxy goals in a
counterproductive way. On the left-hand side, we list preventative
measures, such as ensuring that we can control AI drives like
power-seeking. If the AI system has less power (for example fewer
resources), this would reduce the probability that it finds a way to
optimize its goal in a way that conflicts with our values (as well as
the severity of the impact if it does). On the right-hand side, we list
protective measures, such as improving anomaly detection tools that can
recognize any AI behavior that resembles proxy gaming. This would help
human operators to notice activity like this early and take corrective
action to limit the damage caused.<p>
The exact measures on either side of the bow tie depend on which event
we put at the center. We can make a system safer by individually
considering each hazard associated with it, and ensuring we implement
both preventative and protective measures against that hazard.</p>
<h2 id="fault-tree-analysis-method">4.5.3 Fault Tree Analysis Method</h2>
<p><strong>Fault tree analysis works backward to identify possible
causes of negative outcomes.</strong> Fault tree analysis (FTA) is a
top-down process that starts by considering specific negative outcomes
that could result from a system, and then works backward to identify
possible routes to those outcomes. In other words, FTA involves
“backchaining” from a possible accident to find its potential
causes.<p>
</p>
<figure id="wrap-fig:nasa-fta">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/NASA_FTA.png" class="tb-img-full" style="width: 70%"/>
<p class="tb-caption">Figure 4.6: Using a fault tree, we can work backwards from a failure (no water flow) to its possible
causes (such as a blockage or lack of flow at source). <span class="citation"
data-cites="nasafta">[8]</span></p>
<!--<figcaption>NASA example fault tree</figcaption>-->
</figure>
<p>For each negative outcome, we work backward through the system,
identifying potential causes of that event. We can then draw a “fault
tree” showing all the possible pathways through the system to the
negative outcome. By studying this fault tree, we can identify ways of
improving the system that remove these pathways. In Figure 4.6, we trace backwards from a
pump failure to two types of failure: mechanical and electrical failure.
Each of these has further subtypes. For fuse failing, we know that we
require a circuit overload, which can happen as a result of a wire
shorted or a power surge. Hence, we know what sort of hazards we might
need to think about.</p>
<p><strong>Example: Fire Hazard.</strong> We could also consider the
risk of a fire outbreak as a negative outcome. We then work backward,
thinking about the different requirements for this to happen—fuel,
oxygen, and sufficient heat energy. This differs from the water pump
failure since all of these are necessary rather than just one of them.
Working backward again, we can think about all the possible sources of
each of these requirements in the system. After completing this process,
we can identify multiple combinations of events that could lead to the
same negative outcome.</p>
<figure id="fig:FTA-decision">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/AI_harm.png" class="tb-img-full"/>
<p class="tb-caption">Figure 4.7: An FTA decision-tree can identify several potential problems by interrogating important
contextual questions. <span class="citation" data-cites="critch2023tasra">[3]</span></p>
<!--<figcaption>AI FTA decision-tree <span class="citation"-->
<!--data-cites="critch2023tasra">[3]</span></figcaption>-->
</figure>
<p><strong>FTA can be used to guide decisions.</strong> Figure 4.7 shows how we can use fault-tree
style reasoning to create concrete questions about risks. We can trace
this through to identify risks depending on the answers to these
questions. For instance, if there is no unified group accountable for
creating AIs, then we know that diffusion of responsibility is a risk.
If there is such a group, then we need to question their beliefs,
intentions, and incentives.<p>
By thinking more broadly about all the ways in which a specific accident
could be caused, whether by a single component failing or by a
combination of many smaller events, FTA can discover hazards that are
important to find. However, the backchaining process that FTA relies on
also has limitations, which we will discuss in the next section.</p>
<h2 id="limitations">4.5.4 Limitations</h2>
<p><strong>Chain-of-Events Reasoning.</strong> The Swiss cheese and bow
tie models and the FTA method can be useful for identifying hazards and
reducing risk in some systems. However, they share some limitations that
make them inapplicable to many of the complex systems that are built and
operated today. Within the field of safety engineering, these approaches
are largely considered overly simplistic and outdated. We will now
discuss the limitations of these approaches in more detail, before
moving on to describe more sophisticated accident models that may be
better suited to risk management in AI systems.</p>
<p><strong>Chain-of-events reasoning is sometimes too simplistic for
useful risk analysis.</strong> All of these models and techniques are
based on backchaining or linear “chain-of-events” reasoning. This way of
thinking assumes there is a neat line of successive events, each one
directly causing the next, that ultimately leads to the accident. The
goal is then to map out this line of events and trace it back to
identify the “root cause”—usually a component failure or human error—to
blame. However, given the numerous factors that are at play in many
systems, it does not usually make sense to single out just one event as
the cause of an accident. Moreover, this approach puts the emphasis
largely on the details of “what” specifically happened, while neglecting
the bigger question of “why” it happened. This worldview often ignores
broader systemic issues and can be overly simplistic. Rather than break
events down into a chain of events, a complex systems perspective often
sees events as a product or complex interaction between many
factors.</p>
<h3 id="complex-and-sociotechnical-systems">Complex and Sociotechnical
Systems</h3>
<p>Component failure accident models are particularly inadequate for
analyzing complex systems and sociotechnical systems. We cannot always
assume direct, linear causality in complex and sociotechnical systems,
so the assumption of a linear “chain of events” breaks down.</p>
<p><strong>In complex systems, many components interact to produce
emergent properties.</strong> Complex systems are everywhere. A hive of
bees consists of individuals that work together to achieve a common
goal, a body comprises many organs that interact to form a single
organism, and large-scale weather patterns are produced by the
interactions of countless particles in the atmosphere. In all these
examples, we find collective properties that are not found in any of the
components but are produced by the interactions between them. In other
words, a complex system is “more than the sum of its parts.” As
discussed in the Complex Systems chapter, these systems exhibit emergent
features that cannot be usefully understood through a reductive analysis
of components.</p>
<p><strong>Sociotechnical systems involve interactions between humans
and technologies.</strong> For example, a car on the road is a
sociotechnical system, where a human driver and technological vehicle
work together to get from the starting point to the destination. At
progressively higher levels of complexity, vehicles interacting with one
another on a road also form a sociotechnical system, as does the entire
transport network. With the widespread prevalence of computers,
virtually all workplaces are now sociotechnical systems, and so is the
overarching economy.<p>
There are three main reasons why component failure accident models are
insufficient for analyzing complex and sociotechnical systems: accidents
sometimes happen without any individual component failing, accidents are
not always the result of linear causality, and direct causality is
sometimes less important than indirect, remote, or “diffuse” causes such
as safety culture. We will now look at each of these reasons in more
detail.</p>
<h3 id="accidents-without-failure">Accidents Without Failure</h3>
<p><strong>Sometimes accidents happen due to interactions, even if no
single component fails <span class="citation"
data-cites="Leveson2020Introduction">[4]</span>.</strong> The component
failure accident models generally consider each component individually,
but components often behave differently within a complex or
sociotechnical system than they do in isolation. For example, a human
operator who is normally diligent might be less so if they believe that
a piece of technology is taking care of one of their
responsibilities.<p>
Accidents can also arise through interactions between components, even
if every component functions exactly as it was intended to. Consider the
Mars Polar Lander, a spacecraft launched by NASA in 1999, which crashed
on the Martian surface later that year. It was equipped with reverse
thrusters to slow its descent to the surface, and sensors on its legs to
detect a signal generated by landing to turn the thrusters off. However,
the legs had been stowed away for most of the journey. When they
extended in preparation for landing, the sensors interpreted it as a
landing. This caused the software to switch the thrusters off before the
craft had landed, so it crashed on the surface <span class="citation"
data-cites="albee2000Report">[5]</span>.<p>
In this case, there was no component failure. Each component did what it
was designed and intended to do; the accident was caused by an
unanticipated interaction between components. This illustrates the
importance of looking at the bigger picture, and considering the system
as a whole, rather than just looking at each component in isolation, in
a reductionist way.</p>
<h3 id="nonlinear-causality">Nonlinear Causality</h3>
<p><strong>Sometimes, we cannot tease out a neat, linear chain of events
or a “root cause” <span class="citation"
data-cites="Leveson2020Introduction">[4]</span>.</strong> Complex and
sociotechnical systems usually involve a large number of interactions
and feedback loops. Due to the many interdependencies and circular
processes, it is not always feasible to trace an accident back to a
single “root cause.” The high degree of complexity involved in many
systems of work is illustrated in figure 4.8. This
shows the interconnectedness of a system cannot be accurately reduced to
a single line from start to finish.<p>
</p>
<figure id="feedback">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/ML Training Data Feedback Loop.png" class="tb-img-full" style="width: 80%"/>
<p class="tb-caption">Figure 4.8: There are feedback loops in the creation and deployment of AI systems. For example, the curation of training data used
for developing an AI system exhibits feedback loops. <span class="citation"
data-cites="rismani2023beyond">[6]</p>
<!--<figcaption>Feedback Loops in AI Training</figcaption>-->
</figure>
<p><strong>AI systems can exhibit feedback loops and nonlinear
causality.</strong> Reinforcement learning systems involve complexity
and feedback loops. These systems gather information from the
environment to make decisions, which then impact the environment,
influencing their subsequent decisions. Consider an ML system that ranks
advertisements based on their relevance to search terms. The system
learns about relevance by tracking the number of clicks each ad receives
and adjusts its model accordingly.<p>
However, the number of clicks an ad receives depends not only on its
intrinsic relevance but also on its position in the list: higher ads
receive more clicks. If the system underestimates the effect of
position, it may continually place one ad at the top since it receives
many clicks, even if it has lower intrinsic relevance, leading to
failure. Conversely, if the system overestimates the effect of position,
the top ad will receive fewer clicks than it expects, and so it may
constantly shuffle ads, resulting in a random order rather than
relevance-based ranking, also leading to failure.<p>
Many complex and sociotechnical systems comprise a whole network of
interactions between components, including multiple feedback loops like
this one. When thinking about systems like these, it is difficult to
reduce any accidents to a simple chain of successive events that we can
trace back to a root cause.</p>
<h3 id="indirect-causality">Indirect Causality</h3>
<p><strong>Sometimes, it is more useful to focus on underlying
conditions than specific events <span class="citation"
data-cites="Leveson2020Introduction">[4]</span>.</strong> Even if we can
pinpoint specific events that directly led to an accident, it is
sometimes more fruitful to focus on broader systemic issues. For
example, rising global temperatures certainly increase the frequency and
severity of natural disasters, including hurricanes, wildfires, and
floods. Although it would be difficult to prove that any particular
emission of greenhouse gases directly caused any specific disaster, it
would be remiss of us to ignore the effects of climate change when
trying to evaluate the risk of future disasters. Similarly, a poor diet
and lack of exercise are likely to result in ill health, even though no
single instance of unhealthy behavior could be directly blamed for an
individual developing a disease.</p>
<p><strong>Example: spilling drinks.</strong> Consider a scenario where
an event planner is faced with the problem of people spilling drinks at
a crowded venue. One possible solution might be to contact the
individuals who dropped their drinks last time and ask them to be more
careful. While this approach aims to reduce the failure rate of
individual components (people), it fails to address the underlying issue
of high density that contributes to the problem.<p>
Alternatively, the event planner could consider two more effective
strategies. Firstly, they could move the event to a bigger venue with
more space. By modifying the system architecture, the planner would
provide attendees with more room to move around, reducing the likelihood
of people bumping into each other and spilling their drinks. This
approach addresses the diffuse variable of high density.<p>
Secondly, the event planner could limit the number of attendees,
ensuring that the venue does not become overcrowded. By reducing the
overall density of people, the planner would again mitigate the risk of
collisions and drink spills. This solution also targets the diffuse
variable of high density, acknowledging the systemic nature of the
problem.<p>
Compared with the first option, which solely focuses on the behavior of
individuals and would likely fail to eliminate spillages, the latter two
strategies recognize the importance of modifying the environment and
addressing the broader factors that contribute to the issue. By
prioritizing architectural adjustments or managing attendee numbers, the
event planner can more effectively prevent drink spills and create a
safer and more enjoyable experience for everyone.</p>
<p><strong>Sharp end versus blunt end.</strong> The part of a system
where specific events happen, such as people bumping into each other, is
sometimes referred to as the “sharp end” of the system. The higher-level
conditions of the system, such as the density of people in a venue, is
sometimes called the “blunt end.” As outlined in the example of people
spilling drinks, “sharp-end” interventions may not always be effective.
These are related to “proximate causes” and “distal causes,”
respectively.</p>
<p><strong>Diffuse causality suggests broader, “blunt-end”
interventions.</strong> In general, it might not be possible to prove
that “systemic factors” directly caused an accident. However, systemic
factors might “diffusely” affect the whole system in ways that increase
the probability of an accident, making them relevant for risk analysis.
Under conditions like this, even if the specific event that led to an
accident had not happened, something else might have been likely to
happen instead. This means it can be more effective to tackle problems
less directly, such as by changing system architecture or introducing
bottom-up interventions that affect systemic conditions, rather than by
attempting to control the sharp end.<p>
Weaponizing any single AI system would not necessarily lead to war or
any other kind of catastrophe. However, the existence of these systems
increases the probability of a rogue AI system causing a disaster.
Minimizing the use of autonomous weapons might be a better way of
addressing this risk than attempting to introduce lots of safeguards to
prevent loss of control over a rogue system.</p>
<p><strong>Meadows’ twelve leverage points.</strong> One way of
identifying broad interventions that could yield significant results is
to consider Meadows’ twelve leverage points. These are twelve points
within a system, described by environmental scientist Donella Meadows,
where a small change can make a large difference <span class="citation"
data-cites="ref-pan2022effects">[8]</span>. Depending on the system
under consideration, some of these characteristics may be difficult or
impossible to change, for example large physical infrastructure or the
laws of physics. The points are therefore often listed in order of
increasing efficacy, taking into account their tractability.<p>
The lower end of the list of leverage points includes: parameters or
numbers that can be tweaked, such as taxes or the amount of resources
allocated to a particular purpose; the size of buffers in a system, such
as water reservoirs or a store’s reserve stock, where a larger buffer
makes a system more stable and a smaller buffer makes a system more
flexible; and the structure of material flows through a system, such as
the layout of a transport network or the way that people progress
through education and employment.<p>
The middle of the list of leverage points includes: lags between an
input to a system and the system’s response, which can cause the system
to oscillate through over- and undershooting; negative feedback loops
that can be strengthened to help a system self-balance; and positive
feedback loops that can be weakened to prevent runaway escalation at an
earlier stage.<p>
The next three leverage points in Meadows’ list are: the structure of
information flows in a system, which can increase accountability by
making the consequences of a decision more apparent to decision-makers,
for example by publishing information about companies’ emissions; the
rules of the system, such as national laws, or examination policies in
educational institutes, which can be changed in social systems to
influence behavior; and the ability to self-organize and adapt, which
can be promoted by maintaining diversity, such as biodiversity in an
ecosystem and openness to new ideas in a company or institution.<p>
Finally, leverage points toward the higher end of the list include: the
goal of the system, which, if changed, could completely redirect the
system’s activities; the paradigm or mindset that gave rise to the goal,
which, if adjusted, could transform the collective understanding of what
a system can and should be aiming for; and the realization that there
are multiple worldviews or paradigms besides an organization’s current
one, which can empower people to adopt a different paradigm when
appropriate.</p>
<p><strong>Summary.</strong> In complex and sociotechnical systems,
accidents cannot always be reduced to a linear chain of successive
events. The large number of complex interactions and feedback loops
means that accidents can happen even if no single component fails and
that it may not be possible to identify a root cause. Since we may not
be able to anticipate every potential pathway to an accident, it is
often more fruitful to address systemic factors that diffusely influence
risk. Meadows’ twelve leverage points can help us identify systemic
factors that, if changed, could have far-reaching, system-wide
effects.</p>
<br>
<br>
<h3>References</h3>
<div id="refs" class="references csl-bib-body" data-entry-spacing="0"
role="list">
<div id="ref-Marsden2017" class="csl-entry" role="listitem">
<div class="csl-left-margin">[1] E.
Marsden, <span>“Designing for safety: Inherent safety, designed
in.”</span> Accessed: Jul. 31, 2017. [Online]. Available: <a
href="https://risk-engineering.org/safe-design/">https://risk-engineering.org/safe-design/</a></div>
</div>
<div id="ref-Reason1990" class="csl-entry" role="listitem">
<div class="csl-left-margin">[2] J.
Reason, <span>“The contribution of latent human failures to the
breakdown of complex systems.”</span> <em>Philosophical transactions of
the Royal Society of London. Series B, Biological sciences</em>, vol.
327 1241, pp. 475–84, 1990, Available: <a
href="https://api.semanticscholar.org/CorpusID:1249973">https://api.semanticscholar.org/CorpusID:1249973</a></div>
</div>
<div id="ref-critch2023tasra" class="csl-entry" role="listitem">
<div class="csl-left-margin">[3] A.
Critch and S. Russell, <span>“TASRA: A taxonomy and analysis of
societal-scale risks from AI.”</span> 2023. Available: <a
href="https://arxiv.org/abs/2306.06924">https://arxiv.org/abs/2306.06924</a></div>
</div>
<div id="ref-Leveson2020Introduction" class="csl-entry" role="listitem">
<div class="csl-left-margin">[4] N.
G. Leveson, <span>“Introduction to STAMP: Part 1.”</span> MIT
Partnership for Systems Approaches to Safety and Security (PSASS),
2020.</div>
</div>
<div id="ref-albee2000Report" class="csl-entry" role="listitem">
<div class="csl-left-margin">[5] A.
Albee <em>et al.</em>, <span>“Report on the loss of the mars polar
lander and deep space 2 missions,”</span> Jet propulsion Laboratory -
California Institute of Technology, 2000.</div>
</div>
<div id="ref-rismani2023beyond" class="csl-entry" role="listitem">
<div class="csl-left-margin">[6] S. Rismani <em>et al.</em>, <span>“Beyond the ML Model: Applying Safety Engineering Frameworks to Text-to-Image Development”</span> Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2023.</div>
</div>
<div id="ref-meadows1999leverage" class="csl-entry" role="listitem">
<div class="csl-left-margin">[7] D.
H. Meadows, <span>“Leverage points: Places to intervene in a
system,”</span> 1999.</div>
</div>
<div id="ref-nasafta" class="csl-entry" role="listitem">
<div class="csl-left-margin">[8] D.
William Vesely, Joanne Dugan, Joseph Fragola, Joseph Minarick III, Jan Railsback and Michael Stamatelatos, <span>“Fault Tree Handbook with Aerospace Fault Tree Handbook with Aerospace Applications Applications,”</span> 2002.</div>
Available: <a
href="http://www.mwftr.com/CS2/Fault%20Tree%20Handbook_NASA.pdf">link</a></div>
</div>
<div id="ref-pan2022effects" class="csl-entry" role="listitem">
<div class="csl-left-margin">[9] Alexander Pan, Kush Bhatia, and Jacob Steinhardt, <span>“The Effects of Reward Misspecification: Mapping
and Mitigating Misaligned Models,”</span> 2022.</div> arXiv: 2201.03544
</div>