-
Notifications
You must be signed in to change notification settings - Fork 46
/
index.bs
7705 lines (6748 loc) · 387 KB
/
index.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<pre class='metadata'>
Title: Web Neural Network API
Shortname: webnn
Level: None
Status: w3c/ED
Group: webmlwg
TR: https://www.w3.org/TR/webnn/
URL: https://webmachinelearning.github.io/webnn/
Editor: Ningxin Hu 68202, Intel Corporation https://intel.com
Editor: Dwayne Robinson 140212, Microsoft Corporation https://microsoft.com
Former Editor: Chai Chaoweeraprasit 120203, Microsoft Corporation https://microsoft.com
Abstract: This document describes a dedicated low-level API for neural network inference hardware acceleration.
Repository: https://github.com/webmachinelearning/webnn
Test Suite: https://github.com/web-platform-tests/wpt/tree/master/webnn
Implementation Report: https://wpt.fyi/results/webnn?label=master&label=experimental&aligned&q=webnn
!Other: <a href="https://webmachinelearning.github.io/webnn-status/">Implementation Status</a>, <a href="https://github.com/webmachinelearning/webnn/blob/master/explainer.md">Explainer</a>, <a href="https://github.com/webmachinelearning/webnn-samples">Samples</a>
Markup Shorthands: markdown yes
Markup Shorthands: dfn yes
Markup Shorthands: idl yes
Markup Shorthands: css no
Logo: https://webmachinelearning.github.io/webmachinelearning-logo.png
Deadline: 2023-10-01
Assume Explicit For: yes
Status Text: <p>
Since the <a href="https://www.w3.org/TR/2023/CR-webnn-20230330/">initial Candidate Recommendation Snapshot</a> the Working Group has gathered further <a href="https://webmachinelearning.github.io/webnn-status/">implementation experience</a> and added new operations and data types needed for well-known <a href="https://github.com/webmachinelearning/webnn/issues/375">transformers to support generative AI use cases</a>. In addition, informed by this implementation experience, the group removed <code>MLCommandEncoder</code>, support for synchronous execution, and higher-level operations that can be expressed in terms of lower-level primitives in a performant manner. The group has also updated the specification to use modern authoring conventions to improve interoperability and precision of normative definitions.
The group is developing a new feature, a <a href="https://github.com/webmachinelearning/webnn/issues/482">backend-agnostic storage type</a>, to improve performance and interoperability between the WebNN, WebGPU APIs and purpose-built hardware for ML and expects to republish this document as a Candidate Recommendation Snapshot when ready for implementation.
This document is maintained and
updated at any time. Some parts of this document are work in progress and
further improvements are expected to be reflected in revised Candidate
Recommendation Drafts and Snaphots.
</p>
<p>Before requesting transition to <a href="https://www.w3.org/standards/types#PR">Proposed Recommendation</a>, the Working Group will seek to demonstrate that:</p>
<ul>
<li>the API is implementable on top of existing APIs of major platforms, such as Android, Windows and macOS/iOS;</li>
<li>it has at least two independent, interoperable implementations of every feature defined in the specification, where interoperability can be verified by passing open test suites, and two or more implementations interoperating with each other;</li>
<li>it has an open test suite of every feature defined in the specification.</li>
</ul>
Text Macro: EMULATED generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
</pre>
<pre class="anchors">
urlPrefix: https://tc39.es/ecma262/; spec: ECMA-262
type: dfn
text: element size; url: table-the-typedarray-constructors
text: element type; url: table-the-typedarray-constructors
text: view constructor; url: table-the-typedarray-constructors
text: equally close values; url: sec-ecmascript-language-types-number-type
</pre>
<pre class="link-defaults">
spec:html;
type:interface; text:Navigator
spec:webidl;
type:dfn; text:record
type:dfn; text:resolve
spec:ecmascript; for:ECMAScript;
type:dfn; text:realm
</pre>
<style>
/* Make <dl> blocks more distinct from their surroundings. */
main dl:not(.switch) {
border-left: thin solid #f3e48c;
padding-left: .5em;
}
/* <p> by default has these margins. Update ul/ol/dl to match,
* since they are also put in places where paragraphs go. */
p, ul, ol, dl {
margin: 1em 0;
}
/* Override rule making <code> smaller than surrounding text. */
dfn code {
font-size: 100%;
}
/* Style <details>, for clarity of presentation of these blocks. */
details {
padding: .5em;
border: thin solid #88e !important;
border-radius: .5em;
}
summary {
font-weight: bold;
margin: -0.5em -0.5em 0;
padding: 0.5em;
}
/* Algorithm declaration and steps. */
.algorithm > summary {
font-weight: normal;
}
.algorithm > ol {
position: relative;
}
.algorithm > ol::after {
content: "Algorithm";
font-weight: bold;
font-style: italic;
font-size: 130%;
color: rgba(0, 0, 0, 0.15);
color: var(--watermark-text);
position: absolute;
right: .3em;
bottom: -1em;
}
/* Internal slots */
div.internal-slots {
padding: .5em;
border: thin solid #88e !important;
border-radius: .5em;
}
.internal-slots {
position: relative;
}
.internal-slots::after {
font-weight: bold;
font-style: italic;
font-size: 130%;
color: rgba(0, 0, 0, 0.15);
color: var(--watermark-text);
position: absolute;
right: .3em;
bottom: .1em;
}
/*
* Ensure that argumentdef blocks don't overflow algorithm section borders. This is made far harder
* than it needs to be because the top-level W3C stylesheet has several @media + min-width variants
* that mark themselves as !important and then proceed to do the wrong thing.
*/
@media screen and (min-width: 78em) {
body:not(.toc-inline) .algorithm .overlarge {
margin-right: auto !important;
}
}
@media screen and (min-width: 90em) {
body:not(.toc-inline) .algorithm .overlarge {
margin-right: auto !important;
}
}
.algorithm .overlarge {
margin-right: auto !important;
}
/*
* The default algorithm style has a caption that doesn't suit this spec's
* formatting particularly well. Hide it.
*/
.algorithm .argumentdef {
margin-top: 0;
}
.algorithm .argumentdef>caption {
display: none;
}
/*
* Add vertical lines to demarcate multi-column cells.
*/
table.data td[colspan] {
border-left-style: dotted;
border-right-style: dotted;
}
table.data.no-colspan-center td[colspan],
table.data.no-colspan-center th[colspan] {
text-align: unset;
}
table.data tr.row-continuation td,
table.data tr.row-continuation th {
border-top: none;
}
/*
* Sticky table headers.
*/
.overlarge {
/* position: sticky doesn't work inside scrollable elements. */
overflow-x: unset;
}
thead.stickyheader th, th.stickyheader {
position: sticky;
top: 0;
background: #f8f8f8;
background: var(--stickyheader-background);
}
/*
* Generic table format.
*/
th {
text-align: left;
}
th, td {
border-bottom: 1px solid black;
border-collapse: collapse; /* BUG: This property only applies to TABLE. */
padding-left: 5px;
padding-right: 5px;
}
/*
* Grid table format.
*/
table.grid {
border-collapse: collapse;
table-layout: fixed;
width: 100%;
}
table.grid td, table.grid th {
border: 1px solid black;
}
table.grid th {
text-align: center;
}
/* For a table header cell that is split diagonally */
th.split {
background-image: url("data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg' version='1.1' preserveAspectRatio='none' viewBox='0 0 1 1'><line x1='0' y1='0' x2='1' y2='1' stroke='black' vector-effect='non-scaling-stroke'/></svg>");
background-repeat: no-repeat;
background-size: 100% 100%;
position: relative;
}
th.split .bottom-left {
display: block;
position: absolute;
bottom: 1px;
left: 5px;
}
th.split .top-right {
display: block;
position: absolute;
top: 1px;
right: 5px;
}
/*
* Darkmode colors
*/
:root {
--watermark-text: rgba(0, 0, 0, 15%);
--stickyheader-background: #f8f8f8;
--tint-red: rgba(255, 0, 0, 6%);
--tint-green: rgba(0, 255, 0, 10%);
--tint-blue: rgba(0, 0, 255, 5%);
--tint-purple: rgba(255, 0, 255, 5%);
}
@media (prefers-color-scheme:dark) {
:root {
--watermark-text: rgba(255, 255, 255, 25%);
--stickyheader-background: #181818;
--tint-red: rgba(255, 0, 0, 20%);
--tint-green: rgba(0, 255, 0, 18%);
--tint-blue: rgba(0, 130, 255, 24%);
--tint-purple: rgba(255, 0, 255, 22%);
}
}
/* Floating button for collapse/expand all details elements */
.collapse-expand-button {
position: fixed;
bottom: 40px;
right: 40px;
width: 40px;
height: 40px;
border: none;
border-radius: 50%;
background-color: green;
color: ghostwhite;
font-size: 32px;
text-align: center;
align-items:center;
justify-content:center;
cursor: pointer;
}
.collapse-expand-button:hover {
background-color: green;
}
.collapse-expand-button.expand {
background-color: red;
}
.collapse-expand-button.expand::before {
content: "+";
}
.collapse-expand-button.collapse {
background-color: green;
}
.collapse-expand-button.collapse::before {
content: "-";
}
.collapse-expand-button .tooltiptext {
visibility: hidden;
bottom: 20px;
right: 20px;
width: 120px;
background-color: ghostwhite;
color: black;
font-size: 18px;
text-align: center;
align-items:center;
justify-content:center;
padding: 5px 0;
border-radius: 5px;
/* position */
position: absolute;
z-index: 1;
bottom: 100%;
left: 50%;
margin-left: -60px;
/* Use half of the width (120/2 = 60), to center the tooltip */
}
.collapse-expand-button:hover .tooltiptext {
visibility: visible;
opacity: 0.75;
}
/* end of floating collapse/expand button */
</style>
<button class="collapse-expand-button" onclick="toggleCE()">
<span class="tooltiptext">Collapse all</span>
</button>
<script>
var ceButton = document.querySelector(".collapse-expand-button");
ceButton.classList.add("collapse"); // All details are expanded by default.
var scrollY = window.scrollY;
window.addEventListener('scroll', function() {
scrollY = window.scrollY;
ceButton.style.top = scrollY + window.innerHeight - 60 + 'px';
});
function toggleCE() {
var button = document.querySelector(".collapse-expand-button");
var tip = document.querySelector(".tooltiptext");
var allDetails = document.querySelectorAll(':not(.head) > details');
Array.from(allDetails).forEach(function(detail, index) {
if (button.classList.contains("expand")) {
detail.open = true;
} else {
detail.removeAttribute('open');
}
});
if (button.classList.contains("expand")) {
button.classList.remove("expand");
button.classList.add("collapse");
tip.innerHTML = "Collapse all";
} else {
button.classList.remove("collapse");
button.classList.add("expand");
tip.innerHTML = "Expand all";
}
}
// Prevent clicks on active parts of definition don't toggle details.
document.addEventListener('DOMContentLoaded', function() {
var targets = document.querySelectorAll('summary dfn,summary var');
Array.from(targets).forEach(function(target) {
target.addEventListener('click', function(e) {
e.preventDefault();
});
});
});
</script>
Introduction {#intro}
=====================
The Web Neural Network API defines a web-friendly hardware-agnostic abstraction layer that makes use of Machine Learning capabilities of operating systems and underlying hardware platforms without being tied to platform-specific capabilities. The abstraction layer addresses the requirements of key Machine Learning JavaScript frameworks and also allows web developers familiar with the ML domain to write custom code without the help of libraries.
For an illustrated introduction, please see the <a href="https://github.com/webmachinelearning/webnn/blob/master/explainer.md">explainer</a>.
Use cases {#usecases}
=====================
## Application Use Cases ## {#usecases-application}
This section illustrates application-level use cases for neural network
inference hardware acceleration. All applications in those use cases can be
built on top of pre-trained deep neural network (DNN) [[models]].
Note: Please be aware that some of the use cases described here, are by their very nature, privacy-invasive. Developers who are planning to use the API for such use cases should ensure that the API is being used to benefit users, for purposes that users understand, and approve. They should apply the Ethical Principles for Web Machine Learning [[webmachinelearning-ethics]] and implement appropriate privacy risk mitigations such as transparency, data minimisation, and users controls.
### Person Detection ### {#usecase-person-detection}
A user opens a web-based video conferencing application, but she temporarily
leaves from her room. The application is watching whether she is in front of her
PC by using object detection (for example, using object detection approaches
such as [[SSD]] or [[YOLO]] that use a single DNN) to detect regions in a camera
input frame that include persons.
When she comes back, the application automatically detects her and notifies
other online users that she is active now.
### Semantic Segmentation ### {#usecase-segmentation}
A user joins a teleconference via a web-based video conferencing application at
her desk since no meeting room in her office is available. During the
teleconference, she does not wish that her room and people in the background are
visible. To protect the privacy of the other people and the surroundings, the
application runs a machine learning model such as [[DeepLabv3+]], [[MaskR-CNN]]
or [[SegAny]] to semantically split an image into segments and replaces
segments that represent other people and background with another picture.
### Skeleton Detection ### {#usecase-skeleton-detection}
A web-based video conferencing application tracks a pose of user's skeleton by
running a machine learning model, which allows for real-time human pose
estimation, such as [[PoseNet]] to recognize her gesture and body language. When
she raises her hand, her microphone is automatically unmuted and she can start
speaking on the teleconference.
### Face Recognition ### {#usecase-face-recognition}
There are multiple people in the conference room and they join an online meeting
using a web-based video conferencing application. The application detects faces
of participants by using object detection (for example, using object detection
approaches such as [[SSD]]) and checks whether each face was present at the
previous meeting or not by running a machine learning model such as [[FaceNet]],
which verifies whether two faces would be identical or not.
### Facial Landmark Detection ### {#usecase-facial-landmarks}
A user wants to find new glasses that beautifully fits her on an online glasses
store. The online store offers web-based try-on simulator that runs a machine
learning model such as Face Alignment Network [[FAN]] to detect facial landmarks
like eyes, nose, mouth, etc. When she chooses a pair of glasses, the simulator
properly renders the selected glasses on the detected position of eyes on her
facial image.
### Style Transfer ### {#usecase-style-transfer}
A user is looking for cosmetics on an online store and wondering which color may
fit her face. The online store shows sample facial makeup images of cosmetics,
and offers makeup simulator that runs a machine learning model like
[[ContextualLoss]] or [[PairedCycleGAN]] to transfer the makeup style of the
sample makeup image to her facial image. She can check how the selected makeup
looks like on her face by the simulator.
### Super Resolution ### {#usecase-super-resolution}
A web-based video conferencing is receiving a video stream from its peer, but
the resolution of the video becomes lower due to network congestion. To prevent
degradation of the perceived video quality, the application runs a machine
learning model for super-resolution such as [[SRGAN]] to generate
higher-resolution video frames.
### Image Captioning ### {#usecase-image-captioning}
For better accessibility, a web-based presentation application provides
automatic image captioning by running a machine learning model such as
[[im2txt]] which predicts explanatory words of the presentation slides.
### Text-to-image ### {#usecase-text-to-image}
Images are a core part of modern web experiences. An ability to generate images
based on text input in a privacy-preserving manner enables visual
personalization and adaptation of web applications and content. For example, a web
application can use as an input a natural language description on the web page
or a description provided by the user within a text prompt to produce an
image matching the text description. This text-to-image use case enabled by
latent diffusion model architecture [[LDM]] forms the basis for additional
text-to-image use cases. For example, inpainting where a portion of an existing
image on the web page is selectively modified using the newly generated content,
or the converse, outpainting, where an original image is extended beyond its
original dimensions filling the empty space with generated content.
### Machine Translation ### {#usecase-translation}
Multiple people from various countries are talking via a web-based real-time
text chat application. The application translates their conversation by using a
machine learning model such as [[GNMT]] or [[OpenNMT]], which translates every
text into different language.
### Emotion Analysis ### {#usecase-emotion-analysis}
A user is talking to her friend via a web-based real-time text chat application,
and she is wondering how the friend feels because she cannot see the friend's
face. The application analyses the friend's emotion by using a machine learning
model such as [[DeepMoji]], which infers emotion from input texts, and displays
an emoji that represents the estimated emotion.
### Video Summarization ### {#usecase-video-summalization}
A web-based video conferencing application records received video streams, and
it needs to reduce recorded video data to be stored. The application generates
the short version of the recorded video by using a machine learning model for
video summarization such as [[Video-Summarization-with-LSTM]].
### Noise Suppression ### {#usecase-noise-suppression}
A web-based video conferencing application records received audio streams, but
usually the background noise is everywhere. The application leverages real-time
noise suppression using Recurrent Neural Network such as [[RNNoise]] for
suppressing background dynamic noise like baby cry or dog barking to improve
audio experiences in video conferences.
### Speech Recognition ### {#usecase-speech-recognition}
Speech recognition, also known as speech to text, enables recognition and
translation of spoken language into text. Example applications of speech
recognition include transcription, automatic translation, multimodal interaction,
real-time captioning and virtual assistants. Speech recognition improves
accessibility of auditory content and makes it possible to interact with such
content in a privacy-preserving manner in a textual form. Examples of common
use cases include watching videos or participating in online meetings using
real-time captioning. Models such as [[Whisper]] approach humans in their accuracy
and robustness and are well positioned to improve accessibility of such use cases.
### Text Generation ### {#usecase-text-generation}
Various text generation use cases are enabled by large language models (LLM) that
are able to perform tasks where a general ability to predict the next item
in a text sequence is required. This class of models can translate texts, answer
questions based on a text input, summarize a larger body of text, or generate
text output based on a textual input. LLMs enable better performance compared to
older models based on RNN, CNN, or LSTM architectures and further improve the
performance of many other use cases discussed in this section.
Examples of LLMs include [[t5-small]], [[m2m100_418M]], [[gpt2]], and [[llama-2-7b]].
### Detecting fake video ### {#usecase-detecting-fake-video}
A user is exposed to realistic fake videos generated by ‘deepfake’ on the web.
The fake video can swap the speaker’s face into the president’s face to incite
a user politically or to manipulate user’s opinion. The deepfake detection
applications such as [[FaceForensics++]] analyze the videos and protect a user against
the fake videos or images. When she watches a fake video on the web, the
detection application alerts her of the fraud video in real-time.
## Framework Use Cases ## {#usecases-framework}
This section collects framework-level use cases for a dedicated low-level API
for neural network inference hardware acceleration. It is expected that Machine
Learning frameworks will be key consumers of the Web Neural Network API (WebNN
API) and the low-level details exposed through the WebNN API are abstracted out
from typical web developers. However, it is also expected that web developers
with specific interest and competence in Machine Learning will want to interface
with the WebNN API directly instead of a higher-level ML framework.
### Custom Layer ### {#usecase-custom-layer}
A web application developer wants to run a DNN model on the WebNN API. However,
she has found that some of activation functions like [[LeakyReLU]], [[ELU]],
etc. are not included in the WebNN API. To address this issue, she constructs
custom layers of the additional activation functions on top of the WebNN API.
Note that the scope of custom layers may include convolution, normalization,
etc. as well as activation.
### Network Concatenation ### {#usecase-network-concat}
A web application uses a DNN model, and its model data of upper convolutional
layers and lower fully-connected layers are stored in separate files, since
model data of the fully-connected layers are periodically updated due to fine
tuning at the server side.
Therefore, the application downloads both partial model files at first and
concatenates them into a single model. When the model is updated, the
application downloads fine-tuned part of the model and replace only the
fully-connected layers with it.
### Performance Adaptation ### {#usecase-perf-adapt}
A web application developer has a concern about performance of her DNN model on
mobile devices. She has confirmed that it may run too slow on mobile devices
which do not have GPU acceleration. To address this issue, her web application
refers to the WebNN API to confirm whether acceleration is available or not, so
that the application can display the warning for devices without acceleration.
After several weeks, she has developed a tiny DNN model that can even run on
CPU. In order to accommodate CPU execution, she modifies the application
so that the application loads the tiny model in the case of CPU-only devices.
### Operation Level Execution ### {#usecase-op-level-exec}
A JavaScript ML framework is responsible for loading, interpreting and executing a ML model. During the model execution phase, the framework iterates through the operations of the model and executes each operation on the hardware device, like CPU, GPU or ML accelerator. To avoid the unnecessary data copying across devices, the framework selects the same device to execute the operations. For a compute intensive operation, such as convolution 2D or matrix multiplication, the framework uses WebNN API to execute it with the ML-specific acceleration available on that selected device.
### Integration with real-time video processing ### {#usecase-real-time-video-processing}
The user experience of WebRTC-based video conferencing is enhanced using real-time video processing. For example, background blur implemented using a [[#usecase-segmentation]] model blurs the background in the user's live camera feed. To satisfy the performance requirements of this use case, the WebNN API integrates with primitives from other Web APIs that make up the media pipeline to allow WebNN API-based transformation of real-time video streams.
Security Considerations {#security}
===================================
This specification defines a low-level API for neural network inference hardware acceleration. This API is considered a powerful feature [[POWERFUL-FEATURES]] because it grants low-level access to a user's computer. To meet the authentication and confidentiality expectations of a powerful feature and to prevent man-in-the-middle attacks, all interfaces defined by this specification are only available in a secure context.
This API is disabled by default in all cross-origin frames using the [[#permissions-policy-integration]]. This prevents third-party content from using this API unless the embedding page explicitly sets a policy that grants permission.
This API allows creation of an {{MLContext}} from a {{GPUDevice}} defined by WebGPU specification. See <a href="https://gpuweb.github.io/gpuweb/#security-considerations">WebGPU Security Considerations</a> for more information regarding security characteristics of this context.
This API provides an abstraction across GPU, CPU, and dedicated ML accelerator hardware. When using a GPU, <a href="https://www.w3.org/TR/webgpu/#security-dos">denial of service</a> considerations similar to WebGPU apply. When using a CPU or a dedicated ML accelerator, the types of potential resource contention are different and mitigations will be implementation and configuration dependent. Implementations should use whatever mechanisms are available from the platform to prevent sites from using an unfair amount of system resources. These compute units are shared resources, and the use of any compute API will affect overall performance on a fully-loaded system.
Once the graph is fully constructed and compiled, the input shapes into each of the operations in the graph are inferred and finalized. The bounds checking occurs when the compute method is invoked that executes the graph against the actual data. No actual data is bound to the compiled graph before this stage. It is the implementation's responsibility to make sure proper bounds checking occurs against the shapes of the data already inferred by that time.
Issue: Document operations susceptible to out-of-bounds access as a guidance to implementers.
Implementations must defend against control-flow attacks based on changes to data considered to be constant. For example, optimizations in the underlying platform may assume that a weight remains unchanged throughout a computation. If the API allowed the contents of buffers holding weights to change during a computation then those optimization assumptions would be invalidated, causing undefined behavior in the underlying platform. The API mitigates this category of attacks from script by always copying or transferring buffers, but implementations should consider additional defenses such as process isolation of data assumed to be constant.
As a future-proofing measure, the API design allows certain operations that can be generically emulated to be deprecated for security, performance, or other reasons without breaking compatibility. This is made possible by high-level functions that are defined in terms of smaller primitive operations defined in this specifications. This enables a native implementation of a high-level function to be replaced with a polyfill implementation.
Issue: Investigate side channel attack feasibility considering the current state where CPU is shared between processes running renderers.
In order to not allow an attacker to target a specific implementation that may contain a flaw, the [[#programming-model-device-selection]] mechanism is a hint only, and the concrete device selection is left to the implementation - a user agent could for instance choose never to run a model on a device with known vulnerabilities. As a further mitigation, no device enumeration mechanism is defined.
Issue: Hinting partially mitigates the concern. Investigate additional mitigations.
The API design minimizes the attack surface for the compiled computational graph. The {{MLGraphBuilder}} interface that hosts the various operations is a data definition API and as such doesn't execute anything, only constructs data. What follows, is that the potential for an attack is limited to when binding the data to the graph before executing it by invoking the {{MLContext}}.{{MLContext/compute()}} method. This enables implementers to focus on hardening the {{MLContext}}.{{MLContext/compute()}} method. For example, by making sure it honors the boundary of data and fails appropriately when the bounds are not respected.
Purpose-built Web APIs for measuring high-resolution time mitigate against timing attacks using techniques such as resolution reduction, adding jitter, detection of abuse and API call throttling [[hr-time-3]]. The practical deployment of WebNN implementations are likely to bring enough jitter to make timing attacks impractical (e.g. because they would use IPC) but implementers are advised to consider and test their implementations against timing attacks.
## Guidelines for new operations ## {#security-new-ops}
To ensure operations defined in this specification are shaped in a way they can be implemented securely, this section includes guidelines on how operations are expected to be defined to reduce potential for implementation problems. These guidelines are expected to evolve over time to align with industry best practices:
- Prefer simplicity of arguments
- Don't use parsers for complex data formats
- If an operation can be decomposed to low level primitives:
- Add an informative emulation path
- Prefer primitives over new high level operations but consider performance consequences
- Operations should follow a consistent style for inputs and attributes
- Operation families such as pooling and reduction should share API shape and options
- Formalize failure cases into test cases whenever possible
- When in doubt, leave it out: API surface should be as small as possible required to satisfy the use cases, but no smaller
- Try to keep the API free of implementation details that might inhibit future evolution, do not overspecify
- Fail fast: the sooner the web developer is informed of an issue, the better
In general, always consider the security and privacy implications as documented in [[security-privacy-questionnaire]] by the Technical Architecture Group and the Privacy Interest Group when adding new features.
Privacy Considerations {#privacy}
===================================
This API enhances privacy compared to cloud-based inference, since input data such as locally sourced images or video streams stay within the browser's sandbox.
This API exposes the minimum amount of information necessary to address the identified [[#usecases]] for the best performance and reliability of results.
No information from the underlying platform is exposed directly. An execution time analysis may reveal indirectly the performance of the underlying platform's neural network hardware acceleration capabilities relative to another underlying platform.
Note: The group is <a href="https://github.com/webmachinelearning/webnn/issues/85">soliciting further input</a> on the proposed execution time analysis fingerprinting vector and will augment this section with more information and mitigations to inform the implementers of this API.
Unlike WebGPU, this API does not intrinsically support custom shader authoring; and as a result is not prone to timing attacks that rely on shader caches, or other persistent data. The API builds upon pre-existing shaders and lower level primitives of the browser or the underlying OS. Web developers who interface with {{GPUDevice}} are expected to be aware of <a href="https://gpuweb.github.io/gpuweb/#privacy-user-agent-state">WebGPU compilation cache considerations</a>.
The WebGPU API identifies <a href="https://gpuweb.github.io/gpuweb/#privacy-machine-artifacts">machine-specific artifacts</a> as a privacy consideration. Similarly, the WebNN API's compute unit scheduling may under certain circumstances introduce a fingerprint. However, similarly to WebGPU, such fingerprints are identical across most or all of the devices of each vendor, mitigating the concern. Furthermore, software implementations can be used to further eliminate such artifacts.
The WebNN API defines two developer-settable preferences to help inform [[#programming-model-device-selection]] and allow the implementation to better select the most appropriate underlying execution device for the workload. An {{MLDeviceType}} normatively indicates the kind of device and is one of: {{MLDeviceType/"cpu"}}, {{MLDeviceType/"gpu"}}, {{MLDeviceType/"npu"}}. If this type cannot be satisfied, an "{{OperationError}}" {{DOMException}} is thrown, thus this type can in some cases add two bits of entropy to the fingerprint. An {{MLPowerPreference}} indicates preference as related to the power consumption and is considered a hint only and as such does not increase entropy of the fingerprint.
Issue(623): {{MLContextOptions}} is under active development, and the design is expected to change, informed by further implementation experience and new use cases from the wider web community.
If a future version of this specification introduces support for a new {{MLDeviceType}} that can only support a subset of {{MLOperandDataType}}s, that may introduce a new fingerprint.
In general, implementers of this API are expected to apply <a href="https://gpuweb.github.io/gpuweb/#privacy-considerations">WebGPU Privacy Considerations</a> to their implementations where applicable.
Ethical Considerations {#ethics}
===================================
The Working Group has started documenting ethical issues associated with using Machine Learning on the Web, to help identify what mitigations its normative specifications should take into account. The Working Group publishes and maintains an Ethical Principles for Web Machine Learning document [[webmachinelearning-ethics]] open to contributions from the wider community via a dedicated <a href="https://github.com/webmachinelearning/webmachinelearning-ethics">GitHub repository</a>.
# Programming Model # {#programming-model}
## Overview ## {#programming-model-overview}
At the heart of neural networks is a <dfn>computational graph</dfn> of mathematical operations.
These operations are the building blocks of modern machine learning technologies in
computer vision, natural language processing, and robotics.
The WebNN API is a specification for constructing, compiling, and executing computational
graphs of neural networks.
The {{MLGraph}} interface represents a compiled computational graph that is immutable (that is, a model).
The {{MLGraphBuilder}} interface serves as a builder (factory) to construct a [=computational graph=] (its <dfn for=MLGraphBuilder>graph</dfn>) that is then compiled to create an {{MLGraph}}.
In WebNN, a [=computational graph=] is composed of <dfn>operators</dfn> which act on data, and are the nodes of the graph. {{MLOperand}}s are a representation of data that flows within the computational graph, and are the edges of the graph. {{MLOperand}}s include a [=computational graph=]'s <dfn for="computational graph">input</dfn> values for inference, <dfn for="computational graph">constants</dfn> (including trained weights) used for inference, intermediate values (often referred to as activations) computed during inference, as well as the output values of inference. An [=operator=]'s <dfn for=operator>input</dfn> is one or more {{MLOperand}}s. An [=operator=]'s <dfn for=operator>output</dfn> is one or more {{MLOperand}}s. [=Operators=] have operator-specific parameters that control their behavior, which can include zero or more <dfn for=operator lt="activation|activation function">activation functions</dfn>.
A key part of the {{MLGraphBuilder}} interface are methods such as {{MLGraphBuilder/gemm()}} and {{MLGraphBuilder/relu()}} which create an [=operator=] which represents the actual operation to perform on the input data when the computation is run, and return a new {{MLOperand}} holding the operator. Methods that create an {{MLOperand}} connect any [=operator/inputs=] and [=operator/activations=] to the operator. Each method invocation returns a distinct new value, without changing the value of any other {{MLOperand}}.
An [=operator=] has a <dfn for=operator>label</dfn>, a string which may be included in diagnostics such as [=exception=] messages. When an [=operator=] is created its [=operator/label=] is initialized in an [=implementation-defined=] manner and may include the passed {{MLOperatorOptions/label}}.
Note: Implementations are encouraged to use the {{MLOperatorOptions/label}} provided by developers to enhance error messages and improve debuggability, including both synchronous errors during graph construction and for errors that occur during asynchronous {{MLGraphBuilder/build()}} or {{MLContext/compute()}} operations.
At inference time, every {{MLOperand}} will be bound to a tensor (the actual data), which are essentially multidimensional arrays. The representation of the tensors is implementation dependent, but it typically includes the array data stored in some buffer (memory) and some metadata describing the array data (such as its shape).
Operations within the computational graph have functional semantics. This allows the implementation
to potentially share the array data between multiple tensors. For example, the implementation
of operations such as reshape, or slice may return a view of its input tensor
that shares the same buffer as the input tensor. (In the case of reshape,
the entire data is shared, while in the case of slice, a part of the input data is shared.)
The implementation may use views, as above, for intermediate values.
Before the execution, the computation graph that is used to compute one or more specified outputs needs to be converted, compiled, and optimized. The key purpose of the compilation step is to enable optimizations that span two or more operations, such as operation or loop fusion. The user agent may also perform these optimizations during graph conversion.
The {{MLGraphBuilder}}.{{MLGraphBuilder/build()}} method compiles the graph in the background without blocking the calling thread, and returns a {{Promise}} that resolves to an {{MLGraph}}. Each {{MLGraphBuilder}} can build at most one {{MLGraph}}.
The {{MLGraph}} underlying implementation will be composed of platform-specific representations of operators and operands which correspond to the {{MLGraphBuilder}}'s [=operators=] and {{MLOperand}}s, but which are not script-visible and may be compositions or decompositions of the graph as constructed by script.
Once the {{MLGraph}} is constructed, the {{MLContext}}.{{MLContext/compute()}} method performs the execution of the graph asynchronously either on a parallel timeline in a separate worker thread for the CPU execution or on a GPU timeline in a GPU command queue. This method returns immediately without blocking the calling thread while the actual execution is offloaded to a different timeline. The caller supplies the input values using {{MLNamedArrayBufferViews}}, binding the input {{MLOperand}}s to their values. The caller then supplies pre-allocated buffers for output {{MLOperand}}s using {{MLNamedArrayBufferViews}}. The execution produces the results of the computation from all the inputs bound to the graph. The computation results will be placed at the bound outputs at the time the operation is successfully completed on the offloaded timeline at which time the calling thread is signaled. This type of execution supports both the CPU and GPU device.
## Device Selection ## {#programming-model-device-selection}
An {{MLContext}} interface represents a global state of neural network execution. One of the important context states is the underlying execution device that manages the resources and facilitates the compilation and the eventual execution of the neural network graph. In addition to the default method of creation with {{MLContextOptions}}, an {{MLContext}} could also be created from a specific {{GPUDevice}} that is already in use by the application.
In a situation when a GPU context executes a graph with a constant or an input in the system memory as an {{ArrayBufferView}}, the input content is automatically uploaded from the system memory to the GPU memory, and downloaded back to the system memory of an {{ArrayBufferView}} output buffer at the end of the graph execution. This data upload and download cycles will only occur whenever the execution device requires the data to be copied out of and back into the system memory, such as in the case of the GPU. It doesn't occur when the device is a CPU device. Additionally, the result of the graph execution is in a known layout format. While the execution may be optimized for a native memory access pattern in an intermediate result within the graph, the output of the last operation of the graph must convert the content back to a known layout format at the end of the graph in order to maintain the expected behavior from the caller's perspective.
When an {{MLContext}} is created with {{MLContextOptions}}, the user agent selects and creates the underlying execution device by taking into account the application's {{MLPowerPreference}} and {{MLDeviceType}} options.
## Task Source ## {#programming-model-task-source}
The <dfn>ML task source</dfn> is a [=task source=] to be used for all [=tasks=] related to asynchronous compilation and execution of {{MLGraph}}s and creation of {{MLContext}}s.
<div algorithm>
<p>To <dfn>queue an ML task</dfn> given a [=global object=] |global| and a series of steps |steps|, [=queue a global task=] on the [=ML task source=] with |global| and |steps|.
</div>
## Permissions Policy Integration ## {#permissions-policy-integration}
This specification defines a [=policy-controlled feature=] identified by the
string "<code><dfn data-lt="webnn-feature">webnn</dfn></code>".
Its [=policy-controlled feature/default allowlist=] is <code>'self'</code>.
API {#api}
=====================
## The navigator.ml interface ## {#api-navigator-ml}
An {{ML}} object is available in the {{Window}} and {{DedicatedWorkerGlobalScope}} contexts through the {{Navigator}}
and {{WorkerNavigator}} interfaces respectively and is exposed via `navigator.ml`.
<script type=idl>
interface mixin NavigatorML {
[SecureContext, SameObject] readonly attribute ML ml;
};
Navigator includes NavigatorML;
WorkerNavigator includes NavigatorML;
</script>
## {{ML}} interface ## {#api-ml}
<script type=idl>
enum MLDeviceType {
"cpu",
"gpu",
"npu"
};
enum MLPowerPreference {
"default",
"high-performance",
"low-power"
};
dictionary MLContextOptions {
MLDeviceType deviceType = "cpu";
MLPowerPreference powerPreference = "default";
};
[SecureContext, Exposed=(Window, DedicatedWorker)]
interface ML {
Promise<MLContext> createContext(optional MLContextOptions options = {});
Promise<MLContext> createContext(GPUDevice gpuDevice);
};
</script>
### {{MLContextOptions}} ### {#api-mlcontextoptions}
Issue(623): {{MLContextOptions}} is under active development, and the design is expected to change, informed by further implementation experience and new use cases from the wider web community. The Working Group is considering additional API controls to allow the definition of a fallback device, multiple devices in a preferred order, or an exclusion of a specific device. Other considerations under discussion include error handling, ultimate fallback, and quantized operators. Feedback is welcome on any of these design considerations from web developers, library authors, OS and hardware vendors, and other stakeholders via GitHub:
The <dfn dfn-for=MLContextOptions dfn-type=dict-member>deviceType</dfn> option is an <dfn dfn-type=enum>MLDeviceType</dfn> and indicates the application's preference for the kind of device used for the context. It is one of the following:
<dl dfn-for="MLDeviceType">
<dt>"<dfn enum-value>cpu</dfn>"</dt>
<dd>Provides the broadest compatibility and usability across all client devices with varying degrees of performance.</dd>
<dt>"<dfn enum-value>gpu</dfn>"</dt>
<dd>Provides the broadest range of achievable performance across graphics hardware platforms from consumer devices to professional workstations. The underlying platform implementation may fall back to other devices for certain operators and parts of the graph.</dd>
<dt>"<dfn enum-value>npu</dfn>"</dt>
<dd>Provides power efficiency for sustained workloads across hardware platforms with purpose-built accelerators. The underlying platform implementation may fall back to other devices for certain operators and parts of the graph.</dd>
</dl>
The <dfn dfn-for=MLContextOptions dfn-type=dict-member>powerPreference</dfn> option is an <dfn dfn-type=enum>MLPowerPreference</dfn> and indicates the application's preference as related to power consumption. It is one of the following:
<dl dfn-for="MLPowerPreference">
<dt>"<dfn enum-value>default</dfn>"</dt>
<dd>Let the user agent select the most suitable behavior.</dd>
<dt>"<dfn enum-value>high-performance</dfn>"</dt>
<dd>Prioritizes execution speed over power consumption.</dd>
<dt>"<dfn enum-value>low-power</dfn>"</dt>
<dd>Prioritizes power consumption over other considerations such as execution speed.</dd>
</dl>
### {{ML/createContext()}} ### {#api-ml-createcontext}
<div dfn-for="ML/createContext(options), ML/createContext(gpuDevice)" dfn-type=argument>
**Arguments:**
- <dfn>options</dfn>: an {{MLContextOptions}}. Provides the application's preferences for the context.
- <dfn>gpuDevice</dfn>: a {{GPUDevice}}. A specific device to use with the context.
**Returns:** an {{MLContext}}.
</div>
<details open algorithm>
<summary>
To <dfn>create a context</dfn> given [=realm=] |realm| and |options| (a {{GPUDevice}} or {{MLContextOptions}}), run these steps:
</summary>
1. Let |context| be a new {{MLContext}} object with |realm|.
1. If |options| is a {{GPUDevice}} object:
1. Set |context|.{{MLContext/[[contextType]]}} to "[=context type/webgpu=]".
1. Set |context|.{{MLContext/[[deviceType]]}} to {{MLDeviceType/"gpu"}}.
1. Set |context|.{{MLContext/[[powerPreference]]}} to {{MLPowerPreference/"default"}}.
1. Otherwise:
1. Set |context|.{{MLContext/[[contextType]]}} to "[=context type/default=]".
1. If |options|["{{MLContextOptions/deviceType}}"] [=map/exists=], then set |context|.{{MLContext/[[deviceType]]}} to |options|["{{MLContextOptions/deviceType}}"]. Otherwise, set |context|.{{MLContext/[[deviceType]]}} to {{MLDeviceType/"cpu"}}.
1. If |options|["{{MLContextOptions/powerPreference}}"] [=map/exists=], then set |context|.{{MLContext/[[powerPreference]]}} to |options|["{{MLContextOptions/powerPreference}}"]. Otherwise, set |context|.{{MLContext/[[powerPreference]]}} to {{MLPowerPreference/"default"}}.
1. If the user agent cannot support |context|.{{MLContext/[[contextType]]}}, |context|.{{MLContext/[[deviceType]]}} and |context|.{{MLContext/[[powerPreference]]}}, return failure.
1. Return |context|.
</details>
<details open algorithm>
<summary>
The <dfn method for=ML>createContext(|options|)</dfn> steps are:
</summary>
1. Let |global| be [=this=]'s [=relevant global object=].
1. If |global|'s [=associated Document=] is not [=allowed to use=] the [=webnn-feature|webnn=] feature, return [=a new promise=] [=rejected=] with a "{{SecurityError}}" {{DOMException}}.
1. Let |realm| be [=this=]'s [=relevant realm=].
1. Let |promise| be [=a new promise=].
1. Run the following steps [=in parallel=].
1. Let |context| be the result of [=creating a context=] given |realm| and |options|. If that returns failure, then [=queue an ML task=] with |global| to [=reject=] |promise| with a "{{NotSupportedError}}" {{DOMException}} and abort these steps.
1. [=Queue an ML task=] with |global| to [=resolve=] |promise| with |context|.
1. Return |promise|.
</details>
<details open algorithm>
<summary>
The <dfn method for=ML>createContext(|gpuDevice|)</dfn> method steps are:
</summary>
1. Let |global| be [=this=]'s [=relevant global object=].
1. If |global|'s [=associated Document=] is not [=allowed to use=] the [=webnn-feature|webnn=] feature, return [=a new promise=] [=rejected=] with a "{{SecurityError}}" {{DOMException}}.
1. Let |realm| be [=this=]'s [=relevant realm=].
1. Let |promise| be [=a new promise=].
1. Run the following steps [=in parallel=].
1. Let |context| be the result of [=creating a context=] given |realm| and |gpuDevice|. If that returns failure, then [=queue an ML task=] with |global| to [=reject=] |promise| with a "{{NotSupportedError}}" {{DOMException}} and abort these steps.
1. [=Queue an ML task=] with |global| to [=resolve=] |promise| with |context|.
1. Return |promise|.
</details>
## {{MLContext}} interface ## {#api-mlcontext}
The {{MLContext}} interface represents a global state of neural network compute workload and execution processes. Each {{MLContext}} object has associated [=context type=], {{MLDeviceType}} and {{MLPowerPreference}}.
<script type=idl>
typedef record<USVString, ArrayBufferView> MLNamedArrayBufferViews;
dictionary MLComputeResult {
MLNamedArrayBufferViews inputs;
MLNamedArrayBufferViews outputs;
};
[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLContext {
Promise<MLComputeResult> compute(
MLGraph graph, MLNamedArrayBufferViews inputs, MLNamedArrayBufferViews outputs);
MLOpSupportLimits opSupportLimits();
};
</script>
<div class=internal-slots>
{{MLContext}} has the following internal slots:
<dl dfn-type=attribute dfn-for="MLContext">
: <dfn>\[[contextType]]</dfn> of type [=context type=].
::
The {{MLContext}}'s [=context type=].
: <dfn>\[[deviceType]]</dfn> of type {{MLDeviceType}}.
::
The {{MLContext}}'s {{MLDeviceType}}.
: <dfn>\[[powerPreference]]</dfn> of type {{MLPowerPreference}}.
::
The {{MLContext}}'s {{MLPowerPreference}}.
</dl>
</div>
The <dfn>context type</dfn> is the type of the execution context that manages the resources and facilitates the compilation and execution of the neural network graph:
<dl dfn-for="context type">
<dt>"<dfn>default</dfn>"</dt>
<dd>Context created per user preference options.</dd>
<dt>"<dfn>webgpu</dfn>"</dt>
<dd>Context created from WebGPU device.</dd>
</dl>
<div class="note">
When the {{MLContext/[[contextType]]}} is set to [=context type/default=] with the {{MLContextOptions}}.{{MLContextOptions/deviceType}} set to {{MLDeviceType/"gpu"}}, the user agent is responsible for creating an internal GPU device that operates within the context and is capable of ML workload submission on behalf of the calling application. In this setting however, only {{ArrayBufferView}} inputs and outputs are allowed in and out of the graph execution since the application has no way to know what type of internal GPU device is being created on their behalf. In this case, the user agent is responsible for automatic uploads and downloads of the inputs and outputs to and from the GPU memory using this said internal device.
</div>
<dl dfn-type=dict-member dfn-for=MLComputeResult>
: <dfn>inputs</dfn>
:: An object where the keys are the graph input names, and the values are the transferred {{ArrayBufferView}}s for the supplied input tensor values.
: <dfn>outputs</dfn>
:: An object where the keys are the graph output names, and the values are the transferred {{ArrayBufferView}}s for the computed output tensor values.
</dl>
<details open algorithm>
<summary>
To <dfn>validate buffer with descriptor</dfn> given {{ArrayBufferView}} |bufferView| and {{MLOperandDescriptor}} |descriptor|, run the following steps:
</summary>
1. If |bufferView|'s [=element type=] does not match to |descriptor|.{{MLOperandDescriptor/dataType}} according to [this table](#appendices-mloperanddatatype-arraybufferview-compatibility), return false.
1. If |bufferView|.\[[ByteLength]] is not equal to |descriptor|'s [=MLOperandDescriptor/byte length=], return false.
</details>
<details open algorithm>
<summary>
To <dfn>execute graph</dfn>, given {{MLGraph}} |graph|, {{MLNamedArrayBufferViews}} |inputs| and {{MLNamedArrayBufferViews}} |outputs|, run the following steps. They return {{undefined}}, or an error.
</summary>
1. Let |inputResources| be the input resources of |graph|.{{MLGraph/[[implementation]]}}.
1. [=map/For each=] |name| → |inputValue| of |inputs|:
1. Let |inputDescriptor| be |graph|.{{MLGraph/[[inputDescriptors]]}}[|name|].
1. Let |inputTensor| be a new tensor for |graph|.{{MLGraph/[[implementation]]}} as follows:
1. Set the data type of |inputTensor| to the one that matches |inputValue|'s [=element type=].
1. Set the shape of |inputTensor| to |inputDescriptor|.{{MLOperandDescriptor/shape}}.
1. Set the values of elements in |inputTensor| to the values of elements in |inputValue|.
1. Request the underlying implementation of |graph| to bind |inputResources|[|name|] to |inputTensor|.
1. [=map/For each=] |name| → |outputValue| of |outputs|:
1. Issue a compute request to |graph|.{{MLGraph/[[implementation]]}} given |name| and |inputResources| and wait for completion.
1. If that returns an error, then return an "{{OperationError}}" {{DOMException}}.
1. Otherwise, let |outputTensor| be the result.
1. Let |outputDesc| be |graph|.{{MLGraph/[[outputDescriptors]]}}[|name|].
1. If the byte length of |outputTensor| is not equal to |outputDesc|'s [=MLOperandDescriptor/byte length=], then return a {{TypeError}}.
1. If |outputTensor|'s [=element type=] doesn't match |outputValue|'s [=element type=], then return a {{TypeError}}.
1. Request the underlying implementation of |graph| to set the values of elements in |outputValue| to the values of elements in |outputTensor|.
1. Return {{undefined}}.
</details>
### {{MLNamedArrayBufferViews}} transfer algorithm ### {#mlnamedarraybufferviews-transfer-alg}
<details open algorithm>
<summary>
To <dfn for="MLNamedArrayBufferViews">transfer</dfn> an {{MLNamedArrayBufferViews}} |views| with [=realm=] |realm|:
</summary>
1. [=map/For each=] |name| → |view| of |views|:
1. If |view| is not [=BufferSource/transferable=], then throw a {{TypeError}}.
1. Let |transferredViews| be a new {{MLNamedArrayBufferViews}}.
1. [=map/For each=] |name| → |view| of |views|:
1. Let |transferredBuffer| be the result of [=ArrayBuffer/transfer|transferring=] |view|'s [=BufferSource/underlying buffer=].
1. [=Assert=]: The above step never throws an exception.
1. Let |constructor| be the appropriate [=view constructor=] for the type of {{ArrayBufferView}} |view| from |realm|.
1. Let |elementsNumber| be the result of |view|'s [=BufferSource/byte length=] / |view|'s [=element size=].
1. Let |transferredView| be [$Construct$](|constructor|, |transferredBuffer|, |view|.\[[ByteOffset]], |elementsNumber|).
1. Set |transferredViews|[|name|] to |transferredView|.
1. Return |transferredViews|.
</details>
### {{MLContext/compute()}} ### {#api-mlcontext-compute}
Asynchronously carries out the computational workload of a compiled graph {{MLGraph}} on a separate timeline, either on a worker thread for the CPU execution, or on a GPU/NPU timeline for submitting a workload onto the command queue. The asynchronous nature of this call avoids blocking the calling thread while the computation for result is ongoing. This method of execution requires an {{MLContext}} created with {{MLContextOptions}}. Otherwise, it [=exception/throws=] an "{{OperationError}}" {{DOMException}}.
<div class="note">
In accordance with the [=ArrayBufferView/write|Web IDL warning=], to prevent the calling thread from modifying the input and output resources while the computation is ongoing, this method [=MLNamedArrayBufferViews/transfer|transfers=] the input and output {{MLNamedArrayBufferViews}} to new views that share the same backing memory allocations. The transferred views are returned to the caller via the promise fulfillment with the computation result written into the backing memory of the output views.
</div>
<div dfn-for="MLContext/compute(graph, inputs, outputs)" dfn-type=argument>
**Arguments:**
- <dfn>graph</dfn>: an {{MLGraph}}. The compiled graph to be executed.
- <dfn>inputs</dfn>: an {{MLNamedArrayBufferViews}}. The resources of inputs. Will be [=MLNamedArrayBufferViews/transfer|transferred=] if there are no validation errors.
- <dfn>outputs</dfn>: an {{MLNamedArrayBufferViews}}. The pre-allocated resources of required outputs. Will be [=MLNamedArrayBufferViews/transfer|transferred=] if there are no validation errors.
**Returns:** {{Promise}}<{{MLComputeResult}}>.
</div>
Note: Invocations of {{MLContext/compute()}} will fail if any of the {{MLContext/compute(graph, inputs, outputs)/graph}}'s inputs are not provided as {{MLContext/compute(graph, inputs, outputs)/inputs}}, or if any requested {{MLContext/compute(graph, inputs, outputs)/outputs}} do not match the {{MLContext/compute(graph, inputs, outputs)/graph}}'s outputs.
<details open algorithm>
<summary>
The <dfn method for=MLContext>compute(|graph|, |inputs|, |outputs|)</dfn> method steps are:
</summary>
1. Let |global| be [=this=]'s [=relevant global object=].
1. Let |realm| be [=this=]'s [=relevant realm=].
1. If |graph|.{{MLGraph/[[context]]}} is not [=this=], then return [=a new promise=] [=rejected=] with a {{TypeError}}.
1. If |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[contextType]]}} is not "[=context type/default=]", then return [=a new promise=] [=rejected=] with an "{{OperationError}}" {{DOMException}}.
1. [=map/For each=] |name| → |descriptor| of |graph|.{{MLGraph/[[inputDescriptors]]}}:
1. If |inputs|[|name|] does not [=map/exist=], then return [=a new promise=] [=rejected=] with a {{TypeError}}.
1. If [=validating buffer with descriptor=] given |inputs|[|name|] and |descriptor| returns false, then return [=a new promise=] [=rejected=] with a {{TypeError}}.
1. [=map/For each=] |name| → |resource| of |outputs|:
1. If |graph|.{{MLGraph/[[outputDescriptors]]}}[|name|] does not [=map/exist=], then return [=a new promise=] [=rejected=] with a {{TypeError}}.
1. If [=validating buffer with descriptor=] given |resource| and |graph|.{{MLGraph/[[outputDescriptors]]}}[|name|] returns false, then return [=a new promise=] [=rejected=] with a {{TypeError}}.
1. Let |transferredInputs| be the result of [=MLNamedArrayBufferViews/transfer|transferring=] {{MLNamedArrayBufferViews}} |inputs| with |realm|. If that threw an exception, then return [=a new promise=] [=rejected=] with that exception.
1. Let |transferredOutputs| be the result of [=MLNamedArrayBufferViews/transfer|transferring=] {{MLNamedArrayBufferViews}} |outputs| with |realm|. If that threw an exception, then return [=a new promise=] [=rejected=] with that exception.
1. Let |promise| be [=a new promise=].
1. Run the following steps [=in parallel=]:
1. Invoke [=execute graph=] given |graph|, |transferredInputs| and |transferredOutputs|. If that returns an error, then [=queue an ML task=] with |global| to [=reject=] |promise| with an equivalent error in |realm| and abort these steps.