From 8e0b2bcf8fcd16248aca92f25633721820fbb208 Mon Sep 17 00:00:00 2001 From: Erich Gubler Date: Tue, 27 Feb 2024 21:37:04 -0500 Subject: [PATCH 001/285] typos: s/paramters/parameters (#4502) --- wgsl/index.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index a39d3f779d..f2635a70c8 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -1481,7 +1481,7 @@ Let |TemplateList| be a record type containing: -
+
Note:The algorithm can be modified to find the source ranges for [=template parameters=], as follows: * Modify |UnclosedCandidate| to add the following fields: @@ -6328,7 +6328,7 @@ The indirection operator converts a pointer to its correspon |eN|: TN |tg| [=syntax_sym/_template_args_start=]
|e1|,
...,
|eN|
[=syntax_sym/_template_args_end=]
: [=AllTypes=] Each [=type-generator=] has its own requirements on the template parameters it requires and accepts, - and defines how the template paramters help determine the resulting type. + and defines how the template parameters help determine the resulting type. The expressions |e1| through |eN| are the [=template parameters=] for the type-generator. From 5b46514900a485e0598cf07a1b3724ae6742e6b1 Mon Sep 17 00:00:00 2001 From: Jiawei Shao Date: Thu, 7 Mar 2024 08:33:51 +0800 Subject: [PATCH 002/285] Fix the computation of inter-stage shader components (#4503) * Fix the computation of inter-stage shader components In the validation of inter-stage interfaces, each user-defined inter-stage shader variable should always consume 4 inter-stage shader components because in latest Vulkan SPEC the Location value specifies an interface slot comprised of a 32-bit four-component vector conveyed between stages. Fixes: #1962 --- spec/index.bs | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/spec/index.bs b/spec/index.bs index 724c6e9d2b..8a42d98cb1 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -8055,7 +8055,8 @@ dictionary GPURenderPipelineDescriptor - There must be no more than |maxVertexShaderOutputComponents| scalar components across all user-defined outputs for |descriptor|.{{GPURenderPipelineDescriptor/vertex}}. - (For example, a `f32` output uses 1 component, and a `vec3` output uses 3 components.) + Each user-defined output of |descriptor|.{{GPURenderPipelineDescriptor/vertex}} + consumes 4 scalar components. - The [=location=] of each user-defined output of |descriptor|.{{GPURenderPipelineDescriptor/vertex}} must be < |device|.limits.{{supported limits/maxInterStageShaderVariables}}. @@ -8075,6 +8076,8 @@ dictionary GPURenderPipelineDescriptor - There must be no more than |maxFragmentShaderInputComponents| scalar components across all user-defined inputs for |descriptor|.{{GPURenderPipelineDescriptor/fragment}}. + Each user-defined input of |descriptor|.{{GPURenderPipelineDescriptor/fragment}} + consumes 4 scalar components. - For each user-defined input of |descriptor|.{{GPURenderPipelineDescriptor/fragment}} there must be a user-defined output of |descriptor|.{{GPURenderPipelineDescriptor/vertex}} that [=location=], type, and [=interpolation=] of the input. From 4e138b1f121c096dbe615bdfdb40aea7e3645121 Mon Sep 17 00:00:00 2001 From: Christopher Cameron <32557109+ccameron-chromium@users.noreply.github.com> Date: Wed, 13 Mar 2024 20:55:52 +0100 Subject: [PATCH 003/285] Clarify handling of out of range color values (#4499) * Clarify handling of out of range color values. Co-authored-by: Kai Ninomiya --- spec/index.bs | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 8a42d98cb1..246334bd94 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -13683,12 +13683,22 @@ dictionary GPUCanvasConfiguration { ### Canvas Color Space ### {#canvas-color-space} -During presentation, the chrominance of color values outside of the [0, 1] range is not to be -clamped to that range; extended values may be used to display colors outside of the gamut defined -by the canvas' color space's primaries, when permitted by the configured -{{GPUCanvasConfiguration/format}} and the user's display capabilities. -This is in contrast with luminance, which is to be clamped to the maximum standard dynamic range -luminance. +During presentation, the color values in the canvas are converted to the color +space of the screen. Color values are then clamped to the `[0, 1]` interval in +the color space of the screen. + +
+ For example, suppose that the value `(1.035, -0.175, -0.140)` is written to an + `'srgb'` canvas. + + If this is presented to an sRGB screen, then this will be converted to sRGB + (which is a no-op, because the canvas is sRGB), and then will be clamped to + the sRGB value `(1.0, 0.0, 0.0)`. + + If this is presented to a Display P3 screen, then this will be converted to + the value `(0.948, 0.106, 0.01)` in the Display P3 color space, and no + clamping will be needed. +
From 0e2f483d7baf1843c62d60339a45c239f8e408f9 Mon Sep 17 00:00:00 2001 From: Greggman Date: Fri, 15 Mar 2024 04:19:56 +0900 Subject: [PATCH 004/285] Compat: Disallow texture format reinterpretation (#4511) --- proposals/compatibility-mode.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/proposals/compatibility-mode.md b/proposals/compatibility-mode.md index 4feabd017f..97f90810f8 100644 --- a/proposals/compatibility-mode.md +++ b/proposals/compatibility-mode.md @@ -159,6 +159,12 @@ If used via an entry point in a shader module passed to `createRenderPipeline`, **Justification**: OpenGL ES 3.1 does not support copying of multisample textures. +### 13. Disallow texture format reinterpretation + +When calling `createTexture`, the `viewFormats`, if specified, must be the same format as the texture. + +**Justification**: OpenGL ES 3.1 does not support texture format reinterpretation. + ## Issues Q: OpenGL ES does not have "coarse" and "fine" variants of the derivative instructions (`dFdx()`, `dFdy()`, `fwidth()`). Should WGSL's "fine" derivatives (`dpdxFine()`, `dpdyFine()`, and `fwidthFine()`) be required to deliver high precision results? See [Issue 4325](https://github.com/gpuweb/gpuweb/issues/4325). From 78d073db4bf73231b689f2a3f47749122b146261 Mon Sep 17 00:00:00 2001 From: alan-baker Date: Fri, 15 Mar 2024 17:07:24 -0400 Subject: [PATCH 005/285] Add const_assert to statement behaviors (#4516) --- wgsl/index.bs | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/wgsl/index.bs b/wgsl/index.bs index f2635a70c8..880972aa6a 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -7571,6 +7571,10 @@ non-empty [=behavior=] for each statement, and function. continue; {Continue} + + const_assert |e|; + + {Next} if |e| |s1| else |s2| From 435be3bd6402be1d3e89bd67ba53f8c1b90ea80e Mon Sep 17 00:00:00 2001 From: Greggman Date: Sat, 16 Mar 2024 07:14:57 +0900 Subject: [PATCH 006/285] Compat: Validate depthOrArrayLayers matches textureBindingViewDimension (#4520) --- proposals/compatibility-mode.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/proposals/compatibility-mode.md b/proposals/compatibility-mode.md index 97f90810f8..b5db19f53a 100644 --- a/proposals/compatibility-mode.md +++ b/proposals/compatibility-mode.md @@ -165,6 +165,17 @@ When calling `createTexture`, the `viewFormats`, if specified, must be the same **Justification**: OpenGL ES 3.1 does not support texture format reinterpretation. +### 14. Require `depthOrArrayLayers` to be compatible with `textureBindingViewDimension` in `createTexture`. + +When creating a texture you can pass in a `textureBindingViewDimension`. + +* If `textureBindingViewDimension` is `"2d"` and `depthOrArrayLayers` is not 1, a validation error is generated. + +* If `textureBindingViewDimension` is `"cube"` and `depthOrArrayLayers` is not 6, a validation error is generated. + +**Justification**: OpenGL ES 3.1 cannot create 2d textures with more than 1 layer nor can it +create cube maps that are not exactly 6 layers. + ## Issues Q: OpenGL ES does not have "coarse" and "fine" variants of the derivative instructions (`dFdx()`, `dFdy()`, `fwidth()`). Should WGSL's "fine" derivatives (`dpdxFine()`, `dpdyFine()`, and `fwidthFine()`) be required to deliver high precision results? See [Issue 4325](https://github.com/gpuweb/gpuweb/issues/4325). From 53e8c777655b70f28dcab9069c4ce78336139510 Mon Sep 17 00:00:00 2001 From: Gabriel Vogel Date: Sun, 17 Mar 2024 01:20:53 +0100 Subject: [PATCH 007/285] Use y component in vec4 constructor (#4522) --- wgsl/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 880972aa6a..43c83c8fd7 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -13062,7 +13062,7 @@ specify the component type; the component type is inferred from the constructor Parameterization `T` is [=scalar=] Description - [=Component-wise=] construction of a four-component [=vector=] with `e1`, `e2`, `v1.x`, and `v1.x` as components. + [=Component-wise=] construction of a four-component [=vector=] with `e1`, `e2`, `v1.x`, and `v1.y` as components.
Overload From 9b9c407f433e023906213a3820b342b830a5c05f Mon Sep 17 00:00:00 2001 From: alan-baker Date: Fri, 22 Mar 2024 13:21:39 -0400 Subject: [PATCH 008/285] [editorial] Clarify that override declarations must be at module scope (#4529) --- wgsl/index.bs | 2 ++ 1 file changed, 2 insertions(+) diff --git a/wgsl/index.bs b/wgsl/index.bs index 43c83c8fd7..d9875fbdc6 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -4696,6 +4696,8 @@ be used via type inference. An override-declaration specifies a name for a [=pipeline-overridable=] constant value. +An override-declaration [=shader-creation error|must=] only be declared at +[=module scope=]. The value of a pipeline-overridable constant is fixed at [=pipeline creation|pipeline-creation time=]. The value is one provided by the WebGPU pipeline-creation method, if From c187604b0964efdfe5486a760ad28374c672e00e Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Fri, 22 Mar 2024 17:49:37 -0700 Subject: [PATCH 009/285] [editorial] Clarify that texture copy ranges are 'physical' (#4530) This makes it a little less buried how to do image copies with mip-mapped compressed textures. --- spec/sections/copies.bs | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/spec/sections/copies.bs b/spec/sections/copies.bs index f57b2554f7..9de125f659 100644 --- a/spec/sections/copies.bs +++ b/spec/sections/copies.bs @@ -179,7 +179,7 @@ dictionary GPUImageCopyTexture { |imageCopyTexture|.{{GPUImageCopyTexture/texture}}.{{GPUTexture/mipLevelCount}}. - |imageCopyTexture|.{{GPUImageCopyTexture/origin}}.[=GPUOrigin3D/x=] must be a multiple of |blockWidth|. - |imageCopyTexture|.{{GPUImageCopyTexture/origin}}.[=GPUOrigin3D/y=] must be a multiple of |blockHeight|. - - The [=imageCopyTexture subresource size=] of |imageCopyTexture| is equal to |copySize| if either of + - The [=imageCopyTexture physical subresource size=] of |imageCopyTexture| is equal to |copySize| if either of the following conditions is true: - |imageCopyTexture|.{{GPUImageCopyTexture/texture}}.{{GPUTexture/format}} is a depth-stencil format. - |imageCopyTexture|.{{GPUImageCopyTexture/texture}}.{{GPUTexture/sampleCount}} > 1. @@ -312,8 +312,8 @@ dictionary GPUImageCopyExternalImage { ### Subroutines ### {#image-copies-subroutines} -
- imageCopyTexture subresource size +
+ imageCopyTexture physical subresource size **Arguments:** @@ -321,7 +321,7 @@ dictionary GPUImageCopyExternalImage { **Returns:** {{GPUExtent3D}} - The [=imageCopyTexture subresource size=] of |imageCopyTexture| is calculated as follows: + The [=imageCopyTexture physical subresource size=] of |imageCopyTexture| is calculated as follows: Its [=GPUExtent3D/width=], [=GPUExtent3D/height=] and [=GPUExtent3D/depthOrArrayLayers=] are the width, height, and depth, respectively, of the [=physical miplevel-specific texture extent=] of |imageCopyTexture|.{{GPUImageCopyTexture/texture}} [=subresource=] at [=mipmap level=] @@ -393,7 +393,7 @@ dictionary GPUImageCopyExternalImage { 1. Let |blockWidth| be the [=texel block width=] of |imageCopyTexture|.{{GPUImageCopyTexture/texture}}.{{GPUTexture/format}}. 1. Let |blockHeight| be the [=texel block height=] of |imageCopyTexture|.{{GPUImageCopyTexture/texture}}.{{GPUTexture/format}}. - 1. Let |subresourceSize| be the [=imageCopyTexture subresource size=] of |imageCopyTexture|. + 1. Let |subresourceSize| be the [=imageCopyTexture physical subresource size=] of |imageCopyTexture|. 1. Return whether all the conditions below are satisfied: - (|imageCopyTexture|.{{GPUImageCopyTexture/origin}}.[=GPUOrigin3D/x=] + |copySize|.[=GPUExtent3D/width=]) ≤ |subresourceSize|.[=GPUExtent3D/width=] @@ -401,6 +401,11 @@ dictionary GPUImageCopyExternalImage { - (|imageCopyTexture|.{{GPUImageCopyTexture/origin}}.[=GPUOrigin3D/z=] + |copySize|.[=GPUExtent3D/depthOrArrayLayers=]) ≤ |subresourceSize|.[=GPUExtent3D/depthOrArrayLayers=] - |copySize|.[=GPUExtent3D/width=] must be a multiple of |blockWidth|. - |copySize|.[=GPUExtent3D/height=] must be a multiple of |blockHeight|. + + Note: + The texture copy range is validated against the *physical* (rounded-up) + size for [=compressed formats=], allowing copies to access texture + blocks which are not fully inside the texture.
From b24d76be52a240bc55361cb8b7bffaadc2af4446 Mon Sep 17 00:00:00 2001 From: Greggman Date: Tue, 26 Mar 2024 12:56:11 -0700 Subject: [PATCH 010/285] Minor typos (#4544) --- wgsl/index.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index d9875fbdc6..507570ace2 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -15194,7 +15194,7 @@ Calls to these functions:
Description Returns the partial derivative of `e` with respect to window x coordinates using local differences. - This may result in fewer unique positions that `dpdxFine(e)`. + This may result in fewer unique positions than `dpdxFine(e)`. Returns an [=indeterminate value=] if called in [=uniform control flow|non-uniform control flow=].
@@ -15248,7 +15248,7 @@ Calls to these functions: Description Returns the partial derivative of `e` with respect to window y coordinates using local differences. - This may result in fewer unique positions that `dpdyFine(e)`. + This may result in fewer unique positions than `dpdyFine(e)`. Returns an [=indeterminate value=] if called in [=uniform control flow|non-uniform control flow=]. From 649cfc15d5c67e4f10f75a29036aa8168ff327dd Mon Sep 17 00:00:00 2001 From: Greggman Date: Tue, 26 Mar 2024 14:39:32 -0700 Subject: [PATCH 011/285] Fix some typos (#4543) --- proposals/compatibility-mode.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/proposals/compatibility-mode.md b/proposals/compatibility-mode.md index b5db19f53a..a7f4048f0e 100644 --- a/proposals/compatibility-mode.md +++ b/proposals/compatibility-mode.md @@ -2,7 +2,7 @@ This proposal is **under active development, but has not been standardized for inclusion in the WebGPU specification**. WebGPU implementations **must not** expose this functionality; doing so is a spec violation. Note however, an implementation might provide an option (e.g. command line flag) to enable a draft implementation, for developers who want to test this proposal. -The changes merged into this document are those for which the GPU for the Web Community Group has achieved **tentative** consensus prior to official standardization of the whole propsal. New items will be added to this doc as tentative consensus on further issues is achieved. +The changes merged into this document are those for which the GPU for the Web Community Group has achieved **tentative** consensus prior to official standardization of the whole proposal. New items will be added to this doc as tentative consensus on further issues is achieved. ## Problem @@ -10,7 +10,7 @@ WebGPU is a good match for modern explicit graphics APIs such as Vulkan, Metal a ## Goals -The primary goal of WebGPU Compatibility mode is to increase the reach of WebGPU by providing an opt-in, slightly restricted subset of WebGPU which will run on older APIs such as D3D11 and OpenGL ES. The set of restrictions in Compatibility mode should be kept to a minimum in order to make it easy to port exsting WebGPU applications. This will increase adoption of WebGPU applications via a wider userbase. +The primary goal of WebGPU Compatibility mode is to increase the reach of WebGPU by providing an opt-in, slightly restricted subset of WebGPU which will run on older APIs such as D3D11 and OpenGL ES. The set of restrictions in Compatibility mode should be kept to a minimum in order to make it easy to port existing WebGPU applications. This will increase adoption of WebGPU applications via a wider userbase. Since WebGPU Compatibility mode is a subset of WebGPU, all valid Compatibility mode applications are also valid WebGPU applications. Consequently, Compatibility mode applications will also run on user agents which do not support Compatibility mode. Such user agents will simply ignore the option requesting a Compatibility mode Adapter and return a Core WebGPU Adapter instead. @@ -79,7 +79,7 @@ Each `GPUColorTargetState` in a `GPUFragmentState` must have the same `blend.alp ### 5. Views of the same texture used in a single draw may not differ in mip levels. -A draw call may not bind two views of the same texture differing in `baseMipLevel` or `mipLevelCount`. Only a single mip level range range per texture is supported. This is enforced via validation at draw time. +A draw call may not bind two views of the same texture differing in `baseMipLevel` or `mipLevelCount`. Only a single mip level range per texture is supported. This is enforced via validation at draw time. **Justification**: OpenGL ES does not support texture views, but one mip level subset may be specified per texture using `glTexParameter*()` via the `GL_TEXTURE_BASE_LEVEL` and `GL_TEXTURE_MAX_LEVEL` parameters. @@ -105,9 +105,9 @@ Calls to `createTexture()` or `createBindGroupLayout()` with this combination ca ### 9. Depth bias clamp must be zero. -During createRenderPipeline(), GPUDepthStencilState.depthBiasClamp must be zero, or a validation error occurs. +During `createRenderPipeline()` and `createRenderPipelineAsync()`, `GPUDepthStencilState.depthBiasClamp` must be zero, or a validation error occurs. -**Justification**: GLSL ES 3.1 does not support glPolygonOffsetClamp(). +**Justification**: GLSL ES 3.1 does not support `glPolygonOffsetClamp()`. ### 10. Lower limits. @@ -180,4 +180,4 @@ create cube maps that are not exactly 6 layers. Q: OpenGL ES does not have "coarse" and "fine" variants of the derivative instructions (`dFdx()`, `dFdy()`, `fwidth()`). Should WGSL's "fine" derivatives (`dpdxFine()`, `dpdyFine()`, and `fwidthFine()`) be required to deliver high precision results? See [Issue 4325](https://github.com/gpuweb/gpuweb/issues/4325). -A: Unclear. In SPIR-V, Fine variants must include the value of P for the local fragment, while Coarse variants do not. WGSL is less constraining, and simply says that Coarse "may result in fewer unique positions that dpdxFine(e)." +A: Unclear. In SPIR-V, Fine variants must include the value of P for the local fragment, while Coarse variants do not. WGSL is less constraining, and simply says that Coarse "may result in fewer unique positions than `dpdxFine(e)`." From a401f7f8156fc1c395582f7aabb35905fee846f4 Mon Sep 17 00:00:00 2001 From: Greggman Date: Tue, 26 Mar 2024 14:39:52 -0700 Subject: [PATCH 012/285] Compat: Disallow bgra8unorm-srgb textures (#4542) See https://github.com/gpuweb/gpuweb/issues/4514 --- proposals/compatibility-mode.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/proposals/compatibility-mode.md b/proposals/compatibility-mode.md index a7f4048f0e..82860ecc39 100644 --- a/proposals/compatibility-mode.md +++ b/proposals/compatibility-mode.md @@ -176,6 +176,10 @@ When creating a texture you can pass in a `textureBindingViewDimension`. **Justification**: OpenGL ES 3.1 cannot create 2d textures with more than 1 layer nor can it create cube maps that are not exactly 6 layers. +## 15. Disallow bgra8unorm-srgb textures + +**Justification**: OpenGL ES 3.1 does not support bgra8unorm-srgb textures. + ## Issues Q: OpenGL ES does not have "coarse" and "fine" variants of the derivative instructions (`dFdx()`, `dFdy()`, `fwidth()`). Should WGSL's "fine" derivatives (`dpdxFine()`, `dpdyFine()`, and `fwidthFine()`) be required to deliver high precision results? See [Issue 4325](https://github.com/gpuweb/gpuweb/issues/4325). From 51e3223777f90c05a0154491e1761d6658d13c8d Mon Sep 17 00:00:00 2001 From: Teodor Tanasoaia <28601907+teoxoy@users.noreply.github.com> Date: Wed, 27 Mar 2024 21:32:36 +0100 Subject: [PATCH 013/285] Increase `maxInterStageShaderComponents` to 64 (#4517) --- spec/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/index.bs b/spec/index.bs index 246334bd94..9a8601191f 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -1791,7 +1791,7 @@ A supported limits object has a value for every limit defined by when creating a {{GPURenderPipeline}}. maxInterStageShaderComponents - {{GPUSize32}} [=limit class/maximum=] 60 + {{GPUSize32}} [=limit class/maximum=] 64 The maximum allowed number of components of input or output variables for inter-stage communication (like vertex outputs or fragment inputs). From bb2f972a216c092321fc69547e3fa7601e454b1a Mon Sep 17 00:00:00 2001 From: alan-baker Date: Sun, 31 Mar 2024 19:28:39 -0400 Subject: [PATCH 014/285] Make shader-creation errors possible for array sizes (#4552) --- wgsl/index.bs | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 507570ace2..0b32ae4f17 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -2800,9 +2800,13 @@ An expression [=shader-creation error|must not=] evaluate to a runtime-sized arr The element count expression |N| of a fixed-size array is subject to the following constraints: * It [=shader-creation error|must=] be an [=override-expression=]. * It [=shader-creation error|must=] evaluate to a [=type/concrete=] [=integer scalar=]. -* It is a [=pipeline-creation error=] if |N| is not greater than zero. +* If |N| is not greater than zero: + * It is a [=shader-creation error=] if |N| is a [=const-expression=]. + * Otherwise, it is a [=pipeline-creation error=]. -Note: The element count value is fully determined at [=pipeline creation=] time. +Note: The element count value is fully determined at [=pipeline creation=] +time if |N| depends on any [=override-declarations=], and [=shader module creation=] +otherwise. Note: To qualify for type-equivalency, any override expression that is not a const expression must be an **identifier**. See Workgroup variables sized by overridable constants From 1fae0df0f9fb62a75ffaf9afd87505799f1c3936 Mon Sep 17 00:00:00 2001 From: Greggman Date: Tue, 2 Apr 2024 11:02:56 +0900 Subject: [PATCH 015/285] Specify copy rules for different canvas contexts (#4366) * Specify copy rules for more external types WebGL contexts use context.drawingBufferWidth and context.drawingBufferHeight. ImageBitmapRenderingContexts use their internal "output bitmap"'s width and height (which are opaque to the app) but the app itself created added the ImageBitmap to the context and so had it's chance to know the size. HTMLImageElement uses naturalWidth and naturalHeight ImageData uses width, height * Switch to one table cell for both dimensions For HTMLImageElement it looks like naturalWidth, naturalHeight are supposed to always work. --- spec/index.bs | 10 +++++++--- spec/sections/copies.bs | 40 ++++++++++++++++++++++++++-------------- 2 files changed, 33 insertions(+), 17 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 9a8601191f..ccd55b2670 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -73,9 +73,13 @@ spec: WebCodecs; urlPrefix: https://www.w3.org/TR/webcodecs/# text: Close VideoFrame; url: close-videoframe spec: WEBGL-1; urlPrefix: https://www.khronos.org/registry/webgl/specs/latest/1.0/# type: interface - text: WebGLRenderingContext; url: WEBGLRENDERINGCONTEXT - type: attribute; for: WebGLRenderingContext + text: WebGLRenderingContextBase; url: WEBGLRENDERINGCONTEXTBASE + type: attribute; for: WebGLRenderingContextBase text: drawingBufferColorSpace; url: DOM-WebGLRenderingContext-drawingBufferColorSpace + type: attribute; for: WebGLRenderingContextBase + text: drawingBufferWidth; url: DOM-WebGLRenderingContext-drawingBufferWidth + type: attribute; for: WebGLRenderingContextBase + text: drawingBufferHeight; url: DOM-WebGLRenderingContext-drawingBufferHeight type: dictionary text: WebGLContextAttributes; url: WEBGLCONTEXTATTRIBUTES type: dfn @@ -2198,7 +2202,7 @@ For various image sources of {{GPUImageCopyExternalImage}}: - Color space is controlled via the {{CanvasRenderingContext2DSettings/colorSpace}} context creation attribute. - WebGL canvas: - Premultiplication is controlled via the `premultipliedAlpha` option in {{WebGLContextAttributes}}. - - Color space is controlled via the {{WebGLRenderingContext}}'s {{WebGLRenderingContext/drawingBufferColorSpace}} state. + - Color space is controlled via the {{WebGLRenderingContextBase}}'s {{WebGLRenderingContextBase/drawingBufferColorSpace}} state. Note: Check browser implementation support for these features before relying on them. diff --git a/spec/sections/copies.bs b/spec/sections/copies.bs index 9de125f659..4920ad7103 100644 --- a/spec/sections/copies.bs +++ b/spec/sections/copies.bs @@ -268,30 +268,42 @@ dictionary GPUImageCopyExternalImage { Source type - Width - Height + Dimensions {{ImageBitmap}} - {{ImageBitmap/width|ImageBitmap.width}} - {{ImageBitmap/height|ImageBitmap.height}} + {{ImageBitmap/width|ImageBitmap.width}}, + {{ImageBitmap/height|ImageBitmap.height}} + + {{HTMLImageElement}} + {{HTMLImageElement/naturalWidth|HTMLImageElement.naturalWidth}}, + {{HTMLImageElement/naturalHeight|HTMLImageElement.naturalHeight}} {{HTMLVideoElement}} - [=video/intrinsic width|intrinsic width of the frame=] - [=video/intrinsic height|intrinsic height of the frame=] + [=video/intrinsic width|intrinsic width of the frame=], + [=video/intrinsic height|intrinsic height of the frame=] {{VideoFrame}} - {{VideoFrame/codedWidth|VideoFrame.codedWidth}} - {{VideoFrame/codedHeight|VideoFrame.codedHeight}} + {{VideoFrame/codedWidth|VideoFrame.codedWidth}}, + {{VideoFrame/codedHeight|VideoFrame.codedHeight}} + + {{ImageData}} + {{ImageData/width|ImageData.width}}, + {{ImageData/height|ImageData.height}} + + {{HTMLCanvasElement}} or {{OffscreenCanvas}} with {{CanvasRenderingContext2D}} or {{GPUCanvasContext}} + {{HTMLCanvasElement/width|HTMLCanvasElement.width}}, + {{HTMLCanvasElement/height|HTMLCanvasElement.height}} - {{HTMLCanvasElement}} - {{HTMLCanvasElement/width|HTMLCanvasElement.width}} - {{HTMLCanvasElement/height|HTMLCanvasElement.height}} + {{HTMLCanvasElement}} or {{OffscreenCanvas}} with {{WebGLRenderingContextBase}} + {{WebGLRenderingContextBase/drawingBufferWidth|WebGLRenderingContextBase.drawingBufferWidth}}, + {{WebGLRenderingContextBase/drawingBufferHeight|WebGLRenderingContextBase.drawingBufferHeight}} - {{OffscreenCanvas}} - {{OffscreenCanvas/width|OffscreenCanvas.width}} - {{OffscreenCanvas/height|OffscreenCanvas.height}} + {{HTMLCanvasElement}} or {{OffscreenCanvas}} with {{ImageBitmapRenderingContext}} + {{ImageBitmapRenderingContext}}'s internal output bitmap + {{ImageBitmap/width|ImageBitmap.width}}, + {{ImageBitmap/height|ImageBitmap.height}} From 3bcc6f599edff0a753c6163742f0b13c53523701 Mon Sep 17 00:00:00 2001 From: Mehmet Oguz Derin Date: Wed, 10 Apr 2024 02:53:49 +0900 Subject: [PATCH 016/285] Move Attribute Values out of Enumerants (#4559) --- wgsl/index.bs | 84 +++++++++++-------- wgsl/syntax.bnf | 18 +++- wgsl/syntax/attribute.syntax.bs.include | 6 +- .../builtin_value_name.syntax.bs.include | 5 ++ ...nterpolate_sampling_name.syntax.bs.include | 5 ++ .../interpolate_type_name.syntax.bs.include | 5 ++ wgsl/wgsl.recursive.bs.include | 6 +- 7 files changed, 86 insertions(+), 43 deletions(-) create mode 100644 wgsl/syntax/builtin_value_name.syntax.bs.include create mode 100644 wgsl/syntax/interpolate_sampling_name.syntax.bs.include create mode 100644 wgsl/syntax/interpolate_type_name.syntax.bs.include diff --git a/wgsl/index.bs b/wgsl/index.bs index 0b32ae4f17..b3fadea6db 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -1312,6 +1312,18 @@ The spelling of the token may be the same as an [=identifier=], but the token do Section [[#context-dependent-name-tokens]] lists all such tokens. +## Built-in Value Names ## {#builtin-value-names} + +A built-in value name-token is a [=token=] used in the name of a [=built-in value=]. +The spelling of the token may be the same as an [=identifier=] but does not [=resolves|resolve to=] a declared object. +The token must not be a [=keyword=] or [=reserved word=]. + +See [[#builtin-inputs-outputs]]. + +
+path: syntax/builtin_value_name.syntax.bs.include
+
+ ## Diagnostic Rule Names ## {#diagnostic-rule-names} A diagnostic name-token is a [=token=] used in the name of a diagnostic [=diagnostic/triggering rule=]. @@ -1324,6 +1336,30 @@ See [[#diagnostics]]. path: syntax/diagnostic_name_token.syntax.bs.include +## Interpolation Type Names ## {#interpolation-type-names} + +An interpolation type name-token is a [=token=] used in the name of an [=interpolation type=]. +The spelling of the token may be the same as an [=identifier=] but does not [=resolves|resolve to=] a declared object. +The token must not be a [=keyword=] or [=reserved word=]. + +See [[#interpolation]]. + +
+path: syntax/interpolate_type_name.syntax.bs.include
+
+ +## Interpolation Sampling Names ## {#interpolation-sampling-names} + +An interpolation sampling name-token is a [=token=] used in the name of an [=interpolation sampling=]. +The spelling of the token may be the same as an [=identifier=] but does not [=resolves|resolve to=] a declared object. +The token must not be a [=keyword=] or [=reserved word=]. + +See [[#interpolation]]. + +
+path: syntax/interpolate_sampling_name.syntax.bs.include
+
+ ## Template Lists ## {#template-lists-sec} Template parameterization is a way to specify parameters that modify a general concept. @@ -1771,7 +1807,7 @@ For example, WGSL predeclares: * [=built-in functions=], * built-in types such as [=i32=] and [=f32=], * built-in [=type-generators=] such as `array`, `ptr`, and `texture_2d`, and -* [=enumerants=] such as [=access/read_write=], [=interpolation type/perspective=], and [=texel format/rgba8unorm=]. +* [=enumerants=] such as [=access/read_write=], [=address spaces/workgroup=], and [=texel format/rgba8unorm=]. The scope of a declaration is the set of program source locations where a declared identifier potentially denotes @@ -3134,27 +3170,6 @@ The enumeration types exist, but cannot be spelled in WGSL source. [=address spaces/workgroup=] [=address spaces/uniform=] [=address spaces/storage=] - [=interpolation type=] - [=interpolation type/perspective=] - [=interpolation type/linear=] - [=interpolation type/flat=] - [=interpolation sampling=] - [=interpolation sampling/center=] - [=interpolation sampling/centroid=] - [=interpolation sampling/sample=] - [=built-in value=] - [=built-in values/vertex_index=] - [=built-in values/instance_index=] - [=built-in values/position=] - [=built-in values/front_facing=] - [=built-in values/frag_depth=] - [=built-in values/local_invocation_id=] - [=built-in values/local_invocation_index=] - [=built-in values/global_invocation_id=] - [=built-in values/workgroup_id=] - [=built-in values/num_workgroups=] - [=built-in values/sample_index=] - [=built-in values/sample_mask=] [=texel format=] [=texel format/rgba8unorm=] [=texel format/rgba8snorm=] @@ -8267,11 +8282,11 @@ Unless explicitly permitted below, an attribute [=shader-creation error|must not See [[#resource-interface]]. `builtin` - [=shader-creation error|Must=] be an [=enumerant=] for a [=built-in value=]. + [=shader-creation error|Must=] be a [=built-in value name-token=] for a [=built-in value=]. [=shader-creation error|Must=] only be applied to an entry point function parameter, entry point return type, or member of a [=structure=]. - Specifies that the associated object is a built-in value, as denoted by the specified [=enumerant=]. + Specifies that the associated object is a built-in value, as denoted by the specified [=token=]. See [[#builtin-inputs-outputs]]. `const` @@ -8317,10 +8332,11 @@ Unless explicitly permitted below, an attribute [=shader-creation error|must not `interpolate` One or two parameters. - The first parameter [=shader-creation error|must=] be an [=enumerant=] for an [=interpolation type=]. + The first parameter [=shader-creation error|must=] be an + [=interpolation type name-token=] for an [=interpolation type=]. The second parameter, if present, [=shader-creation error|must=] be - an [=enumerant=] for the [=interpolation sampling=]. + an [=interpolation sampling name-token=] for the [=interpolation sampling=]. [=shader-creation error|Must=] only be applied to a declaration that has a [=attribute/location=] attribute applied. @@ -9210,23 +9226,23 @@ the [=attribute/interpolate=] attribute. WGSL offers two aspects of interpolation to control: the type of interpolation, and the sampling of the interpolation. -The interpolation type [=shader-creation error|must=] be one of the following [=predeclared=] [=enumerants=]: -: perspective +The interpolation type [=shader-creation error|must=] be one of the following [=predeclared=] [=enumerants=]: +: perspective :: Values are interpolated in a perspective correct manner. -: linear +: linear :: Values are interpolated in a linear, non-perspective correct manner. -: flat +: flat :: Values are not interpolated. Interpolation sampling is not used with `flat` interpolation. -The interpolation sampling [=shader-creation error|must=] be one of the following [=predeclared=] [=enumerants=]: -: center +The interpolation sampling [=shader-creation error|must=] be one of the following [=predeclared=] [=enumerants=]: +: center :: Interpolation is performed at the center of the pixel. -: centroid +: centroid :: Interpolation is performed at a point that lies within all the samples covered by the fragment within the current primitive. This value is the same for all samples in the primitive. -: sample +: sample :: Interpolation is performed per sample. The [=fragment=] shader is invoked once per sample when this attribute is applied. diff --git a/wgsl/syntax.bnf b/wgsl/syntax.bnf index 450049a153..36d9ecaad7 100644 --- a/wgsl/syntax.bnf +++ b/wgsl/syntax.bnf @@ -104,13 +104,13 @@ template_arg_expression : attribute : '@' 'align' '(' expression attrib_end | '@' 'binding' '(' expression attrib_end -| '@' 'builtin' '(' expression attrib_end +| '@' 'builtin' '(' builtin_value_name attrib_end | '@' 'const' | '@' 'diagnostic' diagnostic_control | '@' 'group' '(' expression attrib_end | '@' 'id' '(' expression attrib_end -| '@' 'interpolate' '(' expression attrib_end -| '@' 'interpolate' '(' expression ',' expression attrib_end +| '@' 'interpolate' '(' interpolate_type_name attrib_end +| '@' 'interpolate' '(' interpolate_type_name ',' interpolate_sampling_name attrib_end | '@' 'invariant' | '@' 'location' '(' expression attrib_end | '@' 'must_use' @@ -127,10 +127,22 @@ attrib_end : ',' ? ')' ; +builtin_value_name : + ident_pattern_token +; + diagnostic_control : '(' severity_control_name ',' diagnostic_rule_name attrib_end ; +interpolate_type_name : + ident_pattern_token +; + +interpolate_sampling_name : + ident_pattern_token +; + struct_decl : 'struct' ident struct_body_decl ; diff --git a/wgsl/syntax/attribute.syntax.bs.include b/wgsl/syntax/attribute.syntax.bs.include index a077aae467..883c8d7fd9 100644 --- a/wgsl/syntax/attribute.syntax.bs.include +++ b/wgsl/syntax/attribute.syntax.bs.include @@ -5,7 +5,7 @@ | `'@'` `'binding'` `'('` [=syntax/expression=] [=syntax/attrib_end=] - | `'@'` `'builtin'` `'('` [=syntax/expression=] [=syntax/attrib_end=] + | `'@'` `'builtin'` `'('` [=syntax/builtin_value_name=] [=syntax/attrib_end=] | `'@'` `'const'` @@ -15,9 +15,9 @@ | `'@'` `'id'` `'('` [=syntax/expression=] [=syntax/attrib_end=] - | `'@'` `'interpolate'` `'('` [=syntax/expression=] [=syntax/attrib_end=] + | `'@'` `'interpolate'` `'('` [=syntax/interpolate_type_name=] [=syntax/attrib_end=] - | `'@'` `'interpolate'` `'('` [=syntax/expression=] `','` [=syntax/expression=] [=syntax/attrib_end=] + | `'@'` `'interpolate'` `'('` [=syntax/interpolate_type_name=] `','` [=syntax/interpolate_sampling_name=] [=syntax/attrib_end=] | `'@'` `'invariant'` diff --git a/wgsl/syntax/builtin_value_name.syntax.bs.include b/wgsl/syntax/builtin_value_name.syntax.bs.include new file mode 100644 index 0000000000..51542e6439 --- /dev/null +++ b/wgsl/syntax/builtin_value_name.syntax.bs.include @@ -0,0 +1,5 @@ +
+ builtin_value_name : + + [=syntax/ident_pattern_token=] +
diff --git a/wgsl/syntax/interpolate_sampling_name.syntax.bs.include b/wgsl/syntax/interpolate_sampling_name.syntax.bs.include new file mode 100644 index 0000000000..db8b5532fb --- /dev/null +++ b/wgsl/syntax/interpolate_sampling_name.syntax.bs.include @@ -0,0 +1,5 @@ +
+ interpolate_sampling_name : + + [=syntax/ident_pattern_token=] +
diff --git a/wgsl/syntax/interpolate_type_name.syntax.bs.include b/wgsl/syntax/interpolate_type_name.syntax.bs.include new file mode 100644 index 0000000000..c355328507 --- /dev/null +++ b/wgsl/syntax/interpolate_type_name.syntax.bs.include @@ -0,0 +1,5 @@ +
+ interpolate_type_name : + + [=syntax/ident_pattern_token=] +
diff --git a/wgsl/wgsl.recursive.bs.include b/wgsl/wgsl.recursive.bs.include index bf2bd92b5f..f47d837e2c 100644 --- a/wgsl/wgsl.recursive.bs.include +++ b/wgsl/wgsl.recursive.bs.include @@ -27,7 +27,7 @@ | `'@'` `'binding'` `'('` [=recursive descent syntax/expression=] `','` ? `')'` - | `'@'` `'builtin'` `'('` [=recursive descent syntax/expression=] `','` ? `')'` + | `'@'` `'builtin'` `'('` [=syntax/ident_pattern_token=] `','` ? `')'` | `'@'` `'compute'` @@ -41,9 +41,9 @@ | `'@'` `'id'` `'('` [=recursive descent syntax/expression=] `','` ? `')'` - | `'@'` `'interpolate'` `'('` [=recursive descent syntax/expression=] `','` ? `')'` + | `'@'` `'interpolate'` `'('` [=syntax/ident_pattern_token=] `','` ? `')'` - | `'@'` `'interpolate'` `'('` [=recursive descent syntax/expression=] `','` [=recursive descent syntax/expression=] `','` ? `')'` + | `'@'` `'interpolate'` `'('` [=syntax/ident_pattern_token=] `','` [=syntax/ident_pattern_token=] `','` ? `')'` | `'@'` `'invariant'` From 5ad4179f23d9fcfe0a12574c35630d240bf0106e Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Tue, 9 Apr 2024 14:36:27 -0700 Subject: [PATCH 017/285] Specify that writeTexture must fail if the texture is destroyed (#4567) --- spec/index.bs | 1 + 1 file changed, 1 insertion(+) diff --git a/spec/index.bs b/spec/index.bs index ccd55b2670..3034f8bafa 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -12738,6 +12738,7 @@ GPUQueue includes GPUObjectBase; [$generate a validation error$] and stop.
+ - |texture|.{{GPUTexture/[[destroyed]]}} is `false`. - [$validating GPUImageCopyTexture$](|destination|, |size|) returns `true`. - |texture|.{{GPUTexture/usage}} includes {{GPUTextureUsage/COPY_DST}}. - |texture|.{{GPUTexture/sampleCount}} is 1. From 8bb5ca9907dd3ef4914d12136323c925128a3293 Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Thu, 11 Apr 2024 22:27:52 -0700 Subject: [PATCH 018/285] Clarify behavior of out-of-gamut canvas (#4476) It's OK if out-of-gamut intermediate values are written to the canvas, it only matters what's there when it gets presented. --- spec/index.bs | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/spec/index.bs b/spec/index.bs index 3034f8bafa..8b62bdee74 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -13755,7 +13755,8 @@ is being composited into (e.g. an HTML page rendering, or a 2D canvas). Read RGBA as premultiplied: color values are premultiplied by their alpha value. 100% red at 50% alpha is `[0.5, 0, 0, 0.5]`. - If [=out-of-gamut premultiplied RGBA values=] are output to the canvas, and the canvas is: + If the canvas texture contains [=out-of-gamut premultiplied RGBA values=] at the time the + canvas contents are read, the behavior depends on whether the canvas is:
: [$get a copy of the image contents of a context|used as an image source$] @@ -13763,6 +13764,8 @@ is being composited into (e.g. an HTML page rendering, or a 2D canvas). : displayed to the screen :: Compositing results are undefined. + + Note: This is true even if color space conversion would produce in-gamut values before compositing, because the intermediate format for compositing is not specified.
From 3e3cb27cbe36491b160c758aa9416dec7367f7ec Mon Sep 17 00:00:00 2001 From: Greggman Date: Sat, 13 Apr 2024 02:21:21 +0900 Subject: [PATCH 019/285] Compat: disallow textureLoad with depth textures (#4564) * Compat: disallow textureLoad with depth textures --- proposals/compatibility-mode.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/proposals/compatibility-mode.md b/proposals/compatibility-mode.md index 82860ecc39..890c90eb77 100644 --- a/proposals/compatibility-mode.md +++ b/proposals/compatibility-mode.md @@ -180,6 +180,16 @@ create cube maps that are not exactly 6 layers. **Justification**: OpenGL ES 3.1 does not support bgra8unorm-srgb textures. +## 16. Disallow `textureLoad` with `texture_depth?` textures + +If a `texture_depth`, `texture_depth_2d_array`, or `texture_depth_cube` are used in a `textureLoad` call via an entry point +in a shader module passed to `createRenderPipeline`, `createRenderPipelineAsync`, +`createComputePipeline`, or `createComputePipelineAsync` a validation error is generated. + +**Justification**: OpenGL ES 3.1 does not support `texelFetch` for depth textures. + +Note: this does not affect textures made with depth formats bound to `texture_2d`. + ## Issues Q: OpenGL ES does not have "coarse" and "fine" variants of the derivative instructions (`dFdx()`, `dFdy()`, `fwidth()`). Should WGSL's "fine" derivatives (`dpdxFine()`, `dpdyFine()`, and `fwidthFine()`) be required to deliver high precision results? See [Issue 4325](https://github.com/gpuweb/gpuweb/issues/4325). From 4301de165857335245a7208272d23c97919bfc93 Mon Sep 17 00:00:00 2001 From: Ryan Harrison Date: Tue, 16 Apr 2024 14:05:12 -0400 Subject: [PATCH 020/285] wgsl: Add a carve out for AF `fract` accuracy (#4541) Fixes #4523 --- wgsl/index.bs | 1 + 1 file changed, 1 insertion(+) diff --git a/wgsl/index.bs b/wgsl/index.bs index b3fadea6db..92bf2a2dfa 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -11929,6 +11929,7 @@ the rules in [[#floating-point-overflow]] apply. The accuracy of an [=AbstractFloat=] operation is as follows: * A correct result is required when the corresponding [=f32=] operation requires a correct result. * A [=correctly rounded=] result is required when the corresponding [=f32=] operation requires a correctly rounded result. +* `fract(x)`'s error is inherited from `x - floor(x)`, where the intermediate calculations are performed as [=AbstractFloat=] operations. * Otherwise, the error of the corresponding [=f32=] operation is an absolute error, a relative error, an error inherited from a potential implementation, or a combination of these. In this case the error of the [=AbstractFloat=] is unbounded. From b934de2997bd9ae31c88d10aa8bfef1cdbf6d730 Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Tue, 16 Apr 2024 15:21:06 -0700 Subject: [PATCH 021/285] [editorial] Update old incorrectly styled timeline boxes, clarify examples (#4571) --- spec/index.bs | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 8b62bdee74..439c71b2e8 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -1154,22 +1154,22 @@ not require multiple processes, or even multiple threads. This styling is non-normative; the specification text always describes the association.
- : Immutable value example definition + : Immutable value example term definition :: Can be used on any timeline.
- : Content-timeline example definition + : Content-timeline example term definition :: Can only be used on the content timeline.
- : Device-timeline example definition + : Device-timeline example term definition :: Can only be used on the device timeline.
- : Queue-timeline example definition + : Queue-timeline example term definition :: Can only be used on the queue timeline.
@@ -1177,20 +1177,20 @@ not require multiple processes, or even multiple threads.
Steps executed on the [=content timeline=] look like this. - [=Immutable value example definition=]. - [=Content-timeline example definition=]. + [=Immutable value example term=] usage. + [=Content-timeline example term=] usage.
Steps executed on the [=device timeline=] look like this. - [=Immutable value example definition=]. - [=Device-timeline example definition=]. + [=Immutable value example term=] usage. + [=Device-timeline example term=] usage.
Steps executed on the [=queue timeline=] look like this. - [=Immutable value example definition=]. - [=Queue-timeline example definition=]. + [=Immutable value example term=] usage. + [=Queue-timeline example term=] usage.
@@ -11666,7 +11666,7 @@ It must only be included by interfaces which also include those mixins. Issue the following steps on the [=Device timeline=] of |this|.{{GPUObjectBase/[[device]]}}: -
+
1. [$Validate the encoder state$] of |this|. If it returns false, stop. 1. If |size| is missing, set |size| to max(0, |buffer|.{{GPUBuffer/size}} - |offset|). 1. If any of the following conditions are unsatisfied, make |this| [=invalid=] and stop. @@ -11706,7 +11706,7 @@ It must only be included by interfaces which also include those mixins. Issue the following steps on the [=Device timeline=] of |this|.{{GPUObjectBase/[[device]]}}: -
+
1. [$Validate the encoder state$] of |this|. If it returns false, stop. 1. Let |bufferSize| be 0 if |buffer| is `null`, or |buffer|.{{GPUBuffer/size}} if not. 1. If |size| is missing, set |size| to max(0, |bufferSize| - |offset|). @@ -11757,7 +11757,7 @@ It must only be included by interfaces which also include those mixins. Issue the following steps on the [=Device timeline=] of |this|.{{GPUObjectBase/[[device]]}}: -
+
1. [$Validate the encoder state$] of |this|. If it returns false, stop. 1. If any of the following conditions are unsatisfied, make |this| [=invalid=] and stop. @@ -11818,7 +11818,7 @@ It must only be included by interfaces which also include those mixins. Issue the following steps on the [=Device timeline=] of |this|.{{GPUObjectBase/[[device]]}}: -
+
1. [$Validate the encoder state$] of |this|. If it returns false, stop. 1. If any of the following conditions are unsatisfied, make |this| [=invalid=] and stop. @@ -11895,7 +11895,7 @@ It must only be included by interfaces which also include those mixins. Issue the following steps on the [=Device timeline=] of |this|.{{GPUObjectBase/[[device]]}}: -
+
1. [$Validate the encoder state$] of |this|. If it returns false, stop. 1. If any of the following conditions are unsatisfied, make |this| [=invalid=] and stop. @@ -11967,7 +11967,7 @@ It must only be included by interfaces which also include those mixins. Issue the following steps on the [=Device timeline=] of |this|.{{GPUObjectBase/[[device]]}}: -
+
1. [$Validate the encoder state$] of |this|. If it returns false, stop. 1. If any of the following conditions are unsatisfied, make |this| [=invalid=] and stop. From 4cde0a69163d52e5ca86ed56400c175155d677ef Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Wed, 17 Apr 2024 11:10:52 -0700 Subject: [PATCH 022/285] [editorial] Tweak "requirements" style for clarity, in a few instances (#4570) * [editorial] Tweak "requirements" style for clarity We used to try to always write validation requirements as unordered lists, but very many of them ended up needing ordered steps in them. This tweaked style uses ordered steps while also using the clickable variable `|must|` to highlight the actual validation requirements. Also fixes some minor style things nearby. * If any are unmet * address comments --- spec/index.bs | 97 +++++++++++++++++++++++++++------------------------ 1 file changed, 51 insertions(+), 46 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 439c71b2e8..d216567bce 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -7184,7 +7184,7 @@ run the following steps: 1. Set |storageTextureLayout|.{{GPUStorageTextureBindingLayout/viewDimension}} to |resource|'s dimension. 1. If the access mode is: - +
: `read` :: Set |textureLayout|.{{GPUStorageTextureBindingLayout/access}} to {{GPUStorageTextureAccess/"read-only"}}. @@ -7354,22 +7354,21 @@ typedef double GPUPipelineConstantValue; // May represent WGSL's bool, f32, i32, To get the entry point({{GPUShaderStage}} |stage|, {{GPUProgrammableStage}} |descriptor|) - - If |descriptor|.{{GPUProgrammableStage/entryPoint}} is [=map/exists|provided=]: + 1. If |descriptor|.{{GPUProgrammableStage/entryPoint}} is [=map/exists|provided=]: - - If |descriptor|.{{GPUProgrammableStage/module}} contains an entry point + 1. If |descriptor|.{{GPUProgrammableStage/module}} contains an entry point whose name equals |descriptor|.{{GPUProgrammableStage/entryPoint}}, and whose shader stage equals |stage|, return that entry point. - - Otherwise, return `null`. + Otherwise, return `null`. - - Otherwise: + Otherwise: - - If there is exactly one entry point in |descriptor|.{{GPUProgrammableStage/module}} + 1. If there is exactly one entry point in |descriptor|.{{GPUProgrammableStage/module}} whose shader stage equals |stage|, return that entry point. - - Otherwise, return `null`. - + Otherwise, return `null`.
@@ -7381,35 +7380,36 @@ typedef double GPUPipelineConstantValue; // May represent WGSL's bool, f32, i32, - {{GPUProgrammableStage}} |descriptor| - {{GPUPipelineLayout}} |layout| - Return `true` if all requirements in the following steps are satisfied, and `false` otherwise: + All of the requirements in the following steps |must| be met. + If any are unmet, return `false`; otherwise, return `true`. - - |descriptor|.{{GPUProgrammableStage/module}} must be a [=valid=] {{GPUShaderModule}}. - - Let |entryPoint| be [$get the entry point$](|stage|, |descriptor|). - - |entryPoint| must not be `null`. - - For each |binding| that is [=statically used=] by |entryPoint|: - - [$validating shader binding$](|binding|, |layout|) must return `true`. - - For each texture and sampler [=statically used=] together by |entryPoint| in texture sampling calls: + 1. |descriptor|.{{GPUProgrammableStage/module}} |must| be a [=valid=] {{GPUShaderModule}}. + 1. Let |entryPoint| be [$get the entry point$](|stage|, |descriptor|). + 1. |entryPoint| |must| not be `null`. + 1. For each |binding| that is [=statically used=] by |entryPoint|: + - [$validating shader binding$](|binding|, |layout|) |must| return `true`. + 1. For each texture and sampler [=statically used=] together by |entryPoint| in texture sampling calls: 1. Let |texture| be the {{GPUBindGroupLayoutEntry}} corresponding to the sampled texture in the call. 1. Let |sampler| be the {{GPUBindGroupLayoutEntry}} corresponding to the used sampler in the call. 1. If |sampler|.{{GPUSamplerBindingLayout/type}} is {{GPUSamplerBindingType/"filtering"}}, - then |texture|.{{GPUTextureBindingLayout/sampleType}} must be + then |texture|.{{GPUTextureBindingLayout/sampleType}} |must| be {{GPUTextureSampleType/"float"}}. Note: {{GPUSamplerBindingType/"comparison"}} samplers can also only be used with {{GPUTextureSampleType/"depth"}} textures, because they are the only texture type that can be bound to WGSL `texture_depth_*` bindings. - - For each |key| → |value| in |descriptor|.{{GPUProgrammableStage/constants}}: - 1. |key| must equal the [=pipeline-overridable constant identifier string=] of + 1. For each |key| → |value| in |descriptor|.{{GPUProgrammableStage/constants}}: + 1. |key| |must| equal the [=pipeline-overridable constant identifier string=] of some [=pipeline-overridable=] constant defined in the shader module |descriptor|.{{GPUProgrammableStage/module}} by the rules defined in [=WGSL identifier comparison=]. Let the type of that constant be |T|. - 1. Converting the IDL value |value| [$to WGSL type$] |T| must not throw a {{TypeError}}. - - For each [=pipeline-overridable constant identifier string=] |key| which is + 1. Converting the IDL value |value| [$to WGSL type$] |T| |must| not throw a {{TypeError}}. + 1. For each [=pipeline-overridable constant identifier string=] |key| which is [=statically used=] by |entryPoint|: - If the pipeline-overridable constant identified by |key| [=pipeline-overridable constant default value|does not have a default value=], - |descriptor|.{{GPUProgrammableStage/constants}} must [=map/contain=] |key|. - - [=pipeline-creation error|Pipeline-creation=] [=program errors=] must not + |descriptor|.{{GPUProgrammableStage/constants}} |must| [=map/contain=] |key|. + 1. [=pipeline-creation error|Pipeline-creation=] [=program errors=] |must| not result from the rules of the [[WGSL]] specification.
@@ -7658,25 +7658,27 @@ dictionary GPUComputePipelineDescriptor |descriptor|.{{GPUPipelineDescriptorBase/layout}} is {{GPUAutoLayoutMode/"auto"}}, and |descriptor|.{{GPUPipelineDescriptorBase/layout}} otherwise. - 1. If any of the requirements in the following steps are unsatisfied, - [$generate a validation error$], make |pipeline| [=invalid=], and stop. + 1. All of the requirements in the following steps |must| be met. + If any are unmet, [$generate a validation error$], make |pipeline| [=invalid=], and stop.
- - |layout| must be [$valid to use with$] |this|. - - [$validating GPUProgrammableStage$]({{GPUShaderStage/COMPUTE}}, - |descriptor|.{{GPUComputePipelineDescriptor/compute}}, |layout|) must succeed. - - Let |entryPoint| be [$get the entry point$]({{GPUShaderStage/COMPUTE}}, |descriptor|.{{GPUComputePipelineDescriptor/compute}}). [=Assert=] |entryPoint| is not `null`. - - Let |workgroupStorageUsed| be the sum of [=roundUp=](16, [$SizeOf$](|T|)) over each + 1. |layout| |must| be [$valid to use with$] |this|. + 1. [$validating GPUProgrammableStage$]({{GPUShaderStage/COMPUTE}}, + |descriptor|.{{GPUComputePipelineDescriptor/compute}}, |layout|) |must| succeed. + 1. Let |entryPoint| be [$get the entry point$]({{GPUShaderStage/COMPUTE}}, |descriptor|.{{GPUComputePipelineDescriptor/compute}}). + + [=Assert=] |entryPoint| is not `null`. + 1. Let |workgroupStorageUsed| be the sum of [=roundUp=](16, [$SizeOf$](|T|)) over each type |T| of all variables with address space "[=address spaces/workgroup=]" [=statically used=] by |entryPoint|. - |workgroupStorageUsed| must be ≤ + |workgroupStorageUsed| |must| be ≤ |device|.limits.{{supported limits/maxComputeWorkgroupStorageSize}}. - - |entryPoint| must use ≤ + 1. |entryPoint| |must| use ≤ |device|.limits.{{supported limits/maxComputeInvocationsPerWorkgroup}} per workgroup. - - Each component of |entryPoint|'s - `workgroup_size` attribute must be ≤ the corresponding component in + 1. Each component of |entryPoint|'s + `workgroup_size` attribute |must| be ≤ the corresponding component in [|device|.limits.{{supported limits/maxComputeWorkgroupSizeX}}, |device|.limits.{{supported limits/maxComputeWorkgroupSizeY}}, |device|.limits.{{supported limits/maxComputeWorkgroupSizeZ}}]. @@ -11759,18 +11761,21 @@ It must only be included by interfaces which also include those mixins.
1. [$Validate the encoder state$] of |this|. If it returns false, stop. - 1. If any of the following conditions are unsatisfied, make |this| [=invalid=] and stop. + 1. All of the requirements in the following steps |must| be met. + If any are unmet, make |this| [=invalid=] and stop.
- - It is [$valid to draw$] with |this|. - - Let |buffers| be |this|.{{GPURenderCommandsMixin/[[pipeline]]}}.{{GPURenderPipeline/[[descriptor]]}}.{{GPURenderPipelineDescriptor/vertex}}.{{GPUVertexState/buffers}}. - - For each {{GPUIndex32}} |slot| from `0` to |buffers|.length (non-inclusive): - - If |buffers|[|slot|] is `null`, [=iteration/continue=]. - - Let |bufferSize| be |this|.{{GPURenderCommandsMixin/[[vertex_buffer_sizes]]}}[|slot|]. - - Let |stride| be |buffers|[|slot|].{{GPUVertexBufferLayout/arrayStride}}. - - Let |lastStride| be max(|attribute|.{{GPUVertexAttribute/offset}} + sizeof(|attribute|.{{GPUVertexAttribute/format}})) - for each |attribute| in |buffers|[|slot|].{{GPUVertexBufferLayout/attributes}}. - - Let |strideCount| be computed based on |buffers|[|slot|].{{GPUVertexBufferLayout/stepMode}}: + 1. It |must| be [$valid to draw$] with |this|. + 1. Let |buffers| be |this|.{{GPURenderCommandsMixin/[[pipeline]]}}.{{GPURenderPipeline/[[descriptor]]}}.{{GPURenderPipelineDescriptor/vertex}}.{{GPUVertexState/buffers}}. + 1. For each {{GPUIndex32}} |slot| from `0` to |buffers|.length (non-inclusive): + 1. If |buffers|[|slot|] is `null`, [=iteration/continue=]. + 1. Let |bufferSize| be |this|.{{GPURenderCommandsMixin/[[vertex_buffer_sizes]]}}[|slot|]. + 1. Let |stride| be |buffers|[|slot|].{{GPUVertexBufferLayout/arrayStride}}. + 1. Let |attributes| be |buffers|[|slot|].{{GPUVertexBufferLayout/attributes}} + 1. Let |lastStride| be the maximum value of + (|attribute|.{{GPUVertexAttribute/offset}} + sizeof(|attribute|.{{GPUVertexAttribute/format}})) + over each |attribute| in |attributes|, or 0 if |attributes| is [=list/empty=]. + 1. Let |strideCount| be computed based on |buffers|[|slot|].{{GPUVertexBufferLayout/stepMode}}:
: {{GPUVertexStepMode/"vertex"}} @@ -11778,8 +11783,8 @@ It must only be included by interfaces which also include those mixins. : {{GPUVertexStepMode/"instance"}} :: |firstInstance| + |instanceCount|
- - If |strideCount| ≠ `0` - - Ensure (|strideCount| − `1`) × |stride| + |lastStride| ≤ |bufferSize|. + 1. If |strideCount| ≠ `0`: + 1. (|strideCount| − `1`) × |stride| + |lastStride| |must| be ≤ |bufferSize|.
1. Increment |this|.{{GPURenderCommandsMixin/[[drawCount]]}} by 1. @@ -13170,7 +13175,7 @@ To mitigate security and privacy concerns, their precision must be reduced:
To get the current queue timestamp: - + - Let |fineTimestamp| be the current timestamp value of the current [=queue timeline=], in nanoseconds, relative to an implementation-defined point in the past. - Return the result of calling [=coarsen time=] on |fineTimestamp|. From 67d98a8170a7bb505801d5db320ac93f364312fd Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Wed, 17 Apr 2024 13:20:55 -0700 Subject: [PATCH 023/285] Issue templates: fix links, remove blank option, improve descriptions/sorting (#4575) --- .github/ISSUE_TEMPLATE/browser.md | 25 +++++++++++++++---------- .github/ISSUE_TEMPLATE/config.yml | 2 ++ .github/ISSUE_TEMPLATE/issue.md | 10 ---------- .github/ISSUE_TEMPLATE/question.md | 10 +++++----- .github/ISSUE_TEMPLATE/webgpu.md | 8 ++++++++ .github/ISSUE_TEMPLATE/wgsl.md | 4 ++-- 6 files changed, 32 insertions(+), 27 deletions(-) create mode 100644 .github/ISSUE_TEMPLATE/config.yml delete mode 100644 .github/ISSUE_TEMPLATE/issue.md create mode 100644 .github/ISSUE_TEMPLATE/webgpu.md diff --git a/.github/ISSUE_TEMPLATE/browser.md b/.github/ISSUE_TEMPLATE/browser.md index 6dd9786f5a..56e89194bd 100644 --- a/.github/ISSUE_TEMPLATE/browser.md +++ b/.github/ISSUE_TEMPLATE/browser.md @@ -1,23 +1,28 @@ --- -name: Bug -about: 'Bug - please file against relevant browser instead' +name: Browser/implementation bug +about: 'For bugs in a WebGPU implementation, please file against relevant browser/implementation instead' title: '' -labels: '' +labels: 'invalid' assignees: '' --- -**Please do not file bugs on this GitHub repository.** +**Please do not file browser/implementation bugs on this GitHub repository.** Instead, file a bug against the relevant browser. -Chrome: - Known issues: https://bugs.chromium.org/p/chromium/issues/list?q=component:Blink%3EWebGPU - To file a new issue: https://bugs.chromium.org/p/chromium/issues/entry?components=Blink%3EWebGPU +Chrome (or Dawn/Tint): + Known issues: + https://issues.chromium.org/savedsearches/6760928 + https://crbug.com/dawn + https://crbug.com/tint + To file a new issue: + https://issues.chromium.org/issues/new?component=1456980 WebKit: -https://bugs.webkit.org/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&component=WebGPU + https://bugs.webkit.org/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&component=WebGPU Firefox: -https://bugzilla.mozilla.org/buglist.cgi?product=Core&component=Graphics%3A%20WebGPU + https://bugzilla.mozilla.org/buglist.cgi?product=Core&component=Graphics%3A%20WebGPU -See also: [implementation status](https://github.com/gpuweb/gpuweb/wiki/Implementation-Status). +See also: + [Implementation Status](https://github.com/gpuweb/gpuweb/wiki/Implementation-Status) wiki page diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml new file mode 100644 index 0000000000..ea56ea0f84 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -0,0 +1,2 @@ +# "webgpu.md" is the catch-all template, we don't need a "blank issue" button +blank_issues_enabled: false diff --git a/.github/ISSUE_TEMPLATE/issue.md b/.github/ISSUE_TEMPLATE/issue.md deleted file mode 100644 index e9f2e1909b..0000000000 --- a/.github/ISSUE_TEMPLATE/issue.md +++ /dev/null @@ -1,10 +0,0 @@ ---- -name: WebGPU spec issue -about: 'Standardization/specification issue' -title: '' -labels: '' -assignees: '' - ---- - - diff --git a/.github/ISSUE_TEMPLATE/question.md b/.github/ISSUE_TEMPLATE/question.md index 8f6e199050..73da8a67eb 100644 --- a/.github/ISSUE_TEMPLATE/question.md +++ b/.github/ISSUE_TEMPLATE/question.md @@ -1,11 +1,11 @@ --- -name: WebGPU question. -about: 'Browser-non-specific WebGPU question' +name: Question about WebGPU +about: 'Question? Please open a GitHub "Discussion" instead' title: '' -labels: 'question' +labels: 'invalid' assignees: '' --- - -If you have a Q&A style question about using WebGPU, consider using a GitHub "Discussion". +If you have a Q&A style question about using WebGPU, please use a GitHub "Discussion": +https://github.com/gpuweb/gpuweb/discussions diff --git a/.github/ISSUE_TEMPLATE/webgpu.md b/.github/ISSUE_TEMPLATE/webgpu.md new file mode 100644 index 0000000000..cf4cc9e6ec --- /dev/null +++ b/.github/ISSUE_TEMPLATE/webgpu.md @@ -0,0 +1,8 @@ +--- +name: WebGPU spec issue, or general, wiki, etc. +about: 'WebGPU standardization/specification issue, or miscellaneous issue report' +title: '' +labels: '' +assignees: '' + +--- diff --git a/.github/ISSUE_TEMPLATE/wgsl.md b/.github/ISSUE_TEMPLATE/wgsl.md index 61dd038832..62bcbb4b63 100644 --- a/.github/ISSUE_TEMPLATE/wgsl.md +++ b/.github/ISSUE_TEMPLATE/wgsl.md @@ -1,6 +1,6 @@ --- -name: WebGPU Shading Language Issue. -about: 'WebGPU Shading Language Issues' +name: WGSL spec issue +about: 'WebGPU Shading Language standardization/specification issue' title: '' labels: 'wgsl' assignees: '' From 2c0a9f1161ca51af3247a3e0454b04c5392163bd Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Wed, 17 Apr 2024 13:22:31 -0700 Subject: [PATCH 024/285] Update ISSUE_TEMPLATE/browser.md --- .github/ISSUE_TEMPLATE/browser.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/ISSUE_TEMPLATE/browser.md b/.github/ISSUE_TEMPLATE/browser.md index 56e89194bd..be8992aee5 100644 --- a/.github/ISSUE_TEMPLATE/browser.md +++ b/.github/ISSUE_TEMPLATE/browser.md @@ -1,6 +1,6 @@ --- name: Browser/implementation bug -about: 'For bugs in a WebGPU implementation, please file against relevant browser/implementation instead' +about: 'For bugs in a WebGPU implementation, please file against the relevant browser/implementation instead' title: '' labels: 'invalid' assignees: '' From 3269e861ecd981d456cf37053fea3a0cf2cb904b Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Wed, 17 Apr 2024 14:46:46 -0700 Subject: [PATCH 025/285] Remove resolved TODOs (#4580) --- spec/index.bs | 4 ---- 1 file changed, 4 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index d216567bce..f4fd1536d7 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -2044,14 +2044,12 @@ interface GPUAdapterInfo { A normalized identifier string is one that follows the following pattern: `[a-z0-9]+(-[a-z0-9]+)*` -
Examples of valid normalized identifier strings include: @@ -6935,8 +6933,6 @@ There are two ways to create pipelines: GPUPipelineError describes a pipeline creation failure. - - +{{GPUExternalTextureDescriptor}} dictionaries have the following members: + +
+ : source + :: + The video source to import the external texture from. + + : colorSpace + :: + The color space the image contents of {{GPUExternalTextureDescriptor/source}} will be + converted into when reading. +
+
: importExternalTexture(descriptor) :: @@ -5601,6 +5614,14 @@ dictionary GPUBindGroupLayoutDescriptor }; +{{GPUBindGroupLayoutDescriptor}} dictionaries have the following members: + +
+ : entries + :: + A list of entries describing the shader resource bindings for a bind group. +
+ A {{GPUBindGroupLayoutEntry}} describes a single shader resource binding to be included in a {{GPUBindGroupLayout}}. -
-
- To create a new WebGPU object({{GPUObjectBase}} |parent|, - interface |T|, {{GPUObjectDescriptorBase}} |descriptor|) - (where |T| extends {{GPUObjectBase}}): - - 1. Let |device| be |parent|.{{GPUObjectBase/[[device]]}}. - 1. Let |object| be a new instance of |T|. - 1. Set |object|.{{GPUObjectBase/[[device]]}} to |device|. - 1. Set |object|.{{GPUObjectBase/label}} to |descriptor|.{{GPUObjectDescriptorBase/label}}. - 1. Return |object|. -
+
+ To create a new WebGPU object({{GPUObjectBase}} |parent|, + interface |T|, {{GPUObjectDescriptorBase}} |descriptor|) + (where |T| extends {{GPUObjectBase}}), run the following [=content timeline=] steps: + + 1. Let |device| be |parent|.{{GPUObjectBase/[[device]]}}. + 1. Let |object| be a new instance of |T|. + 1. Set |object|.{{GPUObjectBase/[[device]]}} to |device|. + 1. Set |object|.{{GPUObjectBase/label}} to |descriptor|.{{GPUObjectDescriptorBase/label}}. + 1. Return |object|.
{{GPUObjectBase}} has the following [=immutable properties=]: @@ -702,9 +700,9 @@ like buffer state "[=GPUBuffer/[[internal state]]/destroyed=]". [=Internal objects=] of some types *can* become [=invalid=] after they are created; specifically, [=devices=], [=adapters=], {{GPUCommandBuffer}}s, and command/pass/bundle encoders. -
+
A given {{GPUObjectBase}} |object| is valid to use with - a |targetObject| if and only if the following requirements are met: + a |targetObject| if the all of the requirements in the following [=device timeline=] steps are met:
- |object| must be [=valid=]. @@ -731,13 +729,13 @@ Several operations in WebGPU return promises. WebGPU does not make any guarantees about the order in which these promises settle (resolve or reject), except for the following: --
+-
For some {{GPUQueue}} |q|, if |p1| = |q|.{{GPUQueue/onSubmittedWorkDone()}} is called before |p2| = |q|.{{GPUQueue/onSubmittedWorkDone()}}, then |p1| must settle before |p2|.
--
+-
For some {{GPUQueue}} |q| and {{GPUBuffer}} |b| on the same {{GPUDevice}}, if |p1| = |b|.{{GPUBuffer/mapAsync()}} is called before |p2| = |q|.{{GPUQueue/onSubmittedWorkDone()}}, @@ -822,6 +820,12 @@ not require multiple processes, or even multiple threads. on the compute units of the GPU. It includes actual draw, copy, and compute jobs that run on the GPU. +: Timeline-agnostic +:: Associated with any of the above timelines + + Steps may be issued to any timeline if they only operate on [=immutable properties=] or + arguments passed from the calling steps. +
The following show the styling of steps and values associated with each timeline. This styling is non-normative; the specification text always describes the association. @@ -833,20 +837,25 @@ not require multiple processes, or even multiple threads.
: Content-timeline example term definition - :: Can only be used on the content timeline. + :: Can only be used on the [=content timeline=].
: Device-timeline example term definition - :: Can only be used on the device timeline. + :: Can only be used on the [=device timeline=].
: Queue-timeline example term definition - :: Can only be used on the queue timeline. + :: Can only be used on the [=queue timeline=].
+
+ Steps which are [=timeline-agnostic=] look like this. + + [=Immutable value example term=] usage. +
Steps executed on the [=content timeline=] look like this. @@ -1135,9 +1144,9 @@ it and all objects created on it (directly, e.g. {{GPUDevice/createTexture()}}, or indirectly, e.g. {{GPUTexture/createView()}}) become implicitly [$valid to use with|unusable$]. -A [=device=] has the following internal slots: +A [=device=] has the following [=immutable properties=]: -
+
: \[[adapter]], of type [=adapter=], readonly :: The [=adapter=] from which this device was created. @@ -1153,9 +1162,9 @@ A [=device=] has the following internal slots: No [=limit/better=] limits can be used, even if the underlying [=adapter=] can support them.
-
+
When a new device |device| is created from [=adapter=] |adapter| - with {{GPUDeviceDescriptor}} |descriptor|: + with {{GPUDeviceDescriptor}} |descriptor|, run the following [=device timeline=] steps: - Set |device|.{{device/[[adapter]]}} to |adapter|. @@ -1185,7 +1194,7 @@ API tries to behave like nothing is wrong to avoid interrupting the runtime flow no validation errors are raised, most promises resolve normally, etc.
- To lose the device(|device|, |reason|): + To lose the device(|device|, |reason|) run the following [=device timeline=] steps: 1. Make |device| [=invalid=]. 1. Let |gpuDevice| be the [=content timeline=] {{GPUDevice}} corresponding to |device|. @@ -1245,7 +1254,7 @@ and using optional API surfaces results in the following: - Using a new WGSL `enable` directive always results in a {{GPUDevice/createShaderModule()}} [$validation error$]. -
+
A {{GPUFeatureName}} |feature| is enabled for a {{GPUObjectBase}} |object| if and only if |object|.{{GPUObjectBase/[[device]]}}.{{device/[[features]]}} [=list/contains=] |feature|. @@ -1682,9 +1691,9 @@ interface GPUAdapterInfo { other fields when possible.
-
+
To create a new adapter info for a given [=adapter=] |adapter|, run the - following steps: + following [=content timeline=] steps: 1. Let |adapterInfo| be a new {{GPUAdapterInfo}}. @@ -1713,7 +1722,7 @@ interface GPUAdapterInfo { 1. Return |adapterInfo|.
-
+
A normalized identifier string is one that follows the following pattern: `[a-z0-9]+(-[a-z0-9]+)*` @@ -1771,9 +1780,9 @@ For more information on issuing CORS requests for image and video elements, cons WebGPU defines a new [=task source=] called the WebGPU task source. It is used for the {{GPUDevice/uncapturederror}} event and {{GPUDevice}}.{{GPUDevice/lost}}. -
+
To queue a global task for {{GPUDevice}} |device|, - with a series of steps |steps|: + with a series of steps |steps| on the [=content timeline=]: 1. [=Queue a global task=] on the [=WebGPU task source=], with the global object that was used to create |device|, and the steps |steps|. @@ -1788,9 +1797,9 @@ It is used for the automatic, timed expiry (destruction) of certain objects: - {{GPUTexture}}s returned by {{GPUCanvasContext/getCurrentTexture()}} - {{GPUExternalTexture}}s created from {{HTMLVideoElement}}s -
+
To queue an automatic expiry task - with {{GPUDevice}} |device| and a series of steps |steps|: + with {{GPUDevice}} |device| and a series of steps |steps| on the [=content timeline=]: 1. [=Queue a global task=] on the [=automatic expiry task source=], with the global object that was used to create |device|, and the steps |steps|. @@ -1885,7 +1894,7 @@ them to WGSL values (`bool`, `i32`, `u32`, `f32`, `f16`).
To convert an IDL value |idlValue| of type {{double}} or {{float}} to WGSL type |T|, - possibly throwing a {{TypeError}}: + possibly throwing a {{TypeError}}, run the following [=device timeline=] steps: Note: This {{TypeError}} is generated in the [=device timeline=] and never surfaced to JavaScript. @@ -1936,7 +1945,7 @@ them to WGSL values (`bool`, `i32`, `u32`, `f32`, `f16`).
To convert a {{GPUColor}} |color| to a texel value of texture format |format|, - possibly throwing a {{TypeError}}: + possibly throwing a {{TypeError}}, run the following [=device timeline=] steps: Note: This {{TypeError}} is generated in the [=device timeline=] and never surfaced to JavaScript. @@ -2659,7 +2668,7 @@ Those not defined here are defined elsewhere in this document. mapped memory that was just unmapped.
-
+
A {{GPUDevice}}'s allowed buffer usages are: - Always allowed: @@ -2677,7 +2686,7 @@ Those not defined here are defined elsewhere in this document.
-
+
A {{GPUDevice}}'s allowed texture usages are: - Always allowed: @@ -2876,9 +2885,9 @@ enum GPUBufferMapState { They are tracked so they can be detached when {{GPUBuffer/unmap()}} is called. -
+
To initialize an active buffer mapping with mode |mode| and - range |range|: + range |range|, run the following [=content timeline=] steps: 1. Let |size| be |range|[1] - |range|[0]. 1. Let |data| be [=?=] [$CreateByteDataBlock$](|size|). @@ -3630,7 +3639,7 @@ GPUTexture includes GPUObjectBase; and its underlying memory can be freed. -
+
compute render extent(baseSize, mipLevel) **Arguments:** @@ -3640,6 +3649,8 @@ GPUTexture includes GPUObjectBase; **Returns:** {{GPUExtent3DDict}} + [=Device timeline=] steps: + 1. Let |extent| be a new {{GPUExtent3DDict}} object. 1. Set |extent|.{{GPUExtent3DDict/width}} to max(1, |baseSize|.[=GPUExtent3D/width=] ≫ |mipLevel|). 1. Set |extent|.{{GPUExtent3DDict/height}} to max(1, |baseSize|.[=GPUExtent3D/height=] ≫ |mipLevel|). @@ -3650,7 +3661,7 @@ GPUTexture includes GPUObjectBase; The logical miplevel-specific texture extent of a [=texture=] is the size of the [=texture=] in texels at a specific miplevel. It is calculated by this procedure: -
+
Logical miplevel-specific texture extent(descriptor, mipLevel) **Arguments:** @@ -3689,7 +3700,7 @@ The physical miplevel-specific texture extent of a [=texture=] is [=texture=] in texels at a specific miplevel that includes the possible extra padding to form complete [=texel blocks=] in the [=texture=]. It is calculated by this procedure: -
+
Physical miplevel-specific texture extent(descriptor, mipLevel) **Arguments:** @@ -3792,7 +3803,7 @@ dictionary GPUTextureDescriptor Formats in this list must be [=texture view format compatible=] with the texture format. -
+
Two {{GPUTextureFormat}}s |format| and |viewFormat| are texture view format compatible if: - |format| equals |viewFormat|, or @@ -3874,13 +3885,13 @@ The {{GPUTextureUsage}} flags determine how a {{GPUTexture}} may be used after i {{GPURenderPassDepthStencilAttachment}}.{{GPURenderPassDepthStencilAttachment/view}}.) -
- maximum mipLevel count(dimension, size) +
+ maximum mipLevel count(|dimension|, |size|) **Arguments:** - - {{GPUTextureDescriptor/dimension}} |dimension| - - {{GPUTextureDescriptor/size}} |size| + - {{GPUTextureDimension}} |dimension| + - {{GPUTextureDimension}} |size| 1. Calculate the max dimension value |m|: - If |dimension| is: @@ -3952,70 +3963,79 @@ The {{GPUTextureUsage}} flags determine how a {{GPUTexture}} may be used after i
-
- validating GPUTextureDescriptor({{GPUDevice}} |this|, {{GPUTextureDescriptor}} |descriptor|): +
+ validating GPUTextureDescriptor(|this|, |descriptor|): - Return `true` if all of the following requirements are met, and `false` otherwise: + **Arguments:** - - |this| must be a [=valid=] {{GPUDevice}}. - - |descriptor|.{{GPUTextureDescriptor/usage}} must not be 0. - - |descriptor|.{{GPUTextureDescriptor/usage}} must contain only bits present in |this|'s [=allowed texture usages=]. - - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=], - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=], - and |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be > zero. - - |descriptor|.{{GPUTextureDescriptor/mipLevelCount}} must be > zero. - - |descriptor|.{{GPUTextureDescriptor/sampleCount}} must be either 1 or 4. - - If |descriptor|.{{GPUTextureDescriptor/dimension}} is: + - {{GPUDevice}} |this| + - {{GPUTextureDescriptor}} |descriptor| -
- : {{GPUTextureDimension/"1d"}} - :: - - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=] must be ≤ - |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension1D}}. - - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=] must be 1. - - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be 1. - - |descriptor|.{{GPUTextureDescriptor/sampleCount}} must be 1. - - |descriptor|.{{GPUTextureDescriptor/format}} must not be a [=compressed format=] or [=depth-or-stencil format=]. + [=Device timeline=] steps: - : {{GPUTextureDimension/"2d"}} - :: - - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=] must be ≤ - |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension2D}}. - - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=] must be ≤ - |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension2D}}. - - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be ≤ - |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureArrayLayers}}. + 1. Return `true` if all of the following requirements are met, and `false` otherwise: - : {{GPUTextureDimension/"3d"}} - :: - - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=] must be ≤ - |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension3D}}. - - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=] must be ≤ - |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension3D}}. - - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be ≤ - |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension3D}}. - - |descriptor|.{{GPUTextureDescriptor/sampleCount}} must be 1. - - |descriptor|.{{GPUTextureDescriptor/format}} must not be a [=compressed format=] or [=depth-or-stencil format=]. -
- - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=] must be multiple of [=texel block width=]. - - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=] must be multiple of [=texel block height=]. - - If |descriptor|.{{GPUTextureDescriptor/sampleCount}} > 1: - - |descriptor|.{{GPUTextureDescriptor/mipLevelCount}} must be 1. - - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be 1. - - |descriptor|.{{GPUTextureDescriptor/usage}} must not include the {{GPUTextureUsage/STORAGE_BINDING}} bit. - - |descriptor|.{{GPUTextureDescriptor/usage}} must include the {{GPUTextureUsage/RENDER_ATTACHMENT}} bit. - - |descriptor|.{{GPUTextureDescriptor/format}} must support multisampling according to [[#texture-format-caps]]. - - |descriptor|.{{GPUTextureDescriptor/mipLevelCount}} must be ≤ - [$maximum mipLevel count$](|descriptor|.{{GPUTextureDescriptor/dimension}}, |descriptor|.{{GPUTextureDescriptor/size}}). - - If |descriptor|.{{GPUTextureDescriptor/usage}} includes the {{GPUTextureUsage/RENDER_ATTACHMENT}} bit: - - |descriptor|.{{GPUTextureDescriptor/format}} must be a [=renderable format=]. - - |descriptor|.{{GPUTextureDescriptor/dimension}} must be either {{GPUTextureDimension/"2d"}} or {{GPUTextureDimension/"3d"}}. - - If |descriptor|.{{GPUTextureDescriptor/usage}} includes the {{GPUTextureUsage/STORAGE_BINDING}} bit: - - |descriptor|.{{GPUTextureDescriptor/format}} must be listed in [[#plain-color-formats]] table - with {{GPUTextureUsage/STORAGE_BINDING}} capability for the appropriate access mode. - - For each |viewFormat| in |descriptor|.{{GPUTextureDescriptor/viewFormats}}, - |descriptor|.{{GPUTextureDescriptor/format}} and |viewFormat| must be - [=texture view format compatible=]. +
+ - |this| must be a [=valid=] {{GPUDevice}}. + - |descriptor|.{{GPUTextureDescriptor/usage}} must not be 0. + - |descriptor|.{{GPUTextureDescriptor/usage}} must contain only bits present in |this|'s [=allowed texture usages=]. + - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=], + |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=], + and |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be > zero. + - |descriptor|.{{GPUTextureDescriptor/mipLevelCount}} must be > zero. + - |descriptor|.{{GPUTextureDescriptor/sampleCount}} must be either 1 or 4. + - If |descriptor|.{{GPUTextureDescriptor/dimension}} is: + +
+ : {{GPUTextureDimension/"1d"}} + :: + - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=] must be ≤ + |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension1D}}. + - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=] must be 1. + - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be 1. + - |descriptor|.{{GPUTextureDescriptor/sampleCount}} must be 1. + - |descriptor|.{{GPUTextureDescriptor/format}} must not be a [=compressed format=] or [=depth-or-stencil format=]. + + : {{GPUTextureDimension/"2d"}} + :: + - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=] must be ≤ + |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension2D}}. + - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=] must be ≤ + |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension2D}}. + - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be ≤ + |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureArrayLayers}}. + + : {{GPUTextureDimension/"3d"}} + :: + - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=] must be ≤ + |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension3D}}. + - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=] must be ≤ + |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension3D}}. + - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be ≤ + |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension3D}}. + - |descriptor|.{{GPUTextureDescriptor/sampleCount}} must be 1. + - |descriptor|.{{GPUTextureDescriptor/format}} must not be a [=compressed format=] or [=depth-or-stencil format=]. +
+ - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=] must be multiple of [=texel block width=]. + - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=] must be multiple of [=texel block height=]. + - If |descriptor|.{{GPUTextureDescriptor/sampleCount}} > 1: + - |descriptor|.{{GPUTextureDescriptor/mipLevelCount}} must be 1. + - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be 1. + - |descriptor|.{{GPUTextureDescriptor/usage}} must not include the {{GPUTextureUsage/STORAGE_BINDING}} bit. + - |descriptor|.{{GPUTextureDescriptor/usage}} must include the {{GPUTextureUsage/RENDER_ATTACHMENT}} bit. + - |descriptor|.{{GPUTextureDescriptor/format}} must support multisampling according to [[#texture-format-caps]]. + - |descriptor|.{{GPUTextureDescriptor/mipLevelCount}} must be ≤ + [$maximum mipLevel count$](|descriptor|.{{GPUTextureDescriptor/dimension}}, |descriptor|.{{GPUTextureDescriptor/size}}). + - If |descriptor|.{{GPUTextureDescriptor/usage}} includes the {{GPUTextureUsage/RENDER_ATTACHMENT}} bit: + - |descriptor|.{{GPUTextureDescriptor/format}} must be a [=renderable format=]. + - |descriptor|.{{GPUTextureDescriptor/dimension}} must be either {{GPUTextureDimension/"2d"}} or {{GPUTextureDimension/"3d"}}. + - If |descriptor|.{{GPUTextureDescriptor/usage}} includes the {{GPUTextureUsage/STORAGE_BINDING}} bit: + - |descriptor|.{{GPUTextureDescriptor/format}} must be listed in [[#plain-color-formats]] table + with {{GPUTextureUsage/STORAGE_BINDING}} capability for the appropriate access mode. + - For each |viewFormat| in |descriptor|.{{GPUTextureDescriptor/viewFormats}}, + |descriptor|.{{GPUTextureDescriptor/format}} and |viewFormat| must be + [=texture view format compatible=]. +
@@ -4390,9 +4410,9 @@ enum GPUTextureAspect {
-
+
When resolving GPUTextureViewDescriptor defaults for {{GPUTextureView}} - |texture| with a {{GPUTextureViewDescriptor}} |descriptor| run the following steps: + |texture| with a {{GPUTextureViewDescriptor}} |descriptor|, run the following [=device timeline=] steps: 1. Let |resolved| be a copy of |descriptor|. 1. If |resolved|.{{GPUTextureViewDescriptor/format}} is not [=map/exist|provided=]: @@ -4447,7 +4467,7 @@ enum GPUTextureAspect { 1. Return |resolved|.
-
+
To determine the array layer count of {{GPUTexture}} |texture|, run the following steps: @@ -4654,7 +4674,7 @@ A format is filterable if it suppor that is, it can be used with {{GPUSamplerBindingType/"filtering"}} {{GPUSampler}}s. See [[#texture-format-caps]]. -
+
resolving GPUTextureAspect(format, aspect) **Arguments:** @@ -4687,9 +4707,9 @@ the behavior the same as when the format is unknown to the implementation. See [[#texture-format-caps]] for information about which {{GPUTextureFormat}}s require features. -
- Validate texture format required features of a {{GPUTextureFormat}} - |format| with logical [=device=] |device| by running the following steps: +
+ To Validate texture format required features of a {{GPUTextureFormat}} |format|
+ with logical [=device=] |device|, run the following [=content timeline=] steps: 1. If |format| requires a feature and |device|.{{device/[[features]]}} does not [=list/contain=] the feature: @@ -5436,12 +5456,14 @@ type and each [=binding type=] has an associated [=internal usage=], given by th [=internal usage/constant=] -
+
The [=list=] of {{GPUBindGroupLayoutEntry}} values |entries| exceeds the binding slot limits of [=supported limits=] |limits| if the number of slots used toward a limit exceeds the supported value in |limits|. Each entry may use multiple slots toward multiple limits. + [=Device timeline=] steps: + 1. For each |entry| in |entries|, if:
@@ -5727,13 +5749,16 @@ A {{GPUBindGroupLayout}} object has the following internal slots: ### Compatibility ### {#bind-group-compatibility} -
+
Two {{GPUBindGroupLayout}} objects |a| and |b| are considered group-equivalent if and only if all of the following conditions are satisfied: - - |a|.{{GPUBindGroupLayout/[[exclusivePipeline]]}} == |b|.{{GPUBindGroupLayout/[[exclusivePipeline]]}}. - - for any binding number |binding|, one of the following conditions is satisfied: - - it's missing from both |a|.{{GPUBindGroupLayout/[[entryMap]]}} and |b|.{{GPUBindGroupLayout/[[entryMap]]}}. - - |a|.{{GPUBindGroupLayout/[[entryMap]]}}[|binding|] == |b|.{{GPUBindGroupLayout/[[entryMap]]}}[|binding|] + +
+ - |a|.{{GPUBindGroupLayout/[[exclusivePipeline]]}} == |b|.{{GPUBindGroupLayout/[[exclusivePipeline]]}}. + - for any binding number |binding|, one of the following conditions is satisfied: + - it's missing from both |a|.{{GPUBindGroupLayout/[[entryMap]]}} and |b|.{{GPUBindGroupLayout/[[entryMap]]}}. + - |a|.{{GPUBindGroupLayout/[[entryMap]]}}[|binding|] == |b|.{{GPUBindGroupLayout/[[entryMap]]}}[|binding|] +
If bind groups layouts are [=group-equivalent=] they can be interchangeably used in all contents. @@ -5769,7 +5794,7 @@ A {{GPUBindGroup}} object has the following internal slots: associated with lists of the [=internal usage=] flags.
-
+
The bound buffer ranges of a {{GPUBindGroup}} |bindGroup|, given [=list=]<GPUBufferDynamicOffset> |dynamicOffsets|, are computed as follows: @@ -6016,14 +6041,21 @@ following members:
-
- effective buffer binding size(binding) - 1. If |binding|.{{GPUBufferBinding/size}} is not [=map/exist|provided=]: - 1. Return max(0, |binding|.{{GPUBufferBinding/buffer}}.{{GPUBuffer/size}} - |binding|.{{GPUBufferBinding/offset}}); - 1. Return |binding|.{{GPUBufferBinding/size}}. +
+ effective buffer binding size(|binding|) + + **Arguments:** + + - {{GPUBufferBinding}} |binding| + + **Returns:** {{GPUSize64}} + + 1. If |binding|.{{GPUBufferBinding/size}} is not [=map/exist|provided=]: + 1. Return max(0, |binding|.{{GPUBufferBinding/buffer}}.{{GPUBuffer/size}} - |binding|.{{GPUBufferBinding/offset}}); + 1. Return |binding|.{{GPUBufferBinding/size}}.
-
+
Two {{GPUBufferBinding}} objects |a| and |b| are considered buffer-binding-aliasing if and only if all of the following are true: - |a|.{{GPUBufferBinding/buffer}} == |b|.{{GPUBufferBinding/buffer}} @@ -6659,11 +6691,15 @@ enum GPUPipelineErrorReason { : constructor() ::
+ **Arguments:** +
                 |message|: Error message of the base {{DOMException}}.
                 |options|: Options specific to {{GPUPipelineError}}.
             
+ [=Content timeline=] steps: + 1. Set [=this=].[=DOMException/name=] to `"GPUPipelineError"`. 1. Set [=this=].[=DOMException/message=] to |message|. 1. Set [=this=].{{GPUPipelineError/reason}} to |options|.{{GPUPipelineErrorInit/reason}}. @@ -6778,8 +6814,8 @@ interface mixin GPUPipelineBase { |this|.{{GPUPipelineBase/[[layout]]}}.{{GPUPipelineLayout/[[bindGroupLayouts]]}}[|index|]. Note: {{GPUBindGroupLayout}} is only ever used by-value, not by-reference, - so this is equivalent to returning the same internal object in a new wrapper. - A new {{GPUBindGroupLayout}} wrapper is returned each time to avoid a round-trip + so this is equivalent to returning the same [=internal object=] with a new [=WebGPU interface=]. + A new {{GPUBindGroupLayout}} [=WebGPU interface=] is returned each time to avoid a round-trip between the [=Content timeline=] and the [=Device timeline=].
@@ -6795,10 +6831,10 @@ is recommended in most cases. Bind groups created from default layouts cannot be pipelines, and the structure of the default layout may change when altering shaders, causing unexpected bind group creation errors. -
+
To create a default pipeline layout for {{GPUPipelineBase}} |pipeline|, -run the following steps: +run the following [=device timeline=] steps: 1. Let |groupCount| be 0. 1. Let |groupDescs| be a sequence of |device|.{{device/[[limits]]}}.{{supported limits/maxBindGroups}} @@ -7049,9 +7085,9 @@ typedef double GPUPipelineConstantValue; // May represent WGSL's bool, f32, i32,
-
+
To get the entry point({{GPUShaderStage}} |stage|, - {{GPUProgrammableStage}} |descriptor|) + {{GPUProgrammableStage}} |descriptor|), run the following [=device timeline=] steps: 1. If |descriptor|.{{GPUProgrammableStage/entryPoint}} is [=map/exists|provided=]: @@ -7112,7 +7148,7 @@ typedef double GPUPipelineConstantValue; // May represent WGSL's bool, f32, i32, result from the rules of the [[WGSL]] specification.
-
+
validating shader binding(|variable|, |layout|) **Arguments:** @@ -7258,7 +7294,7 @@ typedef double GPUPipelineConstantValue; // May represent WGSL's bool, f32, i32,
-
+
The minimum buffer binding size for a buffer binding variable |var| is computed as follows: 1. Let |T| be the [=store type=] of |var|. @@ -7274,7 +7310,7 @@ typedef double GPUPipelineConstantValue; // May represent WGSL's bool, f32, i32, within the bound region of the buffer.
-
+
A resource binding, [=pipeline-overridable=] constant, shader stage input, or shader stage output is considered to be statically used by an entry point if it is present in the [=interface of a shader @@ -7710,39 +7746,43 @@ dictionary GPURenderPipelineDescriptor - {{GPUPipelineLayout}} |layout| - {{GPUDevice}} |device| - Return `true` if all of the following conditions are satisfied: + [=Device timeline=] steps: + + 1. Return `true` if all of the following conditions are satisfied: - - [$validating GPUVertexState$](|device|, |descriptor|.{{GPURenderPipelineDescriptor/vertex}}, |layout|) succeeds. - - If |descriptor|.{{GPURenderPipelineDescriptor/fragment}} is [=map/exist|provided=]: - - [$validating GPUFragmentState$](|device|, |descriptor|.{{GPURenderPipelineDescriptor/fragment}}, |layout|) succeeds. - - If the [=builtin/sample_mask=] builtin is a [=shader stage output=] of - |descriptor|.{{GPURenderPipelineDescriptor/fragment}}: - - |descriptor|.{{GPURenderPipelineDescriptor/multisample}}.{{GPUMultisampleState/alphaToCoverageEnabled}} is `false`. - - If the [=builtin/frag_depth=] builtin is a [=shader stage output=] of - |descriptor|.{{GPURenderPipelineDescriptor/fragment}}: - - |descriptor|.{{GPURenderPipelineDescriptor/depthStencil}} must be - [=map/exist|provided=], and - |descriptor|.{{GPURenderPipelineDescriptor/depthStencil}}.{{GPUDepthStencilState/format}} - must have a [=aspect/depth=] aspect. - - [$validating GPUPrimitiveState$](|descriptor|.{{GPURenderPipelineDescriptor/primitive}}, |device|) succeeds. - - If |descriptor|.{{GPURenderPipelineDescriptor/depthStencil}} is [=map/exist|provided=]: - - [$validating GPUDepthStencilState$](|descriptor|.{{GPURenderPipelineDescriptor/depthStencil}}) succeeds. - - [$validating GPUMultisampleState$](|descriptor|.{{GPURenderPipelineDescriptor/multisample}}) succeeds. - - If |descriptor|.{{GPURenderPipelineDescriptor/multisample}}.{{GPUMultisampleState/alphaToCoverageEnabled}} - is true: - 1. |descriptor|.{{GPURenderPipelineDescriptor/fragment}} must be [=map/exist|provided=]. - 1. |descriptor|.{{GPURenderPipelineDescriptor/fragment}}.{{GPUFragmentState/targets}}[0] - must [=list/exist=] and be non-null. - 1. |descriptor|.{{GPURenderPipelineDescriptor/fragment}}.{{GPUFragmentState/targets}}[0].{{GPUColorTargetState/format}} - must be a {{GPUTextureFormat}} which is [=blendable=] and has an alpha channel. - - There must exist at least one attachment, either: - - A non-`null` value in - |descriptor|.{{GPURenderPipelineDescriptor/fragment}}.{{GPUFragmentState/targets}}, or - - A |descriptor|.{{GPURenderPipelineDescriptor/depthStencil}}. - - [$validating inter-stage interfaces$](|device|, |descriptor|) returns `true`. +
+ - [$validating GPUVertexState$](|device|, |descriptor|.{{GPURenderPipelineDescriptor/vertex}}, |layout|) succeeds. + - If |descriptor|.{{GPURenderPipelineDescriptor/fragment}} is [=map/exist|provided=]: + - [$validating GPUFragmentState$](|device|, |descriptor|.{{GPURenderPipelineDescriptor/fragment}}, |layout|) succeeds. + - If the [=builtin/sample_mask=] builtin is a [=shader stage output=] of + |descriptor|.{{GPURenderPipelineDescriptor/fragment}}: + - |descriptor|.{{GPURenderPipelineDescriptor/multisample}}.{{GPUMultisampleState/alphaToCoverageEnabled}} is `false`. + - If the [=builtin/frag_depth=] builtin is a [=shader stage output=] of + |descriptor|.{{GPURenderPipelineDescriptor/fragment}}: + - |descriptor|.{{GPURenderPipelineDescriptor/depthStencil}} must be + [=map/exist|provided=], and + |descriptor|.{{GPURenderPipelineDescriptor/depthStencil}}.{{GPUDepthStencilState/format}} + must have a [=aspect/depth=] aspect. + - [$validating GPUPrimitiveState$](|descriptor|.{{GPURenderPipelineDescriptor/primitive}}, |device|) succeeds. + - If |descriptor|.{{GPURenderPipelineDescriptor/depthStencil}} is [=map/exist|provided=]: + - [$validating GPUDepthStencilState$](|descriptor|.{{GPURenderPipelineDescriptor/depthStencil}}) succeeds. + - [$validating GPUMultisampleState$](|descriptor|.{{GPURenderPipelineDescriptor/multisample}}) succeeds. + - If |descriptor|.{{GPURenderPipelineDescriptor/multisample}}.{{GPUMultisampleState/alphaToCoverageEnabled}} + is true: + 1. |descriptor|.{{GPURenderPipelineDescriptor/fragment}} must be [=map/exist|provided=]. + 1. |descriptor|.{{GPURenderPipelineDescriptor/fragment}}.{{GPUFragmentState/targets}}[0] + must [=list/exist=] and be non-null. + 1. |descriptor|.{{GPURenderPipelineDescriptor/fragment}}.{{GPUFragmentState/targets}}[0].{{GPUColorTargetState/format}} + must be a {{GPUTextureFormat}} which is [=blendable=] and has an alpha channel. + - There must exist at least one attachment, either: + - A non-`null` value in + |descriptor|.{{GPURenderPipelineDescriptor/fragment}}.{{GPUFragmentState/targets}}, or + - A |descriptor|.{{GPURenderPipelineDescriptor/depthStencil}}. + - [$validating inter-stage interfaces$](|device|, |descriptor|) returns `true`. +
-
+
validating inter-stage interfaces(|device|, |descriptor|) **Arguments:** @@ -7752,6 +7792,8 @@ dictionary GPURenderPipelineDescriptor **Returns:** {{boolean}} + [=Device timeline=] steps: + 1. Let |maxVertexShaderOutputComponents| be |device|.limits.{{supported limits/maxInterStageShaderComponents}}. 1. If |descriptor|.{{GPURenderPipelineDescriptor/primitive}}.{{GPUPrimitiveState/topology}} @@ -7873,20 +7915,24 @@ constructs and rasterizes primitives from its vertex inputs: Requires the {{GPUFeatureName/"depth-clip-control"}} feature to be enabled. -
+
validating GPUPrimitiveState(|descriptor|, |device|) **Arguments:** - {{GPUPrimitiveState}} |descriptor| - {{GPUDevice}} |device| - Return `true` if all of the following conditions are satisfied: + [=Device timeline=] steps: + + 1. Return `true` if all of the following conditions are satisfied: - - If |descriptor|.{{GPUPrimitiveState/topology}} is not - {{GPUPrimitiveTopology/"line-strip"}} or {{GPUPrimitiveTopology/"triangle-strip"}}: - - |descriptor|.{{GPUPrimitiveState/stripIndexFormat}} must not be [=map/exist|provided=]. - - If |descriptor|.{{GPUPrimitiveState/unclippedDepth}} is `true`: - - {{GPUFeatureName/"depth-clip-control"}} must be [=enabled for=] |device|. +
+ - If |descriptor|.{{GPUPrimitiveState/topology}} is not + {{GPUPrimitiveTopology/"line-strip"}} or {{GPUPrimitiveTopology/"triangle-strip"}}: + - |descriptor|.{{GPUPrimitiveState/stripIndexFormat}} must not be [=map/exist|provided=]. + - If |descriptor|.{{GPUPrimitiveState/unclippedDepth}} is `true`: + - {{GPUFeatureName/"depth-clip-control"}} must be [=enabled for=] |device|. +
-A {{GPUImageDataLayout}} is a layout of images within some linear memory. +An image is comprised of one or more rows of [=texel blocks=], referred to here as +block rows. Each [=block row=] of an [=image=] must contain the same number of +[=texel blocks=], and all [=texel blocks=] in an [=image=] are of the same {{GPUTextureFormat}}. + +A {{GPUImageDataLayout}} is a layout of [=images=] within some linear memory. It's used when copying data between a [=texture=] and a {{GPUBuffer}}, or when scheduling a write into a [=texture=] from the {{GPUQueue}}. - For {{GPUTextureDimension/2d}} textures, data is copied between one or multiple contiguous [=images=] and [=array layers=]. - For {{GPUTextureDimension/3d}} textures, data is copied between one or multiple contiguous [=images=] and depth [=slices=]. -Issue: Define images more precisely. In particular, define them as being comprised of [=texel blocks=]. - -Operations that copy between byte arrays and textures always work with rows of [=texel blocks=], -which we'll call block rows. It's not possible to update only a part of a [=texel block=]. +Operations that copy between byte arrays and textures always operate on whole [=texel block=]. +It's not possible to update only a part of a [=texel block=]. [=Texel blocks=] are tightly packed within each [=block row=] in the linear memory layout of an image copy, with each subsequent texel block immediately following the previous texel block, From 93b7e83e14524e970600181bccbd4e11fbbeeac5 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Fri, 24 May 2024 13:57:38 -0700 Subject: [PATCH 074/285] Clarify list of context compatible formats (#4665) * Clarify list of context compatible formats * Apply Kai's suggestion * Update spec/index.bs * Fix formatting issue. --------- Co-authored-by: Kai Ninomiya --- spec/index.bs | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 68784c445f..ed4fb34534 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -13486,11 +13486,11 @@ specified points. ## GPUCanvasConfiguration ## {#canvas-configuration} -The supported context formats are a [=set=] of {{GPUTextureFormat}}s that must be -supported when specified as a {{GPUCanvasConfiguration}}.{{GPUCanvasConfiguration/format}} -regardless of the given {{GPUCanvasConfiguration}}.{{GPUCanvasConfiguration/device}}, -initially set to: «{{GPUTextureFormat/"bgra8unorm"}}, {{GPUTextureFormat/"rgba8unorm"}}, -{{GPUTextureFormat/"rgba16float"}}». +The supported context formats are the [=set=] of {{GPUTextureFormat}}s: +«{{GPUTextureFormat/"bgra8unorm"}}, {{GPUTextureFormat/"rgba8unorm"}}, +{{GPUTextureFormat/"rgba16float"}}». These formats must be supported when specified as a +{{GPUCanvasConfiguration}}.{{GPUCanvasConfiguration/format}} regardless of the given +{{GPUCanvasConfiguration}}.{{GPUCanvasConfiguration/device}}. Note: Canvas configuration cannot use `srgb` formats like {{GPUTextureFormat/"bgra8unorm-srgb"}}. Instead, use the non-`srgb` equivalent ({{GPUTextureFormat/"bgra8unorm"}}), specify the `srgb` From 9a9fcc9a8eb21f6beb5207ebf1d0b44466947f44 Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Fri, 24 May 2024 16:09:44 -0700 Subject: [PATCH 075/285] Point to TC39 draft of source map spec instead of sourcemaps.info (#4670) --- spec/index.bs | 25 +++++++++---------------- 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index ed4fb34534..5f481c3404 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -23,19 +23,6 @@ Markup Shorthands: css no Assume Explicit For: yes -
-{
-    "SourceMap": {
-        "authors": [
-            "John Lenz",
-            "Nick Fitzgerald"
-        ],
-        "href": "https://sourcemaps.info/spec.html",
-        "title": "Source Map Revision 3 Proposal"
-    }
-}
-
-
- - : requestAdapterInfo() - :: - Requests the {{GPUAdapterInfo}} for this {{GPUAdapter}}. - - Note: Adapter info values are returned with a Promise to give user agents an - opportunity to perform potentially long-running checks in the future. - -
-
- **Called on:** {{GPUAdapter}} |this|. - - **Returns:** {{Promise}}<{{GPUAdapterInfo}}> - - [=Content timeline=] steps: - - 1. Let |promise| be [=a new promise=]. - 1. Let |adapter| be |this|.{{GPUAdapter/[[adapter]]}}. - 1. Run the following steps [=in parallel=]: - 1. [=Resolve=] |promise| with a [$new adapter info$] for |adapter|. - - 1. Return |promise|. -
-
From 33fba809ff825bf9da3d4310cba869096ee654fd Mon Sep 17 00:00:00 2001 From: Mehmet Oguz Derin Date: Wed, 29 May 2024 07:29:47 +0900 Subject: [PATCH 081/285] Allow compressed formats for 3D textures (#4677) --- spec/index.bs | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 9484d49084..459bd0fd59 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -3907,9 +3907,7 @@ enum GPUTextureDimension { : "3d" :: Specifies a texture that has a width, height, and depth. {{GPUTextureDimension/"3d"}} - textures cannot be multisampled or use compressed or depth/stencil formats. - - + textures cannot be multisampled or use depth/stencil formats. ### Texture Usages ### {#texture-usage} @@ -4088,7 +4086,7 @@ The {{GPUTextureUsage}} flags determine how a {{GPUTexture}} may be used after i - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be ≤ |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension3D}}. - |descriptor|.{{GPUTextureDescriptor/sampleCount}} must be 1. - - |descriptor|.{{GPUTextureDescriptor/format}} must not be a [=compressed format=] or [=depth-or-stencil format=]. + - |descriptor|.{{GPUTextureDescriptor/format}} must not be a [=depth-or-stencil format=]. - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=] must be multiple of [=texel block width=]. - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=] must be multiple of [=texel block height=]. From 158e0cfbbd8a60d2251699ccdd23e9d6670e21b8 Mon Sep 17 00:00:00 2001 From: Teodor Tanasoaia <28601907+teoxoy@users.noreply.github.com> Date: Wed, 29 May 2024 01:10:40 +0200 Subject: [PATCH 082/285] make sure the device is valid when creating a texture view (#4678) --- spec/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/index.bs b/spec/index.bs index 459bd0fd59..9df7d24f1e 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -4427,7 +4427,7 @@ enum GPUTextureAspect { [$generate a validation error$], [$invalidate$] |view|, and stop.
- - |this| must be [$valid$]. + - |this| is [$valid to use with$] |this|.{{GPUObjectBase/[[device]]}}. - |descriptor|.{{GPUTextureViewDescriptor/aspect}} must be present in |this|.{{GPUTexture/format}}. - If the |descriptor|.{{GPUTextureViewDescriptor/aspect}} is {{GPUTextureAspect/"all"}}: - |descriptor|.{{GPUTextureViewDescriptor/format}} must equal either From 72a4cb61a9762dd78e4e6180d04bde64d3e18757 Mon Sep 17 00:00:00 2001 From: Teodor Tanasoaia <28601907+teoxoy@users.noreply.github.com> Date: Wed, 29 May 2024 01:21:55 +0200 Subject: [PATCH 083/285] add missing `multisample.count` validation (#4681) --- spec/index.bs | 1 + 1 file changed, 1 insertion(+) diff --git a/spec/index.bs b/spec/index.bs index 9df7d24f1e..eb50813f4b 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -8159,6 +8159,7 @@ interacts with a render pass's multisampled attachments. 1. Return `true` if all of the following conditions are satisfied:
+ - |descriptor|.{{GPUMultisampleState/count}} must be either 1 or 4. - If |descriptor|.{{GPUMultisampleState/alphaToCoverageEnabled}} is `true`: - |descriptor|.{{GPUMultisampleState/count}} > 1.
From 93faa4bf752f2152d0295c443d9697e4a5b0e337 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Tue, 28 May 2024 16:57:28 -0700 Subject: [PATCH 084/285] Detail binding size check at draw/dispatch (#4682) * Detail binding size check at draw/dispatch * Update spec/index.bs --------- Co-authored-by: Kai Ninomiya --- spec/index.bs | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index eb50813f4b..ab14568f5b 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -5955,6 +5955,14 @@ following members: {{GPUExternalTexture}}, or {{GPUBufferBinding}}. +A {{GPUBindGroupEntry}} object also has the following internal slots: + +
+ : \[[prevalidatedSize]], of type boolean + :: + Whether or not this binding entry had it's buffer size validated at time of creation. +
+ @@ -7904,9 +7907,16 @@ dictionary GPURenderPipelineDescriptor 1. Let |maxVertexShaderOutputComponents| be |device|.limits.{{supported limits/maxInterStageShaderComponents}}. - 1. If |descriptor|.{{GPURenderPipelineDescriptor/primitive}}.{{GPUPrimitiveState/topology}} - is {{GPUPrimitiveTopology/"point-list"}}: - 1. Decrement |maxVertexShaderOutputComponents| by 1. + 1. Let |maxVertexShaderOutputVariables| be + |device|.limits.{{supported limits/maxInterStageShaderVariables}}. + 1. If |descriptor|.{{GPURenderPipelineDescriptor/primitive}}.{{GPUPrimitiveState/topology}} + is {{GPUPrimitiveTopology/"point-list"}}: + 1. Decrement |maxVertexShaderOutputComponents| by 1. + 1. If [=builtin/clip_distances=] is declared in the output of + |descriptor|.{{GPURenderPipelineDescriptor/vertex}}: + 1. Let |clipDistancesSize| be the array size of [=builtin/clip_distances=]. + 1. Decrement |maxVertexShaderOutputComponents| by [=roundUp=](4, |clipDistancesSize|). + 1. Decrement |maxVertexShaderOutputVariables| by ([=roundUp=](4, |clipDistancesSize|) / 4). 1. Return `false` if any of the following requirements are unmet: - There must be no more than |maxVertexShaderOutputComponents| scalar components across all user-defined outputs for @@ -7915,7 +7925,7 @@ dictionary GPURenderPipelineDescriptor consumes 4 scalar components. - The [=location=] of each user-defined output of |descriptor|.{{GPURenderPipelineDescriptor/vertex}} must be - < |device|.limits.{{supported limits/maxInterStageShaderVariables}}. + < |maxVertexShaderOutputVariables|. 1. If |descriptor|.{{GPURenderPipelineDescriptor/fragment}} [=map/exist|is provided=]: 1. Let |maxFragmentShaderInputComponents| be |device|.limits.{{supported limits/maxInterStageShaderComponents}}. @@ -14514,6 +14524,12 @@ inside a primitive, is defined by the following inequalities: - −|p|.w ≤ |p|.y ≤ |p|.w - 0 ≤ |p|.z ≤ |p|.w (depth clipping) +When the {{GPUFeatureName/"clip-distances"}} feature is enabled, this [=clip volume=] can +be further restricted by user-defined half-spaces by declaring [=builtin/clip_distances=] in the +output of vertex stage. Each value in the [=builtin/clip_distances=] array will be linearly +interpolated across the primitive, and the portion of the primitive with interpolated distances less +than 0 will be clipped. + If |descriptor|.{{GPURenderPipelineDescriptor/primitive}}.{{GPUPrimitiveState/unclippedDepth}} is `true`, [=depth clipping=] is not applied: the [=clip volume=] is not bounded in the z dimension. @@ -15319,6 +15335,16 @@ This feature adds no [=optional API surfaces=]. Makes textures with formats {{GPUTextureFormat/"r32float"}}, {{GPUTextureFormat/"rg32float"}}, and {{GPUTextureFormat/"rgba32float"}} [=filterable=]. +

`"clip-distances"` +

+ +Allows the use of [=builtin/clip_distances=] in WGSL. + +This feature adds the following [=optional API surfaces=]: + +- New WGSL extensions: + - [=extension/clip_distances=] + # Appendices # {#appendices} ## Texture Format Capabilities ## {#texture-format-caps} diff --git a/wgsl/index.bs b/wgsl/index.bs index 79d3275a7c..4342d428a3 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -1753,6 +1753,11 @@ The valid [=enable-extensions=] are listed in the following table. `f16` `"shader-f16"` The [=f16=] type is valid to use in the WGSL module. Otherwise, using [=f16=] (directly or indirectly) will result in a [=shader-creation error=]. + `clip_distances` + `"clip_distances"` + The built-in variable [=built-in values/clip_distances=] is valid to use in the WGSL + module. Otherwise, using [=built-in values/clip_distances=] will result in a + [=shader-creation error=].
@@ -8971,6 +8976,11 @@ Each is described in detail in subsequent sections. input u32 + [=built-in values/clip_distances=] + vertex + output + array<f32, N> (`N` ≤ `8`) + [=built-in values/position=] vertex output @@ -9035,7 +9045,8 @@ Each is described in detail in subsequent sections.
struct VertexOutput { - @builtin(position) my_pos: vec4<f32> + @builtin(position) my_pos: vec4<f32>, + @builtin(clip_distances) my_clip_distances: array<f32, 8>, } @vertex @@ -9066,6 +9077,26 @@ Each is described in detail in subsequent sections.
+##### `clip_distances` ##### {#clip-distances-builtin-value} + + +
Name + clip_distances +
Stage + [=vertex shader stage|vertex=] +
Type + array<f32, N> +
Direction + Output +
Description + + Each value in the array represents a distance to a user-defined clip plane. A clip distance of + `0` means the vertex is on the plane, a positive distance means the vertex is inside the clip + half-space, and a negative distance means the vertex is outside the clip half-space. The array + size of [=built-in values/clip_distances=] [=shader-creation error|must=] be ≤ `8`. + See [[WebGPU#primitive-clipping]]. +
+ ##### `frag_depth` ##### {#frag-depth-builtin-value} From 7f84265605e07d9ecad93f0eef0a64653a573545 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Thu, 30 May 2024 10:14:12 -0700 Subject: [PATCH 086/285] Complete algorithm for copyTextureToTexture (#4672) * Complete algorithm for copyTextureToTexture * Address feedback from Kai --- spec/index.bs | 27 ++++++++++++++++++++++++++- spec/sections/copies.bs | 25 +++++++++++++++++++++++-- 2 files changed, 49 insertions(+), 3 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index ea4e96edaa..bc88fd31b0 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -9928,7 +9928,32 @@ dictionary GPUCommandEncoderDescriptor
[=Queue timeline=] steps: - Issue: Define copy, including provision for snorm. + 1. Let |blockWidth| be the [=texel block width=] of |source|.{{GPUImageCopyTexture/texture}}. + 1. Let |blockHeight| be the [=texel block height=] of |source|.{{GPUImageCopyTexture/texture}}. + + 1. Let |srcOrigin| be |source|.{{GPUImageCopyTexture/origin}}; + 1. Let |srcBlockOriginX| be (|srcOrigin|.[=GPUOrigin3D/x=] ÷ |blockWidth|). + 1. Let |srcBlockOriginY| be (|srcOrigin|.[=GPUOrigin3D/y=] ÷ |blockHeight|). + + 1. Let |dstOrigin| be |destination|.{{GPUImageCopyTexture/origin}}; + 1. Let |dstBlockOriginX| be (|dstOrigin|.[=GPUOrigin3D/x=] ÷ |blockWidth|). + 1. Let |dstBlockOriginY| be (|dstOrigin|.[=GPUOrigin3D/y=] ÷ |blockHeight|). + + 1. Let |blockColumns| be (|copySize|.[=GPUExtent3D/width=] ÷ |blockWidth|). + 1. Let |blockRows| be (|copySize|.[=GPUExtent3D/height=] ÷ |blockHeight|). + + 1. [=Assert=] that |srcBlockOriginX|, |srcBlockOriginY|, |dstBlockOriginX|, |dstBlockOriginY|, + |blockColumns|, and |blockRows| are integers. + + 1. For each |z| in the range [0, |copySize|.[=GPUExtent3D/depthOrArrayLayers=] − 1]: + 1. Let |srcSubregion| be [$texture copy sub-region$] (|z| + |srcOrigin|.[=GPUOrigin3D/z=]) of |source|. + 1. Let |dstSubregion| be [$texture copy sub-region$] (|z| + |dstOrigin|.[=GPUOrigin3D/z=]) of |destination|. + + 1. For each |y| in the range [0, |blockRows| − 1]: + 1. For each |x| in the range [0, |blockColumns| − 1]: + 1. Set [=texel block=] (|dstBlockOriginX| + |x|, |dstBlockOriginY| + |y|) of + |dstSubregion| to be an [=equivalent texel representation=] to [=texel block=] + (|srcBlockOriginX| + |x|, |srcBlockOriginY| + |y|) of |srcSubregion|.
diff --git a/spec/sections/copies.bs b/spec/sections/copies.bs index 8bb5d4d3c9..a45e429b0f 100644 --- a/spec/sections/copies.bs +++ b/spec/sections/copies.bs @@ -28,8 +28,8 @@ and "immediate" {{GPUQueue}} operations: - {{GPUQueue/writeTexture()}}, for {{ArrayBuffer}}-to-{{GPUTexture}} writes - {{GPUQueue/copyExternalImageToTexture()}}, for copies from Web Platform image sources to textures -Some texel values have multiple possible representations of some values, -e.g. as `r8snorm`, -1.0 can be represented as either -127 or -128. +Some texel values have multiple possible representations +of some values, e.g. as `r8snorm`, -1.0 can be represented as either -127 or -128. Copy commands are not guaranteed to preserve the source's bit-representation. The following definitions are used by these methods: @@ -167,6 +167,27 @@ dictionary GPUImageCopyTexture { Defines which aspects of the {{GPUImageCopyTexture/texture}} to copy to/from. +
+ The texture copy sub-region for depth slice or array layer |index| of {{GPUImageCopyTexture}} + |copyTexture| is determined by running the following steps: + + 1. Let |texture| be |copyTexture|.{{GPUImageCopyTexture/texture}}. + 1. If |texture|.{{GPUTexture/dimension}} is: +
+ : {{GPUTextureDimension/1d}} + :: 1. [=Assert=] |index| is `0` + 1. Let |depthSliceOrLayer| be |texture| + + : {{GPUTextureDimension/2d}} + :: Let |depthSliceOrLayer| be array layer |index| of |texture| + + : {{GPUTextureDimension/3d}} + :: Let |depthSliceOrLayer| be depth slice |index| of |texture| +
+ 1. Let |textureMip| be mip level |copyTexture|.{{GPUImageCopyTexture/mipLevel}} of |depthSliceOrLayer|. + 1. Return aspect |copyTexture|.{{GPUImageCopyTexture/aspect}} of |textureMip|. +
+
validating GPUImageCopyTexture(|imageCopyTexture|, |copySize|) From b09fbac2d504a20a0e41d3638926ee05866fa0fc Mon Sep 17 00:00:00 2001 From: alan-baker Date: Thu, 30 May 2024 14:38:15 -0400 Subject: [PATCH 087/285] [editorial] Fix var and value table footnote (#4684) * Change the footnote about vertex shader access to be consistent with the Address Space section --- wgsl/index.bs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 4342d428a3..ef3546b4e7 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -4725,9 +4725,9 @@ effective-value-type. creation|pipeline-creation time=]. 4. [=Override-declarations=] are part of the shader interface, but are not bound resources. -5. [=Storage buffers=] with an access mode other than [=access/read=] and - [=type/storage textures=] cannot be [=statically accessed=] - in a [=vertex shader stage=]. +5. [=Storage buffers=] and [=type/storage textures=] with an access mode other + than [=access/read=] cannot be [=statically accessed=] in a [=vertex shader + stage=]. See WebGPU {{GPUDevice/createBindGroupLayout()}}. 6. [=Atomic types=] can only appear in mutable storage buffers or workgroup variables. From 7f3c1a309aa04b0ce7307c865590b012b6077f74 Mon Sep 17 00:00:00 2001 From: Mehmet Oguz Derin Date: Fri, 31 May 2024 22:31:39 +0900 Subject: [PATCH 088/285] Restrict compressed 3D to BCn (#4685) --- spec/index.bs | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index bc88fd31b0..8a243ab85f 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -3910,7 +3910,8 @@ enum GPUTextureDimension { : "3d" :: Specifies a texture that has a width, height, and depth. {{GPUTextureDimension/"3d"}} - textures cannot be multisampled or use depth/stencil formats. + textures cannot be multisampled, and their format must support 3d textures + (all [[#plain-color-formats|plain color formats]] and some [[#packed-formats|packed/compressed formats]]). ### Texture Usages ### {#texture-usage} @@ -4089,7 +4090,8 @@ The {{GPUTextureUsage}} flags determine how a {{GPUTexture}} may be used after i - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be ≤ |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension3D}}. - |descriptor|.{{GPUTextureDescriptor/sampleCount}} must be 1. - - |descriptor|.{{GPUTextureDescriptor/format}} must not be a [=depth-or-stencil format=]. + - |descriptor|.{{GPUTextureDescriptor/format}} must support {{GPUTextureDimension/"3d"}} + textures according to [[#texture-format-caps]]. - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=] must be multiple of [=texel block width=]. - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=] must be multiple of [=texel block height=]. @@ -15225,7 +15227,7 @@ This feature adds the following [=optional API surfaces=]: -Allows for explicit creation of textures of BC compressed formats. +Allows for explicit creation of textures of BC compressed formats. Supports both 2D and 3D textures. This feature adds the following [=optional API surfaces=]: @@ -15250,7 +15252,7 @@ This feature adds the following [=optional API surfaces=]: -Allows for explicit creation of textures of ETC2 compressed formats. +Allows for explicit creation of textures of ETC2 compressed formats. Only supports 2D textures. This feature adds the following [=optional API surfaces=]: @@ -15270,7 +15272,7 @@ This feature adds the following [=optional API surfaces=]: -Allows for explicit creation of textures of ASTC compressed formats. +Allows for explicit creation of textures of ASTC compressed formats. Only supports 2D textures. This feature adds the following [=optional API surfaces=]: @@ -15376,7 +15378,9 @@ This feature adds the following [=optional API surfaces=]: ### Plain color formats ### {#plain-color-formats} -All plain color formats support {{GPUTextureUsage/COPY_SRC}}, {{GPUTextureUsage/COPY_DST}}, and {{GPUTextureUsage/TEXTURE_BINDING}} usage. +All plain color formats support {{GPUTextureUsage/COPY_SRC}}, {{GPUTextureUsage/COPY_DST}}, +and {{GPUTextureUsage/TEXTURE_BINDING}} usage. Additionally, all plain color formats support +textures with {{GPUTextureDimension/"3d"}} dimension. The {{GPUTextureUsage/RENDER_ATTACHMENT}} and {{GPUTextureUsage/STORAGE_BINDING}} columns specify support for {{GPUTextureUsage/RENDER_ATTACHMENT|GPUTextureUsage.RENDER_ATTACHMENT}} @@ -15794,7 +15798,8 @@ depth and stencil aspects. All [=depth-or-stencil formats=] support the {{GPUTextureUsage/COPY_SRC}}, {{GPUTextureUsage/COPY_DST}}, {{GPUTextureUsage/TEXTURE_BINDING}}, and {{GPUTextureUsage/RENDER_ATTACHMENT}} usages. All of these formats support multisampling. -However, certain copy operations also restrict the source and destination formats. +However, certain copy operations also restrict the source and destination formats, and none of +these formats support textures with {{GPUTextureDimension/"3d"}} dimension. Depth textures cannot be used with {{GPUSamplerBindingType/"filtering"}} samplers, but can always be used with {{GPUSamplerBindingType/"comparison"}} samplers even if they use filtering. @@ -15952,6 +15957,7 @@ The [=texel block memory cost=] of each of these formats is the same as its
@@ -15959,12 +15965,14 @@ The [=texel block memory cost=] of each of these formats is the same as its
[=Texel block copy footprint=] (Bytes) {{GPUTextureSampleType}} Texel block [=texel block width|width=]/[=texel block height|height=] + {{GPUTextureDimension/"3d"}} [=Feature=]
4 {{GPUTextureSampleType/"float"}},
{{GPUTextureSampleType/"unfilterable-float"}}
1 × 1 +
{{GPUTextureFormat/bc1-rgba-unorm}} 8 {{GPUTextureSampleType/"float"}},
{{GPUTextureSampleType/"unfilterable-float"}}
4 × 4 + {{GPUFeatureName/texture-compression-bc}}
{{GPUTextureFormat/bc1-rgba-unorm-srgb}} @@ -16003,6 +16011,7 @@ The [=texel block memory cost=] of each of these formats is the same as its 8 {{GPUTextureSampleType/"float"}},
{{GPUTextureSampleType/"unfilterable-float"}}
4 × 4 + {{GPUFeatureName/texture-compression-etc2}}
{{GPUTextureFormat/etc2-rgb8unorm-srgb}} @@ -16031,6 +16040,7 @@ The [=texel block memory cost=] of each of these formats is the same as its 16 {{GPUTextureSampleType/"float"}},
{{GPUTextureSampleType/"unfilterable-float"}}
4 × 4 + {{GPUFeatureName/texture-compression-astc}}
{{GPUTextureFormat/astc-4x4-unorm-srgb}} From 064562f2adb8633266aba7b1d7f4ad70a9077085 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Fri, 31 May 2024 10:04:40 -0700 Subject: [PATCH 089/285] Complete remaining texture copy algorithms (#4686) * Complete remaining texture copy algorithms * Address Kai's feedback --- spec/index.bs | 84 +++++++++++++++++++++++++++++++++++------ spec/sections/copies.bs | 17 +++++++-- 2 files changed, 85 insertions(+), 16 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 8a243ab85f..ee3ecf3367 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -9817,7 +9817,29 @@ dictionary GPUCommandEncoderDescriptor
[=Queue timeline=] steps: - Issue: Define copy, including provision for snorm. + 1. Let |blockWidth| be the [=texel block width=] of |destination|.{{GPUImageCopyTexture/texture}}. + 1. Let |blockHeight| be the [=texel block height=] of |destination|.{{GPUImageCopyTexture/texture}}. + + 1. Let |dstOrigin| be |destination|.{{GPUImageCopyTexture/origin}}; + 1. Let |dstBlockOriginX| be (|dstOrigin|.[=GPUOrigin3D/x=] ÷ |blockWidth|). + 1. Let |dstBlockOriginY| be (|dstOrigin|.[=GPUOrigin3D/y=] ÷ |blockHeight|). + + 1. Let |blockColumns| be (|copySize|.[=GPUExtent3D/width=] ÷ |blockWidth|). + 1. Let |blockRows| be (|copySize|.[=GPUExtent3D/height=] ÷ |blockHeight|). + + 1. [=Assert=] that |dstBlockOriginX|, |dstBlockOriginY|, |blockColumns|, and |blockRows| are integers. + + 1. For each |z| in the range [0, |copySize|.[=GPUExtent3D/depthOrArrayLayers=] − 1]: + 1. Let |dstSubregion| be [$texture copy sub-region$] (|z| + |dstOrigin|.[=GPUOrigin3D/z=]) of |destination|. + + 1. For each |y| in the range [0, |blockRows| − 1]: + 1. For each |x| in the range [0, |blockColumns| − 1]: + 1. Let |blockOffset| be the [$texel block byte offset$] of |source| for (|x|, |y|, |z|) of + |destination|.{{GPUImageCopyTexture/texture}}. + + 1. Set [=texel block=] (|dstBlockOriginX| + |x|, |dstBlockOriginY| + |y|) of + |dstSubregion| to be an [=equivalent texel representation=] to the [=texel block=] + described by |source|.{{GPUImageCopyBuffer/buffer}} at offset |blockOffset|.
@@ -9868,7 +9890,29 @@ dictionary GPUCommandEncoderDescriptor
[=Queue timeline=] steps: - Issue: Define copy, including provision for snorm. + 1. Let |blockWidth| be the [=texel block width=] of |source|.{{GPUImageCopyTexture/texture}}. + 1. Let |blockHeight| be the [=texel block height=] of |source|.{{GPUImageCopyTexture/texture}}. + + 1. Let |srcOrigin| be |source|.{{GPUImageCopyTexture/origin}}; + 1. Let |srcBlockOriginX| be (|srcOrigin|.[=GPUOrigin3D/x=] ÷ |blockWidth|). + 1. Let |srcBlockOriginY| be (|srcOrigin|.[=GPUOrigin3D/y=] ÷ |blockHeight|). + + 1. Let |blockColumns| be (|copySize|.[=GPUExtent3D/width=] ÷ |blockWidth|). + 1. Let |blockRows| be (|copySize|.[=GPUExtent3D/height=] ÷ |blockHeight|). + + 1. [=Assert=] that |srcBlockOriginX|, |srcBlockOriginY|, |blockColumns|, and |blockRows| are integers. + + 1. For each |z| in the range [0, |copySize|.[=GPUExtent3D/depthOrArrayLayers=] − 1]: + 1. Let |srcSubregion| be [$texture copy sub-region$] (|z| + |srcOrigin|.[=GPUOrigin3D/z=]) of |source|. + + 1. For each |y| in the range [0, |blockRows| − 1]: + 1. For each |x| in the range [0, |blockColumns| − 1]: + 1. Let |blockOffset| be the [$texel block byte offset$] of |destination| for (|x|, |y|, |z|) of + |source|.{{GPUImageCopyTexture/texture}}. + + 1. Set |destination|.{{GPUImageCopyBuffer/buffer}} at offset |blockOffset| to be an + [=equivalent texel representation=] to [=texel block=] + (|srcBlockOriginX| + |x|, |srcBlockOriginY| + |y|) of |srcSubregion|.
@@ -12678,6 +12722,10 @@ GPUQueue includes GPUObjectBase; 1. [=?=] [$validate GPUOrigin3D shape$](|destination|.{{GPUImageCopyTexture/origin}}). 1. [=?=] [$validate GPUExtent3D shape$](|size|). 1. Let |dataBytes| be [=get a copy of the buffer source|a copy of the bytes held by the buffer source=] |data|. + + Note: This is described as copying all of |data| to the device timeline, + but in practice |data| could be much larger than necessary. + Implementations should optimize by copying only the necessary bytes. 1. Issue the subsequent steps on the [=Device timeline=] of |this|.
@@ -12698,22 +12746,34 @@ GPUQueue includes GPUObjectBase; |dataLayout|.{{GPUImageDataLayout/bytesPerRow}} or |dataLayout|.{{GPUImageDataLayout/offset}}.
- 1. Let |contents| be the contents of the [=images=] seen by - viewing |dataBytes| with |dataLayout| and |size|. - - Issue: Specify more formally. - - Note: This is described as copying all of |data| to the device timeline, - but in practice |data| could be much larger than necessary. - Implementations should optimize by copying only the necessary bytes. 1. Issue the subsequent steps on the [=Queue timeline=] of |this|.
[=Queue timeline=] steps: - 1. Write |contents| into |destination|. + 1. Let |blockWidth| be the [=texel block width=] of |destination|.{{GPUImageCopyTexture/texture}}. + 1. Let |blockHeight| be the [=texel block height=] of |destination|.{{GPUImageCopyTexture/texture}}. + + 1. Let |dstOrigin| be |destination|.{{GPUImageCopyTexture/origin}}; + 1. Let |dstBlockOriginX| be (|dstOrigin|.[=GPUOrigin3D/x=] ÷ |blockWidth|). + 1. Let |dstBlockOriginY| be (|dstOrigin|.[=GPUOrigin3D/y=] ÷ |blockHeight|). + + 1. Let |blockColumns| be (|copySize|.[=GPUExtent3D/width=] ÷ |blockWidth|). + 1. Let |blockRows| be (|copySize|.[=GPUExtent3D/height=] ÷ |blockHeight|). + + 1. [=Assert=] that |dstBlockOriginX|, |dstBlockOriginY|, |blockColumns|, and |blockRows| are integers. - Issue: Define copy, including provision for snorm. + 1. For each |z| in the range [0, |copySize|.[=GPUExtent3D/depthOrArrayLayers=] − 1]: + 1. Let |dstSubregion| be [$texture copy sub-region$] (|z| + |dstOrigin|.[=GPUOrigin3D/z=]) of |destination|. + + 1. For each |y| in the range [0, |blockRows| − 1]: + 1. For each |x| in the range [0, |blockColumns| − 1]: + 1. Let |blockOffset| be the [$texel block byte offset$] of |dataLayout| for (|x|, |y|, |z|) of + |destination|.{{GPUImageCopyTexture/texture}}. + + 1. Set [=texel block=] (|dstBlockOriginX| + |x|, |dstBlockOriginY| + |y|) of + |dstSubregion| to be an [=equivalent texel representation=] to the [=texel block=] + described by |dataBytes| at offset |blockOffset|.
diff --git a/spec/sections/copies.bs b/spec/sections/copies.bs index a45e429b0f..19e12d57a0 100644 --- a/spec/sections/copies.bs +++ b/spec/sections/copies.bs @@ -67,8 +67,6 @@ This includes [[#copying-depth-stencil|copies]] to/from specific aspects of [=de stencil values are tightly packed in an array of bytes; depth values are tightly packed in an array of the appropriate type ("depth16unorm" or "depth32float"). -Issue: Define the exact copy semantics, by reference to common algorithms shared by the copy methods. -
: offset :: @@ -188,6 +186,19 @@ dictionary GPUImageCopyTexture { 1. Return aspect |copyTexture|.{{GPUImageCopyTexture/aspect}} of |textureMip|. +
+ The texel block byte offset of data described by {{GPUImageDataLayout}} |dataLayout| + corresponding to [=texel block=] |x|, |y| of depth slice or array layer |z| of a {{GPUTexture}} |texture| is + determined by running the following steps: + + 1. Let |blockBytes| be the [=texel block copy footprint=] of |texture|.{{GPUTexture/format}}. + 1. Let |imageOffset| be (|z| × |dataLayout|.{{GPUImageDataLayout/rowsPerImage}} × + |dataLayout|.{{GPUImageDataLayout/bytesPerRow}}) + |dataLayout|.{{GPUImageDataLayout/offset}}. + 1. Let |rowOffset| be (|y| × |dataLayout|.{{GPUImageDataLayout/bytesPerRow}}) + |imageOffset|. + 1. Let |blockOffset| be (|x| × |blockBytes|) + |rowOffset|. + 1. Return |blockOffset|. +
+
validating GPUImageCopyTexture(|imageCopyTexture|, |copySize|) @@ -267,8 +278,6 @@ dictionary GPUImageCopyTexture {
-Issue(gpuweb/gpuweb#69): Define the copies with {{GPUTextureDimension/1d}} and {{GPUTextureDimension/3d}} textures. -

`GPUImageCopyTextureTagged`

From 4512a31b8facf2780ac5b0a322b904f535b5702e Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Mon, 3 Jun 2024 20:32:39 -0700 Subject: [PATCH 090/285] Rename image->texel image, block row->texel block row (#4689) --- spec/sections/copies.bs | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/spec/sections/copies.bs b/spec/sections/copies.bs index 19e12d57a0..c4ec03bf0c 100644 --- a/spec/sections/copies.bs +++ b/spec/sections/copies.bs @@ -46,22 +46,25 @@ dictionary GPUImageDataLayout { }; -An image is comprised of one or more rows of [=texel blocks=], referred to here as -block rows. Each [=block row=] of an [=image=] must contain the same number of -[=texel blocks=], and all [=texel blocks=] in an [=image=] are of the same {{GPUTextureFormat}}. +A texel image is comprised of one or more rows of [=texel blocks=], referred to here +as texel block rows. Each [=texel block row=] of a [=texel image=] must contain the +same number of [=texel blocks=], and all [=texel blocks=] in a [=texel image=] are of the same +{{GPUTextureFormat}}. -A {{GPUImageDataLayout}} is a layout of [=images=] within some linear memory. +A {{GPUImageDataLayout}} is a layout of [=texel images=] within some linear memory. It's used when copying data between a [=texture=] and a {{GPUBuffer}}, or when scheduling a write into a [=texture=] from the {{GPUQueue}}. -- For {{GPUTextureDimension/2d}} textures, data is copied between one or multiple contiguous [=images=] and [=array layers=]. -- For {{GPUTextureDimension/3d}} textures, data is copied between one or multiple contiguous [=images=] and depth [=slices=]. +- For {{GPUTextureDimension/2d}} textures, data is copied between one or multiple contiguous + [=texel images=] and [=array layers=]. +- For {{GPUTextureDimension/3d}} textures, data is copied between one or multiple contiguous + [=texel images=] and depth [=slices=]. Operations that copy between byte arrays and textures always operate on whole [=texel block=]. It's not possible to update only a part of a [=texel block=]. -[=Texel blocks=] are tightly packed within each [=block row=] in the linear memory layout of an -image copy, with each subsequent texel block immediately following the previous texel block, +[=Texel blocks=] are tightly packed within each [=texel block row=] in the linear memory layout of an +image copy, with each subsequent [=texel block=] immediately following the previous [=texel block=], with no padding. This includes [[#copying-depth-stencil|copies]] to/from specific aspects of [=depth-or-stencil format=] textures: stencil values are tightly packed in an array of bytes; @@ -76,17 +79,20 @@ depth values are tightly packed in an array of the appropriate type ("depth16uno : bytesPerRow :: - The stride, in bytes, between the beginning of each [=block row=] and the subsequent [=block row=]. + The stride, in bytes, between the beginning of each [=texel block row=] and the subsequent + [=texel block row=]. - Required if there are multiple [=block rows=] (i.e. the copy height or depth is more than one block). + Required if there are multiple [=texel block rows=] (i.e. the copy height or depth is more + than one block). : rowsPerImage :: - Number of [=block rows=] per single [=image=] of the [=texture=]. + Number of [=texel block rows=] per single [=texel image=] of the [=texture=]. {{GPUImageDataLayout/rowsPerImage}} × - {{GPUImageDataLayout/bytesPerRow}} is the stride, in bytes, between the beginning of each [=image=] of data and the subsequent [=image=]. + {{GPUImageDataLayout/bytesPerRow}} is the stride, in bytes, between the beginning of each + [=texel image=] of data and the subsequent [=texel image=]. - Required if there are multiple [=images=] (i.e. the copy depth is more than one). + Required if there are multiple [=texel images=] (i.e. the copy depth is more than one).

`GPUImageCopyBuffer` From b468f37a13b428a676990fe156dc66dd952aedd4 Mon Sep 17 00:00:00 2001 From: Teodor Tanasoaia <28601907+teoxoy@users.noreply.github.com> Date: Tue, 4 Jun 2024 05:35:55 +0200 Subject: [PATCH 091/285] Make `unmap()` device timeline validation explicit (#4679) Co-authored-by: Kai Ninomiya --- spec/index.bs | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/spec/index.bs b/spec/index.bs index ee3ecf3367..ab42da795b 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -3581,7 +3581,13 @@ The {{GPUMapMode}} flags determine how a {{GPUBuffer}} is mapped when calling
[=Device timeline=] steps: - 1. If |this| is [$invalid$], return. + 1. If any of the following conditions are unsatisfied, return. + +
+ - |this| is [$valid to use with$] |this|.{{GPUObjectBase/[[device]]}}. +
+ + 1. [=Assert=] |this|.{{GPUBuffer/[[internal state]]}} is "[=GPUBuffer/[[internal state]]/unavailable=]". 1. If |bufferUpdate| is not `null`: 1. Issue the following steps on the [=Queue timeline=] of |this|.{{GPUObjectBase/[[device]]}}.{{GPUDevice/queue}}: From 61a383a11bbd8ef246e605219314516c1231aafa Mon Sep 17 00:00:00 2001 From: munrocket Date: Tue, 4 Jun 2024 22:41:48 +0400 Subject: [PATCH 092/285] Adjusts subgroups built-in to atomics operations (#4627) This PR is part of #4528 which only changes two operations ``` subgroupSum -> subgroupAdd subgroupProduct -> subgroupMul ``` And making builtin's consistent with GLSL, WGSL atomics and wgpu. --- proposals/subgroups.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/proposals/subgroups.md b/proposals/subgroups.md index c98a04f1b3..70f8782028 100644 --- a/proposals/subgroups.md +++ b/proposals/subgroups.md @@ -97,10 +97,10 @@ Using f16 as a parameter in any of these functions requires `subgroups-f16` to b | `fn subgroupShuffleXor(v : T, mask : u32) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Returns `v` from the active invocation whose subgroup_invocation_id matches `subgroup_invocation_id ^ mask`.
`mask` must be dynamically uniform. | | `fn subgroupShuffleUp(v : T, delta : u32) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Returns `v` from the active invocation whose subgroup_invocation_id matches `subgroup_invocation_id - delta` | | `fn subgroupShuffleDown(v : T, delta : u32) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Returns `v` from the active invocation whose subgroup_invocation_id matches `subgroup_invocation_id + delta` | -| `fn subgroupSum(e : T) -> T` | `T` must be u32, i32, f32, or a vector of those types | Reduction
Adds `e` among all active invocations and returns that result | -| `fn subgroupExclusiveSum(e : T) -> T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Exclusive scan
Returns the sum of `e` for all active invocations with subgroup_invocation_id less than this invocation | -| `fn subgroupProduct(e : T) -> T` | `T` must be u32, i32, f32, or a vector of those types | Reduction
Multiplies `e` among all active invocations and returns that result | -| `fn subgroupExclusiveProduct(e : T) -> T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Exclusive scan
Returns the product of `e` for all active invocations with subgroup_invocation_id less than this invocation | +| `fn subgroupAdd(e : T) -> T` | `T` must be u32, i32, f32, or a vector of those types | Reduction
Adds `e` among all active invocations and returns that result | +| `fn subgroupExclusiveAdd(e : T) -> T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Exclusive scan
Returns the sum of `e` for all active invocations with subgroup_invocation_id less than this invocation | +| `fn subgroupMul(e : T) -> T` | `T` must be u32, i32, f32, or a vector of those types | Reduction
Multiplies `e` among all active invocations and returns that result | +| `fn subgroupExclusiveMul(e : T) -> T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Exclusive scan
Returns the product of `e` for all active invocations with subgroup_invocation_id less than this invocation | | `fn subgroupAnd(e : T) -> T` | `T` must be u32, i32, or a vector of those types | Reduction
Performs a bitwise and of `e` among all active invocations and returns that result | | `fn subgroupOr(e : T) -> T` | `T` must be u32, i32, or a vector of those types | Reduction
Performs a bitwise or of `e` among all active invocations and returns that result | | `fn subgroupXor(e : T) -> T` | `T` must be u32, i32, or a vector of those types | Reduction
Performs a bitwise xor of `e` among all active invocations and returns that result | @@ -231,10 +231,10 @@ D3D12 would have to be proven empricially. | `subgroupShuffleXor` | OpGroupNonUniformShuffleXor | simd_shuffle_xor | WaveReadLaneAt with index equal `subgroup_invocation_id ^ mask` | | `subgroupShuffleUp` | OpGroupNonUniformShuffleUp | simd_shuffle_up | WaveReadLaneAt with index equal `subgroup_invocation_id - delta` | | `subgroupShuffleDown` | OpGroupNonUniformShuffleDown | simd_shuffle_down | WaveReadLaneAt with index equal `subgroup_invocation_id + delta` | -| `subgroupSum` | OpGroupNonUniform[IF]Add with Reduce operation | simd_sum | WaveActiveSum | -| `subgroupExclusiveSum` | OpGroupNonUniform[IF]Add with ExclusiveScan operation | simd_prefix_exclusive_sum | WavePrefixSum | -| `subgroupProduct` | OpGroupNonUniform[IF]Mul with Reduce operation | simd_product | WaveActiveProduct | -| `subgroupExclusiveProduct` | OpGroupNonUniform[IF]Add with ExclusiveScan operation | simd_prefix_exclusive_product | WavePrefixProduct | +| `subgroupAdd` | OpGroupNonUniform[IF]Add with Reduce operation | simd_sum | WaveActiveSum | +| `subgroupExclusiveAdd` | OpGroupNonUniform[IF]Add with ExclusiveScan operation | simd_prefix_exclusive_sum | WavePrefixSum | +| `subgroupMul` | OpGroupNonUniform[IF]Mul with Reduce operation | simd_product | WaveActiveProduct | +| `subgroupExclusiveMul` | OpGroupNonUniform[IF]Add with ExclusiveScan operation | simd_prefix_exclusive_product | WavePrefixProduct | | `subgroupAnd` | OpGroupNonUniformBitwiseAnd with Reduce operation | simd_and | WaveActiveBitAnd | | `subgroupOr` | OpGroupNonUniformBitwiseOr with Reduce operation | simd_or | WaveActiveBitOr | | `subgroupXor` | OpGroupNonUniformBitwiseXor with Reduce operation | simd_xor | WaveActiveBitXor | From 6b2e3015d1368d1644c05d6a4ead0f26e56d369e Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Thu, 6 Jun 2024 13:42:15 -0700 Subject: [PATCH 093/285] Add a definition for 64-bit unsigned ints (#4666) * Add a definition for 64-bit unsigned ints * Address David's feedback --- spec/index.bs | 5 +++-- wgsl/index.bs | 14 ++++++++++++++ 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index ab42da795b..3a35222c3e 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -104,6 +104,7 @@ spec: WGSL; urlPrefix: https://gpuweb.github.io/gpuweb/wgsl/# text: @binding; url: attribute-binding text: @group; url: attribute-group text: line break; url: line-break + text: 64-bit unsigned integer; url: 64-bit-integer for: address spaces text: workgroup; url: address-spaces-workgroup for: builtin @@ -13153,7 +13154,7 @@ When beginning a render pass, {{GPURenderPassDescriptor}}.{{GPURenderPassDescrip must be set to be able to use occlusion queries during the pass. An occlusion query is begun and ended by calling {{GPURenderPassEncoder/beginOcclusionQuery()}} and {{GPURenderPassEncoder/endOcclusionQuery()}} in pairs that cannot be nested, and resolved into a -{{GPUBuffer}} as 64-bit unsigned integer by {{GPUCommandEncoder}}.{{GPUCommandEncoder/resolveQuerySet()}}. +{{GPUBuffer}} as a [=64-bit unsigned integer=] by {{GPUCommandEncoder}}.{{GPUCommandEncoder/resolveQuerySet()}}. ## Timestamp Query ## {#timestamp} @@ -13162,7 +13163,7 @@ Timestamp queries allow applications to write timestamps to a {{GPUQuerySet}}, u - {{GPUComputePassDescriptor}}.{{GPUComputePassDescriptor/timestampWrites}} - {{GPURenderPassDescriptor}}.{{GPURenderPassDescriptor/timestampWrites}} -and then resolve timestamp values (in nanoseconds as a 64-bit unsigned integer) into +and then resolve timestamp values (in nanoseconds as a [=64-bit unsigned integer=]) into a {{GPUBuffer}}, using {{GPUCommandEncoder}}.{{GPUCommandEncoder/resolveQuerySet()}}. Timestamp values are implementation defined and may not increase monotonically. The physical device diff --git a/wgsl/index.bs b/wgsl/index.bs index ef3546b4e7..665f9fe400 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -10249,6 +10249,20 @@ host-shared buffer, then: Note: Recall that [=i32=] uses twos-complement representation, so the sign bit is in bit position 31. +64-bit integer layout: Some features of the WebGPU API write 64-bit +unsigned integer values into buffers. When such a value |V| appears at byte +offset |k| of a host-shared buffer, then: + * Byte |k| contains bits 0 through 7 of |V| + * Byte |k|+1 contains bits 8 through 15 of |V| + * Byte |k|+2 contains bits 16 through 23 of |V| + * Byte |k|+3 contains bits 24 through 31 of |V| + * Byte |k|+4 contains bits 32 through 39 of |V| + * Byte |k|+5 contains bits 40 through 47 of |V| + * Byte |k|+6 contains bits 48 through 55 of |V| + * Byte |k|+7 contains bits 56 through 63 of |V| + +Note: WGSL does not have a [=type/concrete=] [=64-bit integer=] type. + A value |V| of type [=f32=] is represented in [[!IEEE-754|IEEE-754]] binary32 format. It has one sign bit, 8 exponent bits, and 23 fraction bits. When |V| is placed at byte offset |k| of host-shared buffer, then: From 4e03db2a5c6b171f8f8a68eee7e5ee443ab1a818 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Fri, 7 Jun 2024 13:52:48 -0700 Subject: [PATCH 094/285] Fixed baseVertex type for drawIndexedIndirect (#4691) --- spec/index.bs | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 3a35222c3e..ef25d10acb 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -11931,15 +11931,19 @@ It must only be included by interfaces which also include those mixins. See [[#rendering-operations]] for the detailed specification. The indirect drawIndexed parameters encoded in the buffer must be a - tightly packed block of **five 32-bit unsigned integer values (20 bytes total)**, given in - the same order as the arguments for {{GPURenderCommandsMixin/drawIndexed()}}. For example: + tightly packed block of **five 32-bit values (20 bytes total)**, given in the same order as + the arguments for {{GPURenderCommandsMixin/drawIndexed()}}. The value corresponding to + `baseVertex` is a signed 32-bit integer, and all others are unsigned 32-bit integers. + For example:
             let drawIndexedIndirectParameters = new Uint32Array(5);
+            let drawIndexedIndirectParametersSigned = new Int32Array(drawIndexedIndirectParameters.buffer);
             drawIndexedIndirectParameters[0] = indexCount;
             drawIndexedIndirectParameters[1] = instanceCount;
             drawIndexedIndirectParameters[2] = firstIndex;
-            drawIndexedIndirectParameters[3] = baseVertex;
+            // baseVertex is a signed value.
+            drawIndexedIndirectParametersSigned[3] = baseVertex;
             drawIndexedIndirectParameters[4] = firstInstance;
         
@@ -11994,7 +11998,7 @@ It must only be included by interfaces which also include those mixins. (|indirectOffset| + 4) bytes. 1. Let |firstIndex| be an unsigned 32-bit integer read from |indirectBuffer| at (|indirectOffset| + 8) bytes. - 1. Let |baseVertex| be an unsigned 32-bit integer read from |indirectBuffer| at + 1. Let |baseVertex| be a signed 32-bit integer read from |indirectBuffer| at (|indirectOffset| + 12) bytes. 1. Let |firstInstance| be an unsigned 32-bit integer read from |indirectBuffer| at (|indirectOffset| + 16) bytes. @@ -12729,7 +12733,7 @@ GPUQueue includes GPUObjectBase; 1. [=?=] [$validate GPUOrigin3D shape$](|destination|.{{GPUImageCopyTexture/origin}}). 1. [=?=] [$validate GPUExtent3D shape$](|size|). 1. Let |dataBytes| be [=get a copy of the buffer source|a copy of the bytes held by the buffer source=] |data|. - + Note: This is described as copying all of |data| to the device timeline, but in practice |data| could be much larger than necessary. Implementations should optimize by copying only the necessary bytes. From 7b63ba7bd8eea92d9965e29ec54f58181c8fb59b Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Fri, 7 Jun 2024 15:46:31 -0700 Subject: [PATCH 095/285] Clarify that using the standard sample patterns is mandatory (#4693) --- spec/index.bs | 3 +++ 1 file changed, 3 insertions(+) diff --git a/spec/index.bs b/spec/index.bs index ef25d10acb..70157c51dc 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -14712,6 +14712,9 @@ such that the pixel ranges from (0, 0) to (1, 1): Sample 3: (0.625, 0.875)

+Implementations must use the [=standard sample pattern=] for the given +{{GPURenderPipelineDescriptor/multisample}}.{{GPUMultisampleState/count}} when performing rasterization. + Let's define a FragmentDestination to contain:
: position From 3e6b8e93a899c14e4099435e0df22c6167ffb606 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Fri, 7 Jun 2024 17:25:21 -0700 Subject: [PATCH 096/285] More rigorously define the relationship between device/GPUDevice (#4699) --- spec/index.bs | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 70157c51dc..a18f4317fb 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -1222,6 +1222,14 @@ A [=device=] has the following [=immutable properties=]: No [=limit/better=] limits can be used, even if the underlying [=adapter=] can support them.
+A [=device=] also has the following [=content timeline property=]: + +
+ : [[content device]], of type {{GPUDevice}}, readonly + :: + The [=Content timeline=] {{GPUDevice}} interface which this device is associated with. +
+
When a new device |device| is created from [=adapter=] |adapter| with {{GPUDeviceDescriptor}} |descriptor|, run the following [=device timeline=] steps: @@ -1257,10 +1265,7 @@ no validation errors are raised, most promises resolve normally, etc. To lose the device(|device|, |reason|) run the following [=device timeline=] steps: 1. [$Invalidate$] |device|. - 1. Let |gpuDevice| be the [=content timeline=] {{GPUDevice}} corresponding to |device|. - - Issue: Define this more rigorously. - 1. Issue the following steps on the [=content timeline=] of |gpuDevice|: + 1. Issue the following steps on the [=content timeline=] of |device|.{{device/[[content device]]}}:
1. Resolve |device|.{{GPUDevice/lost}} with a new {{GPUDeviceLostInfo}} with {{GPUDeviceLostInfo/reason}} set to |reason| and @@ -2506,7 +2511,14 @@ interface GPUAdapter {
[=Content timeline=] steps: - 1. [=Resolve=] |promise| with a new {{GPUDevice}} object |device|. + + 1. Let |gpuDevice| be a new {{GPUDevice}} instance. + 1. Set |gpuDevice|.{{GPUObjectBase/[[device]]}} to |device|. + 1. Set |device|.{{device/[[content device]]}} to |gpuDevice|. + 1. Set |gpuDevice|.{{GPUObjectBase/label}} to |descriptor|.{{GPUObjectDescriptorBase/label}}. + + 1. [=Resolve=] |promise| with |gpuDevice|. Note: If the device is already lost because the adapter could not fulfill the request, From 174f42c6d4d7d5c3acd79b2b65ee714ccc7eb153 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Mon, 10 Jun 2024 15:47:28 -0700 Subject: [PATCH 097/285] Remove outdated inline issue (#4703) --- spec/index.bs | 2 -- 1 file changed, 2 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index a18f4317fb..28d096726b 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -7807,8 +7807,6 @@ dictionary GPURenderPipelineDescriptor 1. Set |pipeline|.{{GPURenderPipeline/[[writesStencil]]}} to true. 1. Set |pipeline|.{{GPUPipelineBase/[[layout]]}} to |layout|.
- - Issue: need description of the render states.
: createRenderPipelineAsync(descriptor) From a99e48e6e2f8b5af2c9d63797f22116e52a52670 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Tue, 11 Jun 2024 13:07:12 -0700 Subject: [PATCH 098/285] Add queue steps for copyExternalImageToTexture (#4698) * Add queue steps for copyExternalImageToTexture * Address feedback from Kai * Add back equivalent texel language * Update spec/index.bs --------- Co-authored-by: Kai Ninomiya --- spec/index.bs | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/spec/index.bs b/spec/index.bs index 28d096726b..f8e1489a05 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -12894,7 +12894,33 @@ GPUQueue includes GPUObjectBase; - {{GPUTextureFormat/"rgba16float"}} - {{GPUTextureFormat/"rgba32float"}}
- 1. Issue: Do the actual copy. + + 1. If |copySize|.[=GPUExtent3D/depthOrArrayLayers=] is > 0, issue the subsequent + steps on the [=Queue timeline=] of |this|. +
+
+ [=Queue timeline=] steps: + + 1. [=Assert=] that the [=texel block width=] of |destination|.{{GPUImageCopyTexture/texture}} is 1, + the [=texel block height=] of |destination|.{{GPUImageCopyTexture/texture}} is 1, and that + |copySize|.[=GPUExtent3D/depthOrArrayLayers=] is 1. + + 1. Let |srcOrigin| be |source|.{{GPUImageCopyExternalImage/origin}}. + 1. Let |dstOrigin| be |destination|.{{GPUImageCopyTexture/origin}}. + 1. Let |dstSubregion| be [$texture copy sub-region$] (|dstOrigin|.[=GPUOrigin3D/z=]) of |destination|. + + 1. For each |y| in the range [0, |copySize|.[=GPUExtent3D/height=] − 1]: + 1. Let |srcY| be |y| if |source|.{{GPUImageCopyExternalImage/flipY}} is `false` and + (|copySize|.[=GPUExtent3D/height=] − 1 − |y|) otherwise. + 1. For each |x| in the range [0, |copySize|.[=GPUExtent3D/width=] − 1]: + 1. Set [=texel block=] + (|dstOrigin|.[=GPUOrigin3D/x=] + |x|, |dstOrigin|.[=GPUOrigin3D/y=] + |y|) of + |dstSubregion| to be an [=equivalent texel representation=] of the pixel at + (|srcOrigin|.[=GPUOrigin2D/x=] + |x|, |srcOrigin|.[=GPUOrigin2D/y=] + |srcY|) of + |source|.{{GPUImageCopyExternalImage/source}} after applying any + [[#color-space-conversions|color encoding]] required by + |destination|.{{GPUImageCopyTextureTagged/colorSpace}} and + |destination|.{{GPUImageCopyTextureTagged/premultipliedAlpha}}.
From 0e9ee00cbbe5648f81bfe3e0db9110639472f148 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Tue, 11 Jun 2024 13:08:51 -0700 Subject: [PATCH 099/285] Improve render bundle steps (#4702) * Improve render bundle steps * Address Kai's feedback * One more bit of feedback to address --- spec/index.bs | 66 ++++++++++++++++++++++++++------------------------- 1 file changed, 34 insertions(+), 32 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index f8e1489a05..55a51da0c6 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -10686,7 +10686,7 @@ dictionary GPUComputePassDescriptor |this|.device.limits.{{supported limits/maxComputeWorkgroupsPerDimension}}.
- 1. Let |passState| be a snapshot of |this|'s current state. + 1. Let |bindingState| be a snapshot of |this|'s current state. 1. [$Enqueue a command$] on |this| which issues the subsequent steps on the [=Queue timeline=].
@@ -10694,8 +10694,8 @@ dictionary GPUComputePassDescriptor [=Queue timeline=] steps: 1. Execute a grid of workgroups with dimensions [|workgroupCountX|, |workgroupCountY|, - |workgroupCountZ|] with |passState|.{{GPUComputePassEncoder/[[pipeline]]}} using - |passState|.{{GPUBindingCommandsMixin/[[bind_groups]]}}. + |workgroupCountZ|] with |bindingState|.{{GPUComputePassEncoder/[[pipeline]]}} using + |bindingState|.{{GPUBindingCommandsMixin/[[bind_groups]]}}.
@@ -10752,7 +10752,7 @@ dictionary GPUComputePassDescriptor
1. Add |indirectBuffer| to the [=usage scope=] as [=internal usage/input=]. - 1. Let |passState| be a snapshot of |this|'s current state. + 1. Let |bindingState| be a snapshot of |this|'s current state. 1. [$Enqueue a command$] on |this| which issues the subsequent steps on the [=Queue timeline=].
@@ -10769,8 +10769,8 @@ dictionary GPUComputePassDescriptor |this|.device.limits.{{supported limits/maxComputeWorkgroupsPerDimension}}, stop. 1. Execute a grid of workgroups with dimensions [|workgroupCountX|, |workgroupCountY|, - |workgroupCountZ|] with |passState|.{{GPUComputePassEncoder/[[pipeline]]}} using - |passState|.{{GPUBindingCommandsMixin/[[bind_groups]]}}. + |workgroupCountZ|] with |bindingState|.{{GPUComputePassEncoder/[[pipeline]]}} using + |bindingState|.{{GPUBindingCommandsMixin/[[bind_groups]]}}.
@@ -11777,7 +11777,7 @@ It must only be included by interfaces which also include those mixins.
1. Increment |this|.{{GPURenderCommandsMixin/[[drawCount]]}} by 1. - 1. Let |passState| be a snapshot of |this|'s current state. + 1. Let |bindingState| be a snapshot of |this|'s current state. 1. [$Enqueue a render command$] on |this| which issues the subsequent steps on the [=Queue timeline=] with |renderState| when executed.
@@ -11786,7 +11786,7 @@ It must only be included by interfaces which also include those mixins. 1. Draw |instanceCount| instances, starting with instance |firstInstance|, of primitives consisting of |vertexCount| verticies, starting with vertex |firstVertex|, - with the states from |passState| and |renderState|. + with the states from |bindingState| and |renderState|.
@@ -11838,7 +11838,7 @@ It must only be included by interfaces which also include those mixins.
1. Increment |this|.{{GPURenderCommandsMixin/[[drawCount]]}} by 1. - 1. Let |passState| be a snapshot of |this|'s current state. + 1. Let |bindingState| be a snapshot of |this|'s current state. 1. [$Enqueue a render command$] on |this| which issues the subsequent steps on the [=Queue timeline=] with |renderState| when executed.
@@ -11848,7 +11848,7 @@ It must only be included by interfaces which also include those mixins. 1. Draw |instanceCount| instances, starting with instance |firstInstance|, of primitives consisting of |indexCount| indexed verticies, starting with index |firstIndex| from vertex |baseVertex|, - with the states from |passState| and |renderState|. + with the states from |bindingState| and |renderState|.
Note: a valid program should also never use vertex indices with @@ -11914,7 +11914,7 @@ It must only be included by interfaces which also include those mixins. 1. Add |indirectBuffer| to the [=usage scope=] as [=internal usage/input=]. 1. Increment |this|.{{GPURenderCommandsMixin/[[drawCount]]}} by 1. - 1. Let |passState| be a snapshot of |this|'s current state. + 1. Let |bindingState| be a snapshot of |this|'s current state. 1. [$Enqueue a render command$] on |this| which issues the subsequent steps on the [=Queue timeline=] with |renderState| when executed.
@@ -11931,7 +11931,7 @@ It must only be included by interfaces which also include those mixins. (|indirectOffset| + 12) bytes. 1. Draw |instanceCount| instances, starting with instance |firstInstance|, of primitives consisting of |vertexCount| verticies, starting with vertex |firstVertex|, - with the states from |passState| and |renderState|. + with the states from |bindingState| and |renderState|.
@@ -11995,7 +11995,7 @@ It must only be included by interfaces which also include those mixins. 1. Add |indirectBuffer| to the [=usage scope=] as [=internal usage/input=]. 1. Increment |this|.{{GPURenderCommandsMixin/[[drawCount]]}} by 1. - 1. Let |passState| be a snapshot of |this|'s current state. + 1. Let |bindingState| be a snapshot of |this|'s current state. 1. [$Enqueue a render command$] on |this| which issues the subsequent steps on the [=Queue timeline=] with |renderState| when executed.
@@ -12015,7 +12015,7 @@ It must only be included by interfaces which also include those mixins. 1. Draw |instanceCount| instances, starting with instance |firstInstance|, of primitives consisting of |indexCount| indexed verticies, starting with index |firstIndex| from vertex |baseVertex|, - with the states from |passState| and |renderState|. + with the states from |bindingState| and |renderState|.
@@ -12384,30 +12384,34 @@ attachments used by this encoder. 1. For each |bundle| in |bundles|: 1. Increment |this|.{{GPURenderCommandsMixin/[[drawCount]]}} by |bundle|.{{GPURenderBundle/[[drawCount]]}}. + 1. [$Enqueue a render command$] on |this| which issues the following steps on the + [=Queue timeline=] with |renderState| when executed: - 1. [=map/Clear=] |this|.{{GPUBindingCommandsMixin/[[bind_groups]]}}. - 1. Set |this|.{{GPURenderCommandsMixin/[[pipeline]]}} to `null`. - 1. Set |this|.{{GPURenderCommandsMixin/[[index_buffer]]}} to `null`. - 1. [=map/Clear=] |this|.{{GPURenderCommandsMixin/[[vertex_buffers]]}}. +
+ [=Queue timeline=] steps: - 1. Let |passState| be a snapshot of |this|'s current state. - 1. [$Enqueue a render command$] on |this| which issues the subsequent steps on the - [=Queue timeline=] with |renderState| when executed. -
-
- [=Queue timeline=] steps: + 1. Execute each command in |bundle|.{{GPURenderBundle/[[command_list]]}} + with |renderState|. - 1. For each |bundle| in |bundles|: - 1. Execute each command in |bundle|.{{GPURenderBundle/[[command_list]]}} - with |passState| and |renderState|. + Note: |renderState| cannot be changed by executing render bundles. Binding state was + already captured at bundle encoding time, and so isn't used when executing bundles. +
- Note: |renderState| cannot be changed by executing render bundles. - Also note, no mutable |passState| state is visible to render bundles. + 1. [$Reset the render pass binding state$] of |this|.
-
+
+ To Reset the render pass binding state of {{GPURenderPassEncoder}} |encoder| run + the following [=device timeline=] steps: + + 1. [=map/Clear=] |encoder|.{{GPUBindingCommandsMixin/[[bind_groups]]}}. + 1. Set |encoder|.{{GPURenderCommandsMixin/[[pipeline]]}} to `null`. + 1. Set |encoder|.{{GPURenderCommandsMixin/[[index_buffer]]}} to `null`. + 1. [=map/Clear=] |encoder|.{{GPURenderCommandsMixin/[[vertex_buffers]]}}. +
+ # Bundles # {#bundles} A bundle is a partial, limited pass that is encoded once and can then be executed multiple times as @@ -12523,8 +12527,6 @@ GPURenderBundleEncoder includes GPURenderCommandsMixin; 1. Set |e|.{{GPURenderCommandsMixin/[[stencilReadOnly]]}} to |descriptor|.{{GPURenderBundleEncoderDescriptor/stencilReadOnly}}. 1. Set |e|.{{GPUCommandsMixin/[[state]]}} to "[=encoder state/open=]". 1. Set |e|.{{GPURenderCommandsMixin/[[drawCount]]}} to 0. - - Issue: Describe the reset of the steps for {{GPUDevice/createRenderBundleEncoder()}}.
From 538a4c75b819a48b0d4f631d68be7ce24c379e8c Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Thu, 13 Jun 2024 13:37:14 -0700 Subject: [PATCH 100/285] Algorithm steps for attachment load/store (#4706) * Algorithm steps for attachment load/store * Address Kai's feedback --- spec/index.bs | 144 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 140 insertions(+), 4 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 55a51da0c6..56e4d456f0 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -9598,6 +9598,7 @@ dictionary GPUCommandEncoderDescriptor
1. Set |pass|.{{GPURenderCommandsMixin/[[drawCount]]}} to 0. 1. Set |pass|.{{GPURenderPassEncoder/[[maxDrawCount]]}} to |descriptor|.{{GPURenderPassDescriptor/maxDrawCount}}. + 1. Set |pass|.{{GPURenderPassEncoder/[[maxDrawCount]]}} to |descriptor|.{{GPURenderPassDescriptor/maxDrawCount}}. 1. [$Enqueue a command$] on |this| which issues the subsequent steps on the [=Queue timeline=] when executed. @@ -9607,10 +9608,72 @@ dictionary GPUCommandEncoderDescriptor 1. Let the {{GPUCommandBuffer/[[renderState]]}} of the currently executing {{GPUCommandBuffer}} be a new [=RenderState=]. - 1. Issue: Perform attachment loads/clears. -
- Issue: specify the behavior of read-only depth/stencil + 1. Set {{GPUCommandBuffer/[[renderState]]}}.{{RenderState/[[colorAttachments]]}} to + |descriptor|.{{GPURenderPassDescriptor/colorAttachments}}. + 1. Set {{GPUCommandBuffer/[[renderState]]}}.{{RenderState/[[depthStencilAttachment]]}} to + |descriptor|.{{GPURenderPassDescriptor/depthStencilAttachment}}. + + 1. For each non-`null` |colorAttachment| in |descriptor|.{{GPURenderPassDescriptor/colorAttachments}}: + 1. Let |colorView| be |colorAttachment|.{{GPURenderPassColorAttachment/view}}. + 1. If |colorView|.{{GPUTextureView/[[descriptor]]}}.{{GPUTextureViewDescriptor/dimension}} is: +
+ : {{GPUTextureViewDimension/"3d"}} + :: + Let |colorSubregion| be |colorAttachment|.{{GPURenderPassColorAttachment/depthSlice}} of + |colorView|. + + : Otherwise + :: + Let |colorSubregion| be |colorView|. +
+ + 1. If |colorAttachment|.{{GPURenderPassColorAttachment/loadOp}} is: +
+ : {{GPULoadOp/"load"}} + :: + Ensure the contents of |colorSubregion| are loaded into the [=framebuffer memory=] + associated with |colorSubregion|. + + : {{GPULoadOp/"clear"}} + :: + Set every [=texel block|texel=] of the [=framebuffer memory=] associated with + |colorSubregion| to |colorAttachment|.{{GPURenderPassColorAttachment/clearValue}}. +
+ + 1. If |depthStencilAttachment| is not `null`: + 1. If |depthStencilAttachment|.{{GPURenderPassDepthStencilAttachment/depthLoadOp}} is: +
+ : {{GPULoadOp/"load"}} + :: + Ensure the contents of the [=aspect/depth=] [=GPUTextureView/subresource=] of + |depthStencilView| are loaded into the [=framebuffer memory=] associated with + |depthStencilView|. + + : {{GPULoadOp/"clear"}} + :: + Set every [=texel block|texel=] of the [=framebuffer memory=] associated with the + [=aspect/depth=] [=GPUTextureView/subresource=] of |depthStencilView| to + |depthStencilAttachment|.{{GPURenderPassDepthStencilAttachment/depthClearValue}}. +
+ + 1. If |depthStencilAttachment|.{{GPURenderPassDepthStencilAttachment/stencilLoadOp}} is: +
+ : {{GPULoadOp/"load"}} + :: + Ensure the contents of the [=aspect/stencil=] [=GPUTextureView/subresource=] of + |depthStencilView| are loaded into the [=framebuffer memory=] associated with + |depthStencilView|. + + : {{GPULoadOp/"clear"}} + :: + Set every [=texel block|texel=] of the [=framebuffer memory=] associated with the + [=aspect/stencil=] [=GPUTextureView/subresource=] |depthStencilView| to + |depthStencilAttachment|.{{GPURenderPassDepthStencilAttachment/stencilClearValue}}. +
+ + Issue: specify the behavior of read-only depth/stencil +
: beginComputePass(descriptor) @@ -10915,8 +10978,20 @@ When executing encoded render pass commands as part of a {{GPUCommandBuffer}}, a : \[[stencilReference]], of type {{GPUStencilValue}} :: Current stencil reference value, initially `0`. + + : \[[colorAttachments]], of type [=sequence=]<{{GPURenderPassColorAttachment}}?> + :: The color attachments and state for this render pass. + + : \[[depthStencilAttachment]], of type {{GPURenderPassDepthStencilAttachment}}? + :: The depth/stencil attachment and state for this render pass. +Render passes also have framebuffer memory, which contains the [=texel block|texel=] data associated with +each attachment that is written into by draw commands and read from for blending and depth/stencil testing. + +Note: Depending on the GPU hardware, [=framebuffer memory=] may be the memory allocated by the attachment textures or +may be a separate area of memory that the texture data is copied to and from, such as with tile-based architectures. + ### Render Pass Encoder Creation ### {#render-pass-encoder-creation} @@ -8448,12 +8478,14 @@ enum GPUBlendFactor { GPUBlendFactor Blend factor RGBA components + [=Feature=] "zero" (0, 0, 0, 0) + "one" (1, 1, 1, 1) @@ -8490,6 +8522,19 @@ enum GPUBlendFactor { "one-minus-constant" (1 - Rconst, 1 - Gconst, 1 - Bconst, 1 - Aconst) + + "src1" + (Rsrc1, Gsrc1, Bsrc1, Asrc1) + {{GPUFeatureName/dual-source-blending}} + + "one-minus-src1" + (1 - Rsrc1, 1 - Gsrc1, 1 - Bsrc1, 1 - Asrc1) + + "src1-alpha" + (Asrc1, Asrc1, Asrc1, Asrc1) + + "one-minus-src1-alpha" + (1 - Asrc1, 1 - Asrc1, 1 - Asrc1, 1 - Asrc1) @@ -15620,6 +15665,23 @@ This feature adds the following [=optional API surfaces=]: - New WGSL extensions: - [=extension/clip_distances=] +

`"dual-source-blending"` +

+ +Allows the use of [=blend_src=] in WGSL and simultaneously using both pixel shader outputs +(`@blend_src(0)` and `@blend_src(1)`) as inputs to a blending operation with the single color +attachment at [=location=] `0`. + +This feature adds the following [=optional API surfaces=]: +- Allows the use of the below {{GPUBlendFactor}}s: + - {{GPUBlendFactor/"src1"}} + - {{GPUBlendFactor/"one-minus-src1"}} + - {{GPUBlendFactor/"src1-alpha"}} + - {{GPUBlendFactor/"one-minus-src1-alpha"}} + +- New WGSL extensions: + - [=extension/dual_source_blending=] + # Appendices # {#appendices} ## Texture Format Capabilities ## {#texture-format-caps} diff --git a/wgsl/index.bs b/wgsl/index.bs index 665f9fe400..9d7666b64a 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -1335,6 +1335,7 @@ The [=syntax/attribute=] names are: * `'interpolate'` * `'invariant'` * `'location'` +* `'blend_src'` * `'must_use'` * `'size'` * `'vertex'` @@ -1758,6 +1759,10 @@ The valid [=enable-extensions=] are listed in the following table. The built-in variable [=built-in values/clip_distances=] is valid to use in the WGSL module. Otherwise, using [=built-in values/clip_distances=] will result in a [=shader-creation error=]. + `dual_source_blending` + `"dual_source_blending"` + The attribute [=attribute/blend_src=] is valid to use in the WGSL module. Otherwise, using + [=attribute/blend_src=] will result in a [=shader-creation error=].
@@ -3106,12 +3111,13 @@ The following attributes can be applied to structure members: * [=attribute/align=] * [=attribute/builtin=] * [=attribute/location=] + * [=attribute/blend_src=] * [=attribute/interpolate=] * [=attribute/invariant=] * [=attribute/size=] -Attributes [=attribute/builtin=], [=attribute/location=], [=attribute/interpolate=], and [=attribute/invariant=] -are [=IO attributes=]. +Attributes [=attribute/builtin=], [=attribute/location=], [=attribute/blend_src=], +[=attribute/interpolate=], and [=attribute/invariant=] are [=IO attributes=]. An [=IO attribute=] on a member of a structure *S* has effect only when *S* is used as the type of a [=formal parameter=] or [=return type=] of an [=entry point=]. See [[#stage-inputs-outputs]]. @@ -8020,6 +8026,7 @@ WGSL defines the following attributes that can be applied to function parameters and return types: * [=attribute/builtin=] * [=attribute/location=] + * [=attribute/blend_src=] * [=attribute/interpolate=] * [=attribute/invariant=] @@ -8411,6 +8418,33 @@ path: syntax/binding_attr.syntax.bs.include +## `blend_src` ## {#blend-src-attr} + +
+path: syntax/blend_src_attr.syntax.bs.include
+
+ + + + + +
`blend_src` Attribute
Description + + Specifies a part of the [=fragment=] output when the feature [=extension/dual_source_blending=] + is enabled. + See [[#input-output-locations]]. + + [=shader-creation error|Must=] only be applied to a member of a [=structure=] type with a + [=attribute/location=] attribute. + [=shader-creation error|Must=] only be applied to declarations of objects with [=numeric scalar=] + or [=numeric vector=] type. + [=shader-creation error|Must=] only be used as an output of the [=fragment=] shader stage. + +
Parameters + [=shader-creation error|Must=] be a [=const-expression=] that [=resolves=] to an [=i32=] or [=u32=] with value of `0` or `1`. + +
+ ## `builtin` ## {#builtin-attr}
@@ -8922,6 +8956,7 @@ or to further describe the properties of an input or output.
 The IO attributes are:
 * [=attribute/builtin=]
 * [=attribute/location=]
+* [=attribute/blend_src=]
 * [=attribute/interpolate=]
 * [=attribute/invariant=]
 
@@ -9429,11 +9464,21 @@ Each user-defined [=user-defined input datum|input=] and [=user-defined output d
 Each structure member in the entry point IO [=shader-creation error|must=] be one of either a built-in value
 (see [[#builtin-inputs-outputs]]), or assigned a location.
 
-Locations [=shader-creation error|must not=] overlap within each of the following sets:
-* Members within a structure type.
-    This applies to any structure, not just those used in shader stage inputs or outputs.
-* An entry point's shader stage inputs,
-    i.e. locations for its formal parameters, or for the members of its formal parameters of structure type.
+
+
+ For each entry point defined in a WGSL module, let |inputs| be its set of shader stage inputs + (i.e. locations for its formal parameters, or for the members of its formal parameters of structure type). + - |inputs| [=shader-creation error|must not=] contain two entries with the same [=attribute/location=] value. +
+
+ For each structure type |S| defined in a WGSL module (not just those used in shader stage inputs or outputs), + let |members| be the set of members of |S| that have [=attribute/location=] attributes. + - If any entry in |members| specifies a [=attribute/blend_src=] attribute: + - |members| [=shader-creation error|must=] contain exactly `2` entries, + one with `@location(0) @blend_src(0)` and one with `@location(0) @blend_src(1)`. + - All the |members| [=shader-creation error|must=] have same data type. + - Otherwise, |members| [=shader-creation error|must not=] contain two entries with the same [=attribute/location=] value. +
Note: Location numbering is distinct between inputs and outputs: Location numbers for an entry point's shader stage inputs do not conflict with location numbers for the entry point's shader stage outputs. diff --git a/wgsl/syntax.bnf b/wgsl/syntax.bnf index cfdf33f7e4..42c64a9c15 100644 --- a/wgsl/syntax.bnf +++ b/wgsl/syntax.bnf @@ -109,6 +109,10 @@ binding_attr : '@' 'binding' '(' expression ',' ? ')' ; +blend_src_attr : + '@' 'blend_src' '(' expression ',' ? ')' +; + builtin_attr : '@' 'builtin' '(' builtin_value_name ',' ? ')' ; @@ -184,6 +188,7 @@ attribute : '@' ident_pattern_token argument_expression_list ? | align_attr | binding_attr +| blend_src_attr | builtin_attr | const_attr | diagnostic_attr diff --git a/wgsl/syntax/attribute.syntax.bs.include b/wgsl/syntax/attribute.syntax.bs.include index f920f354b8..044e1b1a4d 100644 --- a/wgsl/syntax/attribute.syntax.bs.include +++ b/wgsl/syntax/attribute.syntax.bs.include @@ -7,6 +7,8 @@ | [=syntax/binding_attr=] + | [=syntax/blend_src_attr=] + | [=syntax/builtin_attr=] | [=syntax/const_attr=] diff --git a/wgsl/syntax/blend_src_attr.syntax.bs.include b/wgsl/syntax/blend_src_attr.syntax.bs.include new file mode 100644 index 0000000000..1a60f36175 --- /dev/null +++ b/wgsl/syntax/blend_src_attr.syntax.bs.include @@ -0,0 +1,5 @@ +
+ blend_src_attr : + + `'@'` `'blend_src'` `'('` [=syntax/expression=] `','` ? `')'` +
diff --git a/wgsl/wgsl.recursive.bs.include b/wgsl/wgsl.recursive.bs.include index 5027ed9b23..d8ffdda1ac 100644 --- a/wgsl/wgsl.recursive.bs.include +++ b/wgsl/wgsl.recursive.bs.include @@ -45,6 +45,8 @@ | `'@'` `'binding'` `'('` [=recursive descent syntax/expression=] `','` ? `')'` + | `'@'` `'blend_src'` `'('` [=recursive descent syntax/expression=] `','` ? `')'` + | `'@'` `'builtin'` `'('` [=syntax/ident_pattern_token=] `','` ? `')'` | `'@'` `'diagnostic'` [=recursive descent syntax/diagnostic_control=] From bd5cc72c3dc08152a754157c348580cba96b6f3c Mon Sep 17 00:00:00 2001 From: Jiawei Shao Date: Fri, 21 Jun 2024 00:19:51 +0800 Subject: [PATCH 102/285] Add missing `dual-source-blending` in `GPUFeatureName` (#4710) --- spec/index.bs | 1 + 1 file changed, 1 insertion(+) diff --git a/spec/index.bs b/spec/index.bs index 4655577668..b6c4725dab 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -2640,6 +2640,7 @@ enum GPUFeatureName { "bgra8unorm-storage", "float32-filterable", "clip-distances", + "dual-source-blending", }; From 214629e3a3907df2cf0fa174970aef859491c4ed Mon Sep 17 00:00:00 2001 From: jzm-intel Date: Fri, 21 Jun 2024 12:55:29 +0800 Subject: [PATCH 103/285] [Editorial] Fix a missing parenthesis (#4712) This CL add a missing parenthesis in note text. --- spec/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/index.bs b/spec/index.bs index b6c4725dab..c86dc714ee 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -1256,7 +1256,7 @@ of objects on the device or potentially corrupt internal implementation/driver s the device **should** be lost to prevent these changes from being observable. Note: -For all device losses not initiated by the application (via {{GPUDevice/destroy()}}, +For all device losses not initiated by the application (via {{GPUDevice/destroy()}}), user agents should consider issuing developer-visible warnings *unconditionally*, even if the {{GPUDevice/lost}} promise is handled. These scenarios should be rare, and the signal is vital to developers because most of the WebGPU From daaf4ed76b8fb780c4f609ddc25fe271dfcb4702 Mon Sep 17 00:00:00 2001 From: David Neto Date: Mon, 24 Jun 2024 19:54:32 -0400 Subject: [PATCH 104/285] Fix \ escaping in Python strings (#4716) --- wgsl/tools/TSPath.py | 6 +++--- wgsl/tools/extract-grammar.py | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/wgsl/tools/TSPath.py b/wgsl/tools/TSPath.py index 750ee8d5eb..d460bb126a 100644 --- a/wgsl/tools/TSPath.py +++ b/wgsl/tools/TSPath.py @@ -324,17 +324,17 @@ def parse(self,path): # Indexed # These are only valid inside parens. - m = re.fullmatch('(\d+)(.*)',path) + m = re.fullmatch('(\\d+)(.*)',path) if m: return IndexedNode(int(m.group(1)),self.parse(m.group(2))) # IndexedChild - m = re.fullmatch('\[(\d+)\](.*)',path) + m = re.fullmatch('\\[(\\d+)\\](.*)',path) if m: return IndexedChildNode(int(m.group(1)),self.parse(m.group(2))) # Named - m = re.fullmatch('(\w+)(.*)',path) + m = re.fullmatch('(\\w+)(.*)',path) if m: return NamedNode(m.group(1),self.parse(m.group(2))) diff --git a/wgsl/tools/extract-grammar.py b/wgsl/tools/extract-grammar.py index ac54bc7819..ca2fbaff19 100755 --- a/wgsl/tools/extract-grammar.py +++ b/wgsl/tools/extract-grammar.py @@ -96,7 +96,7 @@ def read_lines_from_file(filename, exclusions): parts = [j for i in [i.split("\n") for i in file.readlines()] for j in i if len(j) > 0] result = [] - include_re = re.compile('(?!.*\.syntax\.bs\.include)path:\s+(\S+)') + include_re = re.compile('(?!.*\\.syntax\\.bs\\.include)path:\\s+(\\S+)') for line in parts: m = include_re.match(line) if m: From b1cc2ac7dc0aee36a76ad8d9cd0fc35f0979ef4d Mon Sep 17 00:00:00 2001 From: Teodor Tanasoaia <28601907+teoxoy@users.noreply.github.com> Date: Tue, 25 Jun 2024 20:47:21 +0200 Subject: [PATCH 105/285] Guarantee that OOB writes to storage textures will be ignored (#4194) See https://github.com/gpuweb/gpuweb/issues/3893#issuecomment-1453847182 Co-authored-by: Mehmet Oguz Derin --- wgsl/index.bs | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 9d7666b64a..663a3a047f 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -17390,9 +17390,7 @@ The [=logical texel address=] is invalid if: for the corresponding element, or * `array_index` is outside the range of `[0, textureNumLayers(t))` -If the logical texel addresss is invalid, the built-in function may do any of the following: -* not be executed -* store `value` to some in bounds texel +If the logical texel addresss is invalid, the built-in function [=behavioral requirement|will=] not be executed. ## Atomic Built-in Functions ## {#atomic-builtin-functions} From 29e048a5321a5c3bebc6026d34886b4b547b146a Mon Sep 17 00:00:00 2001 From: Teodor Tanasoaia <28601907+teoxoy@users.noreply.github.com> Date: Tue, 25 Jun 2024 20:48:43 +0200 Subject: [PATCH 106/285] [subgroups proposal] correct that the `id` must be a const-expression, not `e` (#4715) --- proposals/subgroups.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/subgroups.md b/proposals/subgroups.md index 70f8782028..31aed58238 100644 --- a/proposals/subgroups.md +++ b/proposals/subgroups.md @@ -106,7 +106,7 @@ Using f16 as a parameter in any of these functions requires `subgroups-f16` to b | `fn subgroupXor(e : T) -> T` | `T` must be u32, i32, or a vector of those types | Reduction
Performs a bitwise xor of `e` among all active invocations and returns that result | | `fn subgroupMin(e : T) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Reduction
Performs a min of `e` among all active invocations and returns that result | | `fn subgroupMax(e : T) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Reduction
Performs a max of `e` among all active invocations and returns that result | -| `fn quadBroadcast(e : T, id : I)` | `T` must be u32, i32, f32, f16 or a vector of those types
`I` must be u32 or i32 | Broadcasts `e` from the quad invocation with id equal to `id`
`e` must be a constant-expression2 | +| `fn quadBroadcast(e : T, id : I)` | `T` must be u32, i32, f32, f16 or a vector of those types
`I` must be u32 or i32 | Broadcasts `e` from the quad invocation with id equal to `id`
`id` must be a constant-expression2 | | `fn quadSwapX(e : T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Swaps `e` between invocations in the quad in the X direction | | `fn quadSwapY(e : T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Swaps `e` between invocations in the quad in the Y direction | | `fn quadSwapDiagonal(e : T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Swaps `e` between invocations in the quad diagnoally | From 384140fca8b43918d18d8bf3048f4563a0d5d882 Mon Sep 17 00:00:00 2001 From: James Price Date: Tue, 25 Jun 2024 14:58:02 -0400 Subject: [PATCH 107/285] Rename `subgroups-f16` to `subgroups_f16` for WGSL (#4718) The `ident_pattern_token` grammar rule that we use for extension names does not allow hyphens. --- proposals/subgroups.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/proposals/subgroups.md b/proposals/subgroups.md index 31aed58238..692cd2c5f8 100644 --- a/proposals/subgroups.md +++ b/proposals/subgroups.md @@ -42,14 +42,14 @@ Add two new enable extensions. | Enable | Description | | --- | --- | | **subgroups** | Adds built-in values and functions for subgroups | -| **subgroups-f16** | Allows f16 to be used in subgroups operations | +| **subgroups_f16** | Allows f16 to be used in subgroups operations | -Note: Metal can always provide subgroups-f16, Vulkan requires +Note: Metal can always provide subgroups_f16, Vulkan requires VK_KHR_shader_subgroup_extended_types ([~61%](https://vulkan.gpuinfo.org/listdevicescoverage.php?extension=VK_KHR_shader_subgroup_extended_types&platform=all) of devices), and D3D12 requires SM6.2. -**TODO**: Can we drop **subgroups-f16**? +**TODO**: Can we drop **subgroups_f16**? According to this [analysis](https://github.com/teoxoy/gpuinfo-vulkan-query/blob/8681e0074ece1b251177865203d18b018e05d67a/subgroups.txt#L1071-L1466) Only 4% of devices that support both f16 and subgroups could not support subgroup extended types. @@ -83,7 +83,7 @@ Note: HLSL does not expose a subgroup_id or num_subgroups equivalent. ## Built-in Functions All built-in function can only be used in `compute` or `fragment` shader stages. -Using f16 as a parameter in any of these functions requires `subgroups-f16` to be enabled. +Using f16 as a parameter in any of these functions requires `subgroups_f16` to be enabled. | Function | Preconditions | Description | | --- | --- | --- | From 3cb784d18449dce892ffb08f970d4781e602e3fc Mon Sep 17 00:00:00 2001 From: Mehmet Oguz Derin Date: Wed, 26 Jun 2024 23:43:52 +0900 Subject: [PATCH 108/285] Extensions editorial: linking and syncing (#4720) --- wgsl/index.bs | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 663a3a047f..08cd915d8b 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -1406,6 +1406,8 @@ path: syntax/enable_extension_name.syntax.bs.include The [=enable-extension=] names are: * `'f16'` +* `'clip_distances'` +* `'dual_source_blending'` The valid [=language extension=] names are listed in [[#language-extensions-sec]] but in general have the same form as an [=identifier=]: @@ -1752,15 +1754,15 @@ The valid [=enable-extensions=] are listed in the following table. Description `f16` - `"shader-f16"` + [[WebGPU#shader-f16|"shader-f16"]] The [=f16=] type is valid to use in the WGSL module. Otherwise, using [=f16=] (directly or indirectly) will result in a [=shader-creation error=]. `clip_distances` - `"clip_distances"` + [[WebGPU#dom-gpufeaturename-clip-distances|"clip-distances"]] The built-in variable [=built-in values/clip_distances=] is valid to use in the WGSL module. Otherwise, using [=built-in values/clip_distances=] will result in a [=shader-creation error=]. `dual_source_blending` - `"dual_source_blending"` + [[WebGPU#dom-gpufeaturename-dual-source-blending|"dual-source-blending"]] The attribute [=attribute/blend_src=] is valid to use in the WGSL module. Otherwise, using [=attribute/blend_src=] will result in a [=shader-creation error=]. From c0844c0bd062003b1167dcdcce4b4e8e12b78148 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Wed, 26 Jun 2024 09:19:19 -0700 Subject: [PATCH 109/285] Add compute algorithm to detailed operations (#4719) --- spec/index.bs | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/spec/index.bs b/spec/index.bs index c86dc714ee..d3696f8ff4 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -106,12 +106,18 @@ spec: WGSL; urlPrefix: https://gpuweb.github.io/gpuweb/wgsl/# text: @group; url: attribute-group text: line break; url: line-break text: 64-bit unsigned integer; url: 64-bit-integer + text: Synchronization Built-in Functions; url: sync-builtin-functions for: address spaces text: workgroup; url: address-spaces-workgroup for: builtin text: sample_mask; url: built-in-values-sample_mask text: frag_depth; url: built-in-values-frag_depth text: clip_distances; url: built-in-values-clip_distances + text: workgroup_id; url: built-in-values-workgroup_id + text: local_invocation_id; url: built-in-values-local_invocation_id + text: global_invocation_id; url: built-in-values-global_invocation_id + text: num_workgroups; url: built-in-values-num_workgroups + text: local_invocation_index; url: built-in-values-local_invocation_index for: extension text: f16; url: extension-f16 text: clip_distances; url: extension-clip_distances @@ -14541,7 +14547,63 @@ These operations are encoded within {{GPUComputePassEncoder}} as: - {{GPUComputePassEncoder/dispatchWorkgroups()}} - {{GPUComputePassEncoder/dispatchWorkgroupsIndirect()}} -

Editorial note: describe the computing algorithm +The main compute algorithm: + +

+ compute(descriptor, drawCall, state) + + **Arguments:** + + - |descriptor|: Description of the current {{GPUComputePipeline}}. + - |dispatchCall|: The dispatch call parameters. May come from function arguments or an {{GPUBufferUsage/INDIRECT}} buffer. + + 1. Let |computeInvocations| be an [=list/empty=] [=list=]. + 1. Let |computeStage| be |descriptor|.{{GPUComputePipelineDescriptor/compute}}. + 1. Let |workgroupSize| be the `@workgroup_size` declared for |computeStage|.{{GPUProgrammableStage/entryPoint}} + of |computeStage|.{{GPUProgrammableStage/module}} + 1. For |workgroupX| in range [0, |dispatchCall|.`workgroupCountX`]: + 1. For |workgroupY| in range [0, |dispatchCall|.`workgroupCountY`]: + 1. For |workgroupZ| in range [0, |dispatchCall|.`workgroupCountZ`]: + 1. For |localX| in range [0, |workgroupSize|.`x`]: + 1. For |localY| in range [0, |workgroupSize|.`y`]: + 1. For |localZ| in range [0, |workgroupSize|.`y`]: + 1. Let |invocation| be { |computeStage|, |workgroupX|, |workgroupY|, |workgroupZ|, |localX|, |localY|, |localZ| } + 1. [=list/Append=] |invocation| to |computeInvocations|. + + 1. For every |invocation| in |computeInvocations|, in any order the [=device=] chooses, including in parallel: + 1. Set the shader [=builtins=]: + - Set the [=builtin/num_workgroups=] builtin, if any, to (
+ |dispatchCall|.`workgroupCountX`,
+ |dispatchCall|.`workgroupCountY`,
+ |dispatchCall|.`workgroupCountZ`
+ )
+ - Set the [=builtin/workgroup_id=] builtin, if any, to (
+ |invocation|.|workgroupX|,
+ |invocation|.|workgroupY|,
+ |invocation|.|workgroupZ|
+ )
+ - Set the [=builtin/local_invocation_id=] builtin, if any, to (
+ |invocation|.|localX|,
+ |invocation|.|localY|,
+ |invocation|.|localZ|
+ )
+ - Set the [=builtin/global_invocation_id=] builtin, if any, to (
+ |invocation|.|workgroupX| * |workgroupSize|.`x` + |invocation|.|localX|,
+ |invocation|.|workgroupY| * |workgroupSize|.`y` + |invocation|.|localY|,
+ |invocation|.|workgroupZ| * |workgroupSize|.`z` + |invocation|.|localZ|
+ )
. + - Set the [=builtin/local_invocation_index=] builtin, if any, to + |invocation|.|localX| + (|invocation|.|localY| * |workgroupSize|.`x`) + + (|invocation|.|localZ| * |workgroupSize|.`x` * |workgroupSize|.`y`) + + 1. Invoke the compute shader entry point described by |invocation|.|computeStage|. + + Note: Shader invocations have no guaranteed order, and will generally run in parallel according to device + capabilities. Developers should not assume that any given invocation or workgroup will complete before any + other one is started. Some devices may appear to execute in a consistent order, but this behavior should not + be relied on as it will not perform identically across all devices. Shaders that require synchronization + across invocations must use [=Synchronization Built-in Functions=] to coordinate execution. +
The [=device=] may become [=lose the device|lost=] if [=shader execution end|shader execution does not end=] From 0a0e553c04f74397f7154b5ae381b4105d1aee86 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Wed, 26 Jun 2024 10:46:44 -0700 Subject: [PATCH 110/285] Remove or resolve some editorial notes (#4717) * Remove unnecessary editoral notes * Resolve note about specifying indirect commands better --- spec/index.bs | 37 ++++++++++++------------------------- 1 file changed, 12 insertions(+), 25 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index d3696f8ff4..09806c2a41 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -107,6 +107,10 @@ spec: WGSL; urlPrefix: https://gpuweb.github.io/gpuweb/wgsl/# text: line break; url: line-break text: 64-bit unsigned integer; url: 64-bit-integer text: Synchronization Built-in Functions; url: sync-builtin-functions + for: interpolation type + text: flat; url: interpolation-type-flat + text: linear; url: interpolation-type-linear + text: perspective; url: interpolation-type-perspective for: address spaces text: workgroup; url: address-spaces-workgroup for: builtin @@ -14532,10 +14536,6 @@ This section describes the details of various GPU operations. Issue: This section is incomplete. -## Transfer ## {#transfer-operations} - -

Editorial note: describe the transfers at the high level - ## Computing ## {#computing-operations} Computing operations provide direct access to GPU's programmable hardware. @@ -14631,7 +14631,7 @@ The main rendering algorithm: **Arguments:** - |descriptor|: Description of the current {{GPURenderPipeline}}. - - |drawCall|: The draw call parameters. + - |drawCall|: The draw call parameters. May come from function arguments or an {{GPUBufferUsage/INDIRECT}} buffer. - |state|: [=RenderState=] of the {{GPURenderCommandsMixin}} where the draw call is issued. 1. **Resolve indices**. See [[#index-resolution]]. @@ -14679,7 +14679,7 @@ a list of vertices to process for each instance. **Arguments:** - - |drawCall|: The draw call parameters. + - |drawCall|: The draw call parameters. May come from function arguments or an {{GPUBufferUsage/INDIRECT}} buffer. - |state|: The snapshot of the {{GPURenderCommandsMixin}} state at the time of the draw call. **Returns:** list of integer indices. @@ -14701,11 +14701,9 @@ a list of vertices to process for each instance. 1. Set each |vertexIndexList| item |i| to the value |drawCall|.firstVertex + |i|. 1. Return |vertexIndexList|. - Note: in case of indirect draw calls, the `indexCount`, `vertexCount`, + Note: in the case of indirect draw calls, the `indexCount`, `vertexCount`, and other properties of |drawCall| are read from the indirect buffer instead of the draw command itself. - -

Editorial note: specify indirect commands better.

@@ -14747,7 +14745,7 @@ clip space positions for [[#primitive-clipping]], as well as other data for the **Arguments:** - |vertexIndexList|: List of vertex indices to process (mutable, passed by reference). - - |drawCall|: The draw call parameters. + - |drawCall|: The draw call parameters. May come from function arguments or an {{GPUBufferUsage/INDIRECT}} buffer. - |desc|: The descriptor of type {{GPUVertexState}}. - |state|: The snapshot of the {{GPURenderCommandsMixin}} state at the time of the draw call. @@ -14842,7 +14840,7 @@ Primitives are assembled by a fixed-function stage of GPUs. **Arguments:** - |vertexIndexList|: List of vertex indices to process. - - |drawCall|: The draw call parameters. + - |drawCall|: The draw call parameters. May come from function arguments or an {{GPUBufferUsage/INDIRECT}} buffer. - |desc|: The descriptor of type {{GPUPrimitiveState}}. For each instance, the primitives get assembled from the vertices that have been @@ -14887,8 +14885,6 @@ Primitives are assembled by a fixed-function stage of GPUs. Each subsequent primitive takes 1 vertices. -

Editorial note: should this be defined more formally? - Any incomplete primitives are dropped.

@@ -14938,29 +14934,27 @@ For each vertex output value "v" with a corresponding fragment input, |a|.v and |b|.v would be the outputs for |a| and |b| vertices respectively. The clipped shader output |c|.v is produced based on the interpolation qualifier:
- : "flat" + : [=interpolation type/flat=] :: Flat interpolation is unaffected, and is based on provoking vertex, which is the first vertex in the primitive. The output value is the same for the whole primitive, and matches the vertex output of the [=provoking vertex=]: |c|.v = [=provoking vertex=].v - : "linear" + : [=interpolation type/linear=] :: The interpolation ratio gets adjusted against the perspective coordinates of the [=clip position=]s, so that the result of interpolation is linear in screen space.

Editorial note: provide more specifics here, if possible - : "perspective" + : [=interpolation type/perspective=] :: The value is linearly interpolated in clip space, producing perspective-correct values: |c|.v = |t| × |a|.v + (1 − |t|) × |b|.v

-

Editorial note: link to interpolation qualifiers in WGSL - The result of primitive clipping is a new set of primitives, which are contained within the [=clip volume=]. @@ -15028,8 +15022,6 @@ Rasterization produces a list of RasterizationPoints, each contai :: refers to [[#barycentric-coordinates]] -

Editorial note: define the depth computation algorithm -

rasterize(primitiveList, state) @@ -15204,10 +15196,7 @@ Otherwise, the polygon is back-facing. 1. Let |rp| be a new [=RasterizationPoint=] object 1. Compute the list |b| as [[#barycentric-coordinates]] of that fragment. Set |rp|.[=RasterizationPoint/barycentricCoordinates=] to |b|. - 1. Let |d||i| be the depth value of |v||i|. - -

Editorial note: define how this value is constructed. 1. Set |rp|.[=RasterizationPoint/depth=] to ∑ (|b||i| × |d||i|) 1. Append |rp| to |rasterizationPoints|. @@ -15265,8 +15254,6 @@ This stage produces a Fragment for each [=RasterizationPoint=]: 1. Let |value| be the interpolated fragment input, based on |rp|.[=RasterizationPoint/barycentricCoordinates=], |rp|.[=RasterizationPoint/primitiveVertices=], and the [=interpolation=] qualifier on the input. - -

Editorial note: describe the exact equations. 1. Set the corresponding fragment shader [=location=] input to |value|. 1. Invoke the fragment shader entry point described by |desc|. From 1f4aec4eaef33742b8699ad78820bddd93b8a4d7 Mon Sep 17 00:00:00 2001 From: Christopher Cameron <32557109+ccameron-chromium@users.noreply.github.com> Date: Wed, 26 Jun 2024 23:27:03 +0200 Subject: [PATCH 111/285] Add extended range (#4500) --- spec/index.bs | 90 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 74 insertions(+), 16 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 09806c2a41..4c226d1620 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -13869,12 +13869,22 @@ enum GPUCanvasAlphaMode { "premultiplied", }; +enum GPUCanvasToneMappingMode { + "standard", + "extended", +}; + +dictionary GPUCanvasToneMapping { + GPUCanvasToneMappingMode mode = "standard"; +}; + dictionary GPUCanvasConfiguration { required GPUDevice device; required GPUTextureFormat format; GPUTextureUsageFlags usage = 0x10; // GPUTextureUsage.RENDER_ATTACHMENT sequence viewFormats = []; PredefinedColorSpace colorSpace = "srgb"; + GPUCanvasToneMapping toneMapping = {}; GPUCanvasAlphaMode alphaMode = "opaque"; }; @@ -13910,6 +13920,11 @@ dictionary GPUCanvasConfiguration { The color space that values written into textures returned by {{GPUCanvasContext/getCurrentTexture()}} should be displayed with. + : toneMapping + :: + The tone mapping determines how the content of textures returned by + {{GPUCanvasContext/getCurrentTexture()}} are to be displayed. + : alphaMode :: Determines the effect that alpha values will have on the content of textures returned by @@ -13951,23 +13966,10 @@ dictionary GPUCanvasConfiguration { ### Canvas Color Space ### {#canvas-color-space} During presentation, the color values in the canvas are converted to the color -space of the screen. Color values are then clamped to the `[0, 1]` interval in -the color space of the screen. - -

- For example, suppose that the value `(1.035, -0.175, -0.140)` is written to an - `'srgb'` canvas. +space of the screen. - If this is presented to an sRGB screen, then this will be converted to sRGB - (which is a no-op, because the canvas is sRGB), and then will be clamped to - the sRGB value `(1.0, 0.0, 0.0)`. - - If this is presented to a Display P3 screen, then this will be converted to - the value `(0.948, 0.106, 0.01)` in the Display P3 color space, and no - clamping will be needed. -
- - +The {{GPUCanvasConfiguration/toneMapping}} determines the handling of values +outside of the `[0, 1]` interval in the color space of the screen. ### Canvas Context sizing ### {#context-sizing} @@ -13998,6 +14000,62 @@ In WebGPU, it does so by [$Replace the drawing buffer|replacing the drawing buff if their value is not changed.
+

`GPUCanvasToneMappingMode` +

+ +This enum specifies how color values are displayed to the screen. + +
+ : "standard" + :: + Color values within the standard dynamic range of the screen are unchanged, and + all other color values are projected to the standard dynamic range of the screen. + + Note: + This projection is often accomplished by clamping color values in the color space + of the screen to the `[0, 1]` interval. + +
+ For example, suppose that the value `(1.035, -0.175, -0.140)` is written to an + `'srgb'` canvas. + + If this is presented to an sRGB screen, then this will be converted to sRGB + (which is a no-op, because the canvas is sRGB), then projected into the display's space. + Using component-wise clamping, this results in the sRGB value `(1.0, 0.0, 0.0)`. + + If this is presented to a Display P3 screen, then this will be converted to + the value `(0.948, 0.106, 0.01)` in the Display P3 color space, and no + clamping will be needed. +
+ + : "extended" + :: + Color values in the extended dynamic range of the screen are unchanged, and all + other color values are projected to the extended dynamic range of the screen. + + Note: + This projection is often accomplished by clamping color values in the color space of + the screen to the interval of values that the screen is capable of displaying, + which may include values greater than `1`. + +
+ For example, suppose that the value `(2.5, -0.15, -0.15)` is written to an + `'srgb'` canvas. + + If this is presented to an sRGB screen that is capable of displaying values + in the `[0, 4]` interval in sRGB space, then this will be converted to sRGB + (which is a no-op, because the canvas is sRGB), then projected into the display's space. + If using component-wise clamping, this results in the sRGB value `(2.5, 0.0, 0.0)`. + + If this is presented to a Display P3 screen that is capable of displaying + values in the `[0, 2]` interval in Display P3 space, then this will be + converted to the value `(2.3, 0.545, 0.386)` in the Display P3 color space, + then projected into the display's space. + If using component-wise clamping, this results in the Display P3 value `(2.0, 0.545, 0.386)`. +
+ +
+

`GPUCanvasAlphaMode`

From 22dc18a85e1b391da4c4e57b45c16ee8aa03664f Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Wed, 26 Jun 2024 15:07:15 -0700 Subject: [PATCH 112/285] Clarify workgroupSize calculation (#4722) --- spec/index.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 4c226d1620..27ba28a6e9 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -14617,8 +14617,8 @@ The main compute algorithm: 1. Let |computeInvocations| be an [=list/empty=] [=list=]. 1. Let |computeStage| be |descriptor|.{{GPUComputePipelineDescriptor/compute}}. - 1. Let |workgroupSize| be the `@workgroup_size` declared for |computeStage|.{{GPUProgrammableStage/entryPoint}} - of |computeStage|.{{GPUProgrammableStage/module}} + 1. Let |workgroupSize| be the computed workgroup size for |computeStage|.{{GPUProgrammableStage/entryPoint}} after + applying |computeStage|.{{GPUProgrammableStage/constants}} to |computeStage|.{{GPUProgrammableStage/module}}. 1. For |workgroupX| in range [0, |dispatchCall|.`workgroupCountX`]: 1. For |workgroupY| in range [0, |dispatchCall|.`workgroupCountY`]: 1. For |workgroupZ| in range [0, |dispatchCall|.`workgroupCountZ`]: From 922484631a6756deb17faeb617802e2821b8cbc8 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Thu, 27 Jun 2024 15:01:19 -0700 Subject: [PATCH 113/285] Add algorithm for linear interpolation (#4725) * Add algorithm for linear interpolation * Address some feedback --- spec/index.bs | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 27ba28a6e9..eeb550c1cc 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -107,6 +107,7 @@ spec: WGSL; urlPrefix: https://gpuweb.github.io/gpuweb/wgsl/# text: line break; url: line-break text: 64-bit unsigned integer; url: 64-bit-integer text: Synchronization Built-in Functions; url: sync-builtin-functions + text: interpolation-sampling; url: interpolation-sampling for: interpolation type text: flat; url: interpolation-type-flat text: linear; url: interpolation-type-linear @@ -14994,23 +14995,19 @@ The clipped shader output |c|.v is produced based on the interpolation qualifier
: [=interpolation type/flat=] :: - Flat interpolation is unaffected, and is based on provoking vertex, - which is the first vertex in the primitive. The output value is the same - for the whole primitive, and matches the vertex output of the [=provoking vertex=]: - |c|.v = [=provoking vertex=].v + Flat interpolation is unaffected, and is based on the provoking vertex, + which is determined by the [=interpolation sampling=] mode declared in the shader. The + output value is the same for the whole primitive, and matches the vertex output of the + [=provoking vertex=]. : [=interpolation type/linear=] :: The interpolation ratio gets adjusted against the perspective coordinates of the [=clip position=]s, so that the result of interpolation is linear in screen space. -

Editorial note: provide more specifics here, if possible - : [=interpolation type/perspective=] :: - The value is linearly interpolated in clip space, producing perspective-correct values: - - |c|.v = |t| × |a|.v + (1 − |t|) × |b|.v + The value is linearly interpolated in clip space, producing perspective-correct values.

The result of primitive clipping is a new set of primitives, which are contained From 074f5e83fa594ae3aba919de25ed6d37274b8dac Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Mon, 1 Jul 2024 12:53:45 -0700 Subject: [PATCH 114/285] Define level of detail (#4694) * Define level of detail * Update spec/index.bs Co-authored-by: Kai Ninomiya * Use biblio for Vulkan reference --------- Co-authored-by: Kai Ninomiya --- spec/index.bs | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index eeb550c1cc..63ed63e160 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -108,6 +108,7 @@ spec: WGSL; urlPrefix: https://gpuweb.github.io/gpuweb/wgsl/# text: 64-bit unsigned integer; url: 64-bit-integer text: Synchronization Built-in Functions; url: sync-builtin-functions text: interpolation-sampling; url: interpolation-sampling + text: textureSampleLevel; url: texturesamplelevel for: interpolation type text: flat; url: interpolation-type-flat text: linear; url: interpolation-type-linear @@ -137,6 +138,22 @@ spec: Strings on the Web; urlPrefix: https://w3c.github.io/string-meta/# text: best practices for language and direction information; url: bp_and-reco +
+{
+  "vulkan": {
+    "authors": [
+      "The Khronos Vulkan Working Group"
+    ],
+    "href": "https://registry.khronos.org/vulkan/specs/1.3/html/vkspec.html",
+    "title": "Vulkan 1.3",
+    "publisher": "Khronos",
+    "deliveredBy": [
+      "https://www.khronos.org/"
+    ]
+  }
+}
+
+ @@ -3656,9 +3673,9 @@ Each subresource in a mipmap level is approximately half the size in each spatial dimension, of the corresponding resource in the lesser level (see [=logical miplevel-specific texture extent=]). The subresource in level 0 has the dimensions of the texture itself. -These are typically used to represent levels of detail of a texture. -{{GPUSampler}} and WGSL provide facilities for selecting and interpolating between levels of -detail, explicitly or automatically. +Smaller levels are typically used to store lower resolution versions of the same image. +{{GPUSampler}} and WGSL provide facilities for selecting and interpolating between [=levels of +detail=], explicitly or automatically. A {{GPUTextureDimension/"2d"}} texture may be an array of array layers. Each subresource in a layer is the same size as the corresponding resources in other layers. @@ -5176,7 +5193,7 @@ dictionary GPUSamplerDescriptor : lodMinClamp : lodMaxClamp :: - Specifies the minimum and maximum levels of detail, respectively, used internally when + Specifies the minimum and maximum [=levels of detail=], respectively, used internally when sampling a texture. : compare @@ -5205,7 +5222,11 @@ dictionary GPUSamplerDescriptor
-Issue: explain how LOD is calculated and if there are differences here between platforms. +Level of detail (LOD) describes which mip level(s) are selected when sampling a texture. +It may be specified explicitly through shader methods like [=textureSampleLevel=] or implicitly +determined from the [=texture coordinate=] derivatives. See +[[vulkan#textures-lod-and-scale-factor|Scale Factor Operation, LOD Operation and Image Level Selection]] in the +[[vulkan inline]] spec for details on how implicit LODs are calculated. {{GPUAddressMode}} describes the behavior of the sampler if the sample footprint extends beyond the bounds of the sampled texture. From bbe58b845e1465ef6c9e552d18b40d25d401b9e5 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Mon, 1 Jul 2024 13:52:05 -0700 Subject: [PATCH 115/285] Move vulkan reference into a non-normative note --- spec/index.bs | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 63ed63e160..3330250e78 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -5222,11 +5222,12 @@ dictionary GPUSamplerDescriptor
-Level of detail (LOD) describes which mip level(s) are selected when sampling a texture. -It may be specified explicitly through shader methods like [=textureSampleLevel=] or implicitly -determined from the [=texture coordinate=] derivatives. See -[[vulkan#textures-lod-and-scale-factor|Scale Factor Operation, LOD Operation and Image Level Selection]] in the -[[vulkan inline]] spec for details on how implicit LODs are calculated. +Level of detail (LOD) describes which mip level(s) are selected when sampling a +texture. It may be specified explicitly through shader methods like [=textureSampleLevel=] or implicitly determined from +the [=texture coordinate=] derivatives. + +Note: See [[vulkan#textures-lod-and-scale-factor|Scale Factor Operation, LOD Operation and Image Level Selection]] in +the [[vulkan inline]] spec for an example of how implicit LODs may be calculated. {{GPUAddressMode}} describes the behavior of the sampler if the sample footprint extends beyond the bounds of the sampled texture. From 40a69c8583a17c9c1e033664bc43e57ea07f5593 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Wed, 3 Jul 2024 10:58:23 -0700 Subject: [PATCH 116/285] Update viewport depth range validation (#4730) --- spec/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/index.bs b/spec/index.bs index 3330250e78..c2eda30899 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -12325,7 +12325,7 @@ attachments used by this encoder. - |y| + |height| ≤ |this|.{{GPURenderPassEncoder/[[attachment_size]]}}.height - 0.0 ≤ |minDepth| ≤ 1.0 - 0.0 ≤ |maxDepth| ≤ 1.0 - - |minDepth| < |maxDepth| + - |minDepth| ≤ |maxDepth|
1. [$Enqueue a render command$] on |this| which issues the subsequent steps on the From 1c712f75e99f38f9b033ee449d0eafad8c79ddc9 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Wed, 3 Jul 2024 11:00:24 -0700 Subject: [PATCH 117/285] Add description of sample footprint (#4683) * Add description of sampler footprint * Add non-normative Vulkan reference * Remove reference to sample footprint --- spec/index.bs | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index c2eda30899..0ddef2de70 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -5179,12 +5179,12 @@ dictionary GPUSamplerDescriptor : magFilter :: - Specifies the sampling behavior when the sample footprint is smaller than or equal to one + Specifies the sampling behavior when the sampled area is smaller than or equal to one texel. : minFilter :: - Specifies the sampling behavior when the sample footprint is larger than one texel. + Specifies the sampling behavior when the sampled area is larger than one texel. : mipmapFilter :: @@ -5229,10 +5229,8 @@ the [=texture coordinate=] derivatives. Note: See [[vulkan#textures-lod-and-scale-factor|Scale Factor Operation, LOD Operation and Image Level Selection]] in the [[vulkan inline]] spec for an example of how implicit LODs may be calculated. -{{GPUAddressMode}} describes the behavior of the sampler if the sample footprint extends beyond -the bounds of the sampled texture. - -Issue: Describe a "sample footprint" in greater detail. +{{GPUAddressMode}} describes the behavior of the sampler if the sampled texels extend beyond the +bounds of the sampled texture. -{{GPUAdapter}} has the following attributes: +{{GPUAdapter}} has the following [=immutable properties=] -
+
: features :: The set of values in `this`.{{GPUAdapter/[[adapter]]}}.{{adapter/[[features]]}}. @@ -2533,11 +2548,7 @@ interface GPUAdapter { : isFallbackAdapter :: Returns the value of {{GPUAdapter/[[adapter]]}}.{{adapter/[[fallback]]}}. -
- -{{GPUAdapter}} has the following internal slots: -
: \[[adapter]], of type [=adapter=], readonly :: The [=adapter=] to which this {{GPUAdapter}} refers. @@ -2816,9 +2827,9 @@ interface GPUDevice : EventTarget { GPUDevice includes GPUObjectBase; -{{GPUDevice}} has the following attributes: +{{GPUDevice}} has the following [=immutable properties=]: -
+
: features :: A set containing the {{GPUFeatureName}} values of the features @@ -2836,8 +2847,7 @@ GPUDevice includes GPUObjectBase; The {{GPUObjectBase/[[device]]}} for a {{GPUDevice}} is the [=device=] that the {{GPUDevice}} refers to. -{{GPUDevice}} has the methods listed in its WebIDL definition above. -Those not defined here are defined elsewhere in this document. +{{GPUDevice}} has the following methods:
: destroy() @@ -3376,6 +3386,8 @@ unmaps it, freeing any memory allocated for the mapping. Note: This allows the user agent to reclaim the GPU memory associated with the {{GPUBuffer}} once all previously submitted operations using it are complete. +{{GPUBuffer}} has the following methods: +
: destroy() :: @@ -3474,6 +3486,8 @@ The {{GPUMapMode}} flags determine how a {{GPUBuffer}} is mapped when calling initialized data (zeros) or data written by the webpage during a previous mapping.
+{{GPUBuffer}} has the following methods: +
: mapAsync(mode, offset, size) :: @@ -3830,9 +3844,9 @@ interface GPUTexture { GPUTexture includes GPUObjectBase; -{{GPUTexture}} has the following attributes: +{{GPUTexture}} has the following [=immutable properties=]: -
+
: width :: The width of this {{GPUTexture}}. @@ -3864,23 +3878,18 @@ GPUTexture includes GPUObjectBase; : usage :: The allowed usages for this {{GPUTexture}}. -
- -{{GPUTexture}} has the following internal slots: - -
- : \[[size]], of type {{GPUExtent3D}} - :: - The size of the texture (same as the {{GPUTexture/width}}, {{GPUTexture/height}}, and - {{GPUTexture/depthOrArrayLayers}} attributes). : \[[viewFormats]], of type [=sequence=]<{{GPUTextureFormat}}> :: The set of {{GPUTextureFormat}}s that can be used as the {{GPUTextureViewDescriptor}}.{{GPUTextureViewDescriptor/format}} when creating views on this {{GPUTexture}}. +
- : \[[destroyed]], of type `boolean`, initially false +{{GPUTexture}} has the following [=device timeline properties=]: + +
+ : \[[destroyed]], of type {{boolean}}, initially `false` :: If the texture is destroyed, it can no longer be used in any operation, and its underlying memory can be freed. @@ -4203,7 +4212,6 @@ The {{GPUTextureUsage}} flags determine how a {{GPUTexture}} may be used after i - [$validating GPUTextureDescriptor$](|this|, |descriptor|) returns `true`.
- 1. Set |t|.{{GPUTexture/[[size]]}} to |descriptor|.{{GPUTextureDescriptor/size}}. 1. Set |t|.{{GPUTexture/[[viewFormats]]}} to |descriptor|.{{GPUTextureDescriptor/viewFormats}}.
@@ -4306,6 +4314,8 @@ garbage collection by calling {{GPUTexture/destroy()}}. Note: This allows the user agent to reclaim the GPU memory associated with the {{GPUTexture}} once all previously submitted operations using it are complete. +{{GPUTexture}} has the following methods: +
: destroy() :: @@ -4319,6 +4329,11 @@ all previously submitted operations using it are complete. [=Content timeline=] steps: + 1. Issue the subsequent steps on the [=device timeline=]. +
+
+ [=Device timeline=] steps: + 1. Set |this|.{{GPUTexture/[[destroyed]]}} to true.
@@ -4338,20 +4353,20 @@ interface GPUTextureView { GPUTextureView includes GPUObjectBase; -{{GPUTextureView}} has the following internal slots: +{{GPUTextureView}} has the following [=immutable properties=]: -
- : \[[texture]] +
+ : \[[texture]], readonly :: The {{GPUTexture}} into which this is a view. - : \[[descriptor]] + : \[[descriptor]], readonly :: The {{GPUTextureViewDescriptor}} describing this texture view. All optional fields of {{GPUTextureViewDescriptor}} are defined. - : \[[renderExtent]] + : \[[renderExtent]], readonly :: For renderable views, this is the effective {{GPUExtent3DDict}} for rendering. @@ -4666,7 +4681,7 @@ enum GPUTextureAspect { 1. Set |view|.{{GPUTextureView/[[texture]]}} to |this|. 1. Set |view|.{{GPUTextureView/[[descriptor]]}} to |descriptor|. 1. If |this|.{{GPUTexture/usage}} contains {{GPUTextureUsage/RENDER_ATTACHMENT}}: - 1. Let |renderExtent| be [$compute render extent$](|this|.{{GPUTexture/[[size]]}}, |descriptor|.{{GPUTextureViewDescriptor/baseMipLevel}}). + 1. Let |renderExtent| be [$compute render extent$]([|this|.{{GPUTexture/width}}, |this|.{{GPUTexture/height}}, |this|.{{GPUTexture/depthOrArrayLayers}}], |descriptor|.{{GPUTextureViewDescriptor/baseMipLevel}}). 1. Set |view|.{{GPUTextureView/[[renderExtent]]}} to |renderExtent|.
@@ -5021,20 +5036,23 @@ interface GPUExternalTexture { GPUExternalTexture includes GPUObjectBase; -{{GPUExternalTexture}} has the following internal slots: +{{GPUExternalTexture}} has the following [=immutable properties=]: -
- : \[[expired]], of type `boolean` +
+ : \[[descriptor]], of type {{GPUExternalTextureDescriptor}}, readonly :: - Indicates whether the object has expired (can no longer be used). - Initially set to `false`. + The descriptor with which the texture was created. +
- Note: - Unlike similar `\[[destroyed]]` slots, this can change from `true` back to `false`. +{{GPUExternalTexture}} has the following [=immutable properties=]: - : \[[descriptor]], of type {{GPUExternalTextureDescriptor}} +
+ : \[[expired]], of type {{boolean}}, initially `false` :: - The descriptor with which the texture was created. + Indicates whether the object has expired (can no longer be used). + + Note: + Unlike `[[destroyed]]` slots, which are similar, this can change from `true` back to `false`.
### Importing External Textures ### {#external-texture-creation} @@ -5254,18 +5272,18 @@ interface GPUSampler { GPUSampler includes GPUObjectBase; -{{GPUSampler}} has the following internal slots: +{{GPUSampler}} has the following [=immutable properties=]: -
+
: \[[descriptor]], of type {{GPUSamplerDescriptor}}, readonly :: The {{GPUSamplerDescriptor}} with which the {{GPUSampler}} was created. - : \[[isComparison]], of type {{boolean}} + : \[[isComparison]], of type {{boolean}}, readonly :: Whether the {{GPUSampler}} is used as a comparison sampler. - : \[[isFiltering]], of type {{boolean}} + : \[[isFiltering]], of type {{boolean}}, readonly :: Whether the {{GPUSampler}} weights multiple samples of a texture.
@@ -5547,10 +5565,10 @@ interface GPUBindGroupLayout { GPUBindGroupLayout includes GPUObjectBase; -{{GPUBindGroupLayout}} has the following internal slots: +{{GPUBindGroupLayout}} has the following [=immutable properties=]:
- : \[[descriptor]], of type {{GPUBindGroupLayoutDescriptor}} + : \[[descriptor]], of type {{GPUBindGroupLayoutDescriptor}}, readonly ::
@@ -5911,19 +5929,19 @@ dictionary GPUExternalTextureBindingLayout { }; -A {{GPUBindGroupLayout}} object has the following internal slots: +A {{GPUBindGroupLayout}} object has the following [=device timeline properties=]: -
- : \[[entryMap]], of type [=ordered map=]<{{GPUSize32}}, {{GPUBindGroupLayoutEntry}}> +
+ : \[[entryMap]], of type [=ordered map=]<{{GPUSize32}}, {{GPUBindGroupLayoutEntry}}>, readonly :: The map of binding indices pointing to the {{GPUBindGroupLayoutEntry}}s, which this {{GPUBindGroupLayout}} describes. - : \[[dynamicOffsetCount]], of type {{GPUSize32}} + : \[[dynamicOffsetCount]], of type {{GPUSize32}}, readonly :: The number of buffer bindings with dynamic offsets in this {{GPUBindGroupLayout}}. - : \[[exclusivePipeline]], of type {{GPUPipelineBase}}?, initially `null` + : \[[exclusivePipeline]], of type {{GPUPipelineBase}}?, readonly :: The pipeline that created this {{GPUBindGroupLayout}}, if it was created as part of a [[#default-pipeline-layout|default pipeline layout]]. If not `null`, {{GPUBindGroup}}s @@ -6010,6 +6028,7 @@ A {{GPUBindGroupLayout}} object has the following internal slots: 1. Set |layout|.{{GPUBindGroupLayout/[[dynamicOffsetCount]]}} to the number of entries in |descriptor| where {{GPUBindGroupLayoutEntry/buffer}} is [=map/exist|provided=] and {{GPUBindGroupLayoutEntry/buffer}}.{{GPUBufferBindingLayout/hasDynamicOffset}} is `true`. + 1. Set |layout|.{{GPUBindGroupLayout/[[exclusivePipeline]]}} to `null`. 1. For each {{GPUBindGroupLayoutEntry}} |entry| in |descriptor|.{{GPUBindGroupLayoutDescriptor/entries}}: 1. Insert |entry| into |layout|.{{GPUBindGroupLayout/[[entryMap]]}} @@ -6048,9 +6067,9 @@ interface GPUBindGroup { GPUBindGroup includes GPUObjectBase; -A {{GPUBindGroup}} object has the following internal slots: +{{GPUBindGroup}} has the following [=device timeline properties=]: -
+
: \[[layout]], of type {{GPUBindGroupLayout}}, readonly :: The {{GPUBindGroupLayout}} associated with this {{GPUBindGroup}}. @@ -6137,10 +6156,10 @@ following members: {{GPUExternalTexture}}, or {{GPUBufferBinding}}.
-A {{GPUBindGroupEntry}} object also has the following internal slots: +{{GPUBindGroupEntry}} has the following [=device timeline properties=]: -
- : \[[prevalidatedSize]], of type boolean +
+ : \[[prevalidatedSize]], of type {{boolean}} :: Whether or not this binding entry had its buffer size validated at time of creation.
@@ -6371,10 +6390,10 @@ interface GPUPipelineLayout { GPUPipelineLayout includes GPUObjectBase; -{{GPUPipelineLayout}} has the following internal slots: +{{GPUPipelineLayout}} has the following [=device timeline properties=]: -
- : \[[bindGroupLayouts]], of type [=list=]<{{GPUBindGroupLayout}}> +
+ : \[[bindGroupLayouts]], of type [=list=]<{{GPUBindGroupLayout}}>, readonly :: The {{GPUBindGroupLayout}} objects provided at creation in {{GPUPipelineLayoutDescriptor/bindGroupLayouts|GPUPipelineLayoutDescriptor.bindGroupLayouts}}.
@@ -6763,7 +6782,7 @@ any specific point in the code at all. {{GPUCompilationMessage}} has the following attributes: -
+
: message :: The human-readable, [=localizable text=] for this compilation message. @@ -7004,7 +7023,7 @@ enum GPUPipelineErrorReason { {{GPUPipelineError}} has the following attributes: -
+
: reason :: A read-only [=slot-backed attribute=] exposing the type of error encountered in pipeline creation @@ -7059,9 +7078,9 @@ interface mixin GPUPipelineBase { }; -{{GPUPipelineBase}} has the following internal slots: +{{GPUPipelineBase}} has the following [=device timeline properties=]: -
+
: \[[layout]], of type `GPUPipelineLayout` :: The definition of the layout of resources which can be used with `this`. @@ -7868,19 +7887,19 @@ GPURenderPipeline includes GPUObjectBase; GPURenderPipeline includes GPUPipelineBase; -{{GPURenderPipeline}} has the following internal slots: +{{GPURenderPipeline}} has the following [=device timeline properties=]: -
- : \[[descriptor]], of type {{GPURenderPipelineDescriptor}} +
+ : \[[descriptor]], of type {{GPURenderPipelineDescriptor}}, readonly :: The {{GPURenderPipelineDescriptor}} describing this pipeline. All optional fields of {{GPURenderPipelineDescriptor}} are defined. - : \[[writesDepth]], of type boolean + : \[[writesDepth]], of type {{boolean}}, readonly :: True if the pipeline writes to the depth component of the depth/stencil attachment - : \[[writesStencil]], of type boolean + : \[[writesStencil]], of type {{boolean}}, readonly :: True if the pipeline writes to the stencil component of the depth/stencil attachment
@@ -9581,17 +9600,17 @@ interface GPUCommandBuffer { GPUCommandBuffer includes GPUObjectBase; -{{GPUCommandBuffer}} has the following internal slots: +{{GPUCommandBuffer}} has the following [=device timeline properties=]: -
- : \[[command_list]], of type [=list=]<[=GPU command=]> +
+ : \[[command_list]], of type [=list=]<[=GPU command=]>, readonly :: A [=list=] of [=GPU commands=] to be executed on the [=Queue timeline=] when this command buffer is submitted. - : \[[renderState]], of type [=RenderState=] + : \[[renderState]], of type [=RenderState=], initially `null` :: - The current state used by any render pass commands being executed, initially `null`. + The current state used by any render pass commands being executed.
### Command Buffer Creation ### {#command-buffer-creation} @@ -9617,9 +9636,9 @@ interface mixin GPUCommandsMixin { }; -{{GPUCommandsMixin}} adds the following internal slots to interfaces which include it: +{{GPUCommandsMixin}} has the following [=device timeline properties=]: -
+
: \[[state]], of type [=encoder state=], initially "[=encoder state/open=]" :: The current state of the encoder. @@ -10512,16 +10531,16 @@ interface mixin GPUBindingCommandsMixin { {{GPUObjectBase}} and {{GPUCommandsMixin}} members on the same object. It must only be included by interfaces which also include those mixins. -{{GPUBindingCommandsMixin}} has the following internal slots: +{{GPUBindingCommandsMixin}} has the following [=device timeline properties=]: -
- : \[[bind_groups]], of type [=ordered map=]<{{GPUIndex32}}, {{GPUBindGroup}}> +
+ : \[[bind_groups]], of type [=ordered map=]<{{GPUIndex32}}, {{GPUBindGroup}}>, initially empty :: - The current {{GPUBindGroup}} for each index, initially empty. + The current {{GPUBindGroup}} for each index. - : \[[dynamic_offsets]], of type [=ordered map=]<{{GPUIndex32}}, [=list=]<{{GPUBufferDynamicOffset}}> > + : \[[dynamic_offsets]], of type [=ordered map=]<{{GPUIndex32}}, [=list=]<{{GPUBufferDynamicOffset}}>>, initally empty :: - The current dynamic offsets for each {{GPUBindingCommandsMixin/[[bind_groups]]}} entry, initially empty. + The current dynamic offsets for each {{GPUBindingCommandsMixin/[[bind_groups]]}} entry.
## Bind Groups ## {#programmable-passes-bind-groups} @@ -10726,9 +10745,9 @@ It must only be included by interfaces which also include those mixins. [=Device timeline=] steps: 1. For each |stage| in [{{GPUShaderStage/VERTEX}}, {{GPUShaderStage/FRAGMENT}}, {{GPUShaderStage/COMPUTE}}]: - 1. Let |bufferBindings| be a [=list=] of ({{GPUBufferBinding}}, `boolean`) pairs, + 1. Let |bufferBindings| be a [=list=] of ({{GPUBufferBinding}}, {{boolean}}) pairs, where the latter indicates whether the resource was used as writable. - 1. Let |textureViews| be a [=list=] of ({{GPUTextureView}}, `boolean`) pairs, + 1. Let |textureViews| be a [=list=] of ({{GPUTextureView}}, {{boolean}}) pairs, where the latter indicates whether the resource was used as writable. 1. For each pair of ({{GPUIndex32}} |bindGroupIndex|, {{GPUBindGroupLayout}} |bindGroupLayout|) in |pipeline|.{{GPUPipelineBase/[[layout]]}}.{{GPUPipelineLayout/[[bindGroupLayouts]]}}: @@ -10743,7 +10762,7 @@ It must only be included by interfaces which also include those mixins. {{GPUBufferBinding}} |resource|) in |bufferRanges|, in which |bindGroupLayoutEntry|.{{GPUBindGroupLayoutEntry/visibility}} contains |stage|: 1. Let |resourceWritable| be (|bindGroupLayoutEntry|.{{GPUBindGroupLayoutEntry/buffer}}.{{GPUBufferBindingLayout/type}} == {{GPUBufferBindingType/"storage"}}). - 1. For each pair ({{GPUBufferBinding}} |pastResource|, `boolean` |pastResourceWritable|) in |bufferBindings|: + 1. For each pair ({{GPUBufferBinding}} |pastResource|, {{boolean}} |pastResourceWritable|) in |bufferBindings|: 1. If (|resourceWritable| or |pastResourceWritable|) is true, and |pastResource| and |resource| are [=buffer-binding-aliasing=], return `true`. 1. [=list/append|Append=] (|resource|, |resourceWritable|) to |bufferBindings|. @@ -10755,7 +10774,7 @@ It must only be included by interfaces which also include those mixins. |bindGroupLayoutEntry|.{{GPUBindGroupLayoutEntry/storageTexture}}.{{GPUStorageTextureBindingLayout/access}} is a writable access mode. 1. If |bindGroupLayoutEntry|.{{GPUBindGroupLayoutEntry/storageTexture}} is not [=map/exist|provided=], **continue**. - 1. For each pair ({{GPUTextureView}} |pastResource|, `boolean` |pastResourceWritable|) in |textureViews|, + 1. For each pair ({{GPUTextureView}} |pastResource|, {{boolean}} |pastResourceWritable|) in |textureViews|, 1. If (|resourceWritable| or |pastResourceWritable|) is true, and |pastResource| and |resource| is [=texture-view-aliasing=], return `true`. 1. [=list/append|Append=] (|resource|, |resourceWritable|) to |textureViews|. @@ -10787,15 +10806,15 @@ interface mixin GPUDebugCommandsMixin { {{GPUObjectBase}} and {{GPUCommandsMixin}} members on the same object. It must only be included by interfaces which also include those mixins. -{{GPUDebugCommandsMixin}} adds the following internal slots to interfaces which include it: +{{GPUDebugCommandsMixin}} has the following [=device timeline properties=]: -
+
: \[[debug_group_stack]], of type [=stack=]<{{USVString}}> :: A stack of active debug group labels.
-{{GPUDebugCommandsMixin}} adds the following methods to interfaces which include it: +{{GPUDebugCommandsMixin}} has the following methods:
: pushDebugGroup(groupLabel) @@ -10905,20 +10924,20 @@ GPUComputePassEncoder includes GPUDebugCommandsMixin; GPUComputePassEncoder includes GPUBindingCommandsMixin; -{{GPUComputePassEncoder}} has the following internal slots: +{{GPUComputePassEncoder}} has the following [=device timeline properties=]: -
+
: \[[command_encoder]], of type {{GPUCommandEncoder}}, readonly :: The {{GPUCommandEncoder}} that created this compute pass encoder. - : \[[pipeline]], of type {{GPUComputePipeline}}, readonly - :: - The current {{GPUComputePipeline}}, initially `null`. - : \[[endTimestampWrite]], of type [=GPU command=]?, readonly, defaulting to `null` :: [=GPU command=], if any, writing a timestamp when the pass ends. + + : \[[pipeline]], of type {{GPUComputePipeline}}, initially `null` + :: + The current {{GPUComputePipeline}}.
### Compute Pass Encoder Creation ### {#compute-pass-encoder-creation} @@ -11233,7 +11252,7 @@ GPURenderPassEncoder includes GPUBindingCommandsMixin; GPURenderPassEncoder includes GPURenderCommandsMixin; -{{GPURenderPassEncoder}} has the following internal slots used for validation while encoding: +{{GPURenderPassEncoder}} has the following [=device timeline properties=]:
: \[[command_encoder]], of type {{GPUCommandEncoder}}, readonly @@ -11251,10 +11270,6 @@ GPURenderPassEncoder includes GPURenderCommandsMixin; The {{GPUQuerySet}} to store occlusion query results for the pass, which is initialized with {{GPURenderPassDescriptor}}.{{GPURenderPassDescriptor/occlusionQuerySet}} at pass creation time. - : \[[occlusion_query_active]], of type {{boolean}} - :: - Whether the pass's {{GPURenderPassEncoder/[[occlusion_query_set]]}} is being written. - : \[[endTimestampWrite]], of type [=GPU command=]?, readonly, defaulting to `null` :: [=GPU command=], if any, writing a timestamp when the pass ends. @@ -11262,14 +11277,18 @@ GPURenderPassEncoder includes GPURenderCommandsMixin; : \[[maxDrawCount]] of type {{GPUSize64}}, readonly :: The maximum number of draws allowed in this pass. + + : \[[occlusion_query_active]], of type {{boolean}} + :: + Whether the pass's {{GPURenderPassEncoder/[[occlusion_query_set]]}} is being written.
When executing encoded render pass commands as part of a {{GPUCommandBuffer}}, an internal RenderState object is used to track the current state required for rendering. -[=RenderState=] contains the following internal slots used for execution of rendering commands: +[=RenderState=] has the following [=queue timeline properties=]: -
+
: \[[occlusionQueryIndex]], of type {{GPUSize32}} :: The index into {{GPURenderPassEncoder/[[occlusion_query_set]]}} at which to store the @@ -11992,18 +12011,18 @@ interface mixin GPURenderCommandsMixin { {{GPUObjectBase}}, {{GPUCommandsMixin}}, and {{GPUBindingCommandsMixin}} members on the same object. It must only be included by interfaces which also include those mixins. -{{GPURenderCommandsMixin}} has the following internal slots: +{{GPURenderCommandsMixin}} has the following [=device timeline properties=]: -
+
: \[[layout]], of type {{GPURenderPassLayout}}, readonly :: The layout of the render pass. - : \[[depthReadOnly]], of type boolean, readonly + : \[[depthReadOnly]], of type {{boolean}}, readonly :: If `true`, indicates that the depth component is not modified. - : \[[stencilReadOnly]], of type boolean, readonly + : \[[stencilReadOnly]], of type {{boolean}}, readonly :: If `true`, indicates that the stencil component is not modified. @@ -12011,13 +12030,13 @@ It must only be included by interfaces which also include those mixins. :: The [=usage scope=] for this render pass or bundle. - : \[[pipeline]], of type {{GPURenderPipeline}} + : \[[pipeline]], of type {{GPURenderPipeline}}, initially `null` :: - The current {{GPURenderPipeline}}, initially `null`. + The current {{GPURenderPipeline}}. - : \[[index_buffer]], of type {{GPUBuffer}} + : \[[index_buffer]], of type {{GPUBuffer}}, initially `null` :: - The current buffer to read index data from, initially `null`. + The current buffer to read index data from. : \[[index_format]], of type {{GPUIndexFormat}} :: @@ -12032,14 +12051,13 @@ It must only be included by interfaces which also include those mixins. The size in bytes of the section of {{GPURenderCommandsMixin/[[index_buffer]]}} currently set, initially `0`. - : \[[vertex_buffers]], of type [=ordered map=]<slot, {{GPUBuffer}}> + : \[[vertex_buffers]], of type [=ordered map=]<slot, {{GPUBuffer}}>, initially empty :: - The current {{GPUBuffer}}s to read vertex data from for each slot, initially empty. + The current {{GPUBuffer}}s to read vertex data from for each slot. - : \[[vertex_buffer_sizes]], of type [=ordered map=]<slot, {{GPUSize64}}> + : \[[vertex_buffer_sizes]], of type [=ordered map=]<slot, {{GPUSize64}}>, initially empty :: - The size in bytes of the section of {{GPUBuffer}} currently set for each slot, initially - empty. + The size in bytes of the section of {{GPUBuffer}} currently set for each slot. : \[[drawCount]], of type {{GPUSize64}} :: @@ -12930,11 +12948,11 @@ GPURenderBundle includes GPUObjectBase; :: The layout of the render bundle. - : \[[depthReadOnly]], of type boolean + : \[[depthReadOnly]], of type {{boolean}} :: If `true`, indicates that the depth component is not modified by executing this render bundle. - : \[[stencilReadOnly]], of type boolean + : \[[stencilReadOnly]], of type {{boolean}} :: If `true`, indicates that the stencil component is not modified by executing this render bundle. @@ -13465,11 +13483,13 @@ GPUQueue includes GPUObjectBase; : {{GPUExternalTexture}} |et| :: |et|.{{GPUExternalTexture/[[expired]]}} must be `false`. : {{GPUQuerySet}} |qs| - :: |qs| must be in the [=query set state/available=] state. - For occlusion queries, the {{GPURenderPassDescriptor/occlusionQuerySet}} - in {{GPUCommandEncoder/beginRenderPass()}} is not "used" unless - it is also used by {{GPURenderPassEncoder/beginOcclusionQuery()}}. + :: |qs|.{{GPUQuerySet/[[destroyed]]}} must be `false`.
+ + Note: + For occlusion queries, the {{GPURenderPassDescriptor/occlusionQuerySet}} + in {{GPUCommandEncoder/beginRenderPass()}} is not "used" unless + it is also used by {{GPURenderPassEncoder/beginOcclusionQuery()}}.
1. For each |commandBuffer| in |commandBuffers|: @@ -13545,9 +13565,9 @@ interface GPUQuerySet { GPUQuerySet includes GPUObjectBase; -{{GPUQuerySet}} has the following attributes: +{{GPUQuerySet}} has the following [=immutable properties=]: -
+
: type :: The type of the queries managed by this {{GPUQuerySet}}. @@ -13557,22 +13577,13 @@ GPUQuerySet includes GPUObjectBase; The number of queries managed by this {{GPUQuerySet}}.
-{{GPUQuerySet}} has the following internal slots: +{{GPUQuerySet}} has the following [=device timeline properties=]: -
- : \[[state]], of type [=query set state=] +
+ : \[[destroyed]], of type {{boolean}}, initially `false` :: - The current state of the {{GPUQuerySet}}. -
- -Each {{GPUQuerySet}} has a current query set state on the [=Device timeline=] -which is one of the following: - -
- : "available" - :: The {{GPUQuerySet}} is available for GPU operations on its content. - : "destroyed" - :: The {{GPUQuerySet}} is no longer available for any operations except {{GPUQuerySet/destroy}}. + If the query set is destroyed, it can no longer be used in any operation, + and its underlying memory can be freed.
### QuerySet Creation ### {#queryset-creation} @@ -13635,8 +13646,6 @@ dictionary GPUQuerySetDescriptor - |this| must not be [$invalid|lost$]. - |descriptor|.{{GPUQuerySetDescriptor/count}} must be ≤ 4096.
- - 1. Set |q|.{{GPUQuerySet/[[state]]}} to [=query set state/available=].
@@ -13652,11 +13661,13 @@ dictionary GPUQuerySetDescriptor
-### QuerySet Destruction ### {#queryset-destruction} +### Query Set Destruction ### {#queryset-destruction} An application that no longer requires a {{GPUQuerySet}} can choose to lose access to it before garbage collection by calling {{GPUQuerySet/destroy()}}. +{{GPUQuerySet}} has the following methods: +
: destroy() :: @@ -13670,7 +13681,12 @@ garbage collection by calling {{GPUQuerySet/destroy()}}. [=Content timeline=] steps: - 1. Set |this|.{{GPUQuerySet/[[state]]}} to [=query set state/destroyed=]. + 1. Issue the subsequent steps on the [=device timeline=]. +
+
+ [=Device timeline=] steps: + + 1. Set |this|.{{GPUQuerySet/[[destroyed]]}} to `true`.
@@ -13811,17 +13827,13 @@ interface GPUCanvasContext { }; -{{GPUCanvasContext}} has the following attributes: +{{GPUCanvasContext}} has the following [=content timeline properties=]: -
+
: canvas :: The canvas this context was created from. -
- -{{GPUCanvasContext}} has the following internal slots: -
: \[[configuration]], of type {{GPUCanvasConfiguration}}?, initially `null` :: The options this context is currently configured with. @@ -14480,9 +14492,9 @@ this possibility, using only the error's {{GPUError/message}} when possible, and `instanceof`. Use `error.constructor.name` when it's necessary to serialize an error (e.g. into JSON, for a debug report). -{{GPUError}} has the following attributes: +{{GPUError}} has the following [=immutable properties=]: -
+
: message :: A human-readable, [=localizable text=] message providing information about the error that @@ -14580,8 +14592,9 @@ A GPU error scope captures {{GPUError}}s that were generated whil [=GPU error scope=] was current. Error scopes are used to isolate errors that occur within a set of WebGPU calls, typically for debugging purposes or to make an operation more fault tolerant. -[=GPU error scope=] has the following internal slots: -
+[=GPU error scope=] has the following [=device timeline properties=]: + +
: \[[errors]], of type [=list=]<{{GPUError}}>, initially [] :: The {{GPUError}}s, if any, observed while the [=GPU error scope=] was current. @@ -14621,9 +14634,9 @@ partial interface GPUDevice { Indicates that the error scope will catch a {{GPUInternalError}}.
-{{GPUDevice}} has the following internal slots: +{{GPUDevice}} has the following [=device timeline properties=]: -
+
: \[[errorScopeStack]], of type [=stack=]<[=GPU error scope=]> :: A [=stack=] of [=GPU error scopes=] that have been pushed to the {{GPUDevice}}. @@ -14869,7 +14882,7 @@ dictionary GPUUncapturedErrorEventInit : EventInit { {{GPUUncapturedErrorEvent}} has the following attributes: -
+
: error :: A [=slot-backed attribute=] holding an object representing the error that was uncaptured. @@ -14883,9 +14896,9 @@ partial interface GPUDevice { }; -{{GPUDevice}} has the following attributes: +{{GPUDevice}} has the following [=content timeline properties=]: -
+
: onuncapturederror :: An [=event handler IDL attribute=] for the {{GPUDevice/uncapturederror}} event type. From 0f098a93b6470c2395d650d798231fe4d4e9bdb9 Mon Sep 17 00:00:00 2001 From: David Neto Date: Mon, 19 Aug 2024 13:38:43 -0400 Subject: [PATCH 161/285] wgsl: define 'extended real' numbers (#4819) This will help when defining the domain of a numerical function. Also, create [INF],[PINF],[NINF] macros to ensure consistent typography for infinities. Issue: #2765 --- wgsl/index.bs | 39 +++++++++++++++++++++++---------------- 1 file changed, 23 insertions(+), 16 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 25c412bbb5..da5b1ec0e4 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -19,6 +19,9 @@ Text Macro: ALLINTEGRALDECL S is AbstractInt, i32, or u32
T is S or vecN<S Text Macro: ALLFLOATINGDECL S is AbstractFloat, f32, or f16
T is S or vecN<S> Text Macro: ALLNUMERICDECL S is AbstractInt, AbstractFloat, i32, u32, f32, or f16
T is S, or vecN<S> Text Macro: ALLSIGNEDNUMERICDECL S is AbstractInt, AbstractFloat, i32, f32, or f16
T is S, or vecN<S> +Text Macro: INF ∞ +Text Macro: PINF +∞ +Text Macro: NINF −∞ Ignored Vars: i, c0, e, e1, e2, e3, edge, eN, p, s1, s2, sn, AS, AM, N, newbits, M, C, R, v, Stride, Offset, Align, Extent, T, T1 !Participate: File an issue (open issues) @@ -481,7 +484,7 @@ Following syntax notation describes the conventions of the syntactic grammar of Angles: * By convention, angles are measured in radians. -* The reference ray for measuring angles is the ray from the origin (0,0) toward (+∞,0). +* The reference ray for measuring angles is the ray from the origin (0,0) toward ([PINF],0). * Let θ be the angle subtended by a comparison ray and the reference ray. Then θ increases as the comparison ray moves counterclockwise. * There are 2 π radians in a complete circle. @@ -508,28 +511,32 @@ sense. Specifically: Then the area |a| is a hyperbolic angle such that |x| is the hyperbolic cosine of |a|, and |y| is the hyperbolic sine of |a|. +The extended real numbers +(also known as the affinely extended real numbers) is the set of real numbers together with [PINF] and [NINF]. +Computers use [[#floating-point-types|floating point types]] to approximately represent the extended reals, including values for both infinities. + An interval is a contiguous set of numbers with a lower and upper bound. Depending on context, they are sets of integers, floating point numbers, or real numbers. * The closed interval [*a*,*b*] is the set of numbers *x* such that *a* ≤ *x* ≤ *b*. * The half-open interval [*a*,*b*) is the set of numbers *x* such that *a* ≤ *x* < *b*. * The half-open interval (*a*,*b*] is the set of numbers *x* such that *a* < *x* ≤ *b*. -The floor expression is defined over real numbers |x| extended with +∞ and −∞: +The floor expression is defined for [=extended real=] numbers |x|: -* ⌊ + ∞ ⌋ = +∞ -* ⌊ − ∞ ⌋ = −∞ +* ⌊ [PINF] ⌋ = [PINF] +* ⌊ [NINF] ⌋ = [NINF] * for real number |x|, ⌊|x|⌋ = |k|, where |k| is the unique integer such that |k| ≤ |x| < |k|+1 -The ceiling expression is defined over real numbers |x| extended with +∞ and −∞: +The ceiling expression is defined for [=extended real=] numbers |x|: -* ⌈ +∞ ⌉ = +∞ -* ⌈ −∞ ⌉ = −∞ +* ⌈ [PINF] ⌉ = [PINF] +* ⌈ [NINF] ⌉ = [NINF] * for real number |x|, ⌈|x|⌉ = |k|, where |k| is the unique integer such that |k|-1 < |x| ≤ |k| -The truncate function is defined over real numbers |x| extended with +∞ and −∞: +The truncate function is defined for [=extended real=] numbers |x|: -* truncate(+∞) = +∞ -* truncate(−∞) = −∞ +* truncate([PINF]) = [PINF] +* truncate([NINF]) = [NINF] * for real number |x|, computes the nearest whole number whose absolute value is less than or equal to the absolute value of |x|: * truncate(|x|) = ⌊|x|⌋ if |x| ≥ 0, and ⌈|x|⌉ if |x| < 0. @@ -12110,7 +12117,7 @@ the following differences: behaviors depending on whether the expression is a [=const-expression=], an [=override-expression=], or a [=runtime expression=]. * IEEE-754 defines five kinds of exceptions: - * Invalid operation. These operations yield a NaN. An example of an invalid operation is 0 × ∞. + * Invalid operation. These operations yield a NaN. An example of an invalid operation is 0 × [INF]. * Division by zero. This occurs when an operation on finite operands is defined as having an exact infinite result. Examples are 1 ÷ 0, and log(0). * Overflow. See [[#floating-point-overflow]]. @@ -12167,11 +12174,11 @@ The final value of the expression is determined in two stages, via intermediate From *X*, compute *X'* in *T* by rounding: * If *X* is in the finite range of *T* then *X'* is the result of rounding *X* up or down. * If *X* is NaN, then *X'* is NaN. -* If *MAX(T)* < *X* < 2*EMAX(T)+1*, then either rounding direction is used: *X'* is *MAX(T)* or +∞. -* If 2*EMAX(T)+1* ≤ *X*, then *X'* = +∞. +* If *MAX(T)* < *X* < 2*EMAX(T)+1*, then either rounding direction is used: *X'* is *MAX(T)* or [PINF]. +* If 2*EMAX(T)+1* ≤ *X*, then *X'* = [PINF]. * Note: This matches the [[!IEEE-754|IEEE-754]] rule. -* If −*MAX(T)* > *X* > −2*EMAX(T)+1*, then either rounding direction is used: *X'* is −*MAX(T)* or −∞. -* If −2*EMAX(T)+1* ≥ *X*, then *X'* = −∞. +* If −*MAX(T)* > *X* > −2*EMAX(T)+1*, then either rounding direction is used: *X'* is −*MAX(T)* or [NINF]. +* If −2*EMAX(T)+1* ≥ *X*, then *X'* = [NINF]. * Note: This matches the IEEE-754 rule. From *X'*, compute the final value of the expression, *X''*, or detect a program error: @@ -13842,7 +13849,7 @@ Note: The result is not mathematically meaningful when `abs(e)` > 1. Description Returns the inverse hyperbolic cosine (cosh-1) of `x`, as a [=hyperbolic angle=].
- That is, approximates `a` with 0 ≤ a ≤ ∞, such that `cosh`(`a`) = `x`. + That is, approximates `a` with 0 ≤ a ≤ [INF], such that `cosh`(`a`) = `x`. [=Component-wise=] when `T` is a vector. From c1990f4f6a8ad55ad593462c585483dcbab7850e Mon Sep 17 00:00:00 2001 From: Matthew Wong <110081332+matthew-wong1@users.noreply.github.com> Date: Tue, 20 Aug 2024 22:29:03 +0800 Subject: [PATCH 162/285] Add extra example for operations involving abstract numerics (#4795) * add additional example relating to an operation between an abstract numeric and non-abstract numeric --- wgsl/index.bs | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/wgsl/index.bs b/wgsl/index.bs index da5b1ec0e4..d931b1d0db 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -2564,6 +2564,15 @@ Example: `1u + 2.5` results in a [=shader-creation error=]: * There is no feasible automatic conversion from a GPU-materialized [=integer scalar=] type to a floating point type. * No type rule matches *e*`+`*f* with *e* in an [=integer scalar=] type, and *f* in a floating point type. +Example: `-1 * i32(-2147483648)` does not result in a [=shader-creation error=]: +* The `-1` term is an expression of type [=AbstractInt=]. +* The `i32(-2147483648)` term is an expression of type [=i32=]. +* There is no overload of the multiply operator for these two types and the [=i32=] term cannot be up-converted to [=AbstractInt=]. +* The only feasible automatic conversion is converting the [=AbstractInt=] to [=i32=] so: + * The resulting computation is a multiplication operation between [=i32=]. + * [=i32=] operations use two's complement arithmetic and have defined overflow behavior. + * Wrapping occurs. +
// Explicitly-typed unsigned integer literal. From 9465f414657b172bad9f33968bb89f45e33261f2 Mon Sep 17 00:00:00 2001 From: Jiawei Shao <jiawei.shao@intel.com> Date: Wed, 21 Aug 2024 18:16:11 +0800 Subject: [PATCH 163/285] Fix rules about dual source blending in `validating GPUFragmentState` (#4791) This patch fixes several rules about dual source blending in the algorithm `validating GPUFragmentState` (1) When `Src1` is used, the fragment output at location 0 must have `@blend_src(1)` whether the colorWriteMask is 0 or not, which is required by Metal validation layer. (2) The first validation rule is not needed any more because: - According to the second rule (required by D3D12) we must have exactly one color target - According to (1), the fragment output at location 0 must have `blend_src(1)`. --- spec/index.bs | 8 +++----- wgsl/index.bs | 3 ++- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index ad42f9424a..4638987507 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -111,6 +111,7 @@ spec: WGSL; urlPrefix: https://gpuweb.github.io/gpuweb/wgsl/# text: interpolation-sampling; url: interpolation-sampling text: textureSampleLevel; url: texturesamplelevel text: interpolation type; url: interpolation-type + text: use dual source blending; url: use-dual-source-blending for: interpolation type text: flat; url: interpolation-type-flat text: linear; url: interpolation-type-linear @@ -8529,12 +8530,9 @@ dictionary GPUFragmentState - |entryPoint| must have a [=shader stage output=] with [=location=] equal to |index| and [=blend_src=] omitted or equal to 0. - If |usesDualSourceBlending| is `true`: - - All the [=shader stage output=] values of |entryPoint| must have a [=blend_src=] attribute. - |descriptor|.{{GPUFragmentState/targets}}.length must be 1. - - Let |colorState| be |descriptor|.{{GPUFragmentState/targets}}[0]. - - If |colorState|.{{GPUColorTargetState/writeMask}} is not 0: - - |entryPoint| must have a [=shader stage output=] with [=location=] equal to 0 - and [=blend_src=] equal to 1. + - All the [=shader stage outputs=] with [=location=] in |entryPoint| must be in one + struct and [=use dual source blending=]. - [$Validating GPUFragmentState's color attachment bytes per sample$](|device|, |descriptor|.{{GPUFragmentState/targets}}) succeeds. </div> </div> diff --git a/wgsl/index.bs b/wgsl/index.bs index d931b1d0db..cb675f0252 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -9570,7 +9570,8 @@ Each structure member in the entry point IO [=shader-creation error|must=] be on <div algorithm="locations in structs"> For each structure type |S| defined in a WGSL module (not just those used in shader stage inputs or outputs), let |members| be the set of members of |S| that have [=attribute/location=] attributes. - - If any entry in |members| specifies a [=attribute/blend_src=] attribute: + - If any entry in |members| specifies a [=attribute/blend_src=] attribute, |members| must <dfn export> + use dual source blending</dfn>, which means: - |members| [=shader-creation error|must=] contain exactly `2` entries, one with `@location(0) @blend_src(0)` and one with `@location(0) @blend_src(1)`. - All the |members| [=shader-creation error|must=] have same data type. From 8759aa70c5c129acb8e22228aa37e3455dd1a8a5 Mon Sep 17 00:00:00 2001 From: David Neto <dneto@google.com> Date: Wed, 21 Aug 2024 10:55:57 -0400 Subject: [PATCH 164/285] wsgl: remove TODO about acos, asin accuracy for f16 (#4823) Conformance tests now check the stated accuracy. --- wgsl/index.bs | 2 -- 1 file changed, 2 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index cb675f0252..b83ddffac6 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -12284,7 +12284,6 @@ the rules in [[#floating-point-overflow]] apply. <td>The worse of: * Absolute error 3.91&times;10<sup>-3</sup> * Inherited from `atan2(sqrt(1.0 - x * x), x)` - <p>TODO: check this with conformance tests <tr><td>`acosh(x)`<td colspan=2 style="text-align:left;">Inherited from `log(x + sqrt(x * x - 1.0))` <tr><td>`asin(x)`<td> The worse of: @@ -12294,7 +12293,6 @@ the rules in [[#floating-point-overflow]] apply. <td>The worse of: * Absolute error 3.91&times;10<sup>-3</sup> * Inherited from `atan2(x, sqrt(1.0 - x * x))` - <p>TODO: check this with conformance tests <tr><td>`asinh(x)`<td colspan=2 style="text-align:left;">Inherited from `log(x + sqrt(x * x + 1.0))` <tr><td>`atan(x)`<td>4096 ULP<td>5 ULP <tr><td>`atan2(y, x)`<td>4096 ULP for `|x|` in the range [2<sup>-126</sup>, 2<sup>126</sup>], and `y` is finite and normal<td>5 ULP for `|x|` in the range [2<sup>-14</sup>, 2<sup>14</sup>], and `y` is finite and normal From fb06f3acc9604d5010b0b5c169d0a6328bc69ea4 Mon Sep 17 00:00:00 2001 From: David Neto <dneto@google.com> Date: Tue, 27 Aug 2024 13:42:32 -0400 Subject: [PATCH 165/285] wsgl: clarify fp edge cases (#4825) * wsgl: clarify fp edge cases * Refactor the inital material in the "Floating point evaluation" section into two new secitons: * Add subsection "Overview of IEEE-754" which: * Defines enough of IEEE-754 behaviour needed to describe the WGSL's differences. * Defines 'subnormal' and 'normal'. * Defines 'intermediate result', 'rounding', and 'rounding mode'. * Defines key characteristics of binary16, binary32, binary64 formats. * Gives the algorithm to compute a floating point value from its it representation. * Describes operation on infinite values. * Absorbs description of 'domain' of an fp operation. * Describes operation on NaN values. * Absorbs description of IEEE 754 exceptions. * Add subsection "Differences from IEEE-754" * Has most of the other material that used to be at the top of "Floating point evaluation" * Adds a clause about differences between corresponding WGSL and IEEE operations. * Add a new section "Domains of Floating Point Expressions and Built-in Functions" * Describes how the rest of the spec defines the domains of specific operations. * Often the same from IEEE-754 * Describes inferring the domain of component-wise operations from scalar domain. * Describes general rule of "inferred from linear terms", for inherited-from implementations that are linear expansions. * Replace existing uses of 'denormalized' to 'subnormal'. * Link uses of 'intermediate result' to the definition. * List domain restrictions as needed for all operations: * add, subtract, multiply, divide, remainder * acos, acosh, asin, astanh, cos, distance, inverseSqrt, log, log2, normalize, pow, sin, sqrt, tan * In float conversion to float, move the NaN clause to be second, instead of subordinate to the third. It reads better that way Fixed: #2765, #2884, #3220, #4219, #4527 * Update wgsl/index.bs use "infinitely precise" Co-authored-by: alan-baker <alanbaker@google.com> * Update wgsl/index.bs use "min" and "max" instead of "minimum" and "maximum" Co-authored-by: alan-baker <alanbaker@google.com> * Update wgsl/index.bs Co-authored-by: alan-baker <alanbaker@google.com> * Update wgsl/index.bs Use vector notation for FMA "implied from linear tems" rule. Co-authored-by: alan-baker <alanbaker@google.com> * Update 'cross' parameter notations in accuracy table: go back to x, y in definition section: use 'a' and 'b'. * fp bits to value algorithm: Drop E from formula when it's 0 * Cross-reference from "domain" to "differences from IEEE754" Restructure the paragraphs about domain into a definition and a bulleted list. This helps to scope the cross-reference mention of what WGSL does when evaluating an operation outside its domain. Also, link from the definition of "extended reals" to the Floating Point Evaluation section. Do this because we already have a sentence about how fp types approximate the extended reals. So link to the whole fp eval section so the reader can quickly jump to the rules. * Move link to 'exception' to outer bullet point * Reword 'mantissa' to 'significand' * Cross link to binary16 binary32 and binary64 definitions * Update wgsl/index.bs fix rendering Co-authored-by: alan-baker <alanbaker@google.com> * Update wgsl/index.bs fixed rendering Co-authored-by: alan-baker <alanbaker@google.com> * Fix more rendering in domain description of 'cross' --------- Co-authored-by: alan-baker <alanbaker@google.com> --- wgsl/index.bs | 515 +++++++++++++++++++++++++++++++++++++------------- 1 file changed, 381 insertions(+), 134 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index b83ddffac6..99bd540735 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -22,7 +22,7 @@ Text Macro: ALLSIGNEDNUMERICDECL S is AbstractInt, AbstractFloat, i32, f32, or f Text Macro: INF &infin; Text Macro: PINF &plus;&infin; Text Macro: NINF &minus;&infin; -Ignored Vars: i, c0, e, e1, e2, e3, edge, eN, p, s1, s2, sn, AS, AM, N, newbits, M, C, R, v, Stride, Offset, Align, Extent, T, T1 +Ignored Vars: i, c0, e, e1, e2, e3, edge, eN, p, s1, s2, sn, AS, AM, N, newbits, M, C, R, v, Stride, Offset, Align, Extent, T, T1, E, S, F, x, y, a, b !Participate: <a href="https://github.com/gpuweb/gpuweb/issues/new?labels=wgsl">File an issue</a> (<a href="https://github.com/gpuweb/gpuweb/issues?q=is%3Aissue+is%3Aopen+label%3Awgsl">open issues</a>) !Tests: <a href=https://github.com/gpuweb/cts/tree/main/src/webgpu/shader/>WebGPU CTS shader/</a> @@ -511,12 +511,17 @@ sense. Specifically: Then the area |a| is a hyperbolic angle such that |x| is the hyperbolic cosine of |a|, and |y| is the hyperbolic sine of |a|. +<dfn noexport>Positive infinity</dfn>, denoted by &infin; or [PINF], is a unique value strictly greater than all real numbers. + +<dfn noexport>Negative infinity</dfn>, denoted by [NINF], is a unique value strictly lower than all real numbers. + The <dfn>extended real</dfn> numbers (also known as the affinely extended real numbers) is the set of real numbers together with [PINF] and [NINF]. Computers use [[#floating-point-types|floating point types]] to approximately represent the extended reals, including values for both infinities. +See [[#floating-point-evaluation]]. An <dfn noexport>interval</dfn> is a contiguous set of numbers with a lower and upper bound. -Depending on context, they are sets of integers, floating point numbers, or real numbers. +Depending on context, they are sets of integers, floating point numbers, real numbers, or [=extended real=] numbers. * The closed interval [*a*,*b*] is the set of numbers *x* such that *a* &le; *x* &le; *b*. * The half-open interval [*a*,*b*) is the set of numbers *x* such that *a* &le; *x* &lt; *b*. * The half-open interval (*a*,*b*] is the set of numbers *x* such that *a* &lt; *x* &le; *b*. @@ -1122,16 +1127,16 @@ or a [=hexadecimal floating point literal=]. path: syntax/float_literal.syntax.bs.include </pre> -A [=floating point literal=] has two logical parts: a mantissa to representing a fraction, and an optional exponent. -Roughly, the value of the literal is the mantissa multiplied by a base value raised to the given exponent. -A mantissa digit is <dfn dfn-for="mantissa">significant</dfn> if it is non-zero, -or if there are mantissa digits to its left and to its right that are both non-zero. +A [=floating point literal=] has two logical parts: a significand to represent a fraction, and an optional exponent. +Roughly, the value of the literal is the significand multiplied by a base value raised to the given exponent. +A significand digit is <dfn dfn-for="significand">significant</dfn> if it is non-zero, +or if there are significand digits to its left and to its right that are both non-zero. Significant digits are counted from left-to-right: the *N*'th significant digit has *N*-1 significant digits to its left. A <dfn noexport>decimal floating point literal</dfn> is: -* A mantissa, specified as a sequence of digits, with an optional decimal point (`.`) somewhere among them. - The mantissa represents a fraction in base 10 notation. +* A significand, specified as a sequence of digits, with an optional decimal point (`.`) somewhere among them. + The significand represents a fraction in base 10 notation. * Then an optional exponent suffix consisting of: * `e` or `E`. * Then an exponent specified as an decimal number with an optional leading sign (`+` or `-`). @@ -1158,30 +1163,30 @@ path: syntax/decimal_float_literal.syntax.bs.include <div algorithm="mathematical value of decimal floating point literal"> The mathematical value of a [=decimal floating point literal=] is computed as follows: -* Compute |effective_mantissa| from |mantissa|: - * If |mantissa| has 20 or fewer [=mantissa/significant=] digits, then |effective_mantissa| is |mantissa|. +* Compute |effective_significand| from |significand|: + * If |significand| has 20 or fewer [=significand/significant=] digits, then |effective_significand| is |significand|. * Otherwise: - * Let |truncated_mantissa| be the same as |mantissa| except each digit to the + * Let |truncated_significand| be the same as |significand| except each digit to the right of the 20th significant digit is replaced with 0. - * Let |truncated_mantissa_next| be the same as |mantissa| except: + * Let |truncated_significand_next| be the same as |significand| except: * the 20th significant digit is incremented by 1, and carries are propagated to the left as needed needed to ensure each digit remains in the range 0 through 9, and * each digit to the right of the 20th significant digit is replaced with 0. - * Set |effective_mantissa| to either |truncated_mantissa| or |truncated_mantissa_next|. + * Set |effective_significand| to either |truncated_significand| or |truncated_significand_next|. This is an implementation choice. -* The mathematical value of the literal is the mathematical value of |effective_mantissa| as a decimal fraction, +* The mathematical value of the literal is the mathematical value of |effective_significand| as a decimal fraction, multiplied by 10 to the power of the exponent. When no exponent is specified, an exponent of 0 is assumed. </div> -Note: The decimal mantissa is truncated after 20 decimal digits, preserving approximately log(10)/log(2)&times;20 &approx; 66.4 significant bits in the fraction. +Note: The decimal significand is truncated after 20 decimal digits, preserving approximately log(10)/log(2)&times;20 &approx; 66.4 significant bits in the fraction. A <dfn noexport>hexadecimal floating point literal</dfn> is: * A `0x` or `0X` prefix -* Then a mantissa, specified as a sequence of hexadecimal digits, with an optional hexadecimal point (`.`) somewhere among them. - The mantissa represents a fraction in base 16 notation. +* Then a significand, specified as a sequence of hexadecimal digits, with an optional hexadecimal point (`.`) somewhere among them. + The significand represents a fraction in base 16 notation. * Then an optional exponent suffix consisting of: * `p` or `P` * Then an exponent specified as an decimal number with an optional leading sign (`+` or `-`). @@ -1206,24 +1211,24 @@ path: syntax/hex_float_literal.syntax.bs.include <div algorithm="mathematical value of hexadecimal floating point literal"> The mathematical value of a [=hexadecimal floating point literal=] is computed as follows: -* Compute *effective_mantissa* from *mantissa*: - * If *mantissa* has 16 or fewer [=mantissa/significant=] digits, then *effective_mantissa* is *mantissa*. +* Compute |effective_significand| from |significand|: + * If |significand| has 16 or fewer [=significand/significant=] digits, then |effective_significand| is |significand|. * Otherwise: - * Let |truncated_mantissa| be the same as |mantissa| except each digit to the + * Let |truncated_significand| be the same as |significand| except each digit to the right of the 16th significant digit is replaced with 0. - * Let |truncated_mantissa_next| be the same as |mantissa| except: + * Let |truncated_significand_next| be the same as |significand| except: * the 16th significant digit is incremented by 1, and carries are propagated to the left as needed needed to ensure each digit remains in the range 0 through `f`, and * each digit to the right of the 16th significant digit is replaced with 0. - * Set |effective_mantissa| to either |truncated_mantissa| or |truncated_mantissa_next|. + * Set |effective_significand| to either |truncated_significand| or |truncated_significand_next|. This is an implementation choice. -* The mathematical value of the literal is the mathematical value of |effective_mantissa| as a hexadecimal fraction, +* The mathematical value of the literal is the mathematical value of |effective_significand| as a hexadecimal fraction, multiplied by 2 to the power of the exponent. When no exponent is specified, an exponent of 0 is assumed. </div> -Note: The hexadecimal mantissa is truncated after 16 hexadecimal digits, preserving approximately 4 &times;16 &equals; 64 significant bits in the fraction. +Note: The hexadecimal significand is truncated after 16 hexadecimal digits, preserving approximately 4 &times;16 &equals; 64 significant bits in the fraction. When a [=numeric literal=] has a suffix, the literal denotes a value in a specific [=type/concrete=] [=scalar=] type. @@ -1251,8 +1256,8 @@ A [=shader-creation error=] results if: * A [=decimal floating point literal=] with a `f` or `h` suffix overflows the target type. * A [=floating point literal=] with a `h` suffix is used while the [=extension/f16|f16 extension=] is not enabled. -Note: The hexadecimal float value 0x1.00000001p0 requires 33 mantissa bits to be represented exactly, -but [=f32=] only has 23 explicit mantissa bits. +Note: The hexadecimal float value 0x1.00000001p0 requires 33 significand bits to be represented exactly, +but [=f32=] only has 23 explicit significand bits. Note: If you want to use an `f` suffix to force a hexadecimal float literal to be of type, the literal must also use a binary exponent. For example, write `0x1p0f`. In comparison, `0x1f` is a hexadecimal integer literal. @@ -1553,7 +1558,7 @@ Let |TemplateList| be a record type containing: **Output:** |DiscoveredTemplateLists|, a list of |TemplateList| records. -**Algorithm:** +**Procedure:** * Initialize |DiscoveredTemplateLists| to an empty list. * Initialize a |Pending| variable to be an empty stack of |UnclosedCandidate| records. * Initialize a |CurrentPosition| integer variable to 0. @@ -2519,7 +2524,7 @@ and with a numeric range and precision that may be larger than directly implemen WGSL defines two <dfn>abstract numeric types</dfn> for these evaluations: * The <dfn noexport>AbstractInt</dfn> type is the set of integers |i|, with -2<sup>63</sup> &leq; |i| &lt; 2<sup>63</sup>. * The <dfn noexport>AbstractFloat</dfn> type is the set of finite floating point numbers representable - in the [[!IEEE-754|IEEE-754]] binary64 (double precision) format. + in the [[!IEEE-754|IEEE-754]] [=ieee754/binary64=] (double precision) format. An evaluation of an expression in one of these types [=shader-creation error|must not=] overflow or produce an infinite or NaN value. @@ -2720,11 +2725,11 @@ Note: [=AbstractInt=] is also an integer type. ### Floating Point Types ### {#floating-point-types} The <dfn noexport>f32</dfn> type is the set of 32-bit floating point values of the -[[!IEEE-754|IEEE-754]] binary32 (single precision) format. +[[!IEEE-754|IEEE-754]] [=ieee754/binary32=] (single precision) format. See [[#floating-point-evaluation]] for details. The <dfn noexport>f16</dfn> type is the set of 16-bit floating point values of the -[[!IEEE-754|IEEE-754]] binary16 (half precision) format. It is a [=shader-creation error=] +[[!IEEE-754|IEEE-754]] [=ieee754/binary16=] (half precision) format. It is a [=shader-creation error=] if the [=f16=] type is used unless the program contains the `enable f16;` directive to enable the [=extension/f16|f16 extension=]. See [[#floating-point-evaluation]] for details. @@ -2733,7 +2738,7 @@ Each has a corresponding negative value. <table class='data'> <caption>Extreme values for floating point types</caption> <thead> - <tr><th>Type<th>Smallest positive denormal<th>Smallest positive normal<th>Largest positive finite<th>Largest finite power of 2 + <tr><th>Type<th>Smallest positive [=ieee754/subnormal=]<th>Smallest positive [=ieee754/normal=]<th>Largest positive finite<th>Largest finite power of 2 </thead> <tr><td rowspan=2>f32<td>1.40129846432481707092e-45f<td>1.17549435082228750797e-38f<td>3.40282346638528859812e+38f<td rowspan=2>0x1p+127f <tr><td>0x1p-149f<td>0x1p-126f<td>0x1.fffffep+127f @@ -4203,10 +4208,10 @@ Note: The channel transfer function for 8snorm maps {-128,...,127} to the floati <tr><td>8sint<td>8<td>signed integer |v| &isinv; {-128,...,127}<td>i32<td> |v|<td> max(-128, min(127, `T`)) <tr><td>16uint<td>16<td>unsigned integer |v| &isinv; {0,...,65535}<td>u32<td> |v|<td> min(65535, `T`) <tr><td>16sint<td>16<td>signed integer |v| &isinv; {-32768,...,32767}<td>i32<td> |v|<td> max(-32768, min(32767, `T`)) - <tr><td>16float<td>16<td>[[!IEEE-754|IEEE-754]] binary16 16-bit floating point value |v|, with 1 sign bit, 5 exponent bits, 10 mantissa bits<td>f32<td>|v|<td>`quantizeToF16(T)` + <tr><td>16float<td>16<td>[[!IEEE-754|IEEE-754]] [=ieee754/binary16=] 16-bit floating point value |v|<td>f32<td>|v|<td>`quantizeToF16(T)` <tr><td>32uint<td>32<td>32-bit unsigned integer value |v|<td>u32<td>|v|<td>`T` <tr><td>32sint<td>32<td>32-bit signed integer value |v|<td>i32<td>|v|<td>`T` - <tr><td>32float<td>32<td>[[!IEEE-754|IEEE-754]] binary32 32-bit floating point value |v|<td>f32<td>|v|<td>`T` + <tr><td>32float<td>32<td>[[!IEEE-754|IEEE-754]] [=ieee754/binary32=] 32-bit floating point value |v|<td>f32<td>|v|<td>`T` </table> The texel formats listed in the @@ -5213,7 +5218,7 @@ The type of a `const` expression [=shader-creation error|must=] [=type rules|res Note: [=type/abstract|Abstract types=] can be the inferred type of a const-expression. A const-expression |E| [=behavioral requirement|will=] be evaluated if and only if: -* |E| is [=top-level expression=], +* |E| is [=top-level expression=], * |E| is a [=subexpression=] of an expression |OuterE|, and |OuterE| [=behavioral requirement|will=] be evaluated, and evaluation of |OuterE| requires |E| to be evaluated, @@ -6006,16 +6011,33 @@ See [[#sync-builtin-functions]]. <td>|e1| `+` |e2| : |T| <td>Addition. [=Component-wise=] when |T| is a vector. + If |T| is a floating point type, the scalar [=domain=] is + the set of all pairs of [=extended reals=] (|x|,|y|) except: + * ([NINF],[PINF]) + * ([PINF],[NINF]) + <tr algorithm="subtraction"> <td>|e1| : |T|<br>|e2| : |T|<br>[ALLNUMERICDECL] <td>|e1| `-` |e2| : |T| <td>Subtraction [=Component-wise=] when |T| is a vector. + If |T| is a floating point type, the scalar [=domain=] is + the set of all pairs of [=extended reals=] (|x|,|y|) except: + * ([NINF],[NINF]) + * ([PINF],[PINF]) + <tr algorithm="multiplication"> <td>|e1| : |T|<br>|e2| : |T|<br>[ALLNUMERICDECL] <td>|e1| `*` |e2| : |T| <td>Multiplication. [=Component-wise=] when |T| is a vector. + If |T| is a floating point type, the scalar [=domain=] is + the set of all pairs of [=extended reals=] (|x|,|y|) except: + * (0,[NINF]) + * (0,[PINF]) + * ([NINF], 0) + * ([PINF], 0) + <tr algorithm="division"> <td>|e1| : |T|<br>|e2| : |T|<br>[ALLNUMERICDECL] <td>|e1| `/` |e2| : |T| @@ -6053,6 +6075,14 @@ See [[#sync-builtin-functions]]. |e1|&nbsp;=&nbsp;|q|&nbsp;&times;&nbsp;|e2|&nbsp;+&nbsp;|r|, where 0 &le; |r| &lt; |e2|. + If |T| is a floating point type, the scalar [=domain=] is + the set of all pairs of [=extended reals=] (|x|,|y|) except: + * (0,0) + * ([NINF],[NINF]) + * ([NINF],[PINF]) + * ([PINF],[NINF]) + * ([PINF],[PINF]) + <tr algorithm="Remainder"> <td>|e1| : |T|<br>|e2| : |T|<br>[ALLNUMERICDECL] <td>|e1| `%` |e2| : |T| @@ -6092,7 +6122,25 @@ See [[#sync-builtin-functions]]. |e1|&nbsp;=&nbsp;|q|&nbsp;&times;&nbsp;|e2|&nbsp;+&nbsp;|r|, where |q| is an integer and 0 &le; |r| &lt; |e2|. - If |T| is a floating point type, the result is equal to:<br> |e1| - |e2| * trunc(|e1| / |e2|) + If |T| is a floating point type, the result is equal to:<br> |e1| - |e2| * trunc(|e1| / |e2|). + + If |T| is a floating point type, the scalar [=domain=] is + the set of all pairs of [=extended reals=] (|x|,|y|) except: + + * Cases outside the domain of |x| / |y|: + * (0,0) + * ([NINF],[NINF]) + * ([NINF],[PINF]) + * ([PINF],[NINF]) + * ([PINF],[PINF]) + * Additional cases outside the domain of |y| * trunc(|x| / |y|): + * |y| is infinite, and |x| is finite, implying trunc(|x| / |y|) is 0. + * |y| is 0, and |x| is infinite, implying trunc(|x| / |y|) is infinite. + <!--- what about subtraction of infinities of the same sign: + |x| = [PINF], |y| * trunc(|x| / |y|) = [PINF] which happens when + |x| is finite and,... but already a contradition + |x| = [NINF], |y| * trunc(|x| / |y|) = [NINF] which happens when + |x| is finite and,... but already a contradition --> </table> @@ -10406,7 +10454,7 @@ offset |k| of a host-shared buffer, then: Note: WGSL does not have a [=type/concrete=] [=64-bit integer=] type. -A value |V| of type [=f32=] is represented in [[!IEEE-754|IEEE-754]] binary32 format. +A value |V| of type [=f32=] is represented in [[!IEEE-754|IEEE-754]] [=ieee754/binary32=] format. It has one sign bit, 8 exponent bits, and 23 fraction bits. When |V| is placed at byte offset |k| of host-shared buffer, then: * Byte |k| contains bits 0 through 7 of the fraction. @@ -10416,7 +10464,7 @@ When |V| is placed at byte offset |k| of host-shared buffer, then: * Bits 0 through 6 of byte |k|+3 contain bits 1 through 7 of the exponent. * Bit 7 of byte |k|+3 contains the sign bit. -A value |V| of type [=f16=] is represented in [[!IEEE-754|IEEE-754]] binary16 format. +A value |V| of type [=f16=] is represented in [[!IEEE-754|IEEE-754]] [=ieee754/binary16=] format. It has one sign bit, 5 exponent bits, and 10 fraction bits. When |V| is placed at byte offset |k| of host-shared buffer, then: * Byte |k| contains bits 0 through 7 of the fraction. @@ -12118,31 +12166,135 @@ If one of these functions is called in non-uniform control flow, then the result ## Floating Point Evaluation ## {#floating-point-evaluation} -WGSL follows the [[!IEEE-754|IEEE-754]] standard for floating point computation with -the following differences: -* No rounding mode is specified. An implementation may round a value up or down. -* No floating point exceptions are generated. - * A floating point operation in WGSL [=behavioral requirement|will=] produce an intermediate result +WGSL floating point features are based on the [[!IEEE-754|IEEE-754]] standard for floating point, +but with reduced functionality reflecting the compromises made by GPUs, and with some additional guardrails for portability. + +### Overview of IEEE-754 ### {#overview-of-ieee-754} + +WGSL floating point types are based on [[!IEEE-754|IEEE-754]] binary floating point types. + +An [[!IEEE-754|IEEE-754]] binary floating point type approximates the [=extended real=] number line as follows: +* The type has a finite set of values, including distinct categories for: + * Positive and negative rational numbers. + * Each of these is finite, and is either [=ieee754/normal=] or [=ieee754/subnormal=] as defined below. + * [=Positive infinity=] and [=negative infinity=]. + * NaN values. A <dfn noexport>NaN</dfn>, short for "Not a Number", represents the result of an [=ieee754/invalid operation=]. + IEEE-754 requires both signalling and quiet NaNs, for distinguishing cases related to error reporting. + WGSL does not require such error reporting, and may yield [=indeterminate values=] instead of NaNs. + See [[#differences-from-ieee754]]. +* The type supports operations including: + * Basic arithmetic such as: addition (`+`), subtraction (`-`), mutiplication (`*`), and division (`/`). + * Conversion to and from other numeric types. + * Built-in functions such as: [[#max-float-builtin|max]], [[#sqrt-builtin|sqrt]], [[#cos-builtin|cos]] + * <div class=note>Note: Infinities are ordinary participants in most operations. For example adding [PINF] to 5 produces [PINF].</div> +* The type has a bit representation characterized by: + * A fixed bit width, where each value's bit representation has three contiguous bit fields, ordered from most significant bit to least: + * A 1-bit <dfn dfn-for="ieee754" noexport>sign field</dfn>. + * A fixed-width <dfn dfn-for="ieee754" noexport>exponent field</dfn>. + * A fixed-width <dfn dfn-for="ieee754" noexport>trailing significand field</dfn>. + * An integer-valued <dfn dfn-for="ieee754" noexport>exponent bias</dfN> related to interpretation of the [=ieee754/exponent field=]. + +The IEEE-754 floating point types of interest are: + +* <dfn dfn-for="ieee754" noexport>binary16</dfn>: + * [=ieee754/exponent field=] width 5 + * [=ieee754/trailing significand field=] width 10 + * [=ieee754/exponent bias=] 15 +* <dfn dfn-for="ieee754" noexport>binary32</dfn>: + * [=ieee754/exponent field=] width 8 + * [=ieee754/trailing significand field=] width 23 + * [=ieee754/exponent bias=] 127 +* <dfn dfn-for="ieee754" noexport>binary64</dfn>: + * [=ieee754/exponent field=] width 11 + * [=ieee754/trailing significand field=] width 52 + * [=ieee754/exponent bias=] 1023 + +The following algorithm maps a bit representation of a floating point value to its corresponding [=extended real=] value, or NaN: +<blockquote algorithm="floating point interpretation of bits"> +**Algorithm:** <dfn noexport>Floating point interpretation of bits</dfn> + +**Input**: <var>Bits</var>, a bit representation for a value in a binary floating point type. + +**Output:** |F|, the floating point value represented by |Bits|. + +**Procedure:** +* Let <var>bias</var> be the [=ieee754/exponent bias=] for the type. +* Let <var>tsw</var> be the bit width of the [=ieee754/trailing significand field=] for the type. +* Partition <var>Bits</var> into the [=ieee754/sign field=], [=ieee754/exponent field=], and the [=ieee754/trailing significand field=]. +* Let <var>Sign</var>, |E|, and |T| be the interpretations of those respective fields as unsigned integers. +* If the [=ieee754/exponent field=] is all 1 bits, then: + * The result |F| &equals; [PINF] when <var>Sign</var> &equals; 0 and |T| &equals; 0. + * The result |F| &equals; [NINF] when <var>Sign</var> &equals; 1 and |T| &equals; 0. + * The result |F| is a [=NaN=] when |T| &ne; 0. +* Otherwise, if the [=ieee754/exponent field=] is all 0 bits, then: + * The result |F| &equals; (&minus; 1)<sup><var>Sign</var></sup> &times; 2<sup>&minus;<var>bias</var></sup> &times; |T| &times; 2<sup>&minus;<var>tsw</var>&plus;1</sup>. + * If |T| = 0, then the value is a zero. + * Each floating point type has both a positive zero and a negative zero. + A <dfn noexport>negative zero</dfn> is a zero value with `1` for its [=ieee754/sign field|sign=] bit. + Negative zero and positive zero compare as equal. + IEEE-754 uses negative zero to indicate certain edge cases unimportant to WGSL. + * If |T| &ne; 0, then the value |F| is <dfn dfn-for="ieee754" noexport>subnormal</dfn>. + (<dfn noexport>Denormalized</dfn> is a synonym for subnormal.) +* Otherwise, the [=ieee754/exponent field=] is neither all 1 bits, nor all 0 bits: + * The result |F| &equals; (&minus; 1)<sup><var>Sign</var></sup> &times; 2<sup>(|E|&minus;<var>bias</var>)</sup> &times; ( 1 + |T| &times; 2<sup>&minus;<var>tsw</var></sup>). + * The value |F| is <dfn dfn-for="ieee754" noexport>normal</dfn>. + +</blockquote> + + +The <dfn>domain</dfn> of a floating point operation is the set of [=extended real=] number inputs for which the operation is well defined. + +* For example, the domain of the mathematical function &radic; is the interval [0,[PINF]]: &radic; is not well defined for inputs less than zero. +* When applied to an input *inside* its [=domain=], an operation is defined in terms of an infinitely precise [=extended real=] <dfn>intermediate result</dfn>, + which is then converted to a floating point result, via [=rounding=]. +* When an operation is evaluated *outside* its [=domain=], + the default exception handling rules of IEEE-754 require an implementation to generate an [=ieee754/exception=] and yield a [=NaN=] value. + In contrast, WGSL does not mandate floating point exceptions, and may instead yield an [=indeterminate value=]. See [[#differences-from-ieee754]]. + +<dfn noexport>Rounding</dfn> maps an [=extended real=] value |x| to a value <var>x'</var> in the floating point type. +When |x| is in the floating point type, then rounding maps |x| to itself: |x| &equals; <var>x'</var>. +Rounding may [=ieee754/overflow=] when |x| is outside the finite range of the type. +Otherwise <var>x'</var> is the either the lowest floating point value above |x|, or +the highest floating point value below |x|; +a <dfn dfn-for="ieee754">rounding mode</dfn> determines which one is chosen. + +Generally, an operation with a [=NaN=] input will yield a [=NaN=] output. +Exceptions include: +* A NaN is never equal to, less than, or greater than any other floating point value. Such comparisons yield false. +* For `min(x,y)` and `max(x,y)` operations, if one of the inputs is NaN, the result is the other input. + +IEEE-754 defines five kinds of <dfn dfn-for="ieee754">exceptions</dfn>: + +* <dfn dfn-for="ieee754">Invalid operation</dfn>. + This occurs when an operation is evaluated on [=extended real=] inputs outside its [=domain=]. + Such operations yield a NaN. + Examples of invalid operations are 0 &times; [INF], and `sqrt`(&minus;1). +* <dfn dfn-for="ieee754">Division by zero</dfn>. + This occurs when an operation on finite operands is defined as having an exact infinite result. + Examples are 1 &divide; 0, and log(0). +* <dfn dfn-for="ieee754">Overflow</dfn>. This occurs when an [=intermediate result=] exceeds the finite range of the type. See [[#floating-point-overflow]]. +* <dfn dfn-for="ieee754" noexport>Underflow</dfn>. This occurs when the [=intermediate result=] or the rounded result is [=ieee754/subnormal=]. +* <dfn dfn-for="ieee754" noexport>Inexact</dfn>. This occurs when the rounded result is different from the [=intermediate result=], + or when overflow occurs. + +### Differences from IEEE-754 ### {#differences-from-ieee754} + +WGSL follows the [[!IEEE-754|IEEE-754]] standard, but with the following differences: +* No [=ieee754/rounding mode=] is specified. An implementation may round an [=intermediate result=] up or down. +* No floating point [=ieee754/exceptions=] are generated. + * A floating point operation in WGSL [=behavioral requirement|will=] produce an [=intermediate result=] according to IEEE-754 rules, but exceptions mandated by IEEE-754 will map to different behaviors depending on whether the expression is a [=const-expression=], an [=override-expression=], or a [=runtime expression=]. - * IEEE-754 defines five kinds of exceptions: - * Invalid operation. These operations yield a NaN. An example of an invalid operation is 0 &times; [INF]. - * Division by zero. This occurs when an operation on finite operands is defined as having an exact infinite result. - Examples are 1 &divide; 0, and log(0). - * Overflow. See [[#floating-point-overflow]]. - * Underflow. This occurs when the rounded or unrounded result is subnormal. - * Inexact. This occurs when the rounded result is different from the intermediate result, - or when overflow occurs. * Consider an operation on finite operands. The operation produces overflow, infinity, or a NaN if and only if IEEE-754 would require the - operation to signal an invalid operation, division-by-zero, or overflow exception. + operation to signal an [=ieee754/overflow=], [=ieee754/invalid operation=], or [=ieee754/division by zero=] exception. * Signaling NaNs may not be generated. Any signaling NaN may be converted to a quiet NaN. -* Overflow, infinities, and NaNs generated before runtime are errors. +* Overflow, infinities, and NaNs generated before [=shader execution start|runtime=] [=behavioral requirement|will=] generate errors. * [=Const-expressions=] and [=override-expressions=] over finite values [=behavioral requirement|will=] generate overflow, infinities, and NaNs - as intermediate values, following IEEE-754 rules. + as [=intermediate result=] values, following IEEE-754 rules. * Note: This rule requires implementations to reliably detect overflow, infinities, and NaNs to within accuracy limits for these kinds of expressions, so that errors can be generated consistently. * A [=shader-creation error=] results if any [=const-expression=] of @@ -12150,22 +12302,27 @@ the following differences: * A [=pipeline-creation error=] results if any [=override-expression=] of floating-point type overflows or evaluates to NaN or infinity. * Implementations may assume that overflow, infinities, and NaNs are not present at runtime. - * In such an implementation, if the intermediate result of evaluating a [=runtime expression=] overflows, + * In such an implementation, if the [=intermediate result=] of evaluating a [=runtime expression=] overflows, or yields infinity or a NaN, the final result [=behavioral requirement|will=] be an [=indeterminate value=] of the target type. * Note: This means some functions (e.g. `min` and `max`) may not return the expected result due to optimizations about the presence of NaNs and infinities. -* Implementations may ignore the sign of a zero. +* Implementations may ignore the [=ieee754/sign field=] of a zero. That is, a zero with a positive sign may behave like a zero a with a negative sign, and vice versa. -* To <dfn noexport title="flushed to zero">flush to zero</dfn> is to replace a denormalized value for a floating point type +* To <dfn noexport title="flushed to zero">flush to zero</dfn> is to replace a [=ieee754/subnormal=] value for a floating point type with a zero value of that type. * Any inputs or outputs of operations listed in [[#floating-point-accuracy]] may be flushed to zero. - * Additionally, intermediate values of operations listed in + * Additionally, [=intermediate result=] values of operations listed in [[#bit-reinterp-builtin-functions]], [[#pack-builtin-functions]], or [[#unpack-builtin-functions]] may be flushed to zero. - * Other operations are required to preserve denormalized numbers. + * Other operations are required to preserve [=ieee754/subnormal=] numbers. * The accuracy of operations is given in [[#floating-point-accuracy]]. +* Some built-in functions in WGSL have different semantics from the corresponding IEEE-754 operation. + Such cases are listed as needed at the definition of the WGSL built-in function. + + For example the WGSL [[#fma-builtin]] function may expand to an ordinary multiply (including a rounding step) and an add (and another rounding step), + while the IEEE-754 `fusedMultiplyAdd` operation requires that only final rounding step occurs. ### Floating Point Overflow ### {#floating-point-overflow} @@ -12178,8 +12335,8 @@ For a floating point type *T*, define *MAX(T)* as the largest positive finite va and 2<sup>*EMAX(T)*</sup> as the largest power of 2 representable by *T*. In particular, EMAX([=f32=]) = 127, and EMAX([=f16=]) = 15. -Let *X* be an infinite-precision intermediate result from a floating point computation. -The final value of the expression is determined in two stages, via intermediate values *X'* and *X''* as follows: +Let *X* be an infinitely precise [=intermediate result=] from a floating point computation. +The final value of the expression is determined in two stages, via [=intermediate result=] values *X'* and *X''* as follows: From *X*, compute *X'* in *T* by rounding: * If *X* is in the finite range of *T* then *X'* is the result of rounding *X* up or down. @@ -12211,8 +12368,8 @@ The <dfn>correctly rounded</dfn> result of the operation for floating point type </div> -That is, the result may be rounded up or down: -WGSL does not specify a rounding mode. +That is, the result may be [=rounding|rounded=] up or down: +WGSL does not specify a [=ieee754/rounding mode=]. Note: Floating point types include positive and negative infinity, so the correctly rounded result may be finite or infinite. @@ -12237,8 +12394,8 @@ possibilities: The given expression is only one valid implementation of the function. A WebGPU implementation may implement the operation differently, with better accuracy or with greater tolerance for extreme inputs. - Additionally, an implementation may treat intermediate results as subject to the rules for - floating-point evaluation (e.g. they may be rounded and/or [=flush to zero|flushed-to-zero=]). + Additionally, an implementation may treat [=intermediate result=] as subject to the rules for + floating-point evaluation (e.g. they may be [=rounding|rounded=] and/or [=flush to zero|flushed-to-zero=]). When the accuracy for an operation is specified over an input range, the accuracy is undefined for input values outside that range. @@ -12303,13 +12460,13 @@ the rules in [[#floating-point-overflow]] apply. The infinitely precise result is computed as either `min(max(x,low),high)`, or with a median-of-3-values formulation. These may differ when `low > high`. - If `x` and either `low` or `high` are denormalized, the result may be any of the denormalized values. - This follows from the possible results from the `min` and `max` functions on denormalized inputs. + If `x` and either `low` or `high` are [=ieee754/subnormal=], the result may be any of the [=ieee754/subnormal=] values. + This follows from the possible results from the `min` and `max` functions on [=ieee754/subnormal=] inputs. <tr><td>`cos(x)` <td>Absolute error at most 2<sup>-11</sup> when `x` is in the interval [-&pi;, &pi;] <td>Absolute error at most 2<sup>-7</sup> when `x` is in the interval [-&pi;, &pi;] <tr><td>`cosh(x)`<td colspan=2 style="text-align:left;">Inherited from `(exp(x) + exp(-x)) * 0.5` - <tr><td>`cross(x, y)`<td colspan=2 style="text-align:left;">Inherited from `(x[i] * y[j] - x[j] * y[i])` + <tr><td>`cross(x, x)`<td colspan=2 style="text-align:left;">Inherited from `(x[i] * y[j] - x[j] * y[i])` where `i` &ne; `j` <tr><td>`degrees(x)`<td colspan=2 style="text-align:left;">Inherited from `x * 57.295779513082322865` <tr><td>`determinant(m:mat2x2<T>)`<br> `determinant(m:mat3x3<T>)`<br> @@ -12354,18 +12511,18 @@ the rules in [[#floating-point-overflow]] apply. <td>Absolute error at most 2<sup>-7</sup> when `x` is in the interval [0.5, 2.0].<br> 3 ULP when `x` is outside the interval [0.5, 2.0].<br> <tr><td>`max(x, y)`<td colspan=2 style="text-align:left;">Correctly rounded - <p>If both `x` and `y` are denormalized, the result may be either input. + <p>If both `x` and `y` are [=ieee754/subnormal=], the result may be either input. <tr><td>`min(x, y)`<td colspan=2 style="text-align:left;">Correctly rounded. - <p>If both `x` and `y` are denormalized, the result may be either input. + <p>If both `x` and `y` are [=ieee754/subnormal=], the result may be either input. <tr><td>`mix(x, y, z)`<td colspan=2 style="text-align:left;">Inherited from `x * (1.0 - z) + y * z` <tr><td>`modf(x)`<td colspan=2 style="text-align:left;">Correctly rounded <tr><td>`normalize(x)`<td colspan=2 style="text-align:left;">Inherited from `x / length(x)` - <tr><td>`pack4x8snorm(x)`<td colspan=2 style="text-align:left;">Correctly rounded intermediate value. Correct result. - <tr><td>`pack4x8unorm(x)`<td colspan=2 style="text-align:left;">Correctly rounded intermediate value. Correct result. - <tr><td>`pack2x16snorm(x)`<td colspan=2 style="text-align:left;">Correctly rounded intermediate value. Correct result. - <tr><td>`pack2x16unorm(x)`<td colspan=2 style="text-align:left;">Correctly rounded intermediate value. Correct result. - <tr><td>`pack2x16float(x)`<td colspan=2 style="text-align:left;">Correctly rounded intermediate value. Correct result. + <tr><td>`pack4x8snorm(x)`<td colspan=2 style="text-align:left;">Correctly rounded [=intermediate result=] value. Correct result. + <tr><td>`pack4x8unorm(x)`<td colspan=2 style="text-align:left;">Correctly rounded [=intermediate result=] value. Correct result. + <tr><td>`pack2x16snorm(x)`<td colspan=2 style="text-align:left;">Correctly rounded [=intermediate result=] value. Correct result. + <tr><td>`pack2x16unorm(x)`<td colspan=2 style="text-align:left;">Correctly rounded [=intermediate result=] value. Correct result. + <tr><td>`pack2x16float(x)`<td colspan=2 style="text-align:left;">Correctly rounded [=intermediate result=] value. Correct result. <tr><td>`pow(x, y)`<td colspan=2 style="text-align:left;">Inherited from `exp2(y * log2(x))` <tr><td>`quantizeToF16(x)`<td colspan=2 style="text-align:left;">Correctly rounded @@ -12421,9 +12578,9 @@ The accuracy of an [=AbstractFloat=] operation is as follows: depends critically on the underlying floating point type. The [=ULP=] for an [=AbstractFloat=] value assumes -AbstractFloat is identical to the [[!IEEE-754|IEEE-754]] binary64 type. +AbstractFloat is identical to the [[!IEEE-754|IEEE-754]] [=ieee754/binary64=] type. -One [=ULP=] for an f32 value is 2<sup>29</sup> times larger than 1 ULP for an IEEE 754 binary64 value, +One [=ULP=] for an f32 value is 2<sup>29</sup> times larger than 1 ULP for an IEEE-754 binary64 value, since the significand in the binary64 format is 29 bits longer than the significand in the f32 type. For example, suppose the true result value of an operation is |x|, but it is computed as <var>x'</var>. @@ -12441,7 +12598,7 @@ expression such that the answer is the same if computed exactly. For example: However, the result may not be the same when computed in floating point. The reassociated result may be inaccurate due to approximation, or may trigger -an overflow or NaN when computing intermediate results. +an overflow or NaN when computing [=intermediate results=]. An implementation may reassociate operations. @@ -12457,7 +12614,7 @@ In this section, a floating point type may be any of: * A hypothetical type corresponding to a binary format defined by the [[!IEEE-754|IEEE-754]] floating point standard. -Note: Recall that the [=f32=] WGSL type corresponds to the IEEE-754 binary32 format, and the [=f16=] WGSL type corresponds to the IEEE-754 binary16 format. +Note: Recall that the [=f32=] WGSL type corresponds to the IEEE-754 [=ieee754/binary32=] format, and the [=f16=] WGSL type corresponds to the IEEE-754 [=ieee754/binary16=] format. The <dfn noexport>scalar floating point to integral conversion</dfn> algorithm is: <blockquote algorithm="convert a float value to an integral value"> @@ -12468,6 +12625,8 @@ To convert a floating point scalar value |X| to an [=integer scalar=] type |T|: </blockquote> Note: In other words, floating point to integer conversion rounds toward zero, then saturates in the target type. +This saturation requirement the result is one place where WGSL mandates a meaningful result, +but where [[!IEEE-754|IEEE-754]] mandates an invalid operation exception and a NaN result. <div class=note><span class=marker>Note:</span> For example: * 3.9f converted to [=u32=] is 3u @@ -12484,6 +12643,7 @@ The <dfn noexport>numeric scalar conversion to floating point</dfn> algorithm is When converting a [=numeric scalar=] value to a floating point type: * If the original value is exactly representable in the destination type, then the result is that value. * Additionally, if the original value is zero and of [=integer scalar=] type, then the resulting value has a zero sign bit. +* If the original value is a NaN for the source type, then the result is a NaN in the destination type. * Otherwise, the original value is not exactly representable. * If the original value is different from but lies between two adjacent finite values representable in the destination type, then the result is one of those two values. @@ -12494,18 +12654,17 @@ When converting a [=numeric scalar=] value to a floating point type: * A [=pipeline-creation error=] results if the original expression is an [=override-expression=]. * Otherwise the conversion proceeds as follows: 1. Set |X| to the original value. - 2. If the source type is a floating point type with more mantissa bits than the destination type, - the extra mantissa bits of the source value *may* be discarded (i.e. treated as if they are 0). + 2. If the source type is a floating point type with more significand bits than the destination type, + the extra significand bits of the source value *may* be discarded (i.e. treated as if they are 0). Update |X| accordingly. 3. If |X| is the most-positive or most-negative normal value of the destination type, then the result is |X|. 4. Otherwise, the result is the infinity value of the destination type, with the same sign as |X|. - * Otherwise, if the original value is a NaN for the source type, then the result is a NaN in the destination type. </blockquote> NOTE: An integer value may lie between two adjacent representable floating point values. In particular, the [=f32=] type uses 23 explicit fractional bits. -Additionally, when the floating point value is in the normal range (the exponent is neither extreme value), then the mantissa is +Additionally, when the floating point value is in the normal range (the exponent is neither extreme value), then the significand is the set of fractional bits together with an extra 1-bit at the most significant position at bit position 23. Then, for example, integers 2<sup>28</sup> and 1+2<sup>28</sup> both map to the same floating point value: the difference in the least significant 1 bit is not representable by the floating point format. @@ -12515,7 +12674,45 @@ Note: The original value is always within range of the destination type when the original type is one of [=i32=] or [=u32=] and the destination type is [=f32=]. Note: The original value is always within range of the destination type when -the source type is a floating point type with fewer exponent and mantissa bits than the target floating point type. +the source type is a floating point type with fewer exponent and significand bits than the target floating point type. + +### Domains of Floating Point Expressions and Built-in Functions ### {#domains-of-floating-point-expressions-and-builtins} + +Previous sections describe the expected behavior when a floating point expression is evaluated +outside its [=domain=]. + +Sections [[#arithmetic-expr]] and [[#numeric-builtin-functions]] define the [=domains=] for floating point expressions and built-in functions, respectively. +If no restriction is listed for a given operation, then the domain is total: the domain includes all finite and infinite inputs. +Otherwise an explicit domain is listed. + +In many cases where a WGSL operation corresponds to an operation defined by [[!IEEE-754|IEEE-754]], they will have the same domain. +For example the both the WGSL and IEEE-754 `acos` operations have a domain of [&minus;1,1]. + +For a [=component-wise=] WGSL operation with an explicitly listed domain, only the scalar case is described. The vector case is inferred from the component-wise semantics. + +Some WGSL operations may be implemented in terms of other WGSL expressions. +Section [[#floating-point-accuracy]] lists these as having [=inherited from=] accuracy. +When listing the domain for one of these operations, either: + +* The domain is explicitly described, or +* The domain is stated as <dfn noexport>implied from linear terms</dfn>, meaning the domain + is derived by: + * Assuming the original operation was replaced by the "inherited from" expression, which is a combination of floating point addition, subtraction, and mulitiplication operations. + * Applying and combining the domain restrictions for those remaining operations over the given parameters. + +For example: The `dot(a,b)` function over 2-element vectors |a| and |b| has its accuracy [=inherited from=] the expression + |a|[0] * |b|[0] + |a|[1] * |b|[1]. +This uses two floating point multiplications, and one floating point addition. +* Floating point multiplication is well defined over the extended reals except when one operand is zero and the other is infinite. +* Floating point addition is well defined except when the two operands are infinites of opposite sign. +* Therefore, the domain is all pairs of extended real two-element vectors |a| and |b| except: + * Implied from the multiplications: + * |a|[i] is zero and |b|[i] is infinite. + * |a|[i] is infinite and |b|[i] is zero. + * Implied from the addition: + * |a|[0] &times; |b|[0] is [PINF] and |a|[1] &times; |b|[1] is [PINF] + * |a|[0] &times; |b|[0] is [NINF] and |a|[1] &times; |b|[1] is [NINF] + # Keyword and Token Summary # {#grammar} @@ -12851,7 +13048,7 @@ specify the component type; the component type is inferred from the constructor If `T` is [=i32=], this is an identity operation.<br> If `T` is [=u32=], this is a reinterpretation of bits (i.e. the result is the unique value in [=i32=] that has the same bit pattern as `e`).<br> - If `T` is a [[#floating-point-types|floating point type]], `e` is [=scalar floating point to integral conversion|converted=] to [=i32=], rounding towards zero.<br> + If `T` is a [[#floating-point-types|floating point type]], `e` is [=scalar floating point to integral conversion|converted=] to [=i32=], [=rounding=] towards zero.<br> If `T` is [=bool=], the result is `1i` if `e` is `true` and `0i` otherwise.<br> If `T` is an [=AbstractInt=], this is an identity operation if `e` can be represented in [=i32=], otherwise it produces a [=shader-creation error=]. </table> @@ -13307,7 +13504,7 @@ specify the component type; the component type is inferred from the constructor If `T` is [=u32=], this is an identity operation.<br> If `T` is [=i32=], this is a reinterpretation of bits (i.e. the result is the unique value in [=u32=] that has the same bit pattern as `e`).<br> - If `T` is a [[#floating-point-types|floating point type]], `e` is [=scalar floating point to integral conversion|converted=] to [=u32=], rounding towards zero.<br> + If `T` is a [[#floating-point-types|floating point type]], `e` is [=scalar floating point to integral conversion|converted=] to [=u32=], [=rounding=] towards zero.<br> If `T` is [=bool=], the result is `1u` if `e` is `true` and `0u` otherwise.<br> If `T` is [=AbstractInt=], this is an identity operation if the `e` can be represented in [=u32=], otherwise it produces a [=shader-creation error=]. <tr><td> @@ -13837,10 +14034,8 @@ fn num_point_lights() -> u32 { [=Component-wise=] when `T` is a vector. <tr> - <td> - <td> - -Note: The result is not mathematically meaningful when `abs(e)` &gt; 1. + <td>Scalar [=domain=] + <td>Interval [&minus;1, 1] </table> ### `acosh` ### {#acosh-builtin} @@ -13861,11 +14056,8 @@ Note: The result is not mathematically meaningful when `abs(e)` &gt; 1. [=Component-wise=] when `T` is a vector. <tr> - <td> - <td> - -Note: The result is not mathematically meaningful when `x` &lt; 1. - + <td>Scalar [=domain=] + <td>Interval [1, [PINF]] </table> ### `asin` ### {#asin-builtin} @@ -13886,10 +14078,8 @@ Note: The result is not mathematically meaningful when `x` &lt; 1. [=Component-wise=] when `T` is a vector. <tr> - <td> - <td> - -Note: The result is not mathematically meaningful when `abs(e)` &gt; 1. + <td>Scalar [=domain=] + <td>Interval [&minus;1, 1] </table> ### `asinh` ### {#asinh-builtin} @@ -13909,6 +14099,11 @@ Note: The result is not mathematically meaningful when `abs(e)` &gt; 1. That is, approximates `a` such that `sinh`(`y`) = `a`. [=Component-wise=] when `T` is a vector. + + <!-- No restriction on scalar domain. + log(x + sqrt( x * x + 1)) requires + x + sqrt( x * x + 1) >= 0 + But the sqrt term is always at least as large as abs(x). QED --> </table> ### `atan` ### {#atan-builtin} @@ -13948,10 +14143,8 @@ Note: The result is not mathematically meaningful when `abs(e)` &gt; 1. [=Component-wise=] when `T` is a vector. <tr> - <td> - <td> - -Note: The result is not mathematically meaningful when `abs(t)` &ge; 1. + <td>Scalar [=domain=] + <td>Interval [&minus;1, 1] </table> @@ -13980,7 +14173,7 @@ Note: The result is not mathematically meaningful when `abs(t)` &ge; 1. <div class=note> <span class=marker>Note:</span> The error in the result is unbounded: - * When `abs(x)` is very small, e.g. subnormal for its type, + * When `abs(x)` is very small, e.g. [=ieee754/subnormal=] for its type, * At the origin (`x`,`y`) = (0,0), or * When `y` is subnormal or infinite. @@ -14048,6 +14241,9 @@ Note: The result is not mathematically meaningful when `abs(t)` &ge; 1. <td>Description <td>Returns the cosine of `e`, where `e` is in radians. [=Component-wise=] when `T` is a vector. + <tr> + <td>Scalar [=domain=] + <td>Interval ([NINF], [PINF]) </table> ### `cosh` ### {#cosh-builtin} @@ -14068,6 +14264,10 @@ Note: The result is not mathematically meaningful when `abs(t)` &ge; 1. but not necessarily computed that way. [=Component-wise=] when `T` is a vector + + <!-- No restriction on domain from the inehrited formula: + (exp(x) + (exp(-x))) * 0.5 + No danger from subtracting infinites of the same sign. --> </table> ### `countLeadingZeros` ### {#countLeadingZeros-builtin} @@ -14132,8 +14332,8 @@ Note: The result is not mathematically meaningful when `abs(t)` &ge; 1. <td style="width:10%">Overload <td class="nowrap"> <xmp highlight=wgsl> - @const @must_use fn cross(e1: vec3<T>, - e2: vec3<T>) -> vec3<T> + @const @must_use fn cross(a: vec3<T>, + b: vec3<T>) -> vec3<T> Parameterization @@ -14141,6 +14341,13 @@ Note: The result is not mathematically meaningful when `abs(t)` ≥ 1. Description Returns the cross product of `e1` and `e2`. + + Domain + [=Implied from linear terms=] given by a possible implementation: + + * |a|[1] × |b|[2] − |a|[2] × |b|[1] + * |a|[2] × |b|[0] − |a|[0] × |b|[2] + * |a|[0] × |b|[1] − |a|[1] × |b|[0] ### `degrees` ### {#degrees-builtin} @@ -14174,6 +14381,9 @@ Note: The result is not mathematically meaningful when `abs(t)` ≥ 1. Description Returns the determinant of `e`. + + Domain + [=Implied from linear terms=] in a standard mathematical definition of the determinant. ### `distance` ### {#distance-builtin} @@ -14191,6 +14401,10 @@ Note: The result is not mathematically meaningful when `abs(t)` ≥ 1. Description Returns the distance between `e1` and `e2` (e.g. `length(e1 - e2)`). + + The [=domain=] is all vectors (|e1|,|e2|) where the subtraction |e1|−|e2| is valid. + That is, the set of all vectors except where `e1[i]` and `e2[i]` are the same infinite value, + for some component `i`. ### `dot` ### {#dot-builtin} @@ -14208,6 +14422,9 @@ Note: The result is not mathematically meaningful when `abs(t)` ≥ 1. Description Returns the dot product of `e1` and `e2`. + + Domain + [=Implied from linear terms=] of the sum over terms |e1|[i] × |e2|[i]. ### `dot4U8Packed` ### {#dot4U8Packed-builtin} @@ -14360,6 +14577,9 @@ Note: The result is not mathematically meaningful when `abs(t)` ≥ 1. Description Returns `e1` if `dot(e2, e3)` is negative, and `-e1` otherwise. + + Domain + The domain restrictions arise from the `dot(e2,e3)` operation: they are [=implied from linear terms=] of the sum over terms |e2|[i] × |e3|[i]. ### `firstLeadingBit` (signed) ### {#firstLeadingBit-signed-builtin} @@ -14474,13 +14694,16 @@ the sign bit appears in the most significant bit position. Note: The name `fma` is short for "fused multiply add". Note: - The [[!IEEE-754|IEEE-754]] `fusedMultiplyAdd` operation computes the intermediate results - as if with unbounded range and precision, and only the final result is rounded - to the destination type. + The [[!IEEE-754|IEEE-754]] `fusedMultiplyAdd` operation computes the [=intermediate results=] + as if with unbounded range and precision, and only the final result is [=rounding|rounded=] + to a value in the destination type. However, the [[#floating-point-accuracy]] rule for `fma` allows an implementation which performs an ordinary multiply to the target type followed by an ordinary addition. - In this case the intermediate values may overflow or lose accuracy, and the overall + In this case the [=intermediate result=] values may overflow or lose accuracy, and the overall operation is not "fused" at all. + + Domain + [=Implied from linear terms=] of the expressions |e2| × |e2| + |e3|. ### `fract` ### {#fract-builtin} @@ -14525,7 +14748,7 @@ For example, if `e` is a very small negative number, then `fract(e)` may be 1.0. * When `e` is zero, the fraction is zero. * When `e` is non-zero and normal, `e` = `fraction * 2``exponent`, where the fraction is in the range [0.5, 1.0) or (-1.0, -0.5]. - * Otherwise, `e` is denormalized, NaN, or infinite. The result fraction and exponent are [=indeterminate values=]. + * Otherwise, `e` is [=ieee754/subnormal=], NaN, or infinite. The result fraction and exponent are [=indeterminate values=]. Returns the `__frexp_result_f32` built-in structure, defined as follows: ```wgsl @@ -14573,7 +14796,7 @@ but a value may infer the type. * When `e` is zero, the fraction is zero. * When `e` is non-zero and normal, `e` = `fraction * 2``exponent`, where the fraction is in the range [0.5, 1.0) or (-1.0, -0.5]. - * Otherwise, `e` is denormalized, NaN, or infinite. The result fraction and exponent are [=indeterminate values=]. + * Otherwise, `e` is [=ieee754/subnormal=], NaN, or infinite. The result fraction and exponent are [=indeterminate values=]. Returns the `__frexp_result_f16` built-in structure, defined as if as follows: ```wgsl @@ -14610,7 +14833,7 @@ but a value may infer the type. * When `e` is zero, the fraction is zero. * When `e` is non-zero and normal, `e` = `fraction * 2``exponent`, where the fraction is in the range [0.5, 1.0) or (-1.0, -0.5]. - * When `e` is denormalized, the fraction and exponent are have unbounded error. + * When `e` is [=ieee754/subnormal=], the fraction and exponent are have unbounded error. The fraction may be any AbstractFloat value, and the exponent may be any AbstractInt value. Note: AbstractFloat expressions resulting in infinity or NaN cause a [=shader-creation error=]. @@ -14735,7 +14958,7 @@ but a value may infer the type. * When `ei` is zero, the fraction is zero. * When `ei` is non-zero and normal, `ei` = `fraction * 2``exponent`, where the fraction is in the range [0.5, 1.0) or (-1.0, -0.5]. - * When `ei` is denormalized, the fraction and exponent are have unbounded error. + * When `ei` is [=ieee754/subnormal=], the fraction and exponent are have unbounded error. The fraction may be any AbstractFloat value, and the exponent may be any AbstractInt value. Note: AbstractFloat expressions resulting in infinity or NaN cause a [=shader-creation error=]. @@ -14809,10 +15032,8 @@ but a value may infer the type. Returns the reciprocal of `sqrt(e)`. [=Component-wise=] when `T` is a vector. - - - -Note: The result is not mathematically meaningful if `e` ≤ 0. + Scalar [=domain=] + Interval [0, [PINF]] ### `ldexp` ### {#ldexp-builtin} @@ -14847,7 +15068,7 @@ Note: The result is not mathematically meaningful if `e` ≤ 0. Here, *bias* is the exponent bias of the floating point format: * 15 for `f16` * 127 for `f32` - * 1023 for AbstractFloat, when AbstractFloat is [[!IEEE-754|IEEE-754]] binary64 + * 1023 for AbstractFloat, when AbstractFloat is [[!IEEE-754|IEEE-754]] [=ieee754/binary64=] If `x` is zero or a finite normal value for its type, then: @@ -14881,6 +15102,7 @@ Note: The result is not mathematically meaningful if `e` ≤ 0. Note: The scalar case may be evaluated as `sqrt(e * e)`, which may unnecessarily overflow or lose accuracy. + ### `log` ### {#log-builtin} @@ -14899,10 +15121,8 @@ Note: The result is not mathematically meaningful if `e` ≤ 0. Returns the natural logarithm of `e`. [=Component-wise=] when `T` is a vector. - - - -Note: The result is not mathematically meaningful if `e` < 0. + Scalar [=domain=] + Interval [0, [PINF]] ### `log2` ### {#log2-builtin} @@ -14921,10 +15141,8 @@ Note: The result is not mathematically meaningful if `e` < 0. Returns the base-2 logarithm of `e`. [=Component-wise=] when `T` is a vector. - - - -Note: The result is not mathematically meaningful if `e` < 0. + Scalar [=domain=] + Interval [0, [PINF]] ### `max` ### {#max-float-builtin} @@ -14945,7 +15163,7 @@ Note: The result is not mathematically meaningful if `e` < 0. [=Component-wise=] when `T` is a vector. If `e1` and `e2` are floating-point values, then: - * If both `e1` and `e2` are denormalized, then the result may be *either* value. + * If both `e1` and `e2` are [=ieee754/subnormal=], then the result may be *either* value. * If one operand is a NaN, the other is returned. * If both operands are NaNs, a NaN is returned. @@ -14968,7 +15186,7 @@ Note: The result is not mathematically meaningful if `e` < 0. [=Component-wise=] when `T` is a vector. If `e1` and `e2` are floating-point values, then: - * If both `e1` and `e2` are denormalized, then the result may be *either* value. + * If both `e1` and `e2` are [=ieee754/subnormal=], then the result may be *either* value. * If one operand is a NaN, the other is returned. * If both operands are NaNs, a NaN is returned. @@ -14988,8 +15206,12 @@ Note: The result is not mathematically meaningful if `e` < 0. [ALLFLOATINGDECL] Description - Returns the linear blend of `e1` and `e2` (e.g. `e1 * (1 - e3) + e2 * e3`). + Returns the linear blend of `e1` and `e2` (e.g. `e1 * (T(1) - e3) + e2 * e3`). [=Component-wise=] when `T` is a vector. + + Domain + [=Implied from linear terms=] of the expressions: |e1|[i] × (1 − |e3|[i]) + |e2|[i] × |e3|[i]. +|e2|[i] × |e2|[i] + |e3|[i]. @@ -15010,6 +15232,9 @@ Note: The result is not mathematically meaningful if `e` < 0. +
Returns the component-wise linear blend of `e1` and `e2`, using scalar blending factor `e3` for each component.
Same as `mix(e1, e2, T2(e3))`. +
Domain + [=Implied from linear terms=] of the expressions: |e1|[i] × (1 − |e3|) + |e2|[i] × |e3|.
### `modf` ### {#modf-builtin} @@ -15248,6 +15473,8 @@ but a value may infer the type. Description Returns a unit vector in the same direction as `e`. + + The [=domain=] is all vectors except the zero vector. ### `pow` ### {#pow-builtin} @@ -15266,6 +15493,17 @@ but a value may infer the type. Description Returns `e1` raised to the power `e2`. [=Component-wise=] when `T` is a vector. + + Scalar [=domain=] + The set of all pairs of [=extended reals=] (|x|,|y|) except: + + * |x| < 0. + * |x| is 1 and |y| is infinite. + * |x| is infinite and |y| is 0. + + This rule arises from the fact the result may be computed as + `exp2(y * log2(x))`. + ### `quantizeToF16` ### {#quantizeToF16-builtin} @@ -15282,8 +15520,8 @@ but a value may infer the type. Description Quantizes a 32-bit floating point value `e` as if `e` were converted to - a [[!IEEE-754|IEEE 754]] binary16 value, and then converted back to a - IEEE 754 binary32 value. + a [[!IEEE-754|IEEE-754]] [=ieee754/binary16=] value, and then converted back to a + IEEE-754 [=ieee754/binary32=] value. If `e` is outside the finite range of binary16, then: * It is a [=shader-creation error=] if `e` is a [=const-expression=]. @@ -15291,7 +15529,7 @@ but a value may infer the type. * Otherwise the result is an [=indeterminate value=] for `T`. The intermediate binary16 value may be [=flushed to zero=], i.e. the final - result may be zero if the intermediate binary16 value is denormalized. + result may be zero if the intermediate binary16 value is [=ieee754/subnormal=]. See [[#floating-point-conversion]]. @@ -15452,6 +15690,9 @@ Note: The vec2<f32> case is the same as `unpack2x16float(pack2x16float(e)) Description Returns the sine of `e`, where `e` is in radians. [=Component-wise=] when `T` is a vector. + + Scalar [=domain=] + Interval ([NINF], [PINF]) ### `sinh` ### {#sinh-builtin} @@ -15517,6 +15758,9 @@ Note: The vec2<f32> case is the same as `unpack2x16float(pack2x16float(e)) Description Returns the square root of `e`. [=Component-wise=] when `T` is a vector. + + Scalar [=domain=] + Interval [0, [PINF]] ### `step` ### {#step-builtin} @@ -15552,6 +15796,9 @@ Note: The vec2<f32> case is the same as `unpack2x16float(pack2x16float(e)) Description Returns the tangent of `e`, where `e` is in radians. [=Component-wise=] when `T` is a vector. + + Scalar [=domain=] + Interval ([NINF], [PINF]) ### `tanh` ### {#tanh-builtin} @@ -17756,7 +18003,7 @@ Note: For packing snorm values, the normalized floating point values are in the Description Converts two floating point values to half-precision floating point numbers, and then combines them into one `u32` value.
- Component `e[i]` of the input is converted to a [[!IEEE-754|IEEE-754]] binary16 value, which is then + Component `e[i]` of the input is converted to a [[!IEEE-754|IEEE-754]] [=ieee754/binary16=] value, which is then placed in bits 16 × `i` through 16 × `i` + 15 of the result. @@ -17886,7 +18133,7 @@ Note: For unpacking snorm values, the normalized floating point result is in the as a floating point value.
Component `i` of the result is the f32 representation of `v`, where `v` is the interpretation of bits 16×`i` through 16×`i + 15` of `e` - as an [[!IEEE-754|IEEE-754]] binary16 value. + as an [[!IEEE-754|IEEE-754]] [=ieee754/binary16=] value. See [[#floating-point-conversion]]. From 0e120c6d6dfcba69e096272c43455e5b7281c3ab Mon Sep 17 00:00:00 2001 From: David Neto Date: Tue, 27 Aug 2024 17:52:38 -0400 Subject: [PATCH 166/285] Clarify validation of storage texture texel-format and access modes (#4831) - a texel mode is valid for storage textures if STORAGE_BINDING is valid with any access mode. - validation of texel-format and access-mode occurs at pipeline-creation time, not shader-creation time. Fixed: #4711 --- wgsl/index.bs | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 99bd540735..5445341677 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -289,6 +289,7 @@ spec: WebGPU; urlPrefix: https://gpuweb.github.io/gpuweb/# text: wgslLanguageFeatures; url: gpuwgsllanguagefeatures type: abstract-op text: validating GPUProgrammableStage; url: abstract-opdef-validating-gpuprogrammablestage + text: validating shader binding; url: abstract-opdef-validating-shader-binding # Introduction # {#intro} @@ -4217,7 +4218,7 @@ Note: The channel transfer function for 8snorm maps {-128,...,127} to the floati The texel formats listed in the Texel Formats for Storage Textures table correspond to the [[WebGPU#plain-color-formats|WebGPU plain color formats]] -which support the WebGPU {{GPUTextureUsage/STORAGE_BINDING}} usage. +which support the WebGPU {{GPUTextureUsage/STORAGE_BINDING}} usage with at least one [=access mode=]. These texel formats are used to parameterize the [=type/storage texture=] types defined in [[#texture-storage]]. @@ -4400,6 +4401,10 @@ used to convert the shader value to the stored texel. * *Format* [=shader-creation error|must=] be an [=enumerant=] for one of the [=storage-texel-formats|texel formats for storage textures=] * *Access* [=shader-creation error|must=] be an [=enumerant=] for one of the [=access modes=]. +* No [=shader-creation error=] occurs due to an invalid combination of *Format* and *Access*. + Combinations of *Format* with *Access* are checked in the [$validating shader binding|shader binding validation$] + step during pipeline creation. + An invalid combination [=behavioral requirement|will=] result in a [=pipeline-creation error=]. ### Depth Texture Types ### {#texture-depth} From 7fa78f6d639356a12b10a3b18d8a6cec9ebfccd1 Mon Sep 17 00:00:00 2001 From: David Neto Date: Wed, 28 Aug 2024 11:07:13 -0400 Subject: [PATCH 167/285] Better explain that you can't assign to a multi-letter swizzle (#4835) It's a consequence of the type rules. But causual readers won't see that. Fixed: #4833 --- wgsl/index.bs | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 5445341677..b40f0c1766 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -5569,23 +5569,30 @@ The result type depends on the number of letters provided. Assuming a `vec4 |e|`.x`: |T|
|e|`.r`: |T| Select the first component of |e| + + This is a single-letter [=swizzle=]. |e|: vec|N|<|T|>
|e|`.y`: |T|
|e|`.g`: |T| Select the second component of |e| + + This is a single-letter [=swizzle=]. |e|: vec|N|<|T|>
|N| is 3 or 4 |e|`.z`: |T|
|e|`.b`: |T| Select the third component of |e| + + This is a single-letter [=swizzle=]. |e|: vec4<|T|> |e|`.w`: |T|
|e|`.a`: |T| Select the fourth component of |e| + This is a single-letter [=swizzle=]. |e|: vec|N|<|T|>
|i|: [INT]
@@ -5619,6 +5626,13 @@ The result type depends on the number of letters provided. Assuming a `vec4 #### Vector Multiple Component Selection #### {#vector-multi-component} +The expressions in this section are all multi-letter [=swizzles=]. +Each forms a [=vector=] from the components of another vector. + +A multi-letter [=swizzle=] cannot appear on the left-hand side of an [=statement/assignment=]: +The left-hand side of an assignment must be of [=reference type=], +but a multi-letter swizzle expression always yields a value of [=vector=] type. + @@ -5708,10 +5722,18 @@ Note: In the table above, [=reference types=] are implicitly handled via the [=l #### Component Reference from Vector Memory View #### {#component-reference-from-vector-memory-view} +The expressions in this section form a [=memory view=] of a single component of a [=vector=] from +the memory view of the whole vector. + +The WGSL [=type rules=] imply that such expressions can appear: +* on the left-hand side of an [=statement/assignment=], to write to that component in memory, or +* any place where a value of the vector component type can appear. + In this case the [=Load Rule=] applies, loading the vector component from memory and yielding that component as the result. + A [=write access=] to component of a vector **may** access all of the [=memory location|memory locations=] associated with that vector. -Note: This means accesses to different components of a vector by different +Note: This means accesses to different components of a vector in memory by different invocations must be synchronized if at least one access is a [=write access=]. See [[#sync-builtin-functions]]. From 60b2926f0b517cf342a2e908ac11577a6bd5ea56 Mon Sep 17 00:00:00 2001 From: David Neto Date: Wed, 28 Aug 2024 11:24:18 -0400 Subject: [PATCH 168/285] subgroups: subgroupBroadcast 'id' parameter is const-expression (#4820) * subgroups: subgroupBroadcast 'id' parameter is const-expression This avoids undefined behaviour if the implementation doesn't know to implement it as a shuffle. Intentional uses of non-const IDs are either: - shuffle - broadcast-first Issue: #4306 Issue: crbug.com/360181411 * Update proposals/subgroups.md fix footnotes Co-authored-by: alan-baker * Update proposals/subgroups.md fix footnotes Co-authored-by: alan-baker --------- Co-authored-by: alan-baker --- proposals/subgroups.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/proposals/subgroups.md b/proposals/subgroups.md index 692cd2c5f8..7e80e2ad28 100644 --- a/proposals/subgroups.md +++ b/proposals/subgroups.md @@ -90,11 +90,11 @@ Using f16 as a parameter in any of these functions requires `subgroups_f16` to b | `fn subgroupElect() -> bool` | | Returns true if this invocation has the lowest subgroup_invocation_id among active invocations in the subgroup | | `fn subgroupAll(e : bool) -> bool` | | Returns true if `e` is true for all active invocations in the subgroup | | `fn subgroupAny(e : bool) -> bool` | | Returns true if `e` is true for any active invocation in the subgroup | -| `fn subgroupBroadcast(e : T, id : I) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types
`I` must be i32 or u32 | Broadcasts `e` from subgroup_invocation_id `id` to all active invocations. `id` must be dynamically uniform1 | +| `fn subgroupBroadcast(e : T, id : I) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types
`I` must be i32 or u32 | Broadcasts `e` from the invocation whose subgroup_invocation_id matches `id`, to all active invocations.
`id` must be a constant-expression. Use `subgroupShuffle` if you need a non-constant `id`. | | `fn subgroupBroadcastFirst(e : T) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Broadcasts `e` from the active invocation with the lowest subgroup_invocation_id in the subgroup to all other active invocations | | `fn subgroupBallot(pred : bool) -> vec4` | | Returns a set of bitfields where the bit corresponding to subgroup_invocation_id is 1 if `pred` is true for that active invocation and 0 otherwise. | | `fn subgroupShuffle(v : T, id : I) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types
`I` must be u32 or i32 | Returns `v` from the active invocation whose subgroup_invocation_id matches `id` | -| `fn subgroupShuffleXor(v : T, mask : u32) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Returns `v` from the active invocation whose subgroup_invocation_id matches `subgroup_invocation_id ^ mask`.
`mask` must be dynamically uniform. | +| `fn subgroupShuffleXor(v : T, mask : u32) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Returns `v` from the active invocation whose subgroup_invocation_id matches `subgroup_invocation_id ^ mask`.
`mask` must be dynamically uniform1 | | `fn subgroupShuffleUp(v : T, delta : u32) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Returns `v` from the active invocation whose subgroup_invocation_id matches `subgroup_invocation_id - delta` | | `fn subgroupShuffleDown(v : T, delta : u32) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Returns `v` from the active invocation whose subgroup_invocation_id matches `subgroup_invocation_id + delta` | | `fn subgroupAdd(e : T) -> T` | `T` must be u32, i32, f32, or a vector of those types | Reduction
Adds `e` among all active invocations and returns that result | @@ -111,7 +111,7 @@ Using f16 as a parameter in any of these functions requires `subgroups_f16` to b | `fn quadSwapY(e : T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Swaps `e` between invocations in the quad in the Y direction | | `fn quadSwapDiagonal(e : T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Swaps `e` between invocations in the quad diagnoally | 1. This is the first instance of dynamic uniformity. See the portability and uniformity section for more details. -2. Unlike `subgroupBroadcast`, SPIR-V does not have a shuffle operation to fall back on, so this requirement must be surfaced. +2. Unlike `subgroupBroadcast`, there is no alternative if the author wants a non-constant `id`: SPIR-V does not have a quad shuffle operation to fall back on. **TODO**: Are quad operations worth it? Quad operations present even less portability than subgroup operations due to From aa904e495d181ced4aff5be54bd59964c9b5753e Mon Sep 17 00:00:00 2001 From: alan-baker Date: Wed, 28 Aug 2024 12:59:37 -0400 Subject: [PATCH 169/285] Add inclusive scan add and mul operations (#4822) --- proposals/subgroups.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/proposals/subgroups.md b/proposals/subgroups.md index 7e80e2ad28..b4774bc625 100644 --- a/proposals/subgroups.md +++ b/proposals/subgroups.md @@ -99,8 +99,10 @@ Using f16 as a parameter in any of these functions requires `subgroups_f16` to b | `fn subgroupShuffleDown(v : T, delta : u32) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Returns `v` from the active invocation whose subgroup_invocation_id matches `subgroup_invocation_id + delta` | | `fn subgroupAdd(e : T) -> T` | `T` must be u32, i32, f32, or a vector of those types | Reduction
Adds `e` among all active invocations and returns that result | | `fn subgroupExclusiveAdd(e : T) -> T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Exclusive scan
Returns the sum of `e` for all active invocations with subgroup_invocation_id less than this invocation | +| `fn subgroupInclusiveAdd(e : T) -> T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Inclusive scan
Returns the sum of `e` for all active invocations with subgroup_invocation_id less than or equal to this invocation | | `fn subgroupMul(e : T) -> T` | `T` must be u32, i32, f32, or a vector of those types | Reduction
Multiplies `e` among all active invocations and returns that result | | `fn subgroupExclusiveMul(e : T) -> T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Exclusive scan
Returns the product of `e` for all active invocations with subgroup_invocation_id less than this invocation | +| `fn subgroupInclusiveMul(e : T) -> T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Inclusive scan
Returns the product of `e` for all active invocations with subgroup_invocation_id less than or equal to this invocation | | `fn subgroupAnd(e : T) -> T` | `T` must be u32, i32, or a vector of those types | Reduction
Performs a bitwise and of `e` among all active invocations and returns that result | | `fn subgroupOr(e : T) -> T` | `T` must be u32, i32, or a vector of those types | Reduction
Performs a bitwise or of `e` among all active invocations and returns that result | | `fn subgroupXor(e : T) -> T` | `T` must be u32, i32, or a vector of those types | Reduction
Performs a bitwise xor of `e` among all active invocations and returns that result | @@ -233,8 +235,10 @@ D3D12 would have to be proven empricially. | `subgroupShuffleDown` | OpGroupNonUniformShuffleDown | simd_shuffle_down | WaveReadLaneAt with index equal `subgroup_invocation_id + delta` | | `subgroupAdd` | OpGroupNonUniform[IF]Add with Reduce operation | simd_sum | WaveActiveSum | | `subgroupExclusiveAdd` | OpGroupNonUniform[IF]Add with ExclusiveScan operation | simd_prefix_exclusive_sum | WavePrefixSum | +| `subgroupInclusiveAdd` | OpGroupNonUniform[IF]Add with InclusiveScan operation | simd_prefix_inclusive_sum | WavePrefixSum(x) + x | | `subgroupMul` | OpGroupNonUniform[IF]Mul with Reduce operation | simd_product | WaveActiveProduct | | `subgroupExclusiveMul` | OpGroupNonUniform[IF]Add with ExclusiveScan operation | simd_prefix_exclusive_product | WavePrefixProduct | +| `subgroupInclusiveMul` | OpGroupNonUniform[IF]Add with InclusiveScan operation | simd_prefix_inclusive_product | WavePrefixProduct(x) * x | | `subgroupAnd` | OpGroupNonUniformBitwiseAnd with Reduce operation | simd_and | WaveActiveBitAnd | | `subgroupOr` | OpGroupNonUniformBitwiseOr with Reduce operation | simd_or | WaveActiveBitOr | | `subgroupXor` | OpGroupNonUniformBitwiseXor with Reduce operation | simd_xor | WaveActiveBitXor | From 93fc5df1691491d1ac3eaf77b8dd203212f54a14 Mon Sep 17 00:00:00 2001 From: David Neto Date: Wed, 28 Aug 2024 14:40:01 -0400 Subject: [PATCH 170/285] interstage-interpolation checks generate pipeline-creation errors, not shader-creation (#4837) Reference the validation algorithm in the WebGPU spec. Fixed: #4836 --- wgsl/index.bs | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index b40f0c1766..34fc092414 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -290,6 +290,7 @@ spec: WebGPU; urlPrefix: https://gpuweb.github.io/gpuweb/# type: abstract-op text: validating GPUProgrammableStage; url: abstract-opdef-validating-gpuprogrammablestage text: validating shader binding; url: abstract-opdef-validating-shader-binding + text: Interstage interface validation; url: abstract-opdef-validating-inter-stage-interfaces # Introduction # {#intro} @@ -9791,9 +9792,10 @@ For user-defined IO of scalar or vector floating-point type: User-defined [=vertex=] outputs and [=fragment=] inputs of scalar or vector integer type [=shader-creation error|must=] always be specified with interpolation type `flat`. -Interpolation attributes [=shader-creation error|must=] match between [=vertex=] outputs and [=fragment=] -inputs with the same [=attribute/location=] assignment within the same [=pipeline=]. - +[$Interstage interface validation$] checks that, within a [=GPURenderPipeline|render pipeline=], +the interpolation properties of each user-defined [=fragment=] input match +the interpolation properties of a vertex output with the same [=attribute/location=] assignment. +If not, a [=pipeline-creation error=] [=behavioral requirement|will=] result. ### Resource Interface ### {#resource-interface} From b205c456f026f561bb41126232348c524ba95b7f Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Wed, 28 Aug 2024 14:13:58 -0700 Subject: [PATCH 171/285] Allow unknown limits to be requested with value `undefined` (#4781) --- spec/index.bs | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 4638987507..cbc8f011e0 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -1359,7 +1359,7 @@ implicitly [$valid to use with|unusable$]. |descriptor|.{{GPUDeviceDescriptor/requiredFeatures}}. 1. Let |limits| be a [=supported limits=] object with all values set to their defaults. 1. For each (|key|, |value|) pair in |descriptor|.{{GPUDeviceDescriptor/requiredLimits}}: - 1. If |value| is [=limit/better=] than |limits|[|key|]: + 1. If |value| is not `undefined` and |value| is [=limit/better=] than |limits|[|key|]: 1. Set |limits|[|key|] to |value|. 1. Let |device| be a [=device=] object. 1. Set |device|.{{device/[[adapter]]}} to |adapter|. @@ -2613,11 +2613,16 @@ interface GPUAdapter {
1. |adapter|.{{adapter/[[state]]}} must not be {{adapter/[[state]]/"consumed"}}. - 1. For each [|key|, |value|] in |descriptor|.{{GPUDeviceDescriptor/requiredLimits}}: + 1. For each [|key|, |value|] in |descriptor|.{{GPUDeviceDescriptor/requiredLimits}} + for which |value| is not `undefined`: 1. |key| |must| be the name of a member of [=supported limits=]. 1. |value| |must| be no [=limit/better=] than |adapter|.{{adapter/[[limits]]}}[|key|]. 1. If |key|'s [=limit class|class=] is [=limit class/alignment=], |value| |must| be a power of 2 less than 232. + + Note: + User agents should consider issuing developer-visible warnings when + |key| is not recognized, even when |value| is `undefined`.
If any are unmet, issue the following steps on contentTimeline @@ -2686,7 +2691,7 @@ interface GPUAdapter { dictionary GPUDeviceDescriptor : GPUObjectDescriptorBase { sequence requiredFeatures = []; - record requiredLimits = {}; + record requiredLimits = {}; GPUQueueDescriptor defaultQueue = {}; }; @@ -2707,7 +2712,7 @@ dictionary GPUDeviceDescriptor Specifies the [=limits=] that are required by the device request. The request will fail if the adapter cannot provide these limits. - Each key must be the name of a member of [=supported limits=]. + Each key with a non-`undefined` value must be the name of a member of [=supported limits=]. API calls on the resulting device perform validation according to the exact limits of the device (not the adapter; see [[#limits]]). From bda63a8e1680cd09dc5744bd4ecfda4c2836393a Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Wed, 28 Aug 2024 14:17:14 -0700 Subject: [PATCH 172/285] Remove maxInterStageShaderComponents (#4783) --- correspondence/index.bs | 13 +++++------ spec/index.bs | 49 ++++++++++++----------------------------- 2 files changed, 19 insertions(+), 43 deletions(-) diff --git a/correspondence/index.bs b/correspondence/index.bs index 3d2e7a4dfc..16c8ddf482 100644 --- a/correspondence/index.bs +++ b/correspondence/index.bs @@ -229,17 +229,14 @@ User agents are not required to use these formulas and may expose whatever they
-
Vector decomposition: multiple component selection
`maxVertexInputBindingStride` *No documented limit?* 2048 B = `D3D12_SO_BUFFER_MAX_STRIDE_IN_BYTES` -
`maxInterStageShaderComponents` - [#1962](https://github.com/gpuweb/gpuweb/issues/1962) - `min(maxVertexOutputComponents, maxFragmentInputComponents)` - `Maximum number of input components to a fragment function, declared with the stage_in qualifier`, subtract 4 for non-Apple GPUs - 120 = `maxInterStageShaderVariables * 4`
`maxInterStageShaderVariables` - [#1962](https://github.com/gpuweb/gpuweb/issues/1962) + [#1962](https://github.com/gpuweb/gpuweb/issues/1962#issuecomment-1136316791) `min(maxVertexOutputComponents // 4, maxFragmentInputComponents // 4)` - `Maximum number of inputs (scalars or vectors) to a fragment function, declared with the stage_in qualifier`, subtract 2 for non-Apple GPUs + Min of: + + - `Maximum scalar or vector inputs to a fragment function`, subtract 2 for non-Apple GPUs + - `(Maximum number of input components to a fragment function) / 4`, subtract 1 for non-Apple GPUs 30 = `min(D3D12_VS_OUTPUT_REGISTER_COUNT - 1, D3D12_PS_INPUT_REGISTER_COUNT - 2)`
`maxColorAttachments` diff --git a/spec/index.bs b/spec/index.bs index cbc8f011e0..456b34cd19 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -1683,12 +1683,6 @@ A supported limits object has a value for every limit defined by The maximum allowed {{GPUVertexBufferLayout/arrayStride}} when creating a {{GPURenderPipeline}}. -
maxInterStageShaderComponents - {{GPUSize32}} [=limit class/maximum=] 64 -
- The maximum allowed number of components of input or output variables - for inter-stage communication (like vertex outputs or fragment inputs). -
maxInterStageShaderVariables {{GPUSize32}} [=limit class/maximum=] 16
@@ -1780,7 +1774,6 @@ interface GPUSupportedLimits { readonly attribute unsigned long long maxBufferSize; readonly attribute unsigned long maxVertexAttributes; readonly attribute unsigned long maxVertexBufferArrayStride; - readonly attribute unsigned long maxInterStageShaderComponents; readonly attribute unsigned long maxInterStageShaderVariables; readonly attribute unsigned long maxColorAttachments; readonly attribute unsigned long maxColorAttachmentBytesPerSample; @@ -8170,45 +8163,31 @@ dictionary GPURenderPipelineDescriptor [=Device timeline=] steps: - 1. Let |maxVertexShaderOutputComponents| be - |device|.limits.{{supported limits/maxInterStageShaderComponents}}. 1. Let |maxVertexShaderOutputVariables| be |device|.limits.{{supported limits/maxInterStageShaderVariables}}. + 1. Let |maxVertexShaderOutputLocation| be + |device|.limits.{{supported limits/maxInterStageShaderVariables}} - 1. 1. If |descriptor|.{{GPURenderPipelineDescriptor/primitive}}.{{GPUPrimitiveState/topology}} is {{GPUPrimitiveTopology/"point-list"}}: - 1. Decrement |maxVertexShaderOutputComponents| by 1. + 1. Decrement |maxVertexShaderOutputVariables| by 1. 1. If [=builtin/clip_distances=] is declared in the output of |descriptor|.{{GPURenderPipelineDescriptor/vertex}}: 1. Let |clipDistancesSize| be the array size of [=builtin/clip_distances=]. - 1. Decrement |maxVertexShaderOutputComponents| by [=roundUp=](4, |clipDistancesSize|). - 1. Decrement |maxVertexShaderOutputVariables| by ([=roundUp=](4, |clipDistancesSize|) / 4). + 1. Decrement |maxVertexShaderOutputVariables| by ceil(|clipDistancesSize| / 4). + 1. Decrement |maxVertexShaderOutputLocation| by ceil(|clipDistancesSize| / 4). 1. Return `false` if any of the following requirements are unmet: - - There must be no more than |maxVertexShaderOutputComponents| scalar - components across all user-defined outputs for + - There must be no more than |maxVertexShaderOutputVariables| user-defined outputs for |descriptor|.{{GPURenderPipelineDescriptor/vertex}}. - Each user-defined output of |descriptor|.{{GPURenderPipelineDescriptor/vertex}} - consumes 4 scalar components. - The [=location=] of each user-defined output of |descriptor|.{{GPURenderPipelineDescriptor/vertex}} must be - < |maxVertexShaderOutputVariables|. + ≤ |maxVertexShaderOutputLocation|. 1. If |descriptor|.{{GPURenderPipelineDescriptor/fragment}} [=map/exist|is provided=]: - 1. Let |maxFragmentShaderInputComponents| be - |device|.limits.{{supported limits/maxInterStageShaderComponents}}. - 1. If the `front_facing` [=builtin=] is an input of - |descriptor|.{{GPURenderPipelineDescriptor/fragment}}: - 1. Decrement |maxFragmentShaderInputComponents| by 1. - 1. If the `sample_index` [=builtin=] is an input of - |descriptor|.{{GPURenderPipelineDescriptor/fragment}}: - 1. Decrement |maxFragmentShaderInputComponents| by 1. - 1. If the `sample_mask` [=builtin=] is an input of - |descriptor|.{{GPURenderPipelineDescriptor/fragment}}: - 1. Decrement |maxFragmentShaderInputComponents| by 1. + 1. Let |maxFragmentShaderInputVariables| be + |device|.limits.{{supported limits/maxInterStageShaderVariables}}. + 1. If any of the `front_facing`, `sample_index`, or `sample_mask` [=builtins=] are an input of + |descriptor|.{{GPURenderPipelineDescriptor/fragment}}: + 1. Decrement |maxFragmentShaderInputVariables| by 1. 1. Return `false` if any of the following requirements are unmet: - - There must be no more than |maxFragmentShaderInputComponents| scalar - components across all user-defined inputs for - |descriptor|.{{GPURenderPipelineDescriptor/fragment}}. - Each user-defined input of |descriptor|.{{GPURenderPipelineDescriptor/fragment}} - consumes 4 scalar components. - For each user-defined input of |descriptor|.{{GPURenderPipelineDescriptor/fragment}} there must be a user-defined output of |descriptor|.{{GPURenderPipelineDescriptor/vertex}} that [=location=], type, and [=interpolation=] of the input. @@ -8217,8 +8196,8 @@ dictionary GPURenderPipelineDescriptor their values will be discarded. 1. [=Assert=] that the [=location=] of each user-defined input of |descriptor|.{{GPURenderPipelineDescriptor/fragment}} is less - than |device|.limits.{{supported limits/maxInterStageShaderVariables}} - (resulting from the above rules). + than |device|.limits.{{supported limits/maxInterStageShaderVariables}}. + (This follows from the above rules.) 1. Return `true`. From ed7dd0728e73d869fbd20dfe46c05c33e1ded8ad Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Wed, 28 Aug 2024 15:01:23 -0700 Subject: [PATCH 173/285] Generate a validation error on mapAsync early-reject (#4786) --- spec/index.bs | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/spec/index.bs b/spec/index.bs index 456b34cd19..3a53631fae 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -3522,6 +3522,8 @@ The {{GPUMapMode}} flags determine how a {{GPUBuffer}} is mapped when calling 1. Let contentTimeline be the current [=Content timeline=]. 1. If |this|.{{GPUBuffer/[[pending_map]]}} is not `null`: + 1. Issue the |early-reject steps| on the [=Device timeline=] of + |this|.{{GPUObjectBase/[[device]]}}. 1. Return [=a promise rejected with=] {{OperationError}}. 1. Let |p| be a new {{Promise}}. 1. Set |this|.{{GPUBuffer/[[pending_map]]}} to |p|. @@ -3529,6 +3531,12 @@ The {{GPUMapMode}} flags determine how a {{GPUBuffer}} is mapped when calling |this|.{{GPUObjectBase/[[device]]}}. 1. Return |p|. +
+ [=Device timeline=] |early-reject steps|: + + 1. [$Generate a validation error$]. + 1. Return. +
[=Device timeline=] |validation steps|: From f1b76ce6a7859b24598ff2399729f1f4b2e0650c Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Thu, 29 Aug 2024 00:07:35 -0700 Subject: [PATCH 174/285] Remove GPUShaderModuleDescriptor.sourceMap (#4840) TBD exactly how we will support source maps in the future, but we realized this isn't the correct way. If there are any applications out there setting this member, this is a non-breaking change because browsers will just ignore the key. (No browsers were implementing this anyway, so at best we were just type-checking it.) History: added in 645 Fixes 4808 --- spec/index.bs | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 3a53631fae..8b3c22d881 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -6563,7 +6563,6 @@ GPUShaderModule includes GPUObjectBase; dictionary GPUShaderModuleDescriptor : GPUObjectDescriptorBase { required USVString code; - object sourceMap; sequence compilationHints = []; }; @@ -6574,21 +6573,6 @@ dictionary GPUShaderModuleDescriptor The WGSL source code for the shader module. - : sourceMap - :: - If defined, **may** be interpreted in the [[!SourceMap]] v3 format. - - If an implementation supports this option but is unable to process the provided value, - it should show a developer-visible warning but must not produce any application-observable - error. - - Note: - Source map support is optional, but serves as a semi-standardized way to support dev-tool - integration such as source-language debugging. - - WGSL names (identifiers) in source maps follow the rules defined in [=WGSL identifier - comparison=]. - : compilationHints :: A list of {{GPUShaderModuleCompilationHint}}s. From edfffd9ea6f6dfdf6205f96e49dc0699e0e2f2be Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Thu, 29 Aug 2024 10:58:55 -0700 Subject: [PATCH 175/285] Change mapAsync to also early-reject if already mapped (#4787) Not just if already _pending_ mapping. --- spec/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/index.bs b/spec/index.bs index 8b3c22d881..a9a3bd9a5d 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -3521,7 +3521,7 @@ The {{GPUMapMode}} flags determine how a {{GPUBuffer}} is mapped when calling [=Content timeline=] steps: 1. Let contentTimeline be the current [=Content timeline=]. - 1. If |this|.{{GPUBuffer/[[pending_map]]}} is not `null`: + 1. If |this|.{{GPUBuffer/mapState}} is not {{GPUBufferMapState/"unmapped"}}: 1. Issue the |early-reject steps| on the [=Device timeline=] of |this|.{{GPUObjectBase/[[device]]}}. 1. Return [=a promise rejected with=] {{OperationError}}. From 078959e2e5e2d87fef50653de2c9a3c6585e4bf5 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Fri, 30 Aug 2024 14:06:00 -0700 Subject: [PATCH 176/285] Additional cases for non-exact texel copies (#4842) --- spec/index.bs | 2 ++ spec/sections/copies.bs | 14 +++++++++++--- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index a9a3bd9a5d..a57ac326c5 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -100,8 +100,10 @@ spec: WGSL; urlPrefix: https://gpuweb.github.io/gpuweb/wgsl/# text: language extension; url: language-extension text: runtime-sized; url: runtime-sized text: WGSL floating point conversion; url: floating-point-conversion + text: WGSL floating point behaviors; url: differences-from-ieee754 text: WGSL identifier comparison; url: identifier-comparison text: WGSL scalar type; url: scalar-types + text: indeterminate value; url: indeterminate-values text: @binding; url: attribute-binding text: @group; url: attribute-group text: @interpolate; url: interpolate-attr diff --git a/spec/sections/copies.bs b/spec/sections/copies.bs index efa9cc68e7..6d7fb51def 100644 --- a/spec/sections/copies.bs +++ b/spec/sections/copies.bs @@ -28,9 +28,17 @@ and "immediate" {{GPUQueue}} operations: - {{GPUQueue/writeTexture()}}, for {{ArrayBuffer}}-to-{{GPUTexture}} writes - {{GPUQueue/copyExternalImageToTexture()}}, for copies from Web Platform image sources to textures -Some texel values have multiple possible representations -of some values, e.g. as `r8snorm`, -1.0 can be represented as either -127 or -128. -Copy commands are not guaranteed to preserve the source's bit-representation. +During a texel copy texels are copied over with an equivalent texel representation. +Texel copies only guarantee that valid, normal numeric values in the source have the same numeric +value in the destination, and may not preserve the bit-representations of the the following values: + +- snorm values may represent -1.0 as either -127 or -128. +- The signs of zero values may not be preserved. +- Subnormal floating-point values may be replaced by either -0.0 or +0.0. +- Any NaN or Infinity is an invalid value and may be replace by an [=indeterminate value=]. + +Note: Copies may be performed with WGSL shaders, which means that any of the documented +[=WGSL floating point behaviors=] may be observed. The following definitions are used by these methods: From 4117c565661afd23b3d26aae87b7a87cd517ee4d Mon Sep 17 00:00:00 2001 From: Teodor Tanasoaia <28601907+teoxoy@users.noreply.github.com> Date: Fri, 30 Aug 2024 23:24:48 +0200 Subject: [PATCH 177/285] add dynamic offset to "Validate encoder bind groups" validation function (#4843) --- spec/index.bs | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index a57ac326c5..fe277354b7 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -10691,12 +10691,25 @@ It must only be included by interfaces which also include those mixins. For each pair of ({{GPUIndex32}} |index|, {{GPUBindGroupLayout}} |bindGroupLayout|) in |pipeline|.{{GPUPipelineBase/[[layout]]}}.{{GPUPipelineLayout/[[bindGroupLayouts]]}}: - Let |bindGroup| be |encoder|.{{GPUBindingCommandsMixin/[[bind_groups]]}}[|index|]. + - Let |dynamicOffsets| be |encoder|.{{GPUBindingCommandsMixin/[[dynamic_offsets]]}}[|index|]. - |bindGroup| must not be `null`. - |bindGroup|.{{GPUBindGroup/[[layout]]}} must be [=group-equivalent=] with |bindGroupLayout|. - - For each |entry| of |bindGroup|.{{GPUBindGroup/[[entries]]}}: - - If |entry|.{{GPUBindGroupEntry/[[prevalidatedSize]]}} is `false`: - - [$effective buffer binding size$](|entry|.{{GPUBindGroupEntry/resource}}) must be ≥ [=minimum buffer binding size=] - of the binding variable in |pipeline|'s shader that corresponds to |entry|. + - Let |dynamicOffsetIndex| be 0. + - For each {{GPUBindGroupEntry}} |bindGroupEntry| in |bindGroup|.{{GPUBindGroup/[[entries]]}}, + sorted by |bindGroupEntry|.{{GPUBindGroupEntry/binding}}: + - Let |bindGroupLayoutEntry| be + |bindGroup|.{{GPUBindGroup/[[layout]]}}.{{GPUBindGroupLayout/[[entryMap]]}}[|bindGroupEntry|.{{GPUBindGroupEntry/binding}}]. + - If |bindGroupLayoutEntry|.{{GPUBindGroupLayoutEntry/buffer}} is not + [=map/exists|provided=], **continue**. + - Let |bound| be a copy of |bindGroupEntry|.{{GPUBindGroupEntry/resource}}. + - [=Assert=] |bound| is a {{GPUBufferBinding}}. + - If |bindGroupLayoutEntry|.{{GPUBindGroupLayoutEntry/buffer}}.{{GPUBufferBindingLayout/hasDynamicOffset}}: + - Increment |bound|.{{GPUBufferBinding/offset}} by + |dynamicOffsets|[|dynamicOffsetIndex|]. + - Increment |dynamicOffsetIndex| by 1. + - If |bindGroupEntry|.{{GPUBindGroupEntry/[[prevalidatedSize]]}} is `false`: + - [$effective buffer binding size$](|bound|) must be ≥ [=minimum buffer binding size=] + of the binding variable in |pipeline|'s shader that corresponds to |bindGroupEntry|. - [$Encoder bind groups alias a writable resource$](|encoder|, |pipeline|) must be `false`.
From a5cc30af2f43f4c015abe002cfa956401382a177 Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Fri, 30 Aug 2024 15:20:26 -0700 Subject: [PATCH 178/285] Disable async operations after device destruction starts (#4788) --- spec/index.bs | 194 ++++++++++++++++++++++++++++++++------------------ 1 file changed, 124 insertions(+), 70 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index fe277354b7..e7bfb2f7cd 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -1248,6 +1248,8 @@ both an instance of compute/rendering functionality on the platform underlying a browser, and an instance of a browser's implementation of WebGPU on top of that functionality. +[=Adapters=] are exposed via {{GPUAdapter}}. + [=Adapters=] do not uniquely represent underlying implementations: calling {{GPU/requestAdapter()}} multiple times returns a different [=adapter=] object each time. @@ -1301,8 +1303,6 @@ improved privacy. It is not required that a [=fallback adapter=] is available on -[=Adapters=] are exposed via {{GPUAdapter}}. -
To expire a {{GPUAdapter}} |adapter|, run the following [=device timeline=] steps: @@ -1320,6 +1320,8 @@ through which [=internal objects=] are created. It can be shared across multiple [=agents=] (e.g. dedicated workers). --> +[=Devices=] are exposed via {{GPUDevice}}. + A [=device=] is the exclusive owner of all [=internal objects=] created from it: when the [=device=] becomes [$invalid$] (is [=lose the device|lost=] or {{GPUDevice/destroy()|destroyed}}), @@ -1345,7 +1347,7 @@ implicitly [$valid to use with|unusable$]. No [=limit/better=] limits can be used, even if the underlying [=adapter=] can support them. -[=device=] has the following [=content timeline property=]: +[=device=] has the following [=content timeline properties=]:
: [[content device]], of type {{GPUDevice}}, readonly @@ -1353,6 +1355,20 @@ implicitly [$valid to use with|unusable$]. The [=Content timeline=] {{GPUDevice}} interface which this device is associated with.
+[=device=] has the following [=device timeline properties=]: + +
+ : [[destroy started]], of type `boolean`, initially `false` + :: + Becomes true when a {{GPUDevice/destroy()}} operation is started, + and remains true once it is finished. + + Note: + Once destruction starts, ongoing operations can complete and send messages back to the + [=content timeline=], but no new operations can start which do so. + See also [[#errors-and-debugging]]. +
+
To create a new device from [=adapter=] |adapter| with {{GPUDeviceDescriptor}} |descriptor|, run the following [=device timeline=] steps: @@ -1402,10 +1418,20 @@ no validation errors are raised, most promises resolve normally, etc. 1. Complete any outstanding steps that are waiting until |device| becomes lost. - Note: No errors are generated after device loss. See [[#errors-and-debugging]]. + Note: No errors are generated from a device which is lost or pending destruction. + See [[#errors-and-debugging]].
-[=Devices=] are exposed via {{GPUDevice}}. +
+ To listen for timeline event + |event| on [=device=] |device|, handled by |steps| on timeline |timeline|: + + - If or when the [=device timeline=] has been informed of the completion of |event|, or + - If |device|.{{device/[[destroy started]]}} is already `true`, or + - If |device| is [$invalid|lost$] already, or when it [=becomes lost=]: + + Then issue |steps| on |timeline|. +
## Optional Capabilities ## {#optional-capabilities} @@ -2879,8 +2905,15 @@ to.
[=Device timeline=] steps: - 1. Once all currently-enqueued operations on any queue on this device - are completed, issue the subsequent steps on the current timeline. + 1. Set |this|.{{GPUObjectBase/[[device]]}}.{{device/[[destroy started]]}} to `true`. + 1. Once: + - All currently-enqueued operations on any queue on this device have completed, and + - Any [=device timeline=] steps that were listening for completion of + queue operations have completed + ([=asserting=] that no new listeners were added after {{device/[[destroy started]]}} was set): + + Then issue the subsequent steps on the + current timeline.
1. [=Lose the device=](|this|.{{GPUObjectBase/[[device]]}}, @@ -3574,6 +3607,16 @@ The {{GPUMapMode}} flags determine how a {{GPUBuffer}} is mapped when calling Note: Since the buffer is mapped, its contents cannot change between this step and {{GPUBuffer/unmap()}}. + 1. If |this|.{{GPUObjectBase/[[device]]}}.{{device/[[destroy started]]}} is `true`: + 1. Return. + + Note: + The map promise will already have been aborted, because + {{GPUDevice/destroy()|GPUDevice.destroy()}} + aborts it, so there is no need to do so here. + + 1. When either of the following events occur (whichever comes first), or if either has already occurred: @@ -6866,14 +6909,11 @@ Note: {{GPUCompilationMessage}}.{{GPUCompilationMessage/offset}} and
[=Device timeline=] |synchronization steps|: - 1. When either of the following events occur (whichever comes first), - or if either has already occurred: - - - The [=device timeline=] becomes informed that [=shader module creation=] has - completed for |this| (successfully or unsuccessfully). - - |this|.{{GPUObjectBase/[[device]]}} [=becomes lost=]. - - Then issue the subsequent steps on contentTimeline. + 1. Let |event| occur upon the (successful or unsuccessful) completion of + [=shader module creation=] for |this|. + 1. [$Listen for timeline event$] |event| + on |this|.{{GPUObjectBase/[[device]]}}, handled by + the subsequent steps on contentTimeline.
[=Content timeline=] steps: @@ -7778,29 +7818,31 @@ dictionary GPUComputePipelineDescriptor 1. Let |pipeline| be a new {{GPUComputePipeline}} created as if |this|.{{GPUDevice/createComputePipeline()}} was called with |descriptor|, except capturing any errors as |error|, rather than dispatching them to the device. - - 1. When either of the following events occur (whichever comes first), - or if either has already occurred: - - - The [=device timeline=] becomes informed that [=pipeline creation=] has - completed for |pipeline| (successfully or unsuccessfully). - - |this| [=becomes lost=]. - - Then issue the subsequent steps on the [=device timeline=] of |this|. + 1. Let |event| occur upon the (successful or unsuccessful) completion of + [=pipeline creation=] for |pipeline|. + 1. [$Listen for timeline event$] |event| + on |this|.{{GPUObjectBase/[[device]]}}, handled by + the subsequent steps on the [=device timeline=] of |this|.
[=Device timeline=] steps: - 1. If |this| is [$invalid|lost$] or |pipeline| is [$valid$], - issue the following steps on |contentTimeline|, and return. + 1. If |pipeline| is [$valid$], + |this|.{{GPUObjectBase/[[device]]}}.{{device/[[destroy started]]}} is `true`, + or |this| is [$invalid|lost$]: -
- [=Content timeline=] steps: + 1. Issue the following steps on |contentTimeline|: - 1. [=Resolve=] |promise| with |pipeline|. -
+
+ [=Content timeline=] steps: - Note: No errors are generated after device loss. See [[#errors-and-debugging]]. + 1. [=Resolve=] |promise| with |pipeline|. +
+ + 1. Return. + + Note: No errors are generated from a device which is lost or pending destruction. + See [[#errors-and-debugging]]. 1. If |pipeline| is [$invalid$] and |error| is an [$internal error$], issue the following steps on |contentTimeline|, and return. @@ -8052,29 +8094,31 @@ dictionary GPURenderPipelineDescriptor 1. Let |pipeline| be a new {{GPURenderPipeline}} created as if |this|.{{GPUDevice/createRenderPipeline()}} was called with |descriptor|, except capturing any errors as |error|, rather than dispatching them to the device. - - 1. When either of the following events occur (whichever comes first), - or if either has already occurred: - - - The [=device timeline=] becomes informed that [=pipeline creation=] has - completed for |pipeline| (successfully or unsuccessfully). - - |this| [=becomes lost=]. - - Then issue the subsequent steps on the [=device timeline=] of |this|. + 1. Let |event| occur upon the (successful or unsuccessful) completion of + [=pipeline creation=] for |pipeline|. + 1. [$Listen for timeline event$] |event| + on |this|.{{GPUObjectBase/[[device]]}}, handled by + the subsequent steps on the [=device timeline=] of |this|.
[=Device timeline=] steps: - 1. If |this| is [$invalid|lost$] or |pipeline| is [$valid$], - issue the following steps on |contentTimeline|, and return. + 1. If |pipeline| is [$valid$], + |this|.{{GPUObjectBase/[[device]]}}.{{device/[[destroy started]]}} is `true`, + or |this| is [$invalid|lost$]: -
- [=Content timeline=] steps: + 1. Issue the following steps on |contentTimeline|: - 1. [=Resolve=] |promise| with |pipeline|. -
+
+ [=Content timeline=] steps: + + 1. [=Resolve=] |promise| with |pipeline|. +
+ + 1. Return. - Note: No errors are generated after device loss. See [[#errors-and-debugging]]. + Note: No errors are generated from a device which is lost or pending destruction. + See [[#errors-and-debugging]]. 1. If |pipeline| is [$invalid$] and |error| is an [$internal error$], issue the following steps on |contentTimeline|, and return. @@ -13519,14 +13563,11 @@ GPUQueue includes GPUObjectBase;
[=Device timeline=] |synchronization steps|: - 1. When either of the following events occur (whichever comes first), - or if either has already occurred: - - - The [=device timeline=] becomes informed of the completion of - all currently-enqueued operations. - - |this|.{{GPUObjectBase/[[device]]}} [=becomes lost=]. - - Then issue the subsequent steps on contentTimeline. + 1. Let |event| occur upon the completion of + all currently-enqueued operations. + 1. [$Listen for timeline event$] |event| + on |this|.{{GPUObjectBase/[[device]]}}, handled by + the subsequent steps on contentTimeline.
[=Content timeline=] steps: @@ -14416,16 +14457,20 @@ is being composited into (e.g. an HTML page rendering, or a 2D canvas). During the normal course of operation of WebGPU, errors are raised via [$dispatch error$]. -After a device is [=lose the device|lost=] (described below), errors are no longer surfaced, -where possible. ({{GPUBuffer/mapAsync()}} does produce an error, because it is impossible to -provide the correct mapped data after the device has been lost.) - -At this point, implementations do not need to run validation or error tracking: +After a device is {{device/[[destroy started]]}} or [=lose the device|lost=], +errors are no longer surfaced, where possible. +After this point, implementations do not need to run validation or error tracking: - The validity of objects on the device becomes unobservable. - {{GPUDevice/popErrorScope()}} and {{GPUDevice/uncapturederror}} stop reporting errors. (No errors are generated by the device loss itself. Instead, the {{GPUDevice}}.{{GPUDevice/lost}} promise resolves to indicate the device is lost.) +- All operations which send a message back to the [=content timeline=] will skip their usual steps. + Most will appear to succeed, except for {{GPUBuffer/mapAsync()}}, which produces an error + because it is impossible to provide the correct mapped data after the device has been lost. + + This makes it unobservable whether other types of operations (that don't send messages back) + actually execute or not. ## Fatal Errors ## {#fatal-errors} @@ -14474,7 +14519,8 @@ and the {{GPUDevice/uncapturederror}} event. Errors must only be generated for operations that explicitly state the conditions one may be generated under in their respective algorithms, and the subtype of error that is generated. -No errors are generated after device loss. See [[#errors-and-debugging]]. +No errors are generated from a device which is lost or pending destruction. +See [[#errors-and-debugging]]. Note: {{GPUError}} may gain new subtypes in future versions of this spec. Applications should handle this possibility, using only the error's {{GPUError/message}} when possible, and specializing using @@ -14662,9 +14708,12 @@ partial interface GPUDevice {
[=Device timeline=] steps: - 1. If |device| is [$invalid|lost$], return. + Note: No errors are generated from a device which is lost or pending destruction. + If this algorithm is called while + |device|.{{GPUObjectBase/[[device]]}}.{{device/[[destroy started]]}} is `true` + or |device| is [$invalid|lost$], it will not be observable to the application. + See [[#errors-and-debugging]]. - Note: No errors are generated after device loss. See [[#errors-and-debugging]]. 1. Let |scope| be the [$current error scope$] for |error| and |device|. 1. If |scope| is not `undefined`: 1. [=list/Append=] |error| to |scope|.{{GPU error scope/[[errors]]}}. @@ -14745,17 +14794,22 @@ partial interface GPUDevice {
[=Device timeline=] |check steps|: - 1. If |this| is [$invalid|lost$], issue the following steps on - contentTimeline and return: + 1. If |this|.{{GPUObjectBase/[[device]]}}.{{device/[[destroy started]]}} or + |this| is [$invalid|lost$]: -
- [=Content timeline=] steps: + 1. Issue the following steps on + contentTimeline: - 1. [=Resolve=] |promise| with `null`. -
+
+ [=Content timeline=] steps: + + 1. [=Resolve=] |promise| with `null`. +
- Note: No errors are generated after device loss. See [[#errors-and-debugging]]. + 1. Return. + Note: No errors are generated from a device which is lost or pending destruction. + See [[#errors-and-debugging]]. 1. If any of the following requirements are unmet:
From 68b60b35f62d975af2ab5b6a9c19a466c9dcec25 Mon Sep 17 00:00:00 2001 From: Samson <16504129+sagudev@users.noreply.github.com> Date: Tue, 3 Sep 2024 22:13:22 +0200 Subject: [PATCH 179/285] Fix typo expresion -> expres**s**ion (#4846) --- wgsl/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 34fc092414..be6aa7f153 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -7740,7 +7740,7 @@ path: syntax/const_assert_statement.syntax.bs.include const z = x + y - 2; const_assert z > 0; // valid in functions. let a = 3; - const_assert a != 0; // invalid, the expresion must be a const-expression. + const_assert a != 0; // invalid, the expression must be a const-expression. }
From 0ebebbe5323f4ebe1faca0432c8b2e0684ca963f Mon Sep 17 00:00:00 2001 From: David Neto Date: Wed, 4 Sep 2024 17:01:03 -0400 Subject: [PATCH 180/285] wgsl: dynamically infinite loops are dynamic errors (#4847) Device loss is not the only possibility. The loop may be terminated early, skipped, etc. Issue: #1987 --- wgsl/index.bs | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index be6aa7f153..c29ec8182a 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -270,6 +270,7 @@ spec: WebGPU; urlPrefix: https://gpuweb.github.io/gpuweb/# text: binding member; url: binding-member text: binding resource type; url: binding-resource-type text: binding type; url: binding-type + text: device loss; url: lose-the-device text: GPU error scope; url: gpu-error-scope text: front-facing; url: front-facing text: shader-output mask; url: shader-output-mask @@ -7317,7 +7318,7 @@ creating a new instance of the [=variable declaration|variable=] or [=value decl path: syntax/loop_statement.syntax.bs.include -A loop statement repeatedly executes a loop body; +A loop statement repeatedly executes a loop body; the loop body is specified as a [=compound statement=]. Each execution of the loop body is called an iteration. @@ -7327,6 +7328,9 @@ This repetition can be interrupted by a [=statement/break=], or Optionally, the last statement in the loop body may be a [=statement/continuing=] statement. +A [=dynamic error=] occurs if the [=statement/loop=] would execute an unbounded number of [=iterations=]. +This may result in early termination of the loop, other non-local effects, or even [=device loss=]. + When one of the statements in the loop body is a [=declaration=], it follows the normal [=scope=] and [=lifetime=] rules of a declaration in a [=compound statement=]. That is, the loop body is a sequence of statements, and if one of those is a declaration @@ -7491,6 +7495,9 @@ Converts to:
+A [=dynamic error=] occurs if the [=statement/for=] loop would execute an unbounded number of [=iterations=]. +This may result in early termination of the loop, other non-local effects, or even [=device loss=]. + ### While Statement ### {#while-statement}
@@ -7510,6 +7517,9 @@ The following statement forms are equivalent:
 * `loop { if !` *condition* `{break;}` *body_statements* `}`
 * `for (;`  *condition* `;) {` *body_statements*  `}`
 
+A [=dynamic error=] occurs if the [=statement/while=] loop would execute an unbounded number of [=iterations=].
+This may result in early termination of the loop, other non-local effects, or even [=device loss=].
+
 ### Break Statement ### {#break-statement}
 
 

From 36d1c091b3b2aabee3ed656a65f557d6b25e694b Mon Sep 17 00:00:00 2001
From: Jiawei Shao 
Date: Thu, 5 Sep 2024 09:32:47 +0800
Subject: [PATCH 181/285] Add missing validation rule about
 |maxFragmentShaderInputVariables| (#4851)

This patch adds the missing validation rule about the maximum number
of fragment shader input variables. The total number of user-defined
fragment shader input variables must be less than or equal to
|maxFragmentShaderInputVariables|.

Issue: #4688
---
 spec/index.bs | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/spec/index.bs b/spec/index.bs
index e7bfb2f7cd..0981c025a0 100644
--- a/spec/index.bs
+++ b/spec/index.bs
@@ -8232,6 +8232,8 @@ dictionary GPURenderPipelineDescriptor
 
                 Note: Vertex-only pipelines **can** have user-defined outputs in the vertex stage;
                 their values will be discarded.
+            - There must be no more than |maxFragmentShaderInputVariables| user-defined inputs for
+                |descriptor|.{{GPURenderPipelineDescriptor/fragment}}.
         1. [=Assert=] that the [=location=] of each user-defined input of
             |descriptor|.{{GPURenderPipelineDescriptor/fragment}} is less
             than |device|.limits.{{supported limits/maxInterStageShaderVariables}}.

From b39d86d356eb759d7564bc7c808ca62fce8bbf3e Mon Sep 17 00:00:00 2001
From: Brandon Jones 
Date: Thu, 5 Sep 2024 09:48:40 -0700
Subject: [PATCH 182/285] Add usage to GPUTextureViewDescriptor (#4746)

Fixes #4426

Adds usage to texture views, defaulting them to the full set of usages
from the texture. Validates that they must be a subset of the texture
usage and validates that the view format is storage compatible if the
`STORAGE_BINDING` usage is specified.

Co-authored-by: Kai Ninomiya 
---
 spec/index.bs | 37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/spec/index.bs b/spec/index.bs
index 0981c025a0..9be7347c78 100644
--- a/spec/index.bs
+++ b/spec/index.bs
@@ -4338,10 +4338,12 @@ The {{GPUTextureUsage}} flags determine how a {{GPUTexture}} may be used after i
                 - |descriptor|.{{GPUTextureDescriptor/dimension}} must be either {{GPUTextureDimension/"2d"}} or {{GPUTextureDimension/"3d"}}.
             - If |descriptor|.{{GPUTextureDescriptor/usage}} includes the {{GPUTextureUsage/STORAGE_BINDING}} bit:
                 - |descriptor|.{{GPUTextureDescriptor/format}} must be listed in [[#plain-color-formats]] table
-                    with {{GPUTextureUsage/STORAGE_BINDING}} capability for the appropriate access mode.
+                    with {{GPUTextureUsage/STORAGE_BINDING}} capability for at least one access mode.
             - For each |viewFormat| in |descriptor|.{{GPUTextureDescriptor/viewFormats}},
                 |descriptor|.{{GPUTextureDescriptor/format}} and |viewFormat| must be
                 [=texture view format compatible=].
+
+                Issue(#4852): Either validate or encourage a warning here if there is no valid viewFormat/usage combination.
         
@@ -4453,6 +4455,7 @@ dictionary GPUTextureViewDescriptor : GPUObjectDescriptorBase { GPUTextureFormat format; GPUTextureViewDimension dimension; + GPUTextureUsageFlags usage = 0; GPUTextureAspect aspect = "all"; GPUIntegerCoordinate baseMipLevel = 0; GPUIntegerCoordinate mipLevelCount; @@ -4473,6 +4476,16 @@ dictionary GPUTextureViewDescriptor :: The dimension to view the texture as. + : usage + :: + The allowed {{GPUTextureUsage|usage(s)}} for the texture view. Must be a subset of the + {{GPUTexture/usage}} flags of the texture. If 0, defaults to the full set of + {{GPUTexture/usage}} flags of the texture. + + Note: If the view's {{GPUTextureViewDescriptor/format}} doesn't support all of the + texture's {{GPUTextureDescriptor/usage}}s, the default will fail, + and the view's {{GPUTextureViewDescriptor/usage}} must be specified explicitly. + : aspect :: Which {{GPUTextureAspect|aspect(s)}} of the texture are accessible to the texture view. @@ -4682,6 +4695,12 @@ enum GPUTextureAspect { |this|.{{GPUTexture/format}}, |descriptor|.{{GPUTextureViewDescriptor/aspect}}). + - |descriptor|.{{GPUTextureViewDescriptor/usage}} must be a subset of |this|.{{GPUTexture/usage}}. + - If |descriptor|.{{GPUTextureViewDescriptor/usage}} includes the {{GPUTextureUsage/RENDER_ATTACHMENT}} bit: + - |descriptor|.{{GPUTextureViewDescriptor/format}} must be a [=renderable format=]. + - If |descriptor|.{{GPUTextureViewDescriptor/usage}} includes the {{GPUTextureUsage/STORAGE_BINDING}} bit: + - |descriptor|.{{GPUTextureViewDescriptor/format}} must be listed in [[#plain-color-formats]] table + with {{GPUTextureUsage/STORAGE_BINDING}} capability for at least one access mode. - |descriptor|.{{GPUTextureViewDescriptor/mipLevelCount}} must be > 0. - |descriptor|.{{GPUTextureViewDescriptor/baseMipLevel}} + |descriptor|.{{GPUTextureViewDescriptor/mipLevelCount}} must be ≤ @@ -4732,7 +4751,7 @@ enum GPUTextureAspect { 1. Let |view| be a new {{GPUTextureView}} object. 1. Set |view|.{{GPUTextureView/[[texture]]}} to |this|. 1. Set |view|.{{GPUTextureView/[[descriptor]]}} to |descriptor|. - 1. If |this|.{{GPUTexture/usage}} contains {{GPUTextureUsage/RENDER_ATTACHMENT}}: + 1. If |descriptor|.{{GPUTextureViewDescriptor/usage}} contains {{GPUTextureUsage/RENDER_ATTACHMENT}}: 1. Let |renderExtent| be [$compute render extent$]([|this|.{{GPUTexture/width}}, |this|.{{GPUTexture/height}}, |this|.{{GPUTexture/depthOrArrayLayers}}], |descriptor|.{{GPUTextureViewDescriptor/baseMipLevel}}). 1. Set |view|.{{GPUTextureView/[[renderExtent]]}} to |renderExtent|.
@@ -4792,6 +4811,8 @@ enum GPUTextureAspect { :: Set |resolved|.{{GPUTextureViewDescriptor/arrayLayerCount}} to the [$array layer count$] of |texture| − |resolved|.{{GPUTextureViewDescriptor/baseArrayLayer}}. + 1. If |resolved|.{{GPUTextureViewDescriptor/usage}} is `0`: + set |resolved|.{{GPUTextureViewDescriptor/usage}} to |texture|.{{GPUTexture/usage}}. 1. Return |resolved|.
@@ -6320,7 +6341,8 @@ following members: - |layoutBinding|.{{GPUBindGroupLayoutEntry/texture}}.{{GPUTextureBindingLayout/sampleType}} is [[#texture-format-caps|compatible]] with |resource|'s {{GPUTextureViewDescriptor/format}}. - - |texture|'s {{GPUTextureDescriptor/usage}} includes {{GPUTextureUsage/TEXTURE_BINDING}}. + - |resource|.{{GPUTextureView/[[descriptor]]}}.{{GPUTextureViewDescriptor/usage}} + includes {{GPUTextureUsage/TEXTURE_BINDING}}. - If |layoutBinding|.{{GPUBindGroupLayoutEntry/texture}}.{{GPUTextureBindingLayout/multisampled}} is `true`, |texture|'s {{GPUTextureDescriptor/sampleCount}} > `1`, Otherwise |texture|'s {{GPUTextureDescriptor/sampleCount}} is `1`. @@ -6334,7 +6356,8 @@ following members: is equal to |resource|'s {{GPUTextureViewDescriptor/dimension}}. - |layoutBinding|.{{GPUBindGroupLayoutEntry/storageTexture}}.{{GPUStorageTextureBindingLayout/format}} is equal to |resource|.{{GPUTextureView/[[descriptor]]}}.{{GPUTextureViewDescriptor/format}}. - - |texture|'s {{GPUTextureDescriptor/usage}} includes {{GPUTextureUsage/STORAGE_BINDING}}. + - |resource|.{{GPUTextureView/[[descriptor]]}}.{{GPUTextureViewDescriptor/usage}} + includes {{GPUTextureUsage/STORAGE_BINDING}}. - |resource|.{{GPUTextureView/[[descriptor]]}}.{{GPUTextureViewDescriptor/mipLevelCount}} must be 1. : {{GPUBindGroupLayoutEntry/buffer}} @@ -8991,8 +9014,8 @@ operations are performed:
: compare :: - The {{GPUCompareFunction}} used when testing fragments against - {{GPURenderPassDescriptor/depthStencilAttachment}} stencil values. + The {{GPUCompareFunction}} used when testing testing fragments against + {{GPURenderPassDescriptor/depthStencilAttachment}} stencil value. : failOp :: @@ -11600,7 +11623,7 @@ dictionary GPURenderPassColorAttachment {
1. Let |descriptor| be |view|.{{GPUTextureView/[[descriptor]]}}. - 1. |view|.{{GPUTextureView/[[texture]]}}.{{GPUTexture/usage}} + 1. |descriptor|.{{GPUTextureViewDescriptor/usage}} must contain {{GPUTextureUsage/RENDER_ATTACHMENT}}. 1. |descriptor|.{{GPUTextureViewDescriptor/dimension}} must be {{GPUTextureViewDimension/"2d"}} or {{GPUTextureViewDimension/"2d-array"}} or {{GPUTextureViewDimension/"3d"}}. From 7084b31c332b1aa3f1df0c413be0210cd7d1e785 Mon Sep 17 00:00:00 2001 From: Greggman Date: Thu, 5 Sep 2024 17:17:08 -0700 Subject: [PATCH 183/285] Uncapturederror and preventDefault (#4708) * Uncapturederror and preventDefault Like JavaScript error events, If a listener added to uncapturederror calls `preventDefault` on the event the user agent should not log the error ot the console. Co-authored-by: Kai Ninomiya --- spec/index.bs | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 9be7347c78..c59399991a 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -14756,9 +14756,10 @@ partial interface GPUDevice { |device|, with an {{GPUUncapturedErrorEvent/error}} of |error|.
- Note: If (and only if) there are no {{GPUDevice/uncapturederror}} handlers are - registered, user agents **should** surface uncaptured errors to developers, - for example as warnings in the browser's developer console. + Note: After dispatching the event, user agents **should** surface uncaptured errors to + developers, for example as warnings in the browser's developer console, unless the event's + {{Event/defaultPrevented}} is true. In other words, calling {{Event/preventDefault()}} + on the event should silence the console warning.
Note: The user agent may choose to throttle or limit the number of {{GPUUncapturedErrorEvent}}s From 13fdb965ca73869ae10559c4a67e3beaa793d3cc Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Fri, 6 Sep 2024 15:57:19 -0700 Subject: [PATCH 184/285] Describing depth/stencil processing in Detailed Operations (#4849) Part one of two to fill out the output merging section. --- spec/index.bs | 104 +++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 90 insertions(+), 14 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index c59399991a..ff9e3180ba 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -9014,8 +9014,8 @@ operations are performed:
: compare :: - The {{GPUCompareFunction}} used when testing testing fragments against - {{GPURenderPassDescriptor/depthStencilAttachment}} stencil value. + The {{GPUCompareFunction}} used when testing the {{RenderState/[[stencilReference]]}} value + against the fragment's {{GPURenderPassDescriptor/depthStencilAttachment}} stencil values. : failOp :: @@ -15116,14 +15116,15 @@ The main rendering algorithm: 1. **Process fragments**. See [[#fragment-processing]]. Gather a list of |fragments|, resulting from executing - [$process fragment$](|rasterPoint|, |descriptor|.{{GPURenderPipelineDescriptor/fragment}}, |state|) + [$process fragment$](|rasterPoint|, |descriptor|, |state|) for each |rasterPoint| in |rasterizationList|. - 1. **Process depth/stencil**. + 1. **Process depth/stencil**. See [[#output-merging]]. -

Editorial note: fill out the section, using |fragments| + Execute [$process depth stencil$](|fragment|, |descriptor|, |state|) For each non-null + |fragment| of |fragments|. - 1. **Write pixels**. + 1. **Write pixels**. See [[#output-merging]].

Editorial note: fill out the section @@ -15687,29 +15688,49 @@ computes the fragment data (often a color) to be written into render targets. This stage produces a Fragment for each [=RasterizationPoint=]:

- destination refers to [=FragmentDestination=]. + - frontFacing is true if it's a fragment on the front face of a primitive. - coverageMask refers to multisample coverage mask (see [[#sample-masking]]). - depth refers to the depth in [=viewport coordinates=], i.e. between the {{RenderState/[[viewport]]}} `minDepth` and `maxDepth`. - colors refers to the list of color values, one for each target in {{GPURenderPassDescriptor/colorAttachments}}. + - depthPassed + is `true` if the fragment passed the {{GPUDepthStencilState/depthCompare}} operation. + - stencilPassed + is `true` if the fragment passed the stencil {{GPUStencilFaceState/compare}} operation.
- process fragment(rp, desc, state) + process fragment(rp, descriptor, state) **Arguments:** - |rp|: The [=RasterizationPoint=], produced by [[#rasterization]]. - - |desc|: The descriptor of type {{GPUFragmentState}}. + - |descriptor|: The descriptor of type {{GPURenderPipelineDescriptor}}. - |state|: The active [=RenderState=]. **Returns:** [=Fragment=] or `null`. + 1. Let |fragmentDesc| be |descriptor|.{{GPURenderPipelineDescriptor/fragment}}. + 1. Let |depthStencilDesc| be |descriptor|.{{GPURenderPipelineDescriptor/depthStencil}}. 1. Let |fragment| be a new [=Fragment=] object. 1. Set |fragment|.[=Fragment/destination=] to |rp|.[=RasterizationPoint/destination=]. + 1. Set |fragment|.[=Fragment/frontFacing=] to |rp|.[=RasterizationPoint/frontFacing=]. 1. Set |fragment|.[=Fragment/coverageMask=] to |rp|.[=RasterizationPoint/coverageMask=]. 1. Set |fragment|.[=Fragment/depth=] to |rp|.[=RasterizationPoint/depth=]. - 1. If |desc| is not `null`: + 1. If `frag_depth` [=builtin=] is not produced by the shader: + 1. Set |fragment|.[=Fragment/depthPassed=] to the result of [$compare fragment$](|fragment|.[=Fragment/destination=], + |fragment|.[=Fragment/depth=], depth, |state|.{{RenderState/[[depthStencilAttachment]]}}, + |depthStencilDesc|?.{{GPUDepthStencilState/depthCompare}}). + 1. Set |stencilState| to |depthStencilDesc|?.{{GPUDepthStencilState/stencilFront}} if + |rp|.[=RasterizationPoint/frontFacing=] is `true` and |depthStencilDesc|?.{{GPUDepthStencilState/stencilBack}} + otherwise. + 1. Set |fragment|.[=Fragment/stencilPassed=] to the result of [$compare fragment$](|fragment|.[=Fragment/destination=], + |state|.{{RenderState/[[stencilReference]]}}, stencil, |state|.{{RenderState/[[depthStencilAttachment]]}}, + |stencilState|?.{{GPUStencilFaceState/compare}}). + 1. If |fragmentDesc| is not `null`: + 1. If |fragment|.[=Fragment/depthPassed=] is `false` and the `frag_depth` [=builtin=] is not produced by the + shader the following steps may be skipped. 1. Set the shader input [=builtins=]. For each non-composite argument of the entry point, annotated as a [=builtin=], set its value based on the annotation: @@ -15731,7 +15752,7 @@ This stage produces a Fragment for each [=RasterizationPoint=]: based on |rp|.[=RasterizationPoint/barycentricCoordinates=], |rp|.[=RasterizationPoint/primitiveVertices=], and the [=interpolation=] qualifier on the input. 1. Set the corresponding fragment shader [=location=] input to |value|. - 1. Invoke the fragment shader entry point described by |desc|. + 1. Invoke the fragment shader entry point described by |fragmentDesc|. The [=device=] may become [=lose the device|lost=] if [=shader execution end|shader execution does not end=] @@ -15743,6 +15764,9 @@ This stage produces a Fragment for each [=RasterizationPoint=]: 1. If `frag_depth` [=builtin=] is produced by the shader as |value|: 1. Let |vp| be |state|.{{RenderState/[[viewport]]}}. 1. Set |fragment|.[=Fragment/depth=] to clamp(|value|, |vp|.`minDepth`, |vp|.`maxDepth`). + 1. Set |fragment|.[=Fragment/depthPassed=] to the result of [$compare fragment$](|fragment|.[=Fragment/destination=], + |fragment|.[=Fragment/depth=], depth, |state|.{{RenderState/[[depthStencilAttachment]]}}, + |depthStencilDesc|?.{{GPUDepthStencilState/depthCompare}}). 1. If `sample_mask` [=builtin=] is produced by the shader as |value|: 1. Set |fragment|.[=Fragment/coverageMask=] to |fragment|.[=Fragment/coverageMask=] ∧ |value|. @@ -15750,17 +15774,69 @@ This stage produces a Fragment for each [=RasterizationPoint=]: 1. Return |fragment|.
+
+ compare fragment(destination, value, aspect, attachment, compareFunc) + + **Arguments:** + + - |destination|: The [=FragmentDestination=]. + - |value|: The value to be compared. + - |aspect|: The aspect of |attachement| to sample values from. + - |attachment|: The attachment to be compared against. + - |compareFunc|: The comparison function to use. + + **Returns:** `true` if the comparison passes, or `false` otherwise + + - If |attachement| is `undefined` or does not have |aspect|, return `true`. + - If |compareFunc| is `undefined` or {{GPUCompareFunction/"always"}}, return `true`. + - Let |attachmentValue| be the value of |aspect| of |attachment| at |destination|. + - Return `true` if comparing |value| with |attachmentValue| using |compareFunc| succeeds, and `false` otherwise. +
+ Processing of fragments happens in parallel, while any side effects, such as writes into {{GPUBufferBindingType/"storage"|GPUBufferBindingType."storage"}} bindings, may happen in any order. ### Output Merging ### {#output-merging} -

Editorial note: fill out this section +Output merging is a fixed-function stage of the render [=pipeline=] that +outputs the fragment color, depth and stencil data to be written into the render pass attachments. -The depth input to this stage, if any, is clamped to the current -{{RenderState/[[viewport]]}} depth range -(regardless of whether the fragment shader stage writes the `frag_depth` builtin). +

+ process depth stencil(fragment, descriptor, state) + + **Arguments:** + + - |fragment|: The [=Fragment=], produced by [[#fragment-processing]]. + - |descriptor|: The descriptor of type {{GPURenderPipelineDescriptor}}. + - |state|: The active [=RenderState=]. + + 1. Let |depthStencilDesc| be |descriptor|.{{GPURenderPipelineDescriptor/depthStencil}}. + + 1. If |descriptor|.{{GPURenderPipeline/[[writesDepth]]}} is `true` and |fragment|.[=Fragment/depthPassed=] is `true`: + 1. Set the value of the depth aspect of |state|.{{RenderState/[[depthStencilAttachment]]}} at + |fragment|.[=Fragment/destination=] to |fragment|.[=Fragment/depth=]. + + 1. If |descriptor|.{{GPURenderPipeline/[[writesStencil]]}} is true: + 1. Set |stencilState| to |depthStencilDesc|.{{GPUDepthStencilState/stencilFront}} if + |fragment|.[=Fragment/frontFacing=] is `true` and + |depthStencilDesc|.{{GPUDepthStencilState/stencilBack}} otherwise. + 1. If |fragment|.[=Fragment/stencilPassed=] is `false`: + - Let |stencilOp| be |stencilState|.{{GPUStencilFaceState/failOp}}. + + Else if |fragment|.[=Fragment/depthPassed=] is `false`: + - Let |stencilOp| be |stencilState|.{{GPUStencilFaceState/depthFailOp}}. + + Else: + - Let |stencilOp| be |stencilState|.{{GPUStencilFaceState/passOp}}. + 1. Update the value of the stencil aspect of |state|.{{RenderState/[[depthStencilAttachment]]}} at + |fragment|.[=Fragment/destination=] by performing the operation described by |stencilOp|. +
+ +The depth input to this stage, if any, is clamped to the current {{RenderState/[[viewport]]}} depth +range (regardless of whether the fragment shader stage writes the `frag_depth` builtin). + +

Editorial note: fill out this section ### No Color Output ### {#no-color-output} From eddc196297cd0f4b19544480528927b7e84c6625 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Mon, 9 Sep 2024 09:34:53 -0700 Subject: [PATCH 185/285] Filling out the output merging section (#4855) Slightly reformats some of the depth/stencil processing for consistency and adds algorithmic description of color attachment processing. --- spec/index.bs | 66 +++++++++++++++++++++++++++++++++++---------------- 1 file changed, 46 insertions(+), 20 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index ff9e3180ba..b9bb682bbe 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -15085,14 +15085,16 @@ blocks in hardware. The main rendering algorithm:

- render(descriptor, drawCall, state) + render(pipeline, drawCall, state) **Arguments:** - - |descriptor|: Description of the current {{GPURenderPipeline}}. + - |pipeline|: The current {{GPURenderPipeline}}. - |drawCall|: The draw call parameters. May come from function arguments or an {{GPUBufferUsage/INDIRECT}} buffer. - |state|: [=RenderState=] of the {{GPURenderCommandsMixin}} where the draw call is issued. + 1. Let |descriptor| be |pipeline|.{{GPURenderPipeline/[[descriptor]]}}. + 1. **Resolve indices**. See [[#index-resolution]]. Let |vertexList| be the result of [$resolve indices$](|drawCall|, |state|). @@ -15119,14 +15121,11 @@ The main rendering algorithm: [$process fragment$](|rasterPoint|, |descriptor|, |state|) for each |rasterPoint| in |rasterizationList|. - 1. **Process depth/stencil**. See [[#output-merging]]. - - Execute [$process depth stencil$](|fragment|, |descriptor|, |state|) For each non-null - |fragment| of |fragments|. - 1. **Write pixels**. See [[#output-merging]]. -

Editorial note: fill out the section + For each non-null |fragment| of |fragments|: + - Execute [$process depth stencil$](|fragment|, |pipeline|, |state|). + - Execute [$process color attachments$](|fragment|, |pipeline|, |state|).

### Index Resolution ### {#index-resolution} @@ -15720,13 +15719,13 @@ This stage produces a Fragment for each [=RasterizationPoint=]: 1. Set |fragment|.[=Fragment/depth=] to |rp|.[=RasterizationPoint/depth=]. 1. If `frag_depth` [=builtin=] is not produced by the shader: 1. Set |fragment|.[=Fragment/depthPassed=] to the result of [$compare fragment$](|fragment|.[=Fragment/destination=], - |fragment|.[=Fragment/depth=], depth, |state|.{{RenderState/[[depthStencilAttachment]]}}, + |fragment|.[=Fragment/depth=], "[=aspect/depth=]", |state|.{{RenderState/[[depthStencilAttachment]]}}, |depthStencilDesc|?.{{GPUDepthStencilState/depthCompare}}). 1. Set |stencilState| to |depthStencilDesc|?.{{GPUDepthStencilState/stencilFront}} if |rp|.[=RasterizationPoint/frontFacing=] is `true` and |depthStencilDesc|?.{{GPUDepthStencilState/stencilBack}} otherwise. 1. Set |fragment|.[=Fragment/stencilPassed=] to the result of [$compare fragment$](|fragment|.[=Fragment/destination=], - |state|.{{RenderState/[[stencilReference]]}}, stencil, |state|.{{RenderState/[[depthStencilAttachment]]}}, + |state|.{{RenderState/[[stencilReference]]}}, "[=aspect/stencil=]", |state|.{{RenderState/[[depthStencilAttachment]]}}, |stencilState|?.{{GPUStencilFaceState/compare}}). 1. If |fragmentDesc| is not `null`: 1. If |fragment|.[=Fragment/depthPassed=] is `false` and the `frag_depth` [=builtin=] is not produced by the @@ -15765,7 +15764,7 @@ This stage produces a Fragment for each [=RasterizationPoint=]: 1. Let |vp| be |state|.{{RenderState/[[viewport]]}}. 1. Set |fragment|.[=Fragment/depth=] to clamp(|value|, |vp|.`minDepth`, |vp|.`maxDepth`). 1. Set |fragment|.[=Fragment/depthPassed=] to the result of [$compare fragment$](|fragment|.[=Fragment/destination=], - |fragment|.[=Fragment/depth=], depth, |state|.{{RenderState/[[depthStencilAttachment]]}}, + |fragment|.[=Fragment/depth=], "[=aspect/depth=]", |state|.{{RenderState/[[depthStencilAttachment]]}}, |depthStencilDesc|?.{{GPUDepthStencilState/depthCompare}}). 1. If `sample_mask` [=builtin=] is produced by the shader as |value|: 1. Set |fragment|.[=Fragment/coverageMask=] to |fragment|.[=Fragment/coverageMask=] ∧ |value|. @@ -15781,13 +15780,13 @@ This stage produces a Fragment for each [=RasterizationPoint=]: - |destination|: The [=FragmentDestination=]. - |value|: The value to be compared. - - |aspect|: The aspect of |attachement| to sample values from. + - |aspect|: The [=aspect=] of |attachment| to sample values from. - |attachment|: The attachment to be compared against. - - |compareFunc|: The comparison function to use. + - |compareFunc|: The {{GPUCompareFunction}} to use, or `undefined`. **Returns:** `true` if the comparison passes, or `false` otherwise - - If |attachement| is `undefined` or does not have |aspect|, return `true`. + - If |attachment| is `undefined` or does not have |aspect|, return `true`. - If |compareFunc| is `undefined` or {{GPUCompareFunction/"always"}}, return `true`. - Let |attachmentValue| be the value of |aspect| of |attachment| at |destination|. - Return `true` if comparing |value| with |attachmentValue| using |compareFunc| succeeds, and `false` otherwise. @@ -15803,21 +15802,21 @@ Output merging is a fixed-function stage of the render [=pipeline=] that outputs the fragment color, depth and stencil data to be written into the render pass attachments.
- process depth stencil(fragment, descriptor, state) + process depth stencil(fragment, pipeline, state) **Arguments:** - |fragment|: The [=Fragment=], produced by [[#fragment-processing]]. - - |descriptor|: The descriptor of type {{GPURenderPipelineDescriptor}}. + - |pipeline|: The current {{GPURenderPipeline}}. - |state|: The active [=RenderState=]. - 1. Let |depthStencilDesc| be |descriptor|.{{GPURenderPipelineDescriptor/depthStencil}}. + 1. Let |depthStencilDesc| be |pipeline|.{{GPURenderPipeline/[[descriptor]]}}.{{GPURenderPipelineDescriptor/depthStencil}}. - 1. If |descriptor|.{{GPURenderPipeline/[[writesDepth]]}} is `true` and |fragment|.[=Fragment/depthPassed=] is `true`: + 1. If |pipeline|.{{GPURenderPipeline/[[writesDepth]]}} is `true` and |fragment|.[=Fragment/depthPassed=] is `true`: 1. Set the value of the depth aspect of |state|.{{RenderState/[[depthStencilAttachment]]}} at |fragment|.[=Fragment/destination=] to |fragment|.[=Fragment/depth=]. - 1. If |descriptor|.{{GPURenderPipeline/[[writesStencil]]}} is true: + 1. If |pipeline|.{{GPURenderPipeline/[[writesStencil]]}} is true: 1. Set |stencilState| to |depthStencilDesc|.{{GPUDepthStencilState/stencilFront}} if |fragment|.[=Fragment/frontFacing=] is `true` and |depthStencilDesc|.{{GPUDepthStencilState/stencilBack}} otherwise. @@ -15836,7 +15835,34 @@ outputs the fragment color, depth and stencil data to be written into the render The depth input to this stage, if any, is clamped to the current {{RenderState/[[viewport]]}} depth range (regardless of whether the fragment shader stage writes the `frag_depth` builtin). -

Editorial note: fill out this section +

+ process color attachments(fragment, pipeline, state) + + **Arguments:** + + - |fragment|: The [=Fragment=], produced by [[#fragment-processing]]. + - |pipeline|: The current {{GPURenderPipeline}}. + - |state|: The active [=RenderState=]. + + 1. If |fragment|.[=Fragment/depthPassed=] is `false` or |fragment|.[=Fragment/stencilPassed=] is `false`, return. + + 1. Let |targets| be |pipeline|.{{GPURenderPipeline/[[descriptor]]}}.{{GPURenderPipelineDescriptor/fragment}}.{{GPUFragmentState/targets}}. + 1. For each |attachment| of |state|.{{RenderState/[[colorAttachments]]}}: + 1. Let |color| be the value from |fragment|.[=Fragment/colors=] that corresponds with |attachment|. + 1. Let |targetDesc| be the |targets| entry that corresponds with |attachment|. + + 1. If |targetDesc|.{{GPUColorTargetState/blend}} is [=map/exist|provided=]: + 1. Let |colorBlend| be |targetDesc|.{{GPUColorTargetState/blend}}.{{GPUBlendState/color}}. + 1. Let |alphaBlend| be |targetDesc|.{{GPUColorTargetState/blend}}.{{GPUBlendState/alpha}}. + 1. Set the RGB components of |color| to the value computed by performing the operation described by + |colorBlend|.{{GPUBlendComponent/operation}} with the values described by + |colorBlend|.{{GPUBlendComponent/srcFactor}} and |colorBlend|.{{GPUBlendComponent/dstFactor}}. + 1. Set the alpha component of |color| to the value computed by performing the operation described by + |alphaBlend|.{{GPUBlendComponent/operation}} with the values described by + |alphaBlend|.{{GPUBlendComponent/srcFactor}} and |alphaBlend|.{{GPUBlendComponent/dstFactor}}. + + 1. Set the value of |attachment| at |fragment|.[=Fragment/destination=] to |color|. +
### No Color Output ### {#no-color-output} From 7ef8208faab892b7526ff9c09e6b7f70afe52a19 Mon Sep 17 00:00:00 2001 From: Mehmet Oguz Derin Date: Tue, 10 Sep 2024 02:28:38 +0900 Subject: [PATCH 186/285] AbstractInt is two's complement, also improve the integer types text (#4858) --- wgsl/index.bs | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index c29ec8182a..e2f30e1869 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -2525,7 +2525,8 @@ Certain expressions are evaluated at [=shader module creation|shader-creation ti and with a numeric range and precision that may be larger than directly implemented by the GPU. WGSL defines two abstract numeric types for these evaluations: -* The AbstractInt type is the set of integers |i|, with -263 ≤ |i| < 263. +* The AbstractInt type is the set of integers representable + in the 64-bit two's complement format, with the sign bit in the most significant bit position. * The AbstractFloat type is the set of finite floating point numbers representable in the [[!IEEE-754|IEEE-754]] [=ieee754/binary64=] (double precision) format. @@ -2538,7 +2539,7 @@ A type is concrete if it is not abstract. A [=numeric literal=] without a suffix denotes a value in an [=abstract numeric type=]: * An [=integer literal=] without an `i` or `u` suffix denotes an [=AbstractInt=] value. -* A [=floating point literal=] without an `f` or `h` suffix denotes a [=AbstractFloat=] value. +* A [=floating point literal=] without an `f` or `h` suffix denotes an [=AbstractFloat=] value. Example: The expression `log2(32)` is analyzed as follows: * `log2(32)` is parsed as a function call to the `log2` builtin function with operand [=AbstractInt=] value 32. @@ -2708,7 +2709,7 @@ The bool type contains the values `true` and `false`. The u32 type is the set of 32-bit unsigned integers. The i32 type is the set of 32-bit signed integers. -It uses a two's complementation representation, with the sign bit in the most significant bit position. +It uses the two's complement representation, with the sign bit in the most significant bit position. [[#arithmetic-expr|Expressions]] on [=type/concrete=] integer types that overflow produce a result that is modulo 2bitwidth From 61b9608e1b9a1019fe860aa49e66d5044d0f5e00 Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Tue, 10 Sep 2024 16:40:43 -0700 Subject: [PATCH 187/285] [editorial] Fix fencepost error in maxBindGroupsPlusVertexBuffers validation (#4860) --- spec/index.bs | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index b9bb682bbe..0c1985d9b3 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -8040,19 +8040,21 @@ dictionary GPURenderPipelineDescriptor 1. Let |layout| be a new [$default pipeline layout$] for |pipeline| if |descriptor|.{{GPUPipelineDescriptorBase/layout}} is {{GPUAutoLayoutMode/"auto"}}, and |descriptor|.{{GPUPipelineDescriptorBase/layout}} otherwise. - 1. If any of the following conditions are unsatisfied: - [$generate a validation error$], [$invalidate$] |pipeline|, and stop. + 1. All of the requirements in the following steps |must| be met. + If any are unmet, [$invalidate$] |pipeline| and return.
- - |layout| is [$valid to use with$] |this|. - - [$validating GPURenderPipelineDescriptor$](|descriptor|, |layout|, |this|) succeeds. - - |layout|.{{GPUPipelineLayout/[[bindGroupLayouts]]}}.length + |vertexBufferCount| is ≤ - |this|.{{GPUObjectBase/[[device]]}}.{{device/[[limits]]}}.{{supported limits/maxBindGroupsPlusVertexBuffers}}, - where |vertexBufferCount| is the maximum index in |descriptor|.{{GPURenderPipelineDescriptor/vertex}}.{{GPUVertexState/buffers}} that is not `undefined`. + 1. |layout| |must| be [$valid to use with$] |this|. + 1. [$validating GPURenderPipelineDescriptor$](|descriptor|, |layout|, |this|) must succeed. + 1. Let |vertexBufferCount| be the index of the last non-null entry in + |descriptor|.{{GPURenderPipelineDescriptor/vertex}}.{{GPUVertexState/buffers}}, + plus 1; or 0 if there are none. + 1. |layout|.{{GPUPipelineLayout/[[bindGroupLayouts]]}}.[=list/size=] + |vertexBufferCount| must be ≤ + |this|.{{GPUObjectBase/[[device]]}}.{{device/[[limits]]}}.{{supported limits/maxBindGroupsPlusVertexBuffers}}.
1. If any [=pipeline-creation error|pipeline-creation=] [=uncategorized errors=] result from the implementation of pipeline creation, - [$generate an internal error$], [$invalidate$] |pipeline|, and stop. + [$generate an internal error$], [$invalidate$] |pipeline| and return. Note: Even if the implementation detected [=uncategorized errors=] in shader module From 1eab553f90fb5ac3bd348316f777087ffcdb0fbb Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Tue, 10 Sep 2024 16:41:01 -0700 Subject: [PATCH 188/285] [editorial] Use `.[=list/size=]` instead of `.length` (#4861) --- spec/index.bs | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 0c1985d9b3..98d2406c87 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -8527,7 +8527,7 @@ dictionary GPUFragmentState
- [$validating GPUProgrammableStage$]({{GPUShaderStage/FRAGMENT}}, |descriptor|, |layout|, |device|) succeeds. - - |descriptor|.{{GPUFragmentState/targets}}.length must be ≤ + - |descriptor|.{{GPUFragmentState/targets}}.[=list/size=] must be ≤ |device|.{{device/[[limits]]}}.{{supported limits/maxColorAttachments}}. - Let |entryPoint| be [$get the entry point$]({{GPUShaderStage/FRAGMENT}}, |descriptor|). - Let |usesDualSourceBlending| be `false`. @@ -8579,7 +8579,7 @@ dictionary GPUFragmentState - |entryPoint| must have a [=shader stage output=] with [=location=] equal to |index| and [=blend_src=] omitted or equal to 0. - If |usesDualSourceBlending| is `true`: - - |descriptor|.{{GPUFragmentState/targets}}.length must be 1. + - |descriptor|.{{GPUFragmentState/targets}}.[=list/size=] must be 1. - All the [=shader stage outputs=] with [=location=] in |entryPoint| must be in one struct and [=use dual source blending=]. - [$Validating GPUFragmentState's color attachment bytes per sample$](|device|, |descriptor|.{{GPUFragmentState/targets}}) succeeds. @@ -9608,11 +9608,11 @@ dictionary GPUVertexAttribute {
- [$validating GPUProgrammableStage$]({{GPUShaderStage/VERTEX}}, |descriptor|, |layout|, |device|) succeeds. - - |descriptor|.{{GPUVertexState/buffers}}.length is ≤ + - |descriptor|.{{GPUVertexState/buffers}}.[=list/size=] is ≤ |device|.{{GPUObjectBase/[[device]]}}.{{device/[[limits]]}}.{{supported limits/maxVertexBuffers}}. - Each |vertexBuffer| layout descriptor in the list |descriptor|.{{GPUVertexState/buffers}} passes [$validating GPUVertexBufferLayout$](|device|, |vertexBuffer|, |descriptor|) - - The sum of |vertexBuffer|.{{GPUVertexBufferLayout/attributes}}.length, + - The sum of |vertexBuffer|.{{GPUVertexBufferLayout/attributes}}.[=list/size=], over every |vertexBuffer| in |descriptor|.{{GPUVertexState/buffers}}, is ≤ |device|.{{GPUObjectBase/[[device]]}}.{{device/[[limits]]}}.{{supported limits/maxVertexAttributes}}. @@ -10650,7 +10650,7 @@ It must only be included by interfaces which also include those mixins.
- |index| must be < |this|.{{GPUObjectBase/[[device]]}}.{{device/[[limits]]}}.{{supported limits/maxBindGroups}}. - - |dynamicOffsets|.length must equal |dynamicOffsetCount|. + - |dynamicOffsets|.[=list/size=] must equal |dynamicOffsetCount|.
1. If |bindGroup| is `null`: 1. [=map/Remove=] |this|.{{GPUBindingCommandsMixin/[[bind_groups]]}}[|index|]. @@ -11462,7 +11462,7 @@ dictionary GPURenderPassDescriptor Given a {{GPUDevice}} |device| and {{GPURenderPassDescriptor}} |this|, the following validation rules apply: - 1. |this|.{{GPURenderPassDescriptor/colorAttachments}}.length must be ≤ + 1. |this|.{{GPURenderPassDescriptor/colorAttachments}}.[=list/size=] must be ≤ |device|.{{device/[[limits]]}}.{{supported limits/maxColorAttachments}}. 1. For each non-`null` |colorAttachment| in |this|.{{GPURenderPassDescriptor/colorAttachments}}: @@ -12314,7 +12314,7 @@ It must only be included by interfaces which also include those mixins.
1. It |must| be [$valid to draw$] with |this|. 1. Let |buffers| be |this|.{{GPURenderCommandsMixin/[[pipeline]]}}.{{GPURenderPipeline/[[descriptor]]}}.{{GPURenderPipelineDescriptor/vertex}}.{{GPUVertexState/buffers}}. - 1. For each {{GPUIndex32}} |slot| from `0` to |buffers|.length (non-inclusive): + 1. For each {{GPUIndex32}} |slot| from `0` to |buffers|.[=list/size=] (non-inclusive): 1. If |buffers|[|slot|] is `null`, [=iteration/continue=]. 1. Let |bufferSize| be |this|.{{GPURenderCommandsMixin/[[vertex_buffer_sizes]]}}[|slot|]. 1. Let |stride| be |buffers|[|slot|].{{GPUVertexBufferLayout/arrayStride}}. @@ -12384,7 +12384,7 @@ It must only be included by interfaces which also include those mixins. - |firstIndex| + |indexCount| ≤ |this|.{{GPURenderCommandsMixin/[[index_buffer_size]]}} ÷ |this|.{{GPURenderCommandsMixin/[[index_format]]}}'s byte size; - Let |buffers| be |this|.{{GPURenderCommandsMixin/[[pipeline]]}}.{{GPURenderPipeline/[[descriptor]]}}.{{GPURenderPipelineDescriptor/vertex}}.{{GPUVertexState/buffers}}. - - For each {{GPUIndex32}} |slot| from `0` to |buffers|.length (non-inclusive): + - For each {{GPUIndex32}} |slot| from `0` to |buffers|.[=list/size=] (non-inclusive): - If |buffers|[|slot|] is `null`, [=iteration/continue=]. - Let |bufferSize| be |this|.{{GPURenderCommandsMixin/[[vertex_buffer_sizes]]}}[|slot|]. - Let |stride| be |buffers|[|slot|].{{GPUVertexBufferLayout/arrayStride}}. @@ -12591,7 +12591,7 @@ It must only be included by interfaces which also include those mixins. must be `true`. - Let |pipelineDescriptor| be |encoder|.{{GPURenderCommandsMixin/[[pipeline]]}}.{{GPURenderPipeline/[[descriptor]]}}. - For each {{GPUIndex32}} |slot| `0` to - |pipelineDescriptor|.{{GPURenderPipelineDescriptor/vertex}}.{{GPUVertexState/buffers}}.length: + |pipelineDescriptor|.{{GPURenderPipelineDescriptor/vertex}}.{{GPUVertexState/buffers}}.[=list/size=]: - If |pipelineDescriptor|.{{GPURenderPipelineDescriptor/vertex}}.{{GPUVertexState/buffers}}[|slot|] is not `null`, |encoder|.{{GPURenderCommandsMixin/[[vertex_buffers]]}} must [=map/contain=] |slot|. - Validate {{supported limits/maxBindGroupsPlusVertexBuffers}}: @@ -13076,7 +13076,7 @@ GPURenderBundleEncoder includes GPURenderCommandsMixin;
- |this| must not be [$invalid|lost$]. - - |descriptor|.{{GPURenderPassLayout/colorFormats}}.length must be ≤ + - |descriptor|.{{GPURenderPassLayout/colorFormats}}.[=list/size=] must be ≤ |this|.{{device/[[limits]]}}.{{supported limits/maxColorAttachments}}. - For each non-`null` |colorFormat| in |descriptor|.{{GPURenderPassLayout/colorFormats}}: - |colorFormat| must be a [=color renderable format=]. @@ -15995,7 +15995,7 @@ integers and single-precision floats. [=Content timeline=] steps: - 1. Throw a {{TypeError}} if |color| is a sequence and |color|.length ≠ 4. + 1. Throw a {{TypeError}} if |color| is a sequence and |color|.[=list/size=] ≠ 4.
@@ -14028,6 +14029,22 @@ interface GPUCanvasContext {
+ : getConfiguration() + :: + Returns the context configuration. + +
+
+ **Called on:** {{GPUCanvasContext}} |this|. + + **Returns:** {{GPUCanvasConfiguration}} + + [=Content timeline=] steps: + + 1. Return |this|.{{GPUCanvasContext/[[configuration]]}}. +
+
+ : getCurrentTexture() :: Get the {{GPUTexture}} that will be composited to the document by the {{GPUCanvasContext}} From 1f32757f1283307a3e1791c33d9bbb70a665fe4e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Beaufort?= Date: Thu, 26 Sep 2024 06:43:01 +0000 Subject: [PATCH 215/285] Revert "Add GPUCanvasContext getConfiguration() method" This reverts commit 513ed5f2a123dbd4351116b4f3c0719d02a59453. --- spec/index.bs | 17 ----------------- 1 file changed, 17 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 0d74406907..26216ad202 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -13887,7 +13887,6 @@ interface GPUCanvasContext { undefined configure(GPUCanvasConfiguration configuration); undefined unconfigure(); - GPUCanvasConfiguration getConfiguration(); GPUTexture getCurrentTexture(); }; @@ -14029,22 +14028,6 @@ interface GPUCanvasContext {
- : getConfiguration() - :: - Returns the context configuration. - -
-
- **Called on:** {{GPUCanvasContext}} |this|. - - **Returns:** {{GPUCanvasConfiguration}} - - [=Content timeline=] steps: - - 1. Return |this|.{{GPUCanvasContext/[[configuration]]}}. -
-
- : getCurrentTexture() :: Get the {{GPUTexture}} that will be composited to the document by the {{GPUCanvasContext}} From 7cffc109d9e971dfa4a950466b798a4cdf3294a7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Beaufort?= Date: Sat, 28 Sep 2024 00:30:49 +0200 Subject: [PATCH 216/285] Add GPUCanvasContext getConfiguration() method (#4899) --- spec/index.bs | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/spec/index.bs b/spec/index.bs index 26216ad202..aef3bc1f13 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -13887,6 +13887,7 @@ interface GPUCanvasContext { undefined configure(GPUCanvasConfiguration configuration); undefined unconfigure(); + GPUCanvasConfiguration? getConfiguration(); GPUTexture getCurrentTexture(); }; @@ -14028,6 +14029,30 @@ interface GPUCanvasContext { + : getConfiguration() + :: + Returns the context configuration. + +
+
+ **Called on:** {{GPUCanvasContext}} |this|. + + **Returns:** {{GPUCanvasConfiguration}} or `null` + + [=Content timeline=] steps: + + 1. Let |configuration| be a copy of |this|.{{GPUCanvasContext/[[configuration]]}}. + 1. Return |configuration|. +
+
+ +
+ In scenarios where {{GPUCanvasContext/getConfiguration()}} shows that + {{GPUCanvasConfiguration/toneMapping}} is implemented and the '@media/dynamic-range' media + query indicates HDR support, then WebGPU canvas **should** render content using the full + HDR range instead of clamping values to the SDR range of the HDR display. +
+ : getCurrentTexture() :: Get the {{GPUTexture}} that will be composited to the document by the {{GPUCanvasContext}} From 2505300ee589f03018a6d07ca7c7ab55a0b915d6 Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Mon, 30 Sep 2024 14:51:40 -0700 Subject: [PATCH 217/285] Add [[lastPresentedImage]] as fallback for canvas image (#4902) --- spec/index.bs | 69 ++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 52 insertions(+), 17 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index aef3bc1f13..376e9085ed 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -13955,6 +13955,15 @@ interface GPUCanvasContext { [$Expire the current texture$] sets the currentTexture to `null`. It is called by {{GPUCanvasContext/configure()}}, resizing the canvas, presentation, {{OffscreenCanvas/transferToImageBitmap()}}, and others. + + : \[[lastPresentedImage]], of type `(readonly image)?`, initially `null` + :: + The image most recently presented for this canvas in "[$updating the rendering of a WebGPU canvas$]". + If the device is lost or destroyed, this image **may** be used as a fallback in + "[$get a copy of the image contents of a context$]" in order to prevent the canvas from going blank. + + Note: + This property only needs to exist in implementations which implement the fallback, which is optional.
{{GPUCanvasContext}} has the following methods: @@ -13988,8 +13997,7 @@ interface GPUCanvasContext { [$GPUTextureDescriptor for the canvas and configuration$](|this|.{{GPUCanvasContext/canvas}}, |configuration|). 1. Set |this|.{{GPUCanvasContext/[[configuration]]}} to |configuration|. 1. Set |this|.{{GPUCanvasContext/[[textureDescriptor]]}} to |descriptor|. - 1. [$Replace the drawing buffer$] of |this|, which resets - |this|.{{GPUCanvasContext/[[drawingBuffer]]}} with a bitmap with the new format and tags. + 1. [$Replace the drawing buffer$] of |this|. 1. Issue the subsequent steps on the [=Device timeline=] of |device|.
@@ -14070,7 +14078,7 @@ interface GPUCanvasContext { (see [=automatic expiry task source=]). Expiry is only guaranteed when a visible canvas is displayed ([$updating the rendering of a WebGPU canvas$]) and in other - callers of [$Replace the drawing buffer$]. + callers of "[$Expire the current texture$]".
@@ -14125,17 +14133,31 @@ interface GPUCanvasContext { [=Content timeline=] steps: - 1. If |context|.{{GPUCanvasContext/[[configuration]]}} is `null`: - 1. Return a transparent black image of the same size as |context|.{{GPUCanvasContext/canvas}}. + 1. Let |snapshot| be a transparent black image of the same size as |context|.{{GPUCanvasContext/canvas}}. + 1. Let |configuration| be |context|.{{GPUCanvasContext/[[configuration]]}}. + 1. If |configuration| is `null`: + 1. Return |snapshot|. - Note: The {{GPUCanvasContext/[[configuration]]}} will be `null` if the context has not been + Note: The configuration will be `null` if the context has not been configured or has been {{GPUCanvasContext/unconfigure()|unconfigured}}. This is identical to the behavior when the canvas has no context. 1. Ensure that all submitted work items (e.g. queue submissions) have completed writing to the image (via |context|.{{GPUCanvasContext/[[currentTexture]]}}). - 1. Let |snapshot| be a copy of |context|.{{GPUCanvasContext/[[drawingBuffer]]}}. - 1. Let |alphaMode| be |context|.{{GPUCanvasContext/[[configuration]]}}.{{GPUCanvasConfiguration/alphaMode}}. + 1. If |configuration|.{{GPUCanvasConfiguration/device}} is found to be [=valid=]: + + 1. Set |snapshot| to a copy of the |context|.{{GPUCanvasContext/[[drawingBuffer]]}}. + + Else, if |context|.{{GPUCanvasContext/[[lastPresentedImage]]}} is not `null`: + + 1. **Optionally**, set |snapshot| to a copy of |context|.{{GPUCanvasContext/[[lastPresentedImage]]}}. + + Note: + This is optional because the {{GPUCanvasContext/[[lastPresentedImage]]}} may no longer exist, + depending on what caused device loss. + Implementations may choose to skip it even if do they still have access to that image. + + 1. Let |alphaMode| be |configuration|.{{GPUCanvasConfiguration/alphaMode}}. 1.
: If |alphaMode| is {{GPUCanvasAlphaMode/"opaque"}}: @@ -14145,13 +14167,14 @@ interface GPUCanvasContext { Note: If the {{GPUCanvasContext/[[currentTexture]]}}, if any, has been destroyed - (for example in [$Replace the drawing buffer$]), the alpha channel is unobservable, + (for example in "[$Expire the current texture$]"), the alpha channel is unobservable, and implementations may clear the alpha channel in-place. : Otherwise: :: Tag |snapshot| with |alphaMode|.
- + 1. Tag |snapshot| with the {{GPUCanvasConfiguration/colorSpace}} and + {{GPUCanvasConfiguration/toneMapping}} of |configuration|. 1. Return |snapshot|. @@ -14171,7 +14194,8 @@ interface GPUCanvasContext { In this case, the drawing buffer will remain blank until the context is configured. - If not, the drawing buffer has the specified |configuration|.{{GPUCanvasConfiguration/format}} and is tagged with the specified - |configuration|.{{GPUCanvasConfiguration/colorSpace}}. + |configuration|.{{GPUCanvasConfiguration/colorSpace}} and + |configuration|.{{GPUCanvasConfiguration/toneMapping}}. Note: |configuration|.{{GPUCanvasConfiguration/alphaMode}} is ignored until "[$get a copy of the image contents of a context$]". @@ -14213,8 +14237,9 @@ specified points. This occurs in many places, including: - When an {{HTMLCanvasElement}} has its rendering updated. - - When an {{OffscreenCanvas}} with a [=placeholder canvas element=] has its rendering updated. + - Including when the canvas is the [=placeholder canvas element=] of an {{OffscreenCanvas}}. - When {{OffscreenCanvas/transferToImageBitmap()}} creates an {{ImageBitmap}} from the bitmap. + (See also [$transferToImageBitmap from WebGPU$].) - When WebGPU canvas contents are read using other Web APIs, like {{CanvasDrawImage/drawImage()}}, `texImage2D()`, `texSubImage2D()`, {{HTMLCanvasElement/toDataURL()}}, {{HTMLCanvasElement/toBlob()}}, and so on. @@ -14231,8 +14256,8 @@ specified points.
When updating the rendering of a WebGPU canvas (an {{HTMLCanvasElement}} or an {{OffscreenCanvas}} with a [=placeholder canvas element=]) - with a {{GPUCanvasContext}} |context|, which occurs in the following sub-steps of the - [=event loop processing model=], run the following [=content timeline=] steps: + with a {{GPUCanvasContext}} |context|, which occurs before getting the canvas's image contents, + in the following sub-steps of the [=event loop processing model=]: - "update the rendering or user interface of that `Document`" - "update the rendering of that dedicated worker" @@ -14245,28 +14270,38 @@ specified points. {{OffscreenCanvas}}es from {{HTMLCanvasElement/transferControlToOffscreen()}} [cannot be sent to these workers](https://github.com/whatwg/html/issues/10112). - Run the following steps: + Run the following [=content timeline=] steps: 1. [$Expire the current texture$] of |context|. Note: If this already happened in the task queued by {{GPUCanvasContext/getCurrentTexture()}}, it has no effect. + 1. Set |context|.{{GPUCanvasContext/[[lastPresentedImage]]}} to + |context|.{{GPUCanvasContext/[[drawingBuffer]]}}. + + Note: This is just a reference, not a copy; the drawing buffer's contents can't change + in-place after the current texture has expired. Note: This does not happen for standalone {{OffscreenCanvas}}es (created by `new OffscreenCanvas()`).
-
+
+ transferToImageBitmap from WebGPU: + When {{OffscreenCanvas/transferToImageBitmap()}} is called on a canvas with {{GPUCanvasContext}} |context|, after creating an {{ImageBitmap}} from the canvas's bitmap, run the following [=content timeline=] steps: 1. [$Replace the drawing buffer$] of |context|. - Note: This is equivalent to "moving" the (possibly alpha-cleared) image contents into the + Note: This makes {{OffscreenCanvas/transferToImageBitmap()}} + equivalent to "moving" (and possibly alpha-clearing) the image contents into the ImageBitmap, without a copy.
+- The [$update the canvas size$] algorithm. + ## GPUCanvasConfiguration ## {#canvas-configuration} The supported context formats are the [=set=] of {{GPUTextureFormat}}s: From bbfbc77249d4adea79cf4b897e80c4e867b1c093 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Beaufort?= Date: Wed, 2 Oct 2024 04:32:08 +0000 Subject: [PATCH 218/285] Update toneMapping mode in configure --- spec/index.bs | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/spec/index.bs b/spec/index.bs index 376e9085ed..94a5bff317 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -13996,6 +13996,13 @@ interface GPUCanvasContext { 1. Let |descriptor| be the [$GPUTextureDescriptor for the canvas and configuration$](|this|.{{GPUCanvasContext/canvas}}, |configuration|). 1. Set |this|.{{GPUCanvasContext/[[configuration]]}} to |configuration|. + 1. Set |this|.{{GPUCanvasContext/[[configuration]]}}.{{GPUCanvasConfiguration/toneMapping}}.{{GPUCanvasToneMapping/mode}} + to {{GPUCanvasToneMappingMode/"standard"}} if either of the following conditions is true: + - If |configuration|.{{GPUCanvasConfiguration/toneMapping}} is not provided. + - If |configuration|.{{GPUCanvasConfiguration/toneMapping}}.{{GPUCanvasToneMapping/mode}} + is {{GPUCanvasToneMappingMode/"extended"}} and the user agent does not + support displaying color values in the extended dynamic range of the screen + in a canvas. 1. Set |this|.{{GPUCanvasContext/[[textureDescriptor]]}} to |descriptor|. 1. [$Replace the drawing buffer$] of |this|. 1. Issue the subsequent steps on the [=Device timeline=] of |device|. From 9ce4c1e53d74252b3c7cc2f1727f4910ff457b07 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Beaufort?= Date: Wed, 2 Oct 2024 04:32:45 +0000 Subject: [PATCH 219/285] Revert "Update toneMapping mode in configure" This reverts commit bbfbc77249d4adea79cf4b897e80c4e867b1c093. --- spec/index.bs | 7 ------- 1 file changed, 7 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 94a5bff317..376e9085ed 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -13996,13 +13996,6 @@ interface GPUCanvasContext { 1. Let |descriptor| be the [$GPUTextureDescriptor for the canvas and configuration$](|this|.{{GPUCanvasContext/canvas}}, |configuration|). 1. Set |this|.{{GPUCanvasContext/[[configuration]]}} to |configuration|. - 1. Set |this|.{{GPUCanvasContext/[[configuration]]}}.{{GPUCanvasConfiguration/toneMapping}}.{{GPUCanvasToneMapping/mode}} - to {{GPUCanvasToneMappingMode/"standard"}} if either of the following conditions is true: - - If |configuration|.{{GPUCanvasConfiguration/toneMapping}} is not provided. - - If |configuration|.{{GPUCanvasConfiguration/toneMapping}}.{{GPUCanvasToneMapping/mode}} - is {{GPUCanvasToneMappingMode/"extended"}} and the user agent does not - support displaying color values in the extended dynamic range of the screen - in a canvas. 1. Set |this|.{{GPUCanvasContext/[[textureDescriptor]]}} to |descriptor|. 1. [$Replace the drawing buffer$] of |this|. 1. Issue the subsequent steps on the [=Device timeline=] of |device|. From 4e1cd5662908867fdbf6d127ce7813de26124826 Mon Sep 17 00:00:00 2001 From: Mehmet Oguz Derin Date: Tue, 8 Oct 2024 02:06:19 +0900 Subject: [PATCH 220/285] Update Tree-sitter version (#4827) * Adjust to new tree-sitter Python system * Point to Dockerfile * Use generate to populate files * Use fork image * Improve Mermaid support * Import wgsl_unit_tests for 't' flow only * Adjust diagrams to new container * Use main repo reference * Remove DrawIO from container json * Remove DrawIO from extensions json * Migrate to C * Fix URL * Fix URL * Update scanner to work with tree-sitter * Complete migration * Match CI --- .github/workflows/build-push-custom-image.yml | 3 +- .gitignore | 24 +- spec/img/buffer-map-failure.mmd.svg | 2 +- spec/img/buffer-map-unmap.mmd.svg | 2 +- tools/custom-action/Dockerfile | 30 +- tools/custom-action/dependency-versions.sh | 7 +- tools/install-dependencies.sh | 2 +- wgsl/Makefile | 10 +- wgsl/grammar/.gitignore | 39 + wgsl/grammar/Cargo.toml | 26 + wgsl/grammar/Makefile | 114 ++ wgsl/grammar/Package.swift | 61 + wgsl/grammar/README.md | 3 + wgsl/grammar/binding.gyp | 31 + .../grammar/bindings/c/tree-sitter-wgsl.pc.in | 11 + wgsl/grammar/bindings/go/binding.go | 14 + .../python/tree_sitter_wgsl/__init__.py | 42 + wgsl/grammar/bindings/rust/build.rs | 20 + wgsl/grammar/go.mod | 5 + wgsl/grammar/package.json | 57 + wgsl/grammar/pyproject.toml | 29 + wgsl/grammar/setup.py | 63 + .../scanner.cc => grammar/src/scanner.c} | 1153 +++++++++-------- wgsl/tools/extract-grammar.py | 93 +- wgsl/tools/wgsl_unit_tests.py | 8 +- 25 files changed, 1220 insertions(+), 629 deletions(-) create mode 100644 wgsl/grammar/.gitignore create mode 100644 wgsl/grammar/Cargo.toml create mode 100644 wgsl/grammar/Makefile create mode 100644 wgsl/grammar/Package.swift create mode 100644 wgsl/grammar/README.md create mode 100644 wgsl/grammar/binding.gyp create mode 100644 wgsl/grammar/bindings/c/tree-sitter-wgsl.pc.in create mode 100644 wgsl/grammar/bindings/go/binding.go create mode 100644 wgsl/grammar/bindings/python/tree_sitter_wgsl/__init__.py create mode 100644 wgsl/grammar/bindings/rust/build.rs create mode 100644 wgsl/grammar/go.mod create mode 100644 wgsl/grammar/package.json create mode 100644 wgsl/grammar/pyproject.toml create mode 100644 wgsl/grammar/setup.py rename wgsl/{tools/scanner.cc => grammar/src/scanner.c} (52%) diff --git a/.github/workflows/build-push-custom-image.yml b/.github/workflows/build-push-custom-image.yml index dd1356b53d..f87c067f9f 100644 --- a/.github/workflows/build-push-custom-image.yml +++ b/.github/workflows/build-push-custom-image.yml @@ -51,7 +51,8 @@ jobs: - name: Build and push Docker image uses: docker/build-push-action@v5.1.0 with: - context: tools/custom-action + context: . + file: tools/custom-action/Dockerfile platforms: linux/amd64,linux/arm64 push: true tags: ${{ steps.meta.outputs.tags }} diff --git a/.gitignore b/.gitignore index f3bb780865..6089155d43 100644 --- a/.gitignore +++ b/.gitignore @@ -2,7 +2,29 @@ out/ spec/index.html spec/index.pre.html spec/webgpu.idl -wgsl/grammar/ +wgsl/grammar/** +!wgsl/grammar/bindings +!wgsl/grammar/bindings/c +!wgsl/grammar/bindings/c/tree-sitter-wgsl.pc.in +!wgsl/grammar/bindings/go +!wgsl/grammar/bindings/go/binding.go +!wgsl/grammar/bindings/python +!wgsl/grammar/bindings/python/tree_sitter_wgsl +!wgsl/grammar/bindings/python/tree_sitter_wgsl/__init__.py +!wgsl/grammar/bindings/rust +!wgsl/grammar/bindings/rust/build.rs +!wgsl/grammar/src +!wgsl/grammar/src/scanner.c +!wgsl/grammar/.gitignore +!wgsl/grammar/binding.gyp +!wgsl/grammar/Cargo.toml +!wgsl/grammar/go.mod +!wgsl/grammar/Makefile +!wgsl/grammar/package.json +!wgsl/grammar/Package.swift +!wgsl/grammar/pyproject.toml +!wgsl/grammar/README.md +!wgsl/grammar/setup.py wgsl/index.html wgsl/index.pre.html wgsl/index.bs.pre diff --git a/spec/img/buffer-map-failure.mmd.svg b/spec/img/buffer-map-failure.mmd.svg index 6ef16fba50..ae1f621b11 100644 --- a/spec/img/buffer-map-failure.mmd.svg +++ b/spec/img/buffer-map-failure.mmd.svg @@ -1 +1 @@ -Device timelineContent timelineDevice timelineContent timeline[[mapping]] is null[[pending_map]] is null[[internal state]] is "available"[[mapping]] is null[[pending_map]] is non-null(failure, state unchanged)[[mapping]] is null[[pending_map]] is nullmapAsync()mapAsync() response \ No newline at end of file +Device timelineContent timelineDevice timelineContent timeline[[mapping]] is null[[pending_map]] is null[[internal state]] is "available"[[mapping]] is null[[pending_map]] is non-null(failure, state unchanged)[[mapping]] is null[[pending_map]] is nullmapAsync()mapAsync() response \ No newline at end of file diff --git a/spec/img/buffer-map-unmap.mmd.svg b/spec/img/buffer-map-unmap.mmd.svg index 0baaabedcb..cc3f2cc76c 100644 --- a/spec/img/buffer-map-unmap.mmd.svg +++ b/spec/img/buffer-map-unmap.mmd.svg @@ -1 +1 @@ -Device timelineContent timelineDevice timelineContent timeline[[mapping]] is null[[pending_map]] is null[[internal state]] is "available"[[mapping]] is null[[pending_map]] is non-null[[internal state]] is "unavailable"[[internal state]] is "unavailable"[[mapping]] is non-null[[pending_map]] is null[[mapping]] is null[[pending_map]] is null[[internal state]] is "available"mapAsync()mapAsync() responseunmap() \ No newline at end of file +Device timelineContent timelineDevice timelineContent timeline[[mapping]] is null[[pending_map]] is null[[internal state]] is "available"[[mapping]] is null[[pending_map]] is non-null[[internal state]] is "unavailable"[[internal state]] is "unavailable"[[mapping]] is non-null[[pending_map]] is null[[mapping]] is null[[pending_map]] is null[[internal state]] is "available"mapAsync()mapAsync() responseunmap() \ No newline at end of file diff --git a/tools/custom-action/Dockerfile b/tools/custom-action/Dockerfile index 8c25e3257a..5f52107bca 100644 --- a/tools/custom-action/Dockerfile +++ b/tools/custom-action/Dockerfile @@ -1,4 +1,4 @@ -FROM ubuntu:23.10 +FROM debian:12.6 ENV LANG=en_US.UTF-8 @@ -7,8 +7,10 @@ RUN \ apt update -y && \ apt install -y locales && \ locale-gen en_US.UTF-8 && \ - sysctl -w kernel.unprivileged_userns_clone=1 && \ apt install -y \ + build-essential \ + clang \ + chromium \ nodejs \ npm \ git \ @@ -25,28 +27,28 @@ RUN \ SHELL ["/bin/bash", "-c"] -COPY entrypoint.sh prepare.sh dependency-versions.sh / +COPY tools/custom-action/entrypoint.sh tools/custom-action/prepare.sh tools/custom-action/dependency-versions.sh / + +COPY wgsl/grammar /grammar + +RUN \ + update-alternatives --install /usr/bin/cc cc /usr/bin/clang 100 && \ + update-alternatives --install /usr/bin/c++ c++ /usr/bin/clang++ 100 + +ENV CC=clang +ENV CXX=clang++ RUN \ chmod +x /entrypoint.sh /prepare.sh /dependency-versions.sh && \ source /dependency-versions.sh && \ python3 -m pip install --break-system-packages \ + build==$PIP_BUILD_VERSION \ tree_sitter==$PIP_TREE_SITTER_VERSION && \ npm install -g @mermaid-js/mermaid-cli@$NPM_MERMAID_CLI_VERSION && \ npm install -g tree-sitter-cli@$NPM_TREE_SITTER_CLI_VERSION && \ node "/usr/local/lib/node_modules/@mermaid-js/mermaid-cli/node_modules/puppeteer/install.js" && \ - mkdir /grammar && \ - echo "{" > /grammar/package.json && \ - echo " \"name\": \"tree-sitter-wgsl\"," >> /grammar/package.json && \ - echo " \"dependencies\": {" >> /grammar/package.json && \ - echo " \"nan\": \"$NPM_NAN_VERSION\"" >> /grammar/package.json && \ - echo " }," >> /grammar/package.json && \ - echo " \"devDependencies\": {" >> /grammar/package.json && \ - echo " \"tree-sitter-cli\": \"$NPM_TREE_SITTER_CLI_VERSION\"" >> /grammar/package.json && \ - echo " }," >> /grammar/package.json && \ - echo " \"main\": \"bindings/node\"" >> /grammar/package.json && \ - echo "}" >> /grammar/package.json && \ cd /grammar && \ + tree-sitter generate && \ npm install ENTRYPOINT [ "/entrypoint.sh" ] diff --git a/tools/custom-action/dependency-versions.sh b/tools/custom-action/dependency-versions.sh index 31adb0a5df..868deba65b 100644 --- a/tools/custom-action/dependency-versions.sh +++ b/tools/custom-action/dependency-versions.sh @@ -1,4 +1,5 @@ -export PIP_TREE_SITTER_VERSION=0.20.4 -export NPM_MERMAID_CLI_VERSION=10.6.1 +export PIP_BUILD_VERSION=1.2.1 +export PIP_TREE_SITTER_VERSION=0.22.3 +export NPM_MERMAID_CLI_VERSION=10.9.1 export NPM_NAN_VERSION=2.15.0 -export NPM_TREE_SITTER_CLI_VERSION=0.20.8 +export NPM_TREE_SITTER_CLI_VERSION=0.22.6 diff --git a/tools/install-dependencies.sh b/tools/install-dependencies.sh index 56ffd979be..56091803d3 100755 --- a/tools/install-dependencies.sh +++ b/tools/install-dependencies.sh @@ -12,7 +12,7 @@ for opt in "$@"; do code=0 ;; wgsl) - python3 -m pip install tree_sitter==$PIP_TREE_SITTER_VERSION + python3 -m pip install --upgrade build==$PIP_BUILD_VERSION tree_sitter==$PIP_TREE_SITTER_VERSION code=0 ;; diagrams) diff --git a/wgsl/Makefile b/wgsl/Makefile index c45b1cfb25..eaa7854e96 100644 --- a/wgsl/Makefile +++ b/wgsl/Makefile @@ -11,7 +11,7 @@ clean: # Generate spec HTML from Bikeshed source. -WGSL_SOURCES:=index.bs ./tools/scanner.cc wgsl.recursive.bs.include wgsl.reserved.bs.include $(wildcard syntax/*.syntax.bs.include) $(wildcard img/*.svg) +WGSL_SOURCES:=index.bs ./grammar/src/scanner.c wgsl.recursive.bs.include wgsl.reserved.bs.include $(wildcard syntax/*.syntax.bs.include) $(wildcard img/*.svg) index.pre.html: $(WGSL_SOURCES) DIE_ON=everything bash ../tools/invoke-bikeshed.sh $@ $(WGSL_SOURCES) @@ -27,14 +27,14 @@ img/%.mmd.svg: diagrams/%.mmd ../tools/invoke-mermaid.sh ../tools/mermaid.json bash ../tools/invoke-mermaid.sh -i $< -o $@ TREESITTER_GRAMMAR_INPUT := grammar/grammar.js -TREESITTER_PARSER := grammar/build/wgsl.so +TREESITTER_PARSER := grammar/src/scanner.o # Extract WGSL grammar from the spec -$(TREESITTER_GRAMMAR_INPUT): index.bs ./tools/scanner.cc ./tools/extract-grammar.py - source ../tools/custom-action/dependency-versions.sh && python3 ./tools/extract-grammar.py --spec index.bs --scanner ./tools/scanner.cc --tree-sitter-dir grammar --flow x +$(TREESITTER_GRAMMAR_INPUT): index.bs ./grammar/src/scanner.c ./tools/extract-grammar.py + source ../tools/custom-action/dependency-versions.sh && python3 ./tools/extract-grammar.py --spec index.bs --scanner ./grammar/src/scanner.c --tree-sitter-dir grammar --flow x # Build a Treesitter parser to validate grammar extract and later examples in spec $(TREESITTER_PARSER): $(TREESITTER_GRAMMAR_INPUT) - source ../tools/custom-action/dependency-versions.sh && python3 ./tools/extract-grammar.py --spec index.bs --scanner ./tools/scanner.cc --tree-sitter-dir grammar --flow b + source ../tools/custom-action/dependency-versions.sh && python3 ./tools/extract-grammar.py --spec index.bs --scanner ./grammar/src/scanner.c --tree-sitter-dir grammar --flow b .PHONY: validate-examples # Use Treesitter to parse many code examples in the spec. diff --git a/wgsl/grammar/.gitignore b/wgsl/grammar/.gitignore new file mode 100644 index 0000000000..dd5cc848e4 --- /dev/null +++ b/wgsl/grammar/.gitignore @@ -0,0 +1,39 @@ +# Rust artifacts +Cargo.lock +target/ + +# Node artifacts +build/ +prebuilds/ +node_modules/ +*.tgz + +# Swift artifacts +.build/ +Package.resolved + +# Go artifacts +go.sum +_obj/ + +# Python artifacts +.venv/ +dist/ +*.egg-info +*.whl + +# C artifacts +*.a +*.so +*.so.* +*.dylib +*.dll +*.pc + +# Example dirs +/examples/*/ + +# Grammar volatiles +*.wasm +*.obj +*.o diff --git a/wgsl/grammar/Cargo.toml b/wgsl/grammar/Cargo.toml new file mode 100644 index 0000000000..6620371aa4 --- /dev/null +++ b/wgsl/grammar/Cargo.toml @@ -0,0 +1,26 @@ +[package] +name = "tree-sitter-wgsl" +description = "WGSL grammar for tree-sitter" +version = "0.0.7" +license = "BSD-3-Clause" +readme = "README.md" +keywords = ["incremental", "parsing", "tree-sitter", "wgsl"] +categories = ["parsing", "text-editors"] +repository = "https://github.com/gpuweb/tree-sitter-wgsl" +edition = "2021" +autoexamples = false + +build = "bindings/rust/build.rs" +include = ["bindings/rust/*", "grammar.js", "queries/*", "src/*"] + +[lib] +path = "bindings/rust/lib.rs" + +[dependencies] +tree-sitter-language = "0.1" + +[dev-dependencies] +tree-sitter = { version = "0.22" } + +[build-dependencies] +cc = "1.0.87" diff --git a/wgsl/grammar/Makefile b/wgsl/grammar/Makefile new file mode 100644 index 0000000000..5de5e849d4 --- /dev/null +++ b/wgsl/grammar/Makefile @@ -0,0 +1,114 @@ +ifeq ($(OS),Windows_NT) +$(error Windows is not supported) +endif + +VERSION := 0.0.7 + +LANGUAGE_NAME := tree-sitter-wgsl + +# repository +SRC_DIR := src + +PARSER_REPO_URL := $(shell git -C $(SRC_DIR) remote get-url origin 2>/dev/null) + +ifeq ($(PARSER_URL),) + PARSER_URL := $(subst .git,,$(PARSER_REPO_URL)) +ifeq ($(shell echo $(PARSER_URL) | grep '^[a-z][-+.0-9a-z]*://'),) + PARSER_URL := $(subst :,/,$(PARSER_URL)) + PARSER_URL := $(subst git@,https://,$(PARSER_URL)) +endif +endif + +TS ?= tree-sitter + +# install directory layout +PREFIX ?= /usr/local +INCLUDEDIR ?= $(PREFIX)/include +LIBDIR ?= $(PREFIX)/lib +PCLIBDIR ?= $(LIBDIR)/pkgconfig + +# source/object files +PARSER := $(SRC_DIR)/parser.c +EXTRAS := $(filter-out $(PARSER),$(wildcard $(SRC_DIR)/*.c)) +OBJS := $(patsubst %.c,%.o,$(PARSER) $(EXTRAS)) + +# flags +ARFLAGS ?= rcs +override CFLAGS += -I$(SRC_DIR) -std=c11 -fPIC + +# ABI versioning +SONAME_MAJOR := 1 +SONAME_MINOR := 4 + +# OS-specific bits +ifeq ($(shell uname),Darwin) + SOEXT = dylib + SOEXTVER_MAJOR = $(SONAME_MAJOR).$(SOEXT) + SOEXTVER = $(SONAME_MAJOR).$(SONAME_MINOR).$(SOEXT) + LINKSHARED := $(LINKSHARED)-dynamiclib -Wl, + ifneq ($(ADDITIONAL_LIBS),) + LINKSHARED := $(LINKSHARED)$(ADDITIONAL_LIBS), + endif + LINKSHARED := $(LINKSHARED)-install_name,$(LIBDIR)/lib$(LANGUAGE_NAME).$(SOEXTVER),-rpath,@executable_path/../Frameworks +else + SOEXT = so + SOEXTVER_MAJOR = $(SOEXT).$(SONAME_MAJOR) + SOEXTVER = $(SOEXT).$(SONAME_MAJOR).$(SONAME_MINOR) + LINKSHARED := $(LINKSHARED)-shared -Wl, + ifneq ($(ADDITIONAL_LIBS),) + LINKSHARED := $(LINKSHARED)$(ADDITIONAL_LIBS) + endif + LINKSHARED := $(LINKSHARED)-soname,lib$(LANGUAGE_NAME).$(SOEXTVER) +endif +ifneq ($(filter $(shell uname),FreeBSD NetBSD DragonFly),) + PCLIBDIR := $(PREFIX)/libdata/pkgconfig +endif + +all: lib$(LANGUAGE_NAME).a lib$(LANGUAGE_NAME).$(SOEXT) $(LANGUAGE_NAME).pc + +lib$(LANGUAGE_NAME).a: $(OBJS) + $(AR) $(ARFLAGS) $@ $^ + +lib$(LANGUAGE_NAME).$(SOEXT): $(OBJS) + $(CC) $(LDFLAGS) $(LINKSHARED) $^ $(LDLIBS) -o $@ +ifneq ($(STRIP),) + $(STRIP) $@ +endif + +$(LANGUAGE_NAME).pc: bindings/c/$(LANGUAGE_NAME).pc.in + sed -e 's|@URL@|$(PARSER_URL)|' \ + -e 's|@VERSION@|$(VERSION)|' \ + -e 's|@LIBDIR@|$(LIBDIR)|' \ + -e 's|@INCLUDEDIR@|$(INCLUDEDIR)|' \ + -e 's|@REQUIRES@|$(REQUIRES)|' \ + -e 's|@ADDITIONAL_LIBS@|$(ADDITIONAL_LIBS)|' \ + -e 's|=$(PREFIX)|=$${prefix}|' \ + -e 's|@PREFIX@|$(PREFIX)|' $< > $@ + +$(PARSER): $(SRC_DIR)/grammar.json + $(TS) generate --no-bindings $^ + +install: all + install -d '$(DESTDIR)$(INCLUDEDIR)'/tree_sitter '$(DESTDIR)$(PCLIBDIR)' '$(DESTDIR)$(LIBDIR)' + install -m644 bindings/c/$(LANGUAGE_NAME).h '$(DESTDIR)$(INCLUDEDIR)'/tree_sitter/$(LANGUAGE_NAME).h + install -m644 $(LANGUAGE_NAME).pc '$(DESTDIR)$(PCLIBDIR)'/$(LANGUAGE_NAME).pc + install -m644 lib$(LANGUAGE_NAME).a '$(DESTDIR)$(LIBDIR)'/lib$(LANGUAGE_NAME).a + install -m755 lib$(LANGUAGE_NAME).$(SOEXT) '$(DESTDIR)$(LIBDIR)'/lib$(LANGUAGE_NAME).$(SOEXTVER) + ln -sf lib$(LANGUAGE_NAME).$(SOEXTVER) '$(DESTDIR)$(LIBDIR)'/lib$(LANGUAGE_NAME).$(SOEXTVER_MAJOR) + ln -sf lib$(LANGUAGE_NAME).$(SOEXTVER_MAJOR) '$(DESTDIR)$(LIBDIR)'/lib$(LANGUAGE_NAME).$(SOEXT) + +uninstall: + $(RM) '$(DESTDIR)$(LIBDIR)'/lib$(LANGUAGE_NAME).a \ + '$(DESTDIR)$(LIBDIR)'/lib$(LANGUAGE_NAME).$(SOEXTVER) \ + '$(DESTDIR)$(LIBDIR)'/lib$(LANGUAGE_NAME).$(SOEXTVER_MAJOR) \ + '$(DESTDIR)$(LIBDIR)'/lib$(LANGUAGE_NAME).$(SOEXT) \ + '$(DESTDIR)$(INCLUDEDIR)'/tree_sitter/$(LANGUAGE_NAME).h \ + '$(DESTDIR)$(PCLIBDIR)'/$(LANGUAGE_NAME).pc + +clean: + $(RM) $(OBJS) $(LANGUAGE_NAME).pc lib$(LANGUAGE_NAME).a lib$(LANGUAGE_NAME).$(SOEXT) + +test: + $(TS) test + +.PHONY: all install uninstall clean test diff --git a/wgsl/grammar/Package.swift b/wgsl/grammar/Package.swift new file mode 100644 index 0000000000..8d3f747442 --- /dev/null +++ b/wgsl/grammar/Package.swift @@ -0,0 +1,61 @@ +// swift-tools-version:5.4 +import PackageDescription + +let package = Package( + name: "TreeSitterWgsl", + products: [ + .library(name: "TreeSitterWgsl", targets: ["TreeSitterWgsl"]), + ], + dependencies: [ + .package(url: "https://github.com/ChimeHQ/SwiftTreeSitter", from: "0.8.0"), + ], + targets: [ + .target( + name: "TreeSitterWgsl", + dependencies: [], + path: ".", + exclude: [ + "Cargo.toml", + "Makefile", + "binding.gyp", + "bindings/c", + "bindings/go", + "bindings/node", + "bindings/python", + "bindings/rust", + "prebuilds", + "grammar.js", + "package.json", + "package-lock.json", + "pyproject.toml", + "setup.py", + "test", + "examples", + ".editorconfig", + ".github", + ".gitignore", + ".gitattributes", + ".gitmodules", + ], + sources: [ + "src/parser.c", + // NOTE: if your language has an external scanner, add it here. + "src/scanner.c", + ], + resources: [ + .copy("queries") + ], + publicHeadersPath: "bindings/swift", + cSettings: [.headerSearchPath("src")] + ), + .testTarget( + name: "TreeSitterWgslTests", + dependencies: [ + "SwiftTreeSitter", + "TreeSitterWgsl", + ], + path: "bindings/swift/TreeSitterWgslTests" + ) + ], + cLanguageStandard: .c11 +) diff --git a/wgsl/grammar/README.md b/wgsl/grammar/README.md new file mode 100644 index 0000000000..a57b2aa79a --- /dev/null +++ b/wgsl/grammar/README.md @@ -0,0 +1,3 @@ +This directory contains tree-sitter grammar files with necessary modifications to build to validate WebGPU Shading Language (WGSL) grammar and also check the code snippets in the specification text. + +Any changes to the content of this directory should be copied to the tree-sitter-wgsl repository: https://github.com/gpuweb/tree-sitter-wgsl diff --git a/wgsl/grammar/binding.gyp b/wgsl/grammar/binding.gyp new file mode 100644 index 0000000000..ecb5f6a397 --- /dev/null +++ b/wgsl/grammar/binding.gyp @@ -0,0 +1,31 @@ +{ + "targets": [ + { + "target_name": "tree_sitter_wgsl_binding", + "dependencies": [ + "=42", "wheel"] +build-backend = "setuptools.build_meta" + +[project] +name = "tree-sitter-wgsl" +description = "WGSL grammar for tree-sitter" +version = "0.0.7" +keywords = ["incremental", "parsing", "tree-sitter", "wgsl"] +classifiers = [ + "Intended Audience :: Developers", + "License :: OSI Approved :: The 3-Clause BSD License (BSD-3-Clause)", + "Topic :: Software Development :: Compilers", + "Topic :: Text Processing :: Linguistic", + "Typing :: Typed" +] +requires-python = ">=3.9" +license.text = "BSD-3-Clause" +readme = "README.md" + +[project.urls] +Homepage = "https://github.com/gpuweb/tree-sitter-wgsl" + +[project.optional-dependencies] +core = ["tree-sitter~=0.22"] + +[tool.cibuildwheel] +build = "cp39-*" +build-frontend = "build" diff --git a/wgsl/grammar/setup.py b/wgsl/grammar/setup.py new file mode 100644 index 0000000000..ab568ba74b --- /dev/null +++ b/wgsl/grammar/setup.py @@ -0,0 +1,63 @@ +from os.path import isdir, join +from platform import system + +from setuptools import Extension, find_packages, setup +from setuptools.command.build import build +from wheel.bdist_wheel import bdist_wheel + + +class Build(build): + def run(self): + if isdir("queries"): + dest = join(self.build_lib, "tree_sitter_wgsl", "queries") + self.copy_tree("queries", dest) + super().run() + + +class BdistWheel(bdist_wheel): + def get_tag(self): + python, abi, platform = super().get_tag() + if python.startswith("cp"): + python, abi = "cp39", "abi3" + return python, abi, platform + + +setup( + packages=find_packages("bindings/python"), + package_dir={"": "bindings/python"}, + package_data={ + "tree_sitter_wgsl": ["*.pyi", "py.typed"], + "tree_sitter_wgsl.queries": ["*.scm"], + }, + ext_package="tree_sitter_wgsl", + ext_modules=[ + Extension( + name="_binding", + sources=[ + "bindings/python/tree_sitter_wgsl/binding.c", + "src/parser.c", + # NOTE: if your language uses an external scanner, add it here. + "src/scanner.c", + ], + extra_compile_args=[ + "-std=c11", + "-fvisibility=hidden", + ] if system() != "Windows" else [ + "/std:c11", + "/utf-8", + ], + define_macros=[ + ("Py_LIMITED_API", "0x03090000"), + ("PY_SSIZE_T_CLEAN", None), + ("TREE_SITTER_HIDE_SYMBOLS", None), + ], + include_dirs=["src"], + py_limited_api=True, + ) + ], + cmdclass={ + "build": Build, + "bdist_wheel": BdistWheel + }, + zip_safe=False +) diff --git a/wgsl/tools/scanner.cc b/wgsl/grammar/src/scanner.c similarity index 52% rename from wgsl/tools/scanner.cc rename to wgsl/grammar/src/scanner.c index 24c1d12026..084b3fc7ba 100644 --- a/wgsl/tools/scanner.cc +++ b/wgsl/grammar/src/scanner.c @@ -1,12 +1,11 @@ -#include +#include "tree_sitter/parser.h" +#include +#include +#include +#include +#include +#include #include -#include -#include -#include -#include -#include -#include -#include #define ENABLE_LOGGING 0 @@ -16,82 +15,100 @@ #define LOG(...) #endif -namespace { - /// The possible external tokens matched by this custom scanner. /// The order of the entries in this enumerator must match the 'externals' in /// the grammar.js. enum Token { BLOCK_COMMENT, - DISAMBIGUATE_TEMPLATE, // A zero-length token used to scan ahead + DISAMBIGUATE_TEMPLATE, // A zero-length token used to scan ahead TEMPLATE_ARGS_START, TEMPLATE_ARGS_END, - LESS_THAN, // '<' - LESS_THAN_EQUAL, // '<=' - SHIFT_LEFT, // '<<' - SHIFT_LEFT_ASSIGN, // '<<=' - GREATER_THAN, // '>' - GREATER_THAN_EQUAL, // '>=' - SHIFT_RIGHT, // '>>' - SHIFT_RIGHT_ASSIGN, // '>>=' + LESS_THAN, // '<' + LESS_THAN_EQUAL, // '<=' + SHIFT_LEFT, // '<<' + SHIFT_LEFT_ASSIGN, // '<<=' + GREATER_THAN, // '>' + GREATER_THAN_EQUAL, // '>=' + SHIFT_RIGHT, // '>>' + SHIFT_RIGHT_ASSIGN, // '>>=' // A sentinel value used to signal an error has occurred already. // https://tree-sitter.github.io/tree-sitter/creating-parsers#other-external-scanner-details ERROR, }; -const char* str(Token tok,bool brief=false) { +static const char *tree_sitter_wgsl_str(enum Token tok, bool brief) { switch (tok) { - case Token::BLOCK_COMMENT: - return "BLOCK_COMMENT"; - case Token::DISAMBIGUATE_TEMPLATE: - return "DISAMBIGUATE_TEMPLATE"; - case Token::TEMPLATE_ARGS_START: - return "TEMPLATE_ARGS_START"; - case Token::TEMPLATE_ARGS_END: - return "TEMPLATE_ARGS_END"; - case Token::LESS_THAN: - return brief ? "<" : "LESS_THAN"; - case Token::LESS_THAN_EQUAL: - return brief ? "<=" : "LESS_THAN_EQUAL"; - case Token::SHIFT_LEFT: - return brief ? "<<" : "SHIFT_LEFT"; - case Token::SHIFT_LEFT_ASSIGN: - return brief ? "<<=" : "SHIFT_LEFT_ASSIGN"; - case Token::GREATER_THAN: - return brief ? ">" : "GREATER_THAN"; - case Token::GREATER_THAN_EQUAL: - return brief ? ">=" : "GREATER_THAN_EQUAL"; - case Token::SHIFT_RIGHT: - return brief ? ">>" : "SHIFT_RIGHT"; - case Token::SHIFT_RIGHT_ASSIGN: - return brief ? ">>=" : "SHIFT_RIGHT_ASSIGN"; - case Token::ERROR: - return "ERROR"; - default: - return ""; + case BLOCK_COMMENT: + return "BLOCK_COMMENT"; + case DISAMBIGUATE_TEMPLATE: + return "DISAMBIGUATE_TEMPLATE"; + case TEMPLATE_ARGS_START: + return "TEMPLATE_ARGS_START"; + case TEMPLATE_ARGS_END: + return "TEMPLATE_ARGS_END"; + case LESS_THAN: + return brief ? "<" : "LESS_THAN"; + case LESS_THAN_EQUAL: + return brief ? "<=" : "LESS_THAN_EQUAL"; + case SHIFT_LEFT: + return brief ? "<<" : "SHIFT_LEFT"; + case SHIFT_LEFT_ASSIGN: + return brief ? "<<=" : "SHIFT_LEFT_ASSIGN"; + case GREATER_THAN: + return brief ? ">" : "GREATER_THAN"; + case GREATER_THAN_EQUAL: + return brief ? ">=" : "GREATER_THAN_EQUAL"; + case SHIFT_RIGHT: + return brief ? ">>" : "SHIFT_RIGHT"; + case SHIFT_RIGHT_ASSIGN: + return brief ? ">>=" : "SHIFT_RIGHT_ASSIGN"; + case ERROR: + return "ERROR"; + default: + return ""; } } -using CodePoint = uint32_t; +typedef uint32_t CodePoint; -static constexpr CodePoint kEOF = 0; +static const CodePoint kEOF = 0; -struct CodePointRange { - CodePoint first; // First code point in the interval - CodePoint last; // Last code point in the interval (inclusive) -}; +typedef struct { + CodePoint first; // First code point in the interval + CodePoint last; // Last code point in the interval (inclusive) +} CodePointRange; -inline bool operator<(CodePoint code_point, CodePointRange range) { +static bool code_point_less_than(CodePoint code_point, CodePointRange range) { return code_point < range.first; } -inline bool operator<(CodePointRange range, CodePoint code_point) { +static bool range_less_than(CodePointRange range, CodePoint code_point) { return range.last < code_point; } +/* Implement C++ std::binary_search using C */ +static bool binary_search(const CodePointRange *ranges, size_t num_ranges, + CodePoint code_point) { + size_t left = 0; + size_t right = num_ranges; + + while (left < right) { + size_t mid = left + (right - left) / 2; + if (range_less_than(ranges[mid], code_point)) { + left = mid + 1; + } else if (code_point_less_than(code_point, ranges[mid])) { + right = mid; + } else { + return true; + } + } + + return false; +} + // Interval ranges of all code points in the Unicode 14 XID_Start set // This array needs to be in ascending order. -constexpr CodePointRange kXIDStartRanges[] = { +static const CodePointRange kXIDStartRanges[] = { {0x00041, 0x0005a}, {0x00061, 0x0007a}, {0x000aa, 0x000aa}, {0x000b5, 0x000b5}, {0x000ba, 0x000ba}, {0x000c0, 0x000d6}, {0x000d8, 0x000f6}, {0x000f8, 0x002c1}, {0x002c6, 0x002d1}, @@ -314,13 +331,13 @@ constexpr CodePointRange kXIDStartRanges[] = { }; // Number of ranges in kXIDStartRanges -constexpr size_t kNumXIDStartRanges = +const size_t kNumXIDStartRanges = sizeof(kXIDStartRanges) / sizeof(kXIDStartRanges[0]); // The additional code point interval ranges for the Unicode 14 XID_Continue // set. This extends the values in kXIDStartRanges. // This array needs to be in ascending order. -constexpr CodePointRange kXIDContinueRanges[] = { +static const CodePointRange kXIDContinueRanges[] = { {0x00030, 0x00039}, {0x0005f, 0x0005f}, {0x000b7, 0x000b7}, {0x00300, 0x0036f}, {0x00387, 0x00387}, {0x00483, 0x00487}, {0x00591, 0x005bd}, {0x005bf, 0x005bf}, {0x005c1, 0x005c2}, @@ -445,12 +462,12 @@ constexpr CodePointRange kXIDContinueRanges[] = { }; // Number of ranges in kXIDContinueRanges -constexpr size_t kNumXIDContinueRanges = +const size_t kNumXIDContinueRanges = sizeof(kXIDContinueRanges) / sizeof(kXIDContinueRanges[0]); /// @param code_point the input code_point /// @return true if the code_point is part of the XIDStart unicode set -bool is_xid_start(CodePoint code_point) { +static bool is_xid_start(CodePoint code_point) { // Fast path for ASCII. if ((code_point >= 'a' && code_point <= 'z') || (code_point >= 'A' && code_point <= 'Z')) { @@ -462,603 +479,683 @@ bool is_xid_start(CodePoint code_point) { if (code_point < 0x000aa) { return false; } - return std::binary_search(kXIDStartRanges, - kXIDStartRanges + kNumXIDStartRanges, code_point); + return binary_search(kXIDStartRanges, kNumXIDStartRanges, code_point); } /// @param code_point the input code_point /// @return true if the code_point is part of the XIDContinue unicode set -bool is_xid_continue(CodePoint code_point) { +static bool is_xid_continue(CodePoint code_point) { // Short circuit ASCII. The binary search will find these last, but most // of our current source is ASCII, so handle them quicker. if ((code_point >= '0' && code_point <= '9') || code_point == '_') { return true; } return is_xid_start(code_point) || - std::binary_search(kXIDContinueRanges, - kXIDContinueRanges + kNumXIDContinueRanges, - code_point); + binary_search(kXIDContinueRanges, kNumXIDContinueRanges, code_point); } -/// @return true if @p code_point is considered a whitespace -bool is_space(CodePoint code_point) { +/// @return true if @p code_point is considered a blankspace +static bool is_space(CodePoint code_point) { switch (code_point) { - case 0x0020: - case 0x0009: - case 0x000a: - case 0x000b: - case 0x000c: - case 0x000d: - case 0x0085: - case 0x200e: - case 0x200f: - case 0x2028: - case 0x2029: - return true; - default: - return false; + case 0x0020: + case 0x0009: + case 0x000a: + case 0x000b: + case 0x000c: + case 0x000d: + case 0x0085: + case 0x200e: + case 0x200f: + case 0x2028: + case 0x2029: + return true; + default: + return false; } } /// A fixed capacity, dynamic sized queue of bits (expressed as bools) -template -class BitQueue { - public: - /// @param index the index of the bit starting from the front - /// @return the bit value - auto operator[](size_t index) { - assert(index < count()); // TODO(dneto): this should error out. - return bits_[(index + read_offset_) % CAPACITY_IN_BITS]; - } +#define BITQUEUE_CAPACITY 64 + +typedef struct { + uint64_t bits; + size_t count; + size_t read_offset; +} BitQueue; + +/// @param index the index of the bit starting from the front +/// @return the bit value +static bool bitqueue_get(BitQueue *queue, size_t index) { + assert(index < queue->count); + return (queue->bits >> ((index + queue->read_offset) % BITQUEUE_CAPACITY)) & + 1; +} - /// Removes the bit at the front of the queue - /// @returns the value of the bit that was removed - bool pop_front() { - assert(count_ > 0); - bool value = (*this)[0]; - count_--; - read_offset_++; - return value; +static void bitqueue_set(BitQueue *queue, size_t index, bool value) { + assert(index < queue->count); + size_t bit_index = (index + queue->read_offset) % BITQUEUE_CAPACITY; + if (value) { + queue->bits |= (1ULL << bit_index); + } else { + queue->bits &= ~(1ULL << bit_index); } +} - /// Appends a bit to the back of the queue - void push_back(bool value) { - assert(count_ < CAPACITY_IN_BITS); - count_++; - (*this)[count_ - 1] = value; - } +/// Removes the bit at the front of the queue +/// @returns the value of the bit that was removed +static bool bitqueue_pop_front(BitQueue *queue) { + assert(queue->count > 0); + bool value = bitqueue_get(queue, 0); + queue->count--; + queue->read_offset++; + return value; +} - /// @returns true if the queue holds no bits. - bool empty() const { return count_ == 0; } +/// Appends a bit to the back of the queue +static void bitqueue_push_back(BitQueue *queue, bool value) { + assert(queue->count < BITQUEUE_CAPACITY); + queue->count++; + bitqueue_set(queue, queue->count - 1, value); +} - /// @returns the number of bits held by the queue. - size_t count() const { return count_; } +/// @returns true if the queue holds no bits. +static bool bitqueue_empty(const BitQueue *queue) { return queue->count == 0; } + +/// @returns the number of bits held by the queue. +static size_t bitqueue_count(const BitQueue *queue) { return queue->count; } - private: - std::bitset bits_; - size_t count_ = 0; // number of bits contained - size_t read_offset_ = 0; // read offset in bits - // #if ENABLE_LOGGING - public: - void to_chars(std::string& str) { - std::stringstream ss; - ss << count_ << ":"; - for (auto i = 0; i < count_; ++i) { - bool is_template = (*this)[i]; - ss << (is_template ? "#" : "."); - } - str = ss.str(); +static void bitqueue_to_chars(const BitQueue *queue, char *str) { + sprintf(str, "%zu:", queue->count); + for (size_t i = 0; i < queue->count; ++i) { + strcat(str, bitqueue_get(queue, i) ? "#" : "."); } +} #endif -}; -class Lexer { - public: - Lexer(TSLexer* l) : lexer_(l) {} +typedef struct { + TSLexer *lexer; +} Lexer; - /// Advances the lexer by one code point. - void advance() { lexer_->advance(lexer_, /* whitespace */ false); } +static void lexer_init(Lexer *lexer, TSLexer *l) { lexer->lexer = l; } - /// Returns the next code point, advancing the lexer by one code point. - CodePoint next() { - // TODO(dneto): should assert !lexer_->eof(lexer_) - CodePoint lookahead = lexer_->lookahead; - advance(); - return lookahead; - } +/// Advances the lexer by one code point. +static void lexer_advance(Lexer *lexer) { lexer->lexer->advance(lexer->lexer, false); } - /// @return the next code point without advancing the lexer, or kEOF if there - /// are no more code points - CodePoint peek() { return lexer_->eof(lexer_) ? kEOF : lexer_->lookahead; } +/// Returns the next code point, advancing the lexer by one code point. +static CodePoint lexer_next(Lexer *lexer) { + // TODO(dneto): should assert !lexer_->eof(lexer_) + CodePoint lookahead = lexer->lexer->lookahead; + lexer_advance(lexer); + return lookahead; +} + +/// @return the next code point without advancing the lexer, or kEOF if there +/// are no more code points +static CodePoint lexer_peek(Lexer *lexer) { + return lexer->lexer->eof(lexer->lexer) ? kEOF : lexer->lexer->lookahead; +} + +/// @return true if the next code point is equal to @p code_point. +/// @note if the code point was found, then the lexer is advanced to that code +/// point. +static bool lexer_match(Lexer *lexer, CodePoint code_point) { + if (lexer_peek(lexer) == code_point) { + lexer_advance(lexer); + return true; + } + return false; +} - /// @return true if the next code point is equal to @p code_point. - /// @note if the code point was found, then the lexer is advanced to that code - /// point. - bool match(CodePoint code_point) { - if (peek() == code_point) { - advance(); +/// @return true if the next code point is found in @p code_points. +/// @note if the code point was found, then the lexer is advanced to that code +/// point. +static bool lexer_match_anyof(Lexer *lexer, const CodePoint *code_points, + size_t count) { + for (size_t i = 0; i < count; i++) { + if (lexer_match(lexer, code_points[i])) { return true; } - return false; } + return false; +} - /// @return true if the next code point is found in @p code_points. - /// @note if the code point was found, then the lexer is advanced to that code - /// point. - bool match_anyof(std::initializer_list code_points) { - for (CodePoint code_point : code_points) { - if (match(code_point)) { - return true; - } - } +/// Attempts to match an identifier pattern that starts with XIDStart followed +/// by any number of XIDContinue code points. +static bool lexer_match_identifier(Lexer *lexer) { + if (!is_xid_start(lexer_peek(lexer))) { return false; } - /// Attempts to match an identifier pattern that starts with XIDStart followed by - /// any number of XIDContinue code points. - bool match_identifier() { - if (!is_xid_start(peek())) { - return false; - } + bool is_ascii = true; + CodePoint start = lexer_next(lexer); + if (start >= 0x80) { + is_ascii = false; + } - bool is_ascii = true; - if (CodePoint start = next(); start < 0x80) { - } else { + while (true) { + if (!is_xid_continue(lexer_peek(lexer))) { + break; + } + CodePoint code_point = lexer_next(lexer); + if (code_point >= 0x80) { is_ascii = false; } + } - while (true) { - if (!is_xid_continue(peek())) { - break; - } - if (CodePoint code_point = next(); code_point < 0x80) { - } else { - is_ascii = false; - } - } + if (is_ascii) { + LOG("ident is ascii"); + } else { + LOG("ident"); + } - if (is_ascii) { - LOG("ident is ascii"); - } else { - LOG("ident"); - } + return true; +} - return true; +/// Attempts to match a /* block comment */ +static bool lexer_match_block_comment(Lexer *lexer) { + // TODO(dneto): Need to un-advance if matched '/' but not '*' + if (!lexer_match(lexer, '/') || !lexer_match(lexer, '*')) { + return false; } - /// Attempts to match a /* block comment */ - bool match_block_comment() { - // TODO(dneto): Need to un-advance if matched '/' but not '*' - if (!match('/') || !match('*')) { - return false; - } - - size_t nesting = 1; - while (nesting > 0 && !match(kEOF)) { - // TODO(dneto): If we match '/' but not '*' there is no way to un-advance - // back to make '/' the lookahead. - if (match('/') && match('*')) { - nesting++; + size_t nesting = 1; + while (nesting > 0 && !lexer_match(lexer, kEOF)) { + // TODO(dneto): If we match '/' but not '*' there is no way to un-advance + // back to make '/' the lookahead. + if (lexer_match(lexer, '/') && lexer_match(lexer, '*')) { + nesting++; // TODO(dneto): Same here, need to be able to un-advance to before '*' - } else if (match('*') && match('/')) { - nesting--; - } else { - next(); - } + } else if (lexer_match(lexer, '*') && lexer_match(lexer, '/')) { + nesting--; + } else { + lexer_next(lexer); } - return true; } + return true; +} - /// Advances the lexer while the next code point is considered whitespace - void skip_whitespace() { - while (is_space(peek())) { - lexer_->advance(lexer_, /* whitespace */ true); - } +/// Advances the lexer while the next code point is considered blankspace +static void lexer_skip_blankspace(Lexer *lexer) { + while (is_space(lexer_peek(lexer))) { + lexer->lexer->advance(lexer->lexer, true); } +} - private: - TSLexer* lexer_; -}; +typedef struct { + BitQueue lt_is_tmpl; // Queue of disambiguated '<' + BitQueue gt_is_tmpl; // Queue of disambiguated '>' +} ScannerState; + +typedef struct { + ScannerState state; +} Scanner; + +/* Stack entry for template argument parsing */ +typedef struct { + size_t index; // Index of the opening '>' in lt_is_tmpl + size_t expr_depth; // The value of 'expr_depth' for the opening '<' +} StackEntry; + +/* Dynamic array for StackEntry */ +typedef struct { + StackEntry *data; + size_t size; + size_t capacity; +} StackEntryArray; + +static void stack_entry_array_init(StackEntryArray *array) { + array->data = NULL; + array->size = 0; + array->capacity = 0; +} -struct Scanner { - struct State { - BitQueue<64> lt_is_tmpl; // Queue of disambiguated '<' - BitQueue<64> gt_is_tmpl; // Queue of disambiguated '>' - bool empty() const { return lt_is_tmpl.empty() && gt_is_tmpl.empty(); } - }; - State state; - static_assert(sizeof(State) < TREE_SITTER_SERIALIZATION_BUFFER_SIZE); - // State is trivially copyable, so it can be serialized and deserialized - // with memcpy. - static_assert(std::is_trivially_copyable::value); - - /// Updates #state with the disambiguated '<' and '>' tokens. - /// The following assumptions are made on entry: - /// * lexer has just advanced to the end of an identifier - /// On exit, all '<' and '>' template tokens will be paired up to the closing - /// '>' for the first '<'. - void classify_template_args(Lexer& lexer) { - LOG("classify_template_args()"); - - if (!lexer.match('<')) { - LOG(" missing '<'"); +static void stack_entry_array_push(StackEntryArray *array, StackEntry entry) { + if (array->size == array->capacity) { + size_t new_capacity = array->capacity == 0 ? 1 : array->capacity * 2; + StackEntry *new_data = + realloc(array->data, new_capacity * sizeof(StackEntry)); + if (new_data == NULL) { + /* Handle allocation failure */ return; } + array->data = new_data; + array->capacity = new_capacity; + } + array->data[array->size++] = entry; +} - // The current expression nesting depth. - size_t expr_depth = 0; +static void stack_entry_array_pop(StackEntryArray *array) { + if (array->size > 0) { + array->size--; + } +} - // A stack of '<' tokens. Each is a candidate for the start of a template list. - // Used to pair '<' and '>' tokens at the same expression depth. - struct StackEntry { - size_t index; // Index of the opening '>' in lt_is_tmpl - size_t expr_depth; // The value of 'expr_depth' for the opening '<' - }; - // The stack of unclosed candidates for template-list starts. - std::vector lt_stack; +static StackEntry *stack_entry_array_back(StackEntryArray *array) { + if (array->size > 0) { + return &array->data[array->size - 1]; + } + return NULL; +} - LOG("classify_template_args() '<' (initial)"); - lt_stack.push_back(StackEntry{state.lt_is_tmpl.count(), expr_depth}); - // Default to less-than (or less-than-equal, or left-shift, or left-shift-equal) - state.lt_is_tmpl.push_back(false); +static bool stack_entry_array_empty(StackEntryArray *array) { + return array->size == 0; +} - while (!lt_stack.empty() && !lexer.match(kEOF)) { - lexer.skip_whitespace(); +static void stack_entry_array_clear(StackEntryArray *array) { array->size = 0; } - // TODO: skip line-ending comments. - if (lexer.match_block_comment()) { - continue; - } +static void stack_entry_array_free(StackEntryArray *array) { + free(array->data); + array->data = NULL; + array->size = 0; + array->capacity = 0; +} - // A template list can't contain an assignment or a compound assignment. - // There is logic below which clears the stack when reaching one of those. - // It looks for a '=' code point. But we still want to allow - // comparison operations inside expressions. So we must pre-emptively - // allow operators: == >= <= != - - // Look for a nested template-list. - if (lexer.match_identifier()) { - lexer.skip_whitespace(); // TODO: Skip comments - if (lexer.match('<')) { - LOG("classify_template_args() '<' after ident"); - // Record this '<' in state.lt_is_tmpl, initially treating this as the operator - // in an expression (less-than, less-than, equal, left-shift, or left-shift-equal). - // If this '<' is recorded in lt_stack, and a corresponding '>' is found, then this - // will be transformed into a template-start token. - state.lt_is_tmpl.push_back(false); - - if (lexer.match('=')) { - // We entered the loop at "ident<=". No template arg can start with '=', - // so consider "<=" to be a single token. - // Litmus test: "alias z = a;" - } else if (lexer.match('<')) { - // We entered the loop at "ident<<". No template arg can start with '<', - // so consider "<<" to be a single token. - // Litmus test: "alias z = a;" - state.lt_is_tmpl.push_back(false); - } else { - lt_stack.push_back(StackEntry{state.lt_is_tmpl.count()-1, expr_depth}); - } - } - continue; - } +/// Updates #state with the disambiguated '<' and '>' tokens. +/// The following assumptions are made on entry: +/// * lexer has just advanced to the end of an identifier +/// On exit, all '<' and '>' template tokens will be paired up to the closing +/// '>' for the first '<'. +static void classify_template_args(Scanner *scanner, Lexer *lexer) { + LOG("classify_template_args()"); + + if (!lexer_match(lexer, '<')) { + LOG(" missing '<'"); + return; + } - // Each '<' must be recorded in the lt_is_tmpl queue. - // Each '>' must be recorded in the gt_is_tmpl queue. + // The current expression nesting depth. + size_t expr_depth = 0; - if (lexer.match('<')) { - // Litmus test: "alias z =a<1<()>;" - LOG("classify_template_args() '<'"); - state.lt_is_tmpl.push_back(false); - continue; - } + // A stack of '<' tokens. Each is a candidate for the start of a template + // list. Used to pair '<' and '>' tokens at the same expression depth. + StackEntryArray lt_stack; + stack_entry_array_init(<_stack); + + LOG("classify_template_args() '<' (initial)"); + StackEntry entry = {bitqueue_count(&scanner->state.lt_is_tmpl), expr_depth}; + stack_entry_array_push(<_stack, entry); + // Default to less-than (or less-than-equal, or left-shift, or + // left-shift-equal) + bitqueue_push_back(&scanner->state.lt_is_tmpl, false); + + while (!stack_entry_array_empty(<_stack) && !lexer_match(lexer, kEOF)) { + lexer_skip_blankspace(lexer); + + // TODO: skip line-ending comments. + if (lexer_match_block_comment(lexer)) { + continue; + } - if (lexer.match('>')) { - LOG("classify_template_args() '>'"); - if (!lt_stack.empty() && lt_stack.back().expr_depth == expr_depth) { - LOG(" TEMPLATE MATCH"); - state.gt_is_tmpl.push_back(true); - state.lt_is_tmpl[lt_stack.back().index] = true; - lt_stack.pop_back(); + // A template list can't contain an assignment or a compound assignment. + // There is logic below which clears the stack when reaching one of those. + // It looks for a '=' code point. But we still want to allow + // comparison operations inside expressions. So we must pre-emptively + // allow operators: == >= <= != + + // Look for a nested template-list. + if (lexer_match_identifier(lexer)) { + lexer_skip_blankspace(lexer); + if (lexer_match(lexer, '<')) { + LOG("classify_template_args() '<' after ident"); + bitqueue_push_back(&scanner->state.lt_is_tmpl, false); + + if (lexer_match(lexer, '=')) { + // We entered the loop at "ident<=". No template arg can start with + // '=', so consider "<=" to be a single token. Litmus test: "alias z + // = a;" + } else if (lexer_match(lexer, '<')) { + // We entered the loop at "ident<<". No template arg can start with + // '<', so consider "<<" to be a single token. Litmus test: "alias z + // = a;" + bitqueue_push_back(&scanner->state.lt_is_tmpl, false); } else { - LOG(" non-template '>'"); - state.gt_is_tmpl.push_back(false); - // Pre-emptvely allow >= as a comparison operator: - // Skip over '=', if present. - lexer.match('='); + StackEntry new_entry = { + bitqueue_count(&scanner->state.lt_is_tmpl) - 1, expr_depth}; + stack_entry_array_push(<_stack, new_entry); } - continue; } + continue; + } - // Pre-emptively allow the != operator. - // As a side effect, allow unary negation operator ! - if (lexer.match('!')) { - lexer.match('='); - continue; - } + // Each '<' must be recorded in the lt_is_tmpl queue. + // Each '>' must be recorded in the gt_is_tmpl queue. - CodePoint was = lexer.peek(); - if (lexer.match_anyof({'(', '['})) { - LOG(" %c expr_depth++", static_cast(was)); - // Entering a nested expression - expr_depth++; - continue; - } + if (lexer_match(lexer, '<')) { + // Litmus test: "alias z =a<1<()>;" + LOG("classify_template_args() '<'"); + bitqueue_push_back(&scanner->state.lt_is_tmpl, false); + continue; + } - if (lexer.match_anyof({')', ']'})) { - LOG(" %c expr_depth--", static_cast(was)); - // Exiting a nested expression - // Pop the stack until we return to the current expression - // expr_depth - while (!lt_stack.empty() && lt_stack.back().expr_depth == expr_depth) { - lt_stack.pop_back(); - } - if (expr_depth > 0) { - expr_depth--; - } - continue; + if (lexer_match(lexer, '>')) { + LOG("classify_template_args() '>'"); + StackEntry *back = stack_entry_array_back(<_stack); + if (back != NULL && back->expr_depth == expr_depth) { + LOG(" TEMPLATE MATCH"); + bitqueue_push_back(&scanner->state.gt_is_tmpl, true); + bitqueue_set(&scanner->state.lt_is_tmpl, back->index, true); + stack_entry_array_pop(<_stack); + } else { + LOG(" non-template '>'"); + bitqueue_push_back(&scanner->state.gt_is_tmpl, false); + // Pre-emptvely allow >= as a comparison operator: + // Skip over '=', if present. + lexer_match(lexer, '='); } + continue; + } - was = lexer.peek(); - if (lexer.match('=')) { - // A subtle point. The '=' we just matched might be the start of a - // syntactic token, or the end of a compound-assignment operator like += - // In either case, it's fine to proceed with the logic below. + // Pre-emptively allow the != operator. + // As a side effect, allow unary negation operator ! + if (lexer_match(lexer, '!')) { + lexer_match(lexer, '='); + continue; + } - if (lexer.match('=')) { - // Pre-emptively allow equality == - continue; - } - // A template list can't contain an assignment, because an expression - // can't contain an assignment. - // This might be a regular assignment, or the tail end of a compound - // assignment. - LOG(" %c expression terminator", was); - expr_depth = 0; - lt_stack.clear(); - continue; - } + CodePoint was = lexer_peek(lexer); + if (lexer_match(lexer, '(') || lexer_match(lexer, '[')) { + LOG(" %c expr_depth++", (int)was); + // Entering a nested expression + expr_depth++; + continue; + } - was = lexer.peek(); - if (lexer.match_anyof({';', '{', ':'})) { - LOG(" %c expression terminator", was); - // Expression terminating tokens. No template list can - // hold these code points, so clear the stack and expression depth. - expr_depth = 0; - lt_stack.clear(); - continue; + if (lexer_match(lexer, ')') || lexer_match(lexer, ']')) { + LOG(" %c expr_depth--", (int)was); + // Exiting a nested expression + // Pop the stack until we return to the current expression + // expr_depth + while (!stack_entry_array_empty(<_stack) && + stack_entry_array_back(<_stack)->expr_depth == expr_depth) { + stack_entry_array_pop(<_stack); } - - bool short_circuit = false; - if (lexer.match('&')) { - short_circuit = lexer.match('&'); - } else if (lexer.match('|')) { - short_circuit = lexer.match('|'); + if (expr_depth > 0) { + expr_depth--; } - if (short_circuit) { - LOG(" short-circuiting expression"); - // Treat 'a < b || c > d' as a logical binary operator of two - // comparison operators instead of a single template argument - // 'b||c'. Use parentheses around 'b||c' to parse as a - // template argument list. - while (!lt_stack.empty() && lt_stack.back().expr_depth == expr_depth) { - lt_stack.pop_back(); - } + continue; + } + + was = lexer_peek(lexer); + if (lexer_match(lexer, '=')) { + // A subtle point. The '=' we just matched might be the start of a + // syntactic token, or the end of a compound-assignment operator like += + // In either case, it's fine to proceed with the logic below. + + if (lexer_match(lexer, '=')) { + // Pre-emptively allow equality == continue; } + // A template list can't contain an assignment, because an expression + // can't contain an assignment. + // This might be a regular assignment, or the tail end of a compound + // assignment. + LOG(" %c expression terminator", (int)was); + expr_depth = 0; + stack_entry_array_clear(<_stack); + continue; + } - LOG(" skip: '%c'",char(lexer.peek())); - lexer.next(); + was = lexer_peek(lexer); + if (lexer_match(lexer, ';') || lexer_match(lexer, '{') || + lexer_match(lexer, ':')) { + LOG(" %c expression terminator", (int)was); + // Expression terminating tokens. No template list can + // hold these code points, so clear the stack and expression depth. + expr_depth = 0; + stack_entry_array_clear(<_stack); + continue; } - } - std::string valids(const bool* const valid_symbols) { - std::string result; - for (int i = 0; i < static_cast(ERROR) ; i++) { - result += std::string(valid_symbols[i] ? "+" : "_"); + bool short_circuit = false; + if (lexer_match(lexer, '&')) { + short_circuit = lexer_match(lexer, '&'); + } else if (lexer_match(lexer, '|')) { + short_circuit = lexer_match(lexer, '|'); } - for (int i = 0; i < static_cast(ERROR) ; i++) { - if (valid_symbols[i]) { - result += std::string(" ") + str(static_cast(i),true); + if (short_circuit) { + LOG(" short-circuiting expression"); + // Treat 'a < b || c > d' as a logical binary operator of two + // comparison operators instead of a single template argument + // 'b||c'. Use parentheses around 'b||c' to parse as a + // template argument list. + while (!stack_entry_array_empty(<_stack) && + stack_entry_array_back(<_stack)->expr_depth == expr_depth) { + stack_entry_array_pop(<_stack); } + continue; } - return result; - } - /// The external token scanner function. Handles block comments and - /// template-argument-list vs less-than / greater-than disambiguation. - /// @return true if lexer->result_symbol was assigned a Token, or - /// false if the token should be taken from the regular WGSL tree-sitter - /// grammar. - bool scan(TSLexer* ts_lexer, const bool* const valid_symbols) { - Lexer lexer{ts_lexer}; + LOG(" skip: '%c'", (char)lexer_peek(lexer)); + lexer_next(lexer); + } - LOG("scan: '%c' [%u] %s", char(lexer.peek()), unsigned(ts_lexer->get_column(ts_lexer)), valids(valid_symbols).c_str()); + stack_entry_array_free(<_stack); +} - if (valid_symbols[Token::ERROR]) { - ts_lexer->result_symbol = Token::ERROR; - return true; +static char *valids(const bool *const valid_symbols) { + static char result[256]; + char *p = result; + for (int i = 0; i < ERROR; i++) { + *p++ = valid_symbols[i] ? '+' : '_'; + } + *p++ = ' '; + for (int i = 0; i < ERROR; i++) { + if (valid_symbols[i]) { + p += sprintf(p, " %s", tree_sitter_wgsl_str((enum Token)i, true)); } + } + *p = '\0'; + return result; +} - if (valid_symbols[Token::DISAMBIGUATE_TEMPLATE]) { - // The parser is telling us the _disambiguate_template token - // may appear at the current position. - // The next token may be the start of a template list, so - // scan forward and use the token-list disambiguation - // algorithm to mark template-list-start and template-list-end - // tokens. These are recorded in the lt and gt bit queues. +/// The external token scanner function. Handles block comments and +/// template-argument-list vs less-than / greater-than disambiguation. +/// @return true if lexer->result_symbol was assigned a Token, or +/// false if the token should be taken from the regular WGSL tree-sitter +/// grammar. +static bool scanner_scan(Scanner *scanner, TSLexer *ts_lexer, + const bool *const valid_symbols) { + Lexer lexer; + lexer_init(&lexer, ts_lexer); + + LOG("scan: '%c' [%u] %s", (char)lexer_peek(&lexer), + ts_lexer->get_column(ts_lexer), valids(valid_symbols)); + + if (valid_symbols[ERROR]) { + ts_lexer->result_symbol = ERROR; + return true; + } - // Call mark_end so that we can "advance" past codepoints without - // automatically including them in the resulting token. - ts_lexer->mark_end(ts_lexer); - ts_lexer->result_symbol = Token::DISAMBIGUATE_TEMPLATE; - - // TODO(dneto): should also skip comments, both line comments - // and block comments. - // https://github.com/gpuweb/gpuweb/issues/3876 - lexer.skip_whitespace(); - if (lexer.peek() == '<') { - if (state.lt_is_tmpl.empty()) { - classify_template_args(lexer); - } + if (valid_symbols[DISAMBIGUATE_TEMPLATE]) { + // The parser is telling us the _disambiguate_template token + // may appear at the current position. + // The next token may be the start of a template list, so + // scan forward and use the token-list disambiguation + // algorithm to mark template-list-start and template-list-end + // tokens. These are recorded in the lt and gt bit queues. + + // Call mark_end so that we can "advance" past codepoints without + // automatically including them in the resulting token. + ts_lexer->mark_end(ts_lexer); + ts_lexer->result_symbol = DISAMBIGUATE_TEMPLATE; + + // TODO(dneto): should also skip comments, both line comments + // and block comments. + // https://github.com/gpuweb/gpuweb/issues/3876 + lexer_skip_blankspace(&lexer); + if (lexer_peek(&lexer) == '<') { + if (bitqueue_empty(&scanner->state.lt_is_tmpl)) { + classify_template_args(scanner, &lexer); } - - // This has to return true so that Treesitter will save - // the state generated by the disambiguation scan. - return true; } - lexer.skip_whitespace(); + // This has to return true so that Treesitter will save + // the state generated by the disambiguation scan. + return true; + } + + lexer_skip_blankspace(&lexer); + + // TODO(dneto): checkpoint and rewind if failed. + if (lexer_match_block_comment(&lexer)) { + ts_lexer->mark_end(ts_lexer); + ts_lexer->result_symbol = BLOCK_COMMENT; + return true; + } - auto match = [&](Token token) { + // TODO(dneto): Check valid array first. + if (lexer_match(&lexer, '<')) { + if (!bitqueue_empty(&scanner->state.lt_is_tmpl) && + bitqueue_pop_front(&scanner->state.lt_is_tmpl)) { ts_lexer->mark_end(ts_lexer); - ts_lexer->result_symbol = token; + ts_lexer->result_symbol = TEMPLATE_ARGS_START; return true; - }; - - // TODO(dneto): checkpoint and rewind if failed. - if (lexer.match_block_comment()) { - return match(Token::BLOCK_COMMENT); } - - // TODO(dneto): Check valid array first. - if (lexer.match('<')) { - if (!state.lt_is_tmpl.empty() && state.lt_is_tmpl.pop_front()) { - return match(Token::TEMPLATE_ARGS_START); - } - if (lexer.match('=')) { - return match(Token::LESS_THAN_EQUAL); + if (lexer_match(&lexer, '=')) { + ts_lexer->mark_end(ts_lexer); + ts_lexer->result_symbol = LESS_THAN_EQUAL; + return true; + } + if (lexer_match(&lexer, '<')) { + // Consume the '<' in the lt queue. + // Litmus test: "alias z = a<1<()>;" + if (!bitqueue_empty(&scanner->state.lt_is_tmpl)) { + bitqueue_pop_front(&scanner->state.lt_is_tmpl); } - if (lexer.match('<')) { - // Consume the '<' in the lt queue. - // Litmus test: "alias z = a<1<()>;" - if (!state.lt_is_tmpl.empty()) { - state.lt_is_tmpl.pop_front(); - } - if (lexer.match('=')) { - return match(Token::SHIFT_LEFT_ASSIGN); - } - return match(Token::SHIFT_LEFT); + if (lexer_match(&lexer, '=')) { + ts_lexer->mark_end(ts_lexer); + ts_lexer->result_symbol = SHIFT_LEFT_ASSIGN; + return true; } - return match(Token::LESS_THAN); + ts_lexer->mark_end(ts_lexer); + ts_lexer->result_symbol = SHIFT_LEFT; + return true; } + ts_lexer->mark_end(ts_lexer); + ts_lexer->result_symbol = LESS_THAN; + return true; + } - // TODO(dneto): check valid array first. - if (lexer.match('>')) { - if (!state.gt_is_tmpl.empty() && state.gt_is_tmpl.pop_front()) { - return match(Token::TEMPLATE_ARGS_END); - } - if (lexer.match('=')) { - return match(Token::GREATER_THAN_EQUAL); + // TODO(dneto): check valid array first. + if (lexer_match(&lexer, '>')) { + if (!bitqueue_empty(&scanner->state.gt_is_tmpl) && + bitqueue_pop_front(&scanner->state.gt_is_tmpl)) { + ts_lexer->mark_end(ts_lexer); + ts_lexer->result_symbol = TEMPLATE_ARGS_END; + return true; + } + if (lexer_match(&lexer, '=')) { + ts_lexer->mark_end(ts_lexer); + ts_lexer->result_symbol = GREATER_THAN_EQUAL; + return true; + } + if (lexer_match(&lexer, '>')) { + // Consume the '>' in the gt queue. + if (!bitqueue_empty(&scanner->state.gt_is_tmpl)) { + bitqueue_pop_front(&scanner->state.gt_is_tmpl); } - if (lexer.match('>')) { - // Consume the '>' in the gt queue. - if (!state.gt_is_tmpl.empty()) { - state.gt_is_tmpl.pop_front(); - } - if (lexer.match('=')) { - return match(Token::SHIFT_RIGHT_ASSIGN); - } - return match(Token::SHIFT_RIGHT); + if (lexer_match(&lexer, '=')) { + ts_lexer->mark_end(ts_lexer); + ts_lexer->result_symbol = SHIFT_RIGHT_ASSIGN; + return true; } - return match(Token::GREATER_THAN); + ts_lexer->mark_end(ts_lexer); + ts_lexer->result_symbol = SHIFT_RIGHT; + return true; } - - return false; // Use regular parsing + ts_lexer->mark_end(ts_lexer); + ts_lexer->result_symbol = GREATER_THAN; + return true; } - /// Serializes the scanner state into @p buffer. - unsigned serialize(char* buffer) { - if (state.empty()) { - return 0; - } + return false; // Use regular parsing +} + +/// Serializes the scanner state into @p buffer. +static unsigned scanner_serialize(Scanner *scanner, char *buffer) { + if (bitqueue_empty(&scanner->state.lt_is_tmpl) && + bitqueue_empty(&scanner->state.gt_is_tmpl)) { + return 0; + } #if ENABLE_LOGGING - std::string lt_str; state.lt_is_tmpl.to_chars(lt_str); - std::string gt_str; state.gt_is_tmpl.to_chars(gt_str); - LOG("serialize(lt_is_tmpl: %s, gt_is_tmpl: %s)", - lt_str.c_str(), gt_str.c_str()); + char lt_str[256], gt_str[256]; + bitqueue_to_chars(&scanner->state.lt_is_tmpl, lt_str); + bitqueue_to_chars(&scanner->state.gt_is_tmpl, gt_str); + LOG("serialize(lt_is_tmpl: %s, gt_is_tmpl: %s)", lt_str, gt_str); #endif - size_t bytes_written = 0; - auto write = [&](const void* data, size_t num_bytes) { - assert(bytes_written + num_bytes <= - TREE_SITTER_SERIALIZATION_BUFFER_SIZE); - memcpy(buffer + bytes_written, data, num_bytes); - bytes_written += num_bytes; - }; - write(&state.lt_is_tmpl, sizeof(state.lt_is_tmpl)); - write(&state.gt_is_tmpl, sizeof(state.gt_is_tmpl)); - // TODO(dneto): implicit conversion be narrowing. - return bytes_written; - } + size_t bytes_written = 0; + memcpy(buffer + bytes_written, &scanner->state.lt_is_tmpl, + sizeof(scanner->state.lt_is_tmpl)); + bytes_written += sizeof(scanner->state.lt_is_tmpl); + memcpy(buffer + bytes_written, &scanner->state.gt_is_tmpl, + sizeof(scanner->state.gt_is_tmpl)); + bytes_written += sizeof(scanner->state.gt_is_tmpl); + // TODO(dneto): implicit conversion be narrowing. + return (unsigned)bytes_written; +} - /// Deserializes the scanner state from @p buffer. - void deserialize(const char* const buffer, unsigned length) { - if (length == 0) { - state = {}; - } else { - size_t bytes_read = 0; - auto read = [&](void* data, size_t num_bytes) { - assert(bytes_read + num_bytes <= length); - memcpy(data, buffer + bytes_read, num_bytes); - bytes_read += num_bytes; - }; - read(&state.lt_is_tmpl, sizeof(state.lt_is_tmpl)); - read(&state.gt_is_tmpl, sizeof(state.gt_is_tmpl)); +/// Deserializes the scanner state from @p buffer. +static void scanner_deserialize(Scanner *scanner, const char *buffer, + unsigned length) { + if (length == 0) { + memset(&scanner->state, 0, sizeof(scanner->state)); + } else { + size_t bytes_read = 0; + memcpy(&scanner->state.lt_is_tmpl, buffer + bytes_read, + sizeof(scanner->state.lt_is_tmpl)); + bytes_read += sizeof(scanner->state.lt_is_tmpl); + memcpy(&scanner->state.gt_is_tmpl, buffer + bytes_read, + sizeof(scanner->state.gt_is_tmpl)); + bytes_read += sizeof(scanner->state.gt_is_tmpl); #if ENABLE_LOGGING - std::string lt_str; state.lt_is_tmpl.to_chars(lt_str); - std::string gt_str; state.gt_is_tmpl.to_chars(gt_str); - LOG("deserialize(lt_is_tmpl: %s, gt_is_tmpl: %s)", - lt_str.c_str(), gt_str.c_str()); + char lt_str[256], gt_str[256]; + bitqueue_to_chars(&scanner->state.lt_is_tmpl, lt_str); + bitqueue_to_chars(&scanner->state.gt_is_tmpl, gt_str); + LOG("deserialize(lt_is_tmpl: %s, gt_is_tmpl: %s)", lt_str, gt_str); #endif - assert(bytes_read == length); - } + assert(bytes_read == length); } -}; - -} // anonymous namespace - -extern "C" { +} // Called once when language is set on a parser. // Allocates memory for storing scanner state. -void* tree_sitter_wgsl_external_scanner_create() { - return new Scanner(); +void *tree_sitter_wgsl_external_scanner_create() { + Scanner *scanner = (Scanner *)calloc(1, sizeof(Scanner)); + return scanner; } // Called once parser is deleted or different language set. // Frees memory storing scanner state. -void tree_sitter_wgsl_external_scanner_destroy(void* const payload) { - Scanner* const scanner = static_cast(payload); - delete scanner; +void tree_sitter_wgsl_external_scanner_destroy(void *payload) { + Scanner *scanner = (Scanner *)payload; + free(scanner); } // Called whenever this scanner recognizes a token. // Serializes scanner state into buffer. -unsigned tree_sitter_wgsl_external_scanner_serialize(void* const payload, - char* const buffer) { - Scanner* scanner = static_cast(payload); - return scanner->serialize(buffer); +unsigned tree_sitter_wgsl_external_scanner_serialize(void *payload, + char *buffer) { + Scanner *scanner = (Scanner *)payload; + return scanner_serialize(scanner, buffer); } // Called when handling edits and ambiguities. // Deserializes scanner state from buffer. -void tree_sitter_wgsl_external_scanner_deserialize(void* const payload, - const char* const buffer, - unsigned const length) { - Scanner* const scanner = static_cast(payload); - scanner->deserialize(buffer, length); +void tree_sitter_wgsl_external_scanner_deserialize(void *payload, + const char *buffer, + unsigned length) { + Scanner *scanner = (Scanner *)payload; + scanner_deserialize(scanner, buffer, length); } // Scans for tokens. -bool tree_sitter_wgsl_external_scanner_scan(void* const payload, - TSLexer* const lexer, - const bool* const valid_symbols) { - Scanner* const scanner = static_cast(payload); - if (scanner->scan(lexer, valid_symbols)) { - LOG("scan returned: %s", str(static_cast(lexer->result_symbol))); +bool tree_sitter_wgsl_external_scanner_scan(void *payload, TSLexer *lexer, + const bool *valid_symbols) { + Scanner *scanner = (Scanner *)payload; + if (scanner_scan(scanner, lexer, valid_symbols)) { + LOG("scan returned: %s", + tree_sitter_wgsl_str((enum Token)lexer->result_symbol, false)); return true; } return false; } - -} // extern "C" diff --git a/wgsl/tools/extract-grammar.py b/wgsl/tools/extract-grammar.py index ca2fbaff19..ed50fc1294 100755 --- a/wgsl/tools/extract-grammar.py +++ b/wgsl/tools/extract-grammar.py @@ -13,8 +13,6 @@ import string import shutil -import wgsl_unit_tests - from distutils.ccompiler import new_compiler from distutils.unixccompiler import UnixCCompiler from tree_sitter import Language, Parser @@ -34,7 +32,7 @@ def __init__(self,bs_filename, tree_sitter_dir, scanner_cc_filename, syntax_file self.bs_filename = bs_filename self.grammar_dir = tree_sitter_dir self.scanner_cc_filename = scanner_cc_filename - self.wgsl_shared_lib = os.path.join(self.grammar_dir,"build","wgsl.so") + self.wgsl_shared_lib = os.path.join(self.grammar_dir,"dist","tree_sitter_wgsl-0.0.7.tar.gz") self.grammar_filename = os.path.join(self.grammar_dir,"grammar.js") self.syntax_filename = syntax_filename self.syntax_dir = syntax_dir @@ -1129,18 +1127,6 @@ def not_token_only(value): for line in scan_result['raw']: previous_file.write(line) - with open(os.path.join(options.grammar_dir,"package.json"), "w") as grammar_package: - grammar_package.write('{\n') - grammar_package.write(' "name": "tree-sitter-wgsl",\n') - grammar_package.write(' "dependencies": {\n') - grammar_package.write(' "nan": "' + value_from_dotenv("NPM_NAN_VERSION") + '"\n') - grammar_package.write(' },\n') - grammar_package.write(' "devDependencies": {\n') - grammar_package.write(' "tree-sitter-cli": "' + value_from_dotenv("NPM_TREE_SITTER_CLI_VERSION") + '"\n') - grammar_package.write(' },\n') - grammar_package.write(' "main": "bindings/node"\n') - grammar_package.write('}\n') - return True def flow_build(options): @@ -1158,21 +1144,12 @@ def flow_build(options): # See: https://github.com/tree-sitter/tree-sitter-rust/blob/master/src/scanner.c os.makedirs(os.path.join(options.grammar_dir, "src"), exist_ok=True) - - # Remove the old custom scanner, if it exists. - scanner_c_staging = os.path.join(options.grammar_dir, "src", "scanner.c") - if os.path.exists(scanner_c_staging): - os.remove(scanner_c_staging) - # Copy the new scanner into place, if newer - scanner_cc_staging = os.path.join(options.grammar_dir, "src", "scanner.cc") - if newer_than(options.scanner_cc_filename, scanner_cc_staging): - shutil.copyfile(options.scanner_cc_filename, scanner_cc_staging) - + scanner_cc_staging = os.path.join(options.grammar_dir, "src", "scanner.c") # Use "npm install" to create the tree-sitter CLI that has WGSL # support. But "npm install" fetches data over the network. # That can be flaky, so only invoke it when needed. - if os.path.exists("grammar/node_modules/tree-sitter-cli") and os.path.exists("grammar/node_modules/nan"): + if os.path.exists("grammar/node_modules/tree-sitter-cli"): # "npm install" has been run already. pass else: @@ -1185,49 +1162,26 @@ def flow_build(options): # cwd=options.grammar_dir, check=True) def build_library(output_path, input_files): - # The py-tree-sitter build_library method with C++17 flags """ - Build a dynamic library at the given path, based on the parser - repositories at the given paths. - - Returns `True` if the dynamic library was compiled and `False` if - the library already existed and was modified more recently than - any of the source files. + Run `python3 -m pip install -e . --user --break-system-packages` + in grammar_dir to install the tree-sitter language package """ - - cpp = False - source_paths = [] - for input_file in input_files: - source_paths.append(input_file) - if input_file.endswith(".cc"): - cpp = True - - compiler = new_compiler() - if isinstance(compiler, UnixCCompiler): - compiler.set_executables(compiler_cxx="c++") - - with TemporaryDirectory(suffix="tree_sitter_language") as out_dir: - object_paths = [] - for source_path in source_paths: - flags = ["-fPIC"] - if source_path.endswith(".c"): - flags.append("-std=c99") - else: - flags.append("-std=c++17") - object_paths.append( - compiler.compile( - [source_path], - output_dir=out_dir, - include_dirs=[os.path.dirname(source_path)], - extra_preargs=flags, - )[0] - ) - compiler.link_shared_object( - object_paths, - output_path, - target_lang="c++" if cpp else "c", + try: + subprocess.run( + ["python3", "-m", "pip", "install", "-e", ".", "--user", "--break-system-packages"], + check=True, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + text=True, + cwd=options.grammar_dir ) - + print("Tree-sitter language package installed successfully.") + except subprocess.CalledProcessError as e: + print(f"Error installing tree-sitter language package: {e}") + print(f"stdout: {e.stdout}") + print(f"stderr: {e.stderr}") + return False + if newer_than(scanner_cc_staging, options.wgsl_shared_lib) or newer_than(options.grammar_filename,options.wgsl_shared_lib): print("{}: ...Building custom scanner: {}".format(options.script,options.wgsl_shared_lib)) build_library(options.wgsl_shared_lib, @@ -1246,10 +1200,10 @@ def flow_examples(options,scan_result): print("{}: Examples...".format(options.script)) examples = scan_result['example'] - WGSL_LANGUAGE = Language(options.wgsl_shared_lib, "wgsl") + import tree_sitter_wgsl + WGSL_LANGUAGE = Language(tree_sitter_wgsl.language()) - parser = Parser() - parser.set_language(WGSL_LANGUAGE) + parser = Parser(WGSL_LANGUAGE) errors = 0 for key, value in examples.items(): @@ -1374,6 +1328,7 @@ def main(): if not flow_examples(options,scan_result): return 1 if 't' in args.flow: + import wgsl_unit_tests test_options = wgsl_unit_tests.Options(options.wgsl_shared_lib) if not wgsl_unit_tests.run_tests(test_options): return 1 diff --git a/wgsl/tools/wgsl_unit_tests.py b/wgsl/tools/wgsl_unit_tests.py index 4b5b953527..6f42ebefbf 100644 --- a/wgsl/tools/wgsl_unit_tests.py +++ b/wgsl/tools/wgsl_unit_tests.py @@ -36,6 +36,7 @@ import os import sys from tree_sitter import Language, Parser +import tree_sitter_wgsl from TSPath import TSPath SCRIPT='wgsl_unit_tests.py' @@ -100,12 +101,9 @@ def run_tests(options): Returns True if all tests passed """ global cases - if not os.path.exists(options.shared_lib): - raise RuntimeException("missing shared library {}",options.shared_lib) - language = Language(options.shared_lib, "wgsl") - parser = Parser() - parser.set_language(language) + language = Language(tree_sitter_wgsl.language()) + parser = Parser(language) print("{}: ".format(SCRIPT),flush=True,end='') From 566f047a6b161c4ef2cb9d927cb710a9f3c3d578 Mon Sep 17 00:00:00 2001 From: alan-baker Date: Mon, 7 Oct 2024 13:30:43 -0400 Subject: [PATCH 221/285] Make image reads and writes non-private (#4913) * This should have been done along with read/write storage images Co-authored-by: Mehmet Oguz Derin --- wgsl/index.bs | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/wgsl/index.bs b/wgsl/index.bs index 6544bce2bf..912a57aeb9 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -10798,6 +10798,16 @@ All non-atomic [=write accesses=] in the [=address spaces/storage=] or with `NonPrivatePointer | MakePointerAvailable` memory operands with the `Workgroup` scope. +All non-atomic [=read accesses=] in the the [=address spaces/handle=] address +space are considered [=memory model non-private|non-private=] and correspond to +read operations with `NonPrivateTexel | MakeTexelVisible` memory operands with +the `Workgroup` scope. + +All non-atomic [=write accesses=] in the [=address spaces/handle=] address +space are considered [=memory model non-private|non-private=] and correspond to +write operations with `NonPrivateTexel | MakeTexelAvailable` memory operands +with the `Workgroup` scope. + # Execution # {#execution} [[#overview]] describes how a shader is invoked and partitioned into [=invocations=]. From 204f038b8663ba680231d95d2fc2df69697658f0 Mon Sep 17 00:00:00 2001 From: seven332 Date: Wed, 9 Oct 2024 21:03:09 +0800 Subject: [PATCH 222/285] Fix typo "vec3(e4,e4,e6)" -> "vec3(e4,e5,e6)" (#4915) --- wgsl/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 912a57aeb9..b9310e9f33 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -13346,7 +13346,7 @@ specify the component type; the component type is inferred from the constructor
Description Construct a 3x3 column-major [=matrix=] from elements. - Same as mat3x3(vec3(e1,e2,e3), vec3(e4,e4,e6), vec3(e7,e8,e9)). + Same as mat3x3(vec3(e1,e2,e3), vec3(e4,e5,e6), vec3(e7,e8,e9)).
#### `mat3x4` #### {#mat3x4-builtin} From 245abe28a8adb90595af97c1a044b4a60b2d92b0 Mon Sep 17 00:00:00 2001 From: Samson <16504129+sagudev@users.noreply.github.com> Date: Wed, 9 Oct 2024 20:13:57 +0200 Subject: [PATCH 223/285] Check for supported context formats in `configure()` on content timeline (#4911) This fixes https://github.com/gpuweb/gpuweb/issues/4906 --- spec/index.bs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 376e9085ed..c09f28fda7 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -13992,7 +13992,9 @@ interface GPUCanvasContext { 1. [=?=] [$Validate texture format required features$] of |configuration|.{{GPUCanvasConfiguration/format}} with |device|.{{GPUObjectBase/[[device]]}}. 1. [=?=] [$Validate texture format required features$] of each element of - |configuration|.{{GPUTextureDescriptor/viewFormats}} with |device|.{{GPUObjectBase/[[device]]}}. + |configuration|.{{GPUCanvasConfiguration/viewFormats}} with |device|.{{GPUObjectBase/[[device]]}}. + 1. If [=Supported context formats=] does not [=set/contain=] + |configuration|.{{GPUCanvasConfiguration/format}}, throw a {{TypeError}}. 1. Let |descriptor| be the [$GPUTextureDescriptor for the canvas and configuration$](|this|.{{GPUCanvasContext/canvas}}, |configuration|). 1. Set |this|.{{GPUCanvasContext/[[configuration]]}} to |configuration|. @@ -14008,8 +14010,6 @@ interface GPUCanvasContext {
- [$validating GPUTextureDescriptor$](|device|, |descriptor|) must return true. - - [=Supported context formats=] must [=set/contain=] - |configuration|.{{GPUCanvasConfiguration/format}}.
Note: This early validation remains valid until the next From 3846d7836654f2a7d9dac0dd4d78bed141315cf9 Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Wed, 9 Oct 2024 11:14:15 -0700 Subject: [PATCH 224/285] Add "float32-blendable" feature (#4896) --- spec/index.bs | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index c09f28fda7..a3683581a1 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -2812,6 +2812,7 @@ enum GPUFeatureName { "rg11b10ufloat-renderable", "bgra8unorm-storage", "float32-filterable", + "float32-blendable", "clip-distances", "dual-source-blending", }; @@ -16390,6 +16391,12 @@ This feature adds no [=optional API surfaces=]. Makes textures with formats {{GPUTextureFormat/"r32float"}}, {{GPUTextureFormat/"rg32float"}}, and {{GPUTextureFormat/"rgba32float"}} [=filterable=]. +

`"float32-blendable"` +

+ +Makes textures with formats {{GPUTextureFormat/"r32float"}}, {{GPUTextureFormat/"rg32float"}}, and +{{GPUTextureFormat/"rgba32float"}} [=blendable=]. +

`"clip-distances"`

@@ -16758,7 +16765,7 @@ The [=texel block memory cost=] of each of these formats is the same as its - {{GPUTextureSampleType/"float"}} if {{GPUFeatureName/"float32-filterable"}} is enabled ✓ - + If {{GPUFeatureName/"float32-blendable"}} is enabled ✓ ✓ @@ -16793,7 +16800,7 @@ The [=texel block memory cost=] of each of these formats is the same as its - {{GPUTextureSampleType/"float"}} if {{GPUFeatureName/"float32-filterable"}} is enabled ✓ - + If {{GPUFeatureName/"float32-blendable"}} is enabled ✓ @@ -16828,7 +16835,7 @@ The [=texel block memory cost=] of each of these formats is the same as its - {{GPUTextureSampleType/"float"}} if {{GPUFeatureName/"float32-filterable"}} is enabled ✓ - + If {{GPUFeatureName/"float32-blendable"}} is enabled ✓ From e7427e49f2949aa57ecab9f6bebb66026cb50e55 Mon Sep 17 00:00:00 2001 From: David Neto Date: Thu, 17 Oct 2024 17:24:26 -0400 Subject: [PATCH 225/285] Fix WGSL grammar flow for Python virtual environments (#4928) 1. move 'tree-sitter generate' step before npm install. 2. If we're operating in a virtual environment then user site packages are not accessible. In that case do an ordinary 'pip install' instead of a user-install of the tree-sitter-wgsl package. --- wgsl/tools/extract-grammar.py | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/wgsl/tools/extract-grammar.py b/wgsl/tools/extract-grammar.py index ed50fc1294..dac58d6ef0 100755 --- a/wgsl/tools/extract-grammar.py +++ b/wgsl/tools/extract-grammar.py @@ -28,7 +28,7 @@ class Options(): A class to store various options including file paths and verbosity. """ def __init__(self,bs_filename, tree_sitter_dir, scanner_cc_filename, syntax_filename, syntax_dir): - self.script = 'extract-grammar.py' + self.script = os.path.basename(__file__) self.bs_filename = bs_filename self.grammar_dir = tree_sitter_dir self.scanner_cc_filename = scanner_cc_filename @@ -760,7 +760,7 @@ def read_spec(options): last_value) if scanner_span.name() == scanner_token.name(): result[scanner_span.name()][last_key] = last_value - scanner_i += scanner_parse[-1] # Advance line index + scanner_i += scanner_parse[-1] # Advance line index for j in scanner_spans: if scanner_span == j: # Check if we should stop using this scanner. @@ -925,7 +925,7 @@ def reproduce_rule(production): print("ERROR: Syntax source should match reproduction for styling and language") print("\n".join(difflib.unified_diff(syntax_source.splitlines(), syntax_target.splitlines()))) sys.exit(1) - + result[scanner_rule.name()] = syntax_dict return result @@ -1146,16 +1146,25 @@ def flow_build(options): os.makedirs(os.path.join(options.grammar_dir, "src"), exist_ok=True) scanner_cc_staging = os.path.join(options.grammar_dir, "src", "scanner.c") + if os.path.exists("grammar/src/tree_sitter/parser.h"): + print("{}: skipping tree-sitter generate: grammar/src/tree_sitter/parser.h already exists".format(options.script)) + else: + cmd = ["npx", "tree-sitter-cli@" + value_from_dotenv("NPM_TREE_SITTER_CLI_VERSION"), "generate"] + print("{}: {}".format(options.script, " ".join(cmd))) + subprocess.run(cmd, cwd=options.grammar_dir, check=True) + # Use "npm install" to create the tree-sitter CLI that has WGSL # support. But "npm install" fetches data over the network. # That can be flaky, so only invoke it when needed. if os.path.exists("grammar/node_modules/tree-sitter-cli"): # "npm install" has been run already. + print("{}: skipping npm install: grammar/node_modules/tree-sitter-cli already exists".format(options.script)) pass else: - subprocess.run(["npm", "install"], cwd=options.grammar_dir, check=True) - subprocess.run(["npx", "tree-sitter-cli@" + value_from_dotenv("NPM_TREE_SITTER_CLI_VERSION"), "generate"], - cwd=options.grammar_dir, check=True) + cmd = ["npm", "install"] + print("{}: {}".format(options.script, " ".join(cmd))) + subprocess.run(cmd, cwd=options.grammar_dir, check=True) + # Following are commented for future reference to expose playground # Remove "--docker" if local environment matches with the container # subprocess.run(["npx", "tree-sitter-cli@" + value_from_dotenv("NPM_TREE_SITTER_CLI_VERSION"), "build-wasm", "--docker"], @@ -1163,12 +1172,16 @@ def flow_build(options): def build_library(output_path, input_files): """ - Run `python3 -m pip install -e . --user --break-system-packages` - in grammar_dir to install the tree-sitter language package + Build and install the tree-sitter language package """ try: + if "VIRTUAL_ENV" in os.environ: + cmd = ["python3", "-m", "pip", "install", "-e", "."] + else: + cmd = ["python3", "-m", "pip", "install", "-e", ".", "--user", "--break-system-packages"] + print("{}: {}".format(options.script, " ".join(cmd))) subprocess.run( - ["python3", "-m", "pip", "install", "-e", ".", "--user", "--break-system-packages"], + cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, From 39dbefdfd01d7b787283d2ef47aa1a45f7a5c684 Mon Sep 17 00:00:00 2001 From: David Neto Date: Thu, 17 Oct 2024 18:01:45 -0400 Subject: [PATCH 226/285] avoid excessive rebuild and install of treesitter WGSL grammar (#4929) * Avoid excessive rebuild and install of treesitter WGSL grammar Use a grammar/build.stamp file to mark successful install of the package Update scripts to avoid references to the .so file, since now we rely on installing the treesitter parser into the Python environment. * The build flow should always run tree-sitter generate This avoids the problem of missing grammar/bindings/python/tree_sitter_wgsl/binding.c --- wgsl/Makefile | 15 +++++++++------ wgsl/tools/extract-grammar.py | 30 +++++++++++++++++------------- wgsl/tools/wgsl_unit_tests.py | 8 ++------ 3 files changed, 28 insertions(+), 25 deletions(-) diff --git a/wgsl/Makefile b/wgsl/Makefile index eaa7854e96..7e0e28714a 100644 --- a/wgsl/Makefile +++ b/wgsl/Makefile @@ -27,18 +27,21 @@ img/%.mmd.svg: diagrams/%.mmd ../tools/invoke-mermaid.sh ../tools/mermaid.json bash ../tools/invoke-mermaid.sh -i $< -o $@ TREESITTER_GRAMMAR_INPUT := grammar/grammar.js -TREESITTER_PARSER := grammar/src/scanner.o + +# A file used to signal the WGSL parser was successfully installed. +TREESITTER_PARSER_STAMP := grammar/build.stamp + # Extract WGSL grammar from the spec $(TREESITTER_GRAMMAR_INPUT): index.bs ./grammar/src/scanner.c ./tools/extract-grammar.py source ../tools/custom-action/dependency-versions.sh && python3 ./tools/extract-grammar.py --spec index.bs --scanner ./grammar/src/scanner.c --tree-sitter-dir grammar --flow x # Build a Treesitter parser to validate grammar extract and later examples in spec -$(TREESITTER_PARSER): $(TREESITTER_GRAMMAR_INPUT) +$(TREESITTER_PARSER_STAMP): $(TREESITTER_GRAMMAR_INPUT) source ../tools/custom-action/dependency-versions.sh && python3 ./tools/extract-grammar.py --spec index.bs --scanner ./grammar/src/scanner.c --tree-sitter-dir grammar --flow b .PHONY: validate-examples # Use Treesitter to parse many code examples in the spec. -validate-examples: $(TREESITTER_PARSER) +validate-examples: $(TREESITTER_PARSER_STAMP) source ../tools/custom-action/dependency-versions.sh && python3 ./tools/extract-grammar.py --flow e .PHONY: tspath_tests @@ -47,12 +50,12 @@ tspath_tests: ./tools/TSPath.py .PHONY: unit_tests # Use Treesitter to parse code samples -unit_tests: $(TREESITTER_PARSER) ./tools/wgsl_unit_tests.py - python3 ./tools/wgsl_unit_tests.py --parser $(TREESITTER_PARSER) +unit_tests: $(TREESITTER_PARSER_STAMP) ./tools/wgsl_unit_tests.py + python3 ./tools/wgsl_unit_tests.py # The grammar in JSON form, emitted by Treesitter. WGSL_GRAMMAR=grammar/src/grammar.json -$(WGSL_GRAMMAR) : $(TREESITTER_PARSER) +$(WGSL_GRAMMAR) : $(TREESITTER_PARSER_STAMP) .PHONY: nfkc nfkc: diff --git a/wgsl/tools/extract-grammar.py b/wgsl/tools/extract-grammar.py index dac58d6ef0..bff1ba31f6 100755 --- a/wgsl/tools/extract-grammar.py +++ b/wgsl/tools/extract-grammar.py @@ -1139,6 +1139,9 @@ def flow_build(options): print("missing grammar file: {}") return False + # The 'build.stamp' file is touched when the parser is successfully installed. + stampfile = os.path.join(options.grammar_dir, 'build.stamp') + # External scanner for nested block comments # For the API, see https://tree-sitter.github.io/tree-sitter/creating-parsers#external-scanners # See: https://github.com/tree-sitter/tree-sitter-rust/blob/master/src/scanner.c @@ -1146,12 +1149,10 @@ def flow_build(options): os.makedirs(os.path.join(options.grammar_dir, "src"), exist_ok=True) scanner_cc_staging = os.path.join(options.grammar_dir, "src", "scanner.c") - if os.path.exists("grammar/src/tree_sitter/parser.h"): - print("{}: skipping tree-sitter generate: grammar/src/tree_sitter/parser.h already exists".format(options.script)) - else: - cmd = ["npx", "tree-sitter-cli@" + value_from_dotenv("NPM_TREE_SITTER_CLI_VERSION"), "generate"] - print("{}: {}".format(options.script, " ".join(cmd))) - subprocess.run(cmd, cwd=options.grammar_dir, check=True) + + cmd = ["npx", "tree-sitter-cli@" + value_from_dotenv("NPM_TREE_SITTER_CLI_VERSION"), "generate"] + print("{}: {}".format(options.script, " ".join(cmd))) + subprocess.run(cmd, cwd=options.grammar_dir, check=True) # Use "npm install" to create the tree-sitter CLI that has WGSL # support. But "npm install" fetches data over the network. @@ -1170,7 +1171,7 @@ def flow_build(options): # subprocess.run(["npx", "tree-sitter-cli@" + value_from_dotenv("NPM_TREE_SITTER_CLI_VERSION"), "build-wasm", "--docker"], # cwd=options.grammar_dir, check=True) - def build_library(output_path, input_files): + def build_library(input_files): """ Build and install the tree-sitter language package """ @@ -1188,6 +1189,9 @@ def build_library(output_path, input_files): text=True, cwd=options.grammar_dir ) + + with open(stampfile, 'w') as f: + print("created file: {}".format(stampfile)) print("Tree-sitter language package installed successfully.") except subprocess.CalledProcessError as e: print(f"Error installing tree-sitter language package: {e}") @@ -1195,11 +1199,11 @@ def build_library(output_path, input_files): print(f"stderr: {e.stderr}") return False - if newer_than(scanner_cc_staging, options.wgsl_shared_lib) or newer_than(options.grammar_filename,options.wgsl_shared_lib): - print("{}: ...Building custom scanner: {}".format(options.script,options.wgsl_shared_lib)) - build_library(options.wgsl_shared_lib, - [scanner_cc_staging, - os.path.join(options.grammar_dir,"src","parser.c")]) + if newer_than(scanner_cc_staging, stampfile) or newer_than(options.grammar_filename,stampfile): + print("{}: ...Building custom scanner".format(options.script)) + build_library([scanner_cc_staging, os.path.join(options.grammar_dir,"src","parser.c")]) + else: + print("{}: ...Skip building tree_sitter_wgsl: grammar/build.stamp is fresh".format(options.script)) return True def flow_examples(options,scan_result): @@ -1342,7 +1346,7 @@ def main(): return 1 if 't' in args.flow: import wgsl_unit_tests - test_options = wgsl_unit_tests.Options(options.wgsl_shared_lib) + test_options = wgsl_unit_tests.Options() if not wgsl_unit_tests.run_tests(test_options): return 1 return 0 diff --git a/wgsl/tools/wgsl_unit_tests.py b/wgsl/tools/wgsl_unit_tests.py index 6f42ebefbf..1f01a5adeb 100644 --- a/wgsl/tools/wgsl_unit_tests.py +++ b/wgsl/tools/wgsl_unit_tests.py @@ -92,8 +92,7 @@ def GetCases(): return cases class Options: - def __init__(self,shared_lib): - self.shared_lib = shared_lib + def __init__(self): self.verbose = False def run_tests(options): @@ -131,12 +130,9 @@ def main(): argparser.add_argument("--verbose","-v", action='store_true', help="be verbose") - argparser.add_argument("--parser", - help="path the shared library for the WGSL tree-sitter parser", - default="grammar/build/wgsl.so") args = argparser.parse_args() - options = Options(args.parser) + options = Options() options.verbose = args.verbose if not run_tests(options): From 91f09120cc6f1168069ce7d2d824ab6e4526f9f0 Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Thu, 17 Oct 2024 21:03:18 -0400 Subject: [PATCH 227/285] Move specs to CG-DRAFT status (#4931) This should fix the build for now. See #4924 --- spec/index.bs | 2 +- wgsl/index.bs | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index a3683581a1..f1025b66d4 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -2,7 +2,7 @@ Title: WebGPU Shortname: webgpu Level: None -Status: w3c/ED +Status: w3c/CG-DRAFT Group: webgpu ED: https://gpuweb.github.io/gpuweb/ TR: https://www.w3.org/TR/webgpu/ diff --git a/wgsl/index.bs b/wgsl/index.bs index b9310e9f33..936eefbd5e 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -2,7 +2,7 @@ Title: WebGPU Shading Language Shortname: WGSL Level: None -Status: w3c/ED +Status: w3c/CG-DRAFT Group: webgpu ED: https://gpuweb.github.io/gpuweb/wgsl/ TR: https://www.w3.org/TR/WGSL/ From 0b4fc951daff199e56f93ade2d47ac7ad4103edd Mon Sep 17 00:00:00 2001 From: David Neto Date: Fri, 18 Oct 2024 11:15:57 -0400 Subject: [PATCH 228/285] wgsl: Define 'finite range' (#4930) --- wgsl/index.bs | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 936eefbd5e..d810217604 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -12235,20 +12235,28 @@ An [[!IEEE-754|IEEE-754]] binary floating point type approximates the [=extended * A fixed-width trailing significand field. * An integer-valued exponent bias related to interpretation of the [=ieee754/exponent field=]. +The finite range of a floating point type is the [=interval=] [|low|, |high|], +where |low| is the lowest finite value in the type, and |high| is the highest finite value in the type. + The IEEE-754 floating point types of interest are: * binary16: * [=ieee754/exponent field=] width 5 * [=ieee754/trailing significand field=] width 10 * [=ieee754/exponent bias=] 15 + * [=finite range=]: [−65504, 65504] * binary32: * [=ieee754/exponent field=] width 8 * [=ieee754/trailing significand field=] width 23 * [=ieee754/exponent bias=] 127 + * [=finite range=]: [ − (2 − 2−23) × 2127, (2 − 2−23) × 2127 ], + or approximately [ −3.4028235 × 1038, 3.4028235 × 1038 ]. * binary64: * [=ieee754/exponent field=] width 11 * [=ieee754/trailing significand field=] width 52 * [=ieee754/exponent bias=] 1023 + * [=finite range=]: [ − (2 − 2−52) × 21023, (2 − 2−52) × 21023 ], + or approximately [ − 1.7976931348623157 × 10308, 1.7976931348623157 × 10308 ]. The following algorithm maps a bit representation of a floating point value to its corresponding [=extended real=] value, or NaN:
@@ -12294,7 +12302,7 @@ The domain of a floating point operation is the set of [=extended rea Rounding maps an [=extended real=] value |x| to a value x' in the floating point type. When |x| is in the floating point type, then rounding maps |x| to itself: |x| = x'. -Rounding may [=ieee754/overflow=] when |x| is outside the finite range of the type. +Rounding may [=ieee754/overflow=] when |x| is outside the [=finite range=] of the type. Otherwise x' is the either the lowest floating point value above |x|, or the highest floating point value below |x|; a rounding mode determines which one is chosen. @@ -12313,7 +12321,7 @@ IEEE-754 defines five kinds of exceptions: * Division by zero. This occurs when an operation on finite operands is defined as having an exact infinite result. Examples are 1 ÷ 0, and log(0). -* Overflow. This occurs when an [=intermediate result=] exceeds the finite range of the type. See [[#floating-point-overflow]]. +* Overflow. This occurs when an [=intermediate result=] exceeds the [=finite range=] of the type. See [[#floating-point-overflow]]. * Underflow. This occurs when the [=intermediate result=] or the rounded result is [=ieee754/subnormal=]. * Inexact. This occurs when the rounded result is different from the [=intermediate result=], or when overflow occurs. @@ -12380,7 +12388,7 @@ Let *X* be an infinitely precise [=intermediate result=] from a floating point c The final value of the expression is determined in two stages, via [=intermediate result=] values *X'* and *X''* as follows: From *X*, compute *X'* in *T* by rounding: -* If *X* is in the finite range of *T* then *X'* is the result of rounding *X* up or down. +* If *X* is in the [=finite range=] of *T* then *X'* is the result of rounding *X* up or down. * If *X* is NaN, then *X'* is NaN. * If *MAX(T)* < *X* < 2*EMAX(T)+1*, then either rounding direction is used: *X'* is *MAX(T)* or [PINF]. * If 2*EMAX(T)+1* ≤ *X*, then *X'* = [PINF]. @@ -12417,7 +12425,7 @@ the correctly rounded result may be finite or infinite. The units in the last place, ULP, for a floating point number `x` is defined as follows [[!Muller2005]]: -* If `x` is in the finite range of the floating point type, then ULP(x) is +* If `x` is in the [=finite range=] of the floating point type, then ULP(x) is the minimum distance between two non-equal, finite floating point numbers `a` and `b` such that `a` ≤ `x` ≤ `b` (i.e. `ulp(x) = min``a,b``|b - a|`). @@ -12449,7 +12457,7 @@ possibilities: When the accuracy for an operation is specified over an input range, the accuracy is undefined for input values outside that range. -If an allowed result is outside the finite range of the result type, then +If an allowed result is outside the [=finite range=] of the result type, then the rules in [[#floating-point-overflow]] apply. #### Accuracy of Concrete Floating Point Expressions #### {#concrete-float-accuracy} @@ -12717,7 +12725,7 @@ When converting a [=numeric scalar=] value to a floating point type: then the result is one of those two values. WGSL does not specify whether the larger or smaller representable value is chosen, and different instances of such a conversion may choose differently. - * Otherwise, the original value lies outside the finite range of the destination type: + * Otherwise, the original value lies outside the [=finite range=] of the destination type: * A [=shader-creation error=] results if the original expression is a [=const-expression=]. * A [=pipeline-creation error=] results if the original expression is an [=override-expression=]. * Otherwise the conversion proceeds as follows: @@ -15591,7 +15599,7 @@ but a value may infer the type. a [[!IEEE-754|IEEE-754]] [=ieee754/binary16=] value, and then converted back to a IEEE-754 [=ieee754/binary32=] value. - If `e` is outside the finite range of binary16, then: + If `e` is outside the [=finite range=] of binary16, then: * It is a [=shader-creation error=] if `e` is a [=const-expression=]. * It is a [=pipeline-creation error=] if `e` is an [=override-expression=]. * Otherwise the result is an [=indeterminate value=] for `T`. @@ -18239,7 +18247,7 @@ Note: For packing snorm values, the normalized floating point values are in the 16 × `i` + 15 of the result. See [[#floating-point-conversion]]. - If either `e[0]` or `e[1]` is outside the finite range of binary16 then: + If either `e[0]` or `e[1]` is outside the [=finite range=] of binary16 then: * It is a [=shader-creation error=] if `e` is a [=const-expression=]. * It is a [=pipeline-creation error=] if `e` is an [=override-expression=]. * Otherwise the result is an [=indeterminate value=] for u32. From 1e1f0e068a6a9db2ebe93384d3e6756c875e03e6 Mon Sep 17 00:00:00 2001 From: Mehmet Oguz Derin Date: Sat, 19 Oct 2024 00:16:17 +0900 Subject: [PATCH 229/285] Similarity of variable and value declarations is a non-normative note (#4932) --- wgsl/index.bs | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index d810217604..c05699e5d5 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -4617,7 +4617,13 @@ declaration until the end of the brace-delimited list of statements immediately enclosing the declaration. A function-scope declaration is a [=dynamic context=]. -Variable and value declarations have a similar overall syntax: +
+Variable and value declarations have a similar overall syntax. The following non-normative +illustration shows the general form of variable and value declarations, where `[...]` denotes +optional parts, `...*` denotes zero or more repetitions of the preceding, and `...+` denotes one or +more repetitions of the preceding. For specific syntactic rules, see the respective sections for +the elements. + // Specific value declarations. const name [: type] = initializer ; @@ -4639,6 +4645,7 @@ Variable and value declarations have a similar overall syntax: [attribute]+ var name : sampler_type; [attribute]+ var<storage[, access_mode]> name : type; +
Each such declaration [=shader-creation error|must=] have an explicitly specified type or an initializer. From a7b9e9dfbf704db95a893c8dcf873e1a27760292 Mon Sep 17 00:00:00 2001 From: alan-baker Date: Fri, 18 Oct 2024 13:29:52 -0400 Subject: [PATCH 230/285] Require delta in up/down shuffle to be dynamically uniform (#4917) * Required by MSL --- proposals/subgroups.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/subgroups.md b/proposals/subgroups.md index dc9a9135f3..82b50c8f0e 100644 --- a/proposals/subgroups.md +++ b/proposals/subgroups.md @@ -95,8 +95,8 @@ Using f16 as a parameter in any of these functions requires `subgroups_f16` to b | `fn subgroupBallot(pred : bool) -> vec4` | | Returns a set of bitfields where the bit corresponding to subgroup_invocation_id is 1 if `pred` is true for that active invocation and 0 otherwise. | | `fn subgroupShuffle(v : T, id : I) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types
`I` must be u32 or i32 | Returns `v` from the active invocation whose subgroup_invocation_id matches `id` | | `fn subgroupShuffleXor(v : T, mask : u32) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Returns `v` from the active invocation whose subgroup_invocation_id matches `subgroup_invocation_id ^ mask`.
`mask` must be dynamically uniform1 | -| `fn subgroupShuffleUp(v : T, delta : u32) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Returns `v` from the active invocation whose subgroup_invocation_id matches `subgroup_invocation_id - delta` | -| `fn subgroupShuffleDown(v : T, delta : u32) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Returns `v` from the active invocation whose subgroup_invocation_id matches `subgroup_invocation_id + delta` | +| `fn subgroupShuffleUp(v : T, delta : u32) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Returns `v` from the active invocation whose subgroup_invocation_id matches `subgroup_invocation_id - delta`
`delta` must be dynamically uniform1 | +| `fn subgroupShuffleDown(v : T, delta : u32) -> T` | `T` must be u32, i32, f32, f16 or a vector of those types | Returns `v` from the active invocation whose subgroup_invocation_id matches `subgroup_invocation_id + delta`
`delta` must be dynamically uniform1 | | `fn subgroupAdd(e : T) -> T` | `T` must be u32, i32, f32, or a vector of those types | Reduction
Adds `e` among all active invocations and returns that result | | `fn subgroupExclusiveAdd(e : T) -> T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Exclusive scan
Returns the sum of `e` for all active invocations with subgroup_invocation_id less than this invocation | | `fn subgroupInclusiveAdd(e : T) -> T)` | `T` must be u32, i32, f32, f16 or a vector of those types | Inclusive scan
Returns the sum of `e` for all active invocations with subgroup_invocation_id less than or equal to this invocation | From 6e41932ccacfcd2609232eacc2974c6dd97d356b Mon Sep 17 00:00:00 2001 From: alan-baker Date: Fri, 18 Oct 2024 14:57:05 -0400 Subject: [PATCH 231/285] Add an appendix for CTS status (#4923) * Tracks CTS status * Made discussion issue a link --- proposals/subgroups.md | 47 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/proposals/subgroups.md b/proposals/subgroups.md index 82b50c8f0e..4783a39753 100644 --- a/proposals/subgroups.md +++ b/proposals/subgroups.md @@ -2,9 +2,9 @@ Status: **Draft** -Last modified: 2023-11-07 +Last modified: 2024-10-16 -Issue: #4306 +Issue: [#4306](https://github.com/gpuweb/gpuweb/issues/4306) # Requirements @@ -252,3 +252,46 @@ D3D12 would have to be proven empricially. 1. All group non-uniform instructions use the `Subgroup` scope. 2. To avoid constant-expression requirement, use SPIR-V 1.5 or OpGroupNonUniformShuffle. + +# Appendix C: CTS Status + +Last updated: 2024-10-16 + +| Built-in value | Validation | Compute | Fragment | +| --- | --- | --- | --- | +| `subgroup_invocation_id` | ✓ | ✓ | ✓ | +| `subgroup_size` | ✓ | ✓ | ✓ | + +| Built-in function | Validation | Compute | Fragment | +| --- | --- | --- | --- | +| `subgroupElect` | ✓ | ✗ | ✗ | +| `subgroupAll` | ✓ | ✓ | ✓ | +| `subgroupAny` | ✓ | ✓ | ✓ | +| `subgroupBroadcast` | ✓ | ✓ | ✗ | +| `subgroupBroadcastFirst`1 | ✓ | ✗ | ✗ | +| `subgroupBallot` | ✓ | ✓ | ✗ | +| `subgroupShuffle` | ✓ | ✗ | ✗ | +| `subgroupShuffleXor` | ✓ | ✗ | ✗ | +| `subgroupShuffleUp` | ✓ | ✗ | ✗ | +| `subgroupShuffleDown` | ✓ | ✗ | ✗ | +| `subgroupAdd` | ✓ | ✓ | ✗ | +| `subgroupExclusiveAdd` | ✓ | ✓ | ✗ | +| `subgroupInclusiveAdd` | ✓ | ✓ | ✗ | +| `subgroupMul` | ✓ | ✓ | ✗ | +| `subgroupExclusiveMul` | ✓ | ✓ | ✗ | +| `subgroupInclusiveMul` | ✓ | ✓ | ✗ | +| `subgroupAnd` | ✓ | ✓ | ✓ | +| `subgroupOr` | ✓ | ✓ | ✓ | +| `subgroupXor` | ✓ | ✓ | ✓ | +| `subgroupMin` | ✓ | ✗ | ✗ | +| `subgroupMax` | ✓ | ✗ | ✗ | +| `quadBroadcast` | ✓ | ✓ | ✓ | +| `quadSwapX` | ✓ | ✓ | ✓ | +| `quadSwapY` | ✓ | ✓ | ✓ | +| `quadSwapDiagonal` | ✓ | ✓ | ✓ | +1. Indirectly tested via other built-in functions. + +| Diagnostic | Validation | +| --- | --- | +| `subgroup_uniformity` | ✗ | +| `subgroup_branching` | ✗ | From 86c3257e1f264d17f0881c8bfb087fddd9de22aa Mon Sep 17 00:00:00 2001 From: David Neto Date: Fri, 18 Oct 2024 16:46:35 -0400 Subject: [PATCH 232/285] wgsl: Remove [INF] macro and its uses (#4933) The IEEE spec uses ∞ by itself to indicate either positive or negative infinity. The WGSL spec was using it to mean positive infinity; replace those uses by [PINF]. --- wgsl/index.bs | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index c05699e5d5..7a8bbe0743 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -19,7 +19,6 @@ Text Macro: ALLINTEGRALDECL S is AbstractInt, i32, or u32
T is S or vecN<S Text Macro: ALLFLOATINGDECL S is AbstractFloat, f32, or f16
T is S or vecN<S> Text Macro: ALLNUMERICDECL S is AbstractInt, AbstractFloat, i32, u32, f32, or f16
T is S, or vecN<S> Text Macro: ALLSIGNEDNUMERICDECL S is AbstractInt, AbstractFloat, i32, f32, or f16
T is S, or vecN<S> -Text Macro: INF ∞ Text Macro: PINF +∞ Text Macro: NINF −∞ Ignored Vars: i, c0, e, e1, e2, e3, edge, eN, p, s1, s2, sn, AS, AM, N, newbits, M, C, R, v, Stride, Offset, Align, Extent, T, T1, E, S, F, x, y, a, b @@ -500,7 +499,7 @@ sense. Specifically: Then the area |a| is a hyperbolic angle such that |x| is the hyperbolic cosine of |a|, and |y| is the hyperbolic sine of |a|. -Positive infinity, denoted by ∞ or [PINF], is a unique value strictly greater than all real numbers. +Positive infinity, denoted by [PINF], is a unique value strictly greater than all real numbers. Negative infinity, denoted by [NINF], is a unique value strictly lower than all real numbers. @@ -12324,7 +12323,7 @@ IEEE-754 defines five kinds of exceptions: * Invalid operation. This occurs when an operation is evaluated on [=extended real=] inputs outside its [=domain=]. Such operations yield a NaN. - Examples of invalid operations are 0 × [INF], and `sqrt`(−1). + Examples of invalid operations are 0 × [PINF], and `sqrt`(−1). * Division by zero. This occurs when an operation on finite operands is defined as having an exact infinite result. Examples are 1 ÷ 0, and log(0). @@ -14135,7 +14134,7 @@ fn num_point_lights() -> u32 { Description Returns the inverse hyperbolic cosine (cosh-1) of `x`, as a [=hyperbolic angle=].
- That is, approximates `a` with 0 ≤ a ≤ [INF], such that `cosh`(`a`) = `x`. + That is, approximates `a` with 0 ≤ a ≤ [PINF], such that `cosh`(`a`) = `x`. [=Component-wise=] when `T` is a vector. From 23671a8e292d712a08a02f4eea5bc11a067e1155 Mon Sep 17 00:00:00 2001 From: David Neto Date: Fri, 18 Oct 2024 17:48:43 -0400 Subject: [PATCH 233/285] wgsl: rebuild grammar module only if grammar contents has changed (#4934) This allows even less rebuilding. Most edits to index.bs don't change the grammar. --- wgsl/tools/extract-grammar.py | 36 +++++++++++++++++++++++++++-------- 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/wgsl/tools/extract-grammar.py b/wgsl/tools/extract-grammar.py index bff1ba31f6..6ea200db16 100755 --- a/wgsl/tools/extract-grammar.py +++ b/wgsl/tools/extract-grammar.py @@ -1131,14 +1131,29 @@ def not_token_only(value): def flow_build(options): """ - Build the shared library for the custom tree-sitter scanner. + Build and install the tree_sitter_wgsl Python module, including the custom scanner """ - print("{}: Build...".format(options.script)) + print("{}: Build tree_sitter_wgsl...".format(options.script)) if not os.path.exists(options.grammar_filename): print("missing grammar file: {}") return False + # Only rebuild if the grammar has changed. + grammar_is_fresh = True + with open(options.grammar_filename,"r") as current_file: + current_lines = current_file.readlines() + previously_scanned_grammar_file = options.grammar_filename + ".pre" + if os.path.exists(previously_scanned_grammar_file): + # Check against previously scanned text + with open(previously_scanned_grammar_file,"r") as previous_file: + previous_lines = previous_file.readlines() + grammar_is_fresh = current_lines != previous_lines + + if not grammar_is_fresh: + print("{}: ...Skip rebuilding because the grammar has not changed".format(options.script)) + return True + # The 'build.stamp' file is touched when the parser is successfully installed. stampfile = os.path.join(options.grammar_dir, 'build.stamp') @@ -1151,7 +1166,7 @@ def flow_build(options): cmd = ["npx", "tree-sitter-cli@" + value_from_dotenv("NPM_TREE_SITTER_CLI_VERSION"), "generate"] - print("{}: {}".format(options.script, " ".join(cmd))) + print("{}: {}".format(options.script, " ".join(cmd))) subprocess.run(cmd, cwd=options.grammar_dir, check=True) # Use "npm install" to create the tree-sitter CLI that has WGSL @@ -1159,11 +1174,11 @@ def flow_build(options): # That can be flaky, so only invoke it when needed. if os.path.exists("grammar/node_modules/tree-sitter-cli"): # "npm install" has been run already. - print("{}: skipping npm install: grammar/node_modules/tree-sitter-cli already exists".format(options.script)) + print("{}: skipping npm install: grammar/node_modules/tree-sitter-cli already exists".format(options.script)) pass else: cmd = ["npm", "install"] - print("{}: {}".format(options.script, " ".join(cmd))) + print("{}: {}".format(options.script, " ".join(cmd))) subprocess.run(cmd, cwd=options.grammar_dir, check=True) # Following are commented for future reference to expose playground @@ -1180,7 +1195,7 @@ def build_library(input_files): cmd = ["python3", "-m", "pip", "install", "-e", "."] else: cmd = ["python3", "-m", "pip", "install", "-e", ".", "--user", "--break-system-packages"] - print("{}: {}".format(options.script, " ".join(cmd))) + print("{}: {}".format(options.script, " ".join(cmd))) subprocess.run( cmd, check=True, @@ -1190,11 +1205,16 @@ def build_library(input_files): cwd=options.grammar_dir ) + # Save the grammar contents for comparing against next time. + with open(previously_scanned_grammar_file,"w") as previous_file: + for line in current_lines: + previous_file.write(line) + previous_file.close() with open(stampfile, 'w') as f: print("created file: {}".format(stampfile)) - print("Tree-sitter language package installed successfully.") + print("{}: ...Successfully built and installed tree_sitter_wgsl.".format(options.script)) except subprocess.CalledProcessError as e: - print(f"Error installing tree-sitter language package: {e}") + print(f"Error installing tree_sitter_wgsl language package: {e}") print(f"stdout: {e.stdout}") print(f"stderr: {e.stderr}") return False From 485366ec66c69eec4e041d403a73022c040c29ee Mon Sep 17 00:00:00 2001 From: fyellin Date: Thu, 24 Oct 2024 20:40:59 -0700 Subject: [PATCH 234/285] Fix typo x4 (#4936) verticies => vertices --- spec/index.bs | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index f1025b66d4..2d7573d48a 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -12352,7 +12352,7 @@ It must only be included by interfaces which also include those mixins. [=Queue timeline=] steps: 1. Draw |instanceCount| instances, starting with instance |firstInstance|, of - primitives consisting of |vertexCount| verticies, starting with vertex |firstVertex|, + primitives consisting of |vertexCount| vertices, starting with vertex |firstVertex|, with the states from |bindingState| and |renderState|.
@@ -12413,7 +12413,7 @@ It must only be included by interfaces which also include those mixins. [=Queue timeline=] steps: 1. Draw |instanceCount| instances, starting with instance |firstInstance|, of - primitives consisting of |indexCount| indexed verticies, starting with index + primitives consisting of |indexCount| indexed vertices, starting with index |firstIndex| from vertex |baseVertex|, with the states from |bindingState| and |renderState|.
@@ -12499,7 +12499,7 @@ It must only be included by interfaces which also include those mixins. 1. Let |firstInstance| be an unsigned 32-bit integer read from |indirectBuffer| at (|indirectOffset| + 12) bytes. 1. Draw |instanceCount| instances, starting with instance |firstInstance|, of - primitives consisting of |vertexCount| verticies, starting with vertex |firstVertex|, + primitives consisting of |vertexCount| vertices, starting with vertex |firstVertex|, with the states from |bindingState| and |renderState|.
@@ -12583,7 +12583,7 @@ It must only be included by interfaces which also include those mixins. 1. Let |firstInstance| be an unsigned 32-bit integer read from |indirectBuffer| at (|indirectOffset| + 16) bytes. 1. Draw |instanceCount| instances, starting with instance |firstInstance|, of - primitives consisting of |indexCount| indexed verticies, starting with index + primitives consisting of |indexCount| indexed vertices, starting with index |firstIndex| from vertex |baseVertex|, with the states from |bindingState| and |renderState|. From fc8eddfa56ac54c438e1562a49ce7768ee6ae4cd Mon Sep 17 00:00:00 2001 From: David Neto Date: Mon, 28 Oct 2024 16:15:05 -0400 Subject: [PATCH 235/285] wgsl: cleanup uses of "infinity" in "differences from IEEE" and fp conversion sections (#4935) * wgsl: cleanup uses of "infinity" in "differences from IEEE" and fp conversion sections "conversion" -> "scalar conversion" and fix crossreferences Disentangle the shader-creation and pipeline-creation errors from the semantics. Define "Finite math assumption". Rephrase floating point conversion for infinities and shader and pipeline errors. Make it a real "Algorithm" section with clickable variable references. Fixed: #3135 * Apply review feedback. - generalize the second bullet about when scalar conversions occur. - using more precise constants for floating point type minima and maxima --- wgsl/index.bs | 127 +++++++++++++++++++++++++++++--------------------- 1 file changed, 75 insertions(+), 52 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 7a8bbe0743..2b6936c85c 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -2748,6 +2748,13 @@ The numeric scalar types are [=AbstractInt=], The integer scalar types are [=AbstractInt=], [=i32=], and [=u32=]. +A scalar conversion maps a value in one scalar type to a value in a different scalar type. +Generally the result value is close to the original value, within the limitations of the destination type. +Scalar conversions occur either: +* By explicitly invoking a [[#value-constructor-builtin-function|value constructor]], or +* When converting a [=const-expression=] in an [=abstract numeric type=] to another type, + via a [=feasible automatic conversion=]. + ### Vector Types ### {#vector-types} A vector is a grouped sequence of 2, 3, or 4 [=scalar=] @@ -12231,7 +12238,7 @@ An [[!IEEE-754|IEEE-754]] binary floating point type approximates the [=extended See [[#differences-from-ieee754]]. * The type supports operations including: * Basic arithmetic such as: addition (`+`), subtraction (`-`), mutiplication (`*`), and division (`/`). - * Conversion to and from other numeric types. + * [=Scalar conversion=] to and from other numeric types. * Built-in functions such as: [[#max-float-builtin|max]], [[#sqrt-builtin|sqrt]], [[#cos-builtin|cos]] *
Note: Infinities are ordinary participants in most operations. For example adding [PINF] to 5 produces [PINF].
* The type has a bit representation characterized by: @@ -12256,13 +12263,13 @@ The IEEE-754 floating point types of interest are: * [=ieee754/trailing significand field=] width 23 * [=ieee754/exponent bias=] 127 * [=finite range=]: [ − (2 − 2−23) × 2127, (2 − 2−23) × 2127 ], - or approximately [ −3.4028235 × 1038, 3.4028235 × 1038 ]. + or approximately [ − 3.4028235 × 1038, 3.4028235 × 1038 ]. * binary64: * [=ieee754/exponent field=] width 11 * [=ieee754/trailing significand field=] width 52 * [=ieee754/exponent bias=] 1023 * [=finite range=]: [ − (2 − 2−52) × 21023, (2 − 2−52) × 21023 ], - or approximately [ − 1.7976931348623157 × 10308, 1.7976931348623157 × 10308 ]. + or approximately [ − 1.798 × 10308, 1.798 × 10308 ]. The following algorithm maps a bit representation of a floating point value to its corresponding [=extended real=] value, or NaN:
@@ -12300,9 +12307,9 @@ The following algorithm maps a bit representation of a floating point value to i The domain of a floating point operation is the set of [=extended real=] number inputs for which the operation is well defined. * For example, the domain of the mathematical function √ is the interval [0,[PINF]]: √ is not well defined for inputs less than zero. -* When applied to an input *inside* its [=domain=], an operation is defined in terms of an infinitely precise [=extended real=] intermediate result, +* When evaluated *inside* its [=domain=], an operation is defined in terms of an infinitely precise [=extended real=] intermediate result, which is then converted to a floating point result, via [=rounding=]. -* When an operation is evaluated *outside* its [=domain=], +* When evaluated *outside* its [=domain=], the default exception handling rules of IEEE-754 require an implementation to generate an [=ieee754/exception=] and yield a [=NaN=] value. In contrast, WGSL does not mandate floating point exceptions, and may instead yield an [=indeterminate value=]. See [[#differences-from-ieee754]]. @@ -12327,7 +12334,7 @@ IEEE-754 defines five kinds of exceptions: * Division by zero. This occurs when an operation on finite operands is defined as having an exact infinite result. Examples are 1 ÷ 0, and log(0). -* Overflow. This occurs when an [=intermediate result=] exceeds the [=finite range=] of the type. See [[#floating-point-overflow]]. +* Overflow. This occurs when an [=intermediate result=] exceeds the [=finite range=] of the type. See [[#floating-point-rounding-and-overflow]]. * Underflow. This occurs when the [=intermediate result=] or the rounded result is [=ieee754/subnormal=]. * Inexact. This occurs when the rounded result is different from the [=intermediate result=], or when overflow occurs. @@ -12336,34 +12343,38 @@ IEEE-754 defines five kinds of exceptions: WGSL follows the [[!IEEE-754|IEEE-754]] standard, but with the following differences: * No [=ieee754/rounding mode=] is specified. An implementation may round an [=intermediate result=] up or down. +* When [=scalar conversion|converting=] a floating point value |x| to an integer type, + |x| is first clamped to the value range of the target type. See [[#floating-point-conversion]]. * No floating point [=ieee754/exceptions=] are generated. * A floating point operation in WGSL [=behavioral requirement|will=] produce an [=intermediate result=] according to IEEE-754 rules, but exceptions mandated by IEEE-754 will map to different behaviors depending on whether the expression is a [=const-expression=], an [=override-expression=], or a [=runtime expression=]. * Consider an operation on finite operands. - The operation produces overflow, infinity, or a NaN if and only if IEEE-754 would require the + The [=intermediate result=] for an operation produces overflow, infinity, or a NaN if and only if IEEE-754 would require the operation to signal an [=ieee754/overflow=], [=ieee754/invalid operation=], or [=ieee754/division by zero=] exception. + Behavior is further modified by the [=Finite Math Assumption=]. * Signaling NaNs may not be generated. - Any signaling NaN may be converted to a quiet NaN. -* Overflow, infinities, and NaNs generated before [=shader execution start|runtime=] [=behavioral requirement|will=] generate errors. - * [=Const-expressions=] and [=override-expressions=] over finite values - [=behavioral requirement|will=] generate overflow, infinities, and NaNs - as [=intermediate result=] values, following IEEE-754 rules. - * Note: This rule requires implementations to reliably detect overflow, infinities, and NaNs - to within accuracy limits for these kinds of expressions, so that errors can be generated consistently. - * A [=shader-creation error=] results if any [=const-expression=] of - floating-point type overflows or evaluates to NaN or infinity. - * A [=pipeline-creation error=] results if any [=override-expression=] of - floating-point type overflows or evaluates to NaN or infinity. -* Implementations may assume that overflow, infinities, and NaNs are not present at runtime. - * In such an implementation, if the [=intermediate result=] of evaluating a [=runtime expression=] overflows, - or yields infinity or a NaN, the final result [=behavioral requirement|will=] be - an [=indeterminate value=] of the target type. - * Note: This means some functions (e.g. `min` and `max`) - may not return the expected result due to optimizations about the presence - of NaNs and infinities. -* Implementations may ignore the [=ieee754/sign field=] of a zero. + In an intermediate calculation, any signaling NaN may be converted to a quiet NaN. +* Finite Math Assumption: + * [=ieee754/Overflow=], infinities, and NaNs generated before [=shader execution start|shader execution=] [=behavioral requirement|will=] generate errors. + * [=Const-expressions=] and [=override-expressions=] over finite values + [=behavioral requirement|will=] generate overflow, infinities, and NaNs + as [=intermediate result=] values, following IEEE-754 rules. + * Note: This rule requires implementations to reliably detect overflow, infinities, and NaNs + to within accuracy limits for these kinds of expressions, so that errors can be generated consistently. + * A [=shader-creation error=] results if any [=const-expression=] of + floating-point type overflows or evaluates to NaN or infinity. + * A [=pipeline-creation error=] results if any [=override-expression=] of + floating-point type overflows or evaluates to NaN or infinity. + * Implementations may assume that overflow, infinities, and NaNs are not present during [=shader execution start|shader execution=]. + * In such an implementation, if the [=intermediate result=] of evaluating a [=runtime expression=] overflows, + or yields an infinity or a NaN, the final result [=behavioral requirement|will=] be + an [=indeterminate value=] of the target type. + * Note: This means some functions (e.g. `min` and `max`) + may not return the expected result due to optimizations about the presence + of NaNs and infinities. +* Implementations may ignore the [=ieee754/sign field=] of a floating point zero value. That is, a zero with a positive sign may behave like a zero a with a negative sign, and vice versa. * To flush to zero is to replace a [=ieee754/subnormal=] value for a floating point type with a zero value of that type. @@ -12379,12 +12390,12 @@ WGSL follows the [[!IEEE-754|IEEE-754]] standard, but with the following differe For example the WGSL [[#fma-builtin]] function may expand to an ordinary multiply (including a rounding step) and an add (and another rounding step), while the IEEE-754 `fusedMultiplyAdd` operation requires that only final rounding step occurs. -### Floating Point Overflow ### {#floating-point-overflow} +### Floating Point Rounding and Overflow ### {#floating-point-rounding-and-overflow} Overflowing computations can round to infinity or to the nearest finite value. -The outcome depends on the magnitude of the overflowing value and on whether -evaluation occurs during shader execution. +The outcome depends on the magnitude of the overflowing [=intermediate result=] value and on whether +evaluation occurs during [=shader module creation=], [=pipeline creation=], or during [=shader execution start|shader execution=]. For a floating point type *T*, define *MAX(T)* as the largest positive finite value of *T*, and 2*EMAX(T)* as the largest power of 2 representable by *T*. @@ -12398,13 +12409,13 @@ From *X*, compute *X'* in *T* by rounding: * If *X* is NaN, then *X'* is NaN. * If *MAX(T)* < *X* < 2*EMAX(T)+1*, then either rounding direction is used: *X'* is *MAX(T)* or [PINF]. * If 2*EMAX(T)+1* ≤ *X*, then *X'* = [PINF]. - * Note: This matches the [[!IEEE-754|IEEE-754]] rule. + * Note: This clause matches the [[!IEEE-754|IEEE-754]] rule. * If −*MAX(T)* > *X* > −2*EMAX(T)+1*, then either rounding direction is used: *X'* is −*MAX(T)* or [NINF]. * If −2*EMAX(T)+1* ≥ *X*, then *X'* = [NINF]. - * Note: This matches the IEEE-754 rule. + * Note: This clause matches the IEEE-754 rule. From *X'*, compute the final value of the expression, *X''*, or detect a program error: -* If *X'* is infinity or NaN, then: +* If *X'* is an infinity or NaN, then by the [=Finite Math Assumption=]: * If the expression is a [=const-expression=], generate a [=shader-creation error=]. * If the expression is a [=override-expression=], generate a [=pipeline-creation error=]. * Otherwise the expression is a [=runtime expression=] and *X''* is an [=indeterminate value=]. @@ -12464,7 +12475,7 @@ When the accuracy for an operation is specified over an input range, the accuracy is undefined for input values outside that range. If an allowed result is outside the [=finite range=] of the result type, then -the rules in [[#floating-point-overflow]] apply. +the rules in [[#floating-point-rounding-and-overflow]] apply. #### Accuracy of Concrete Floating Point Expressions #### {#concrete-float-accuracy} @@ -12691,6 +12702,8 @@ than performing a multiply followed by an addition. ### Floating Point Conversion ### {#floating-point-conversion} +This section describes the details of a [=scalar conversion=] where either the source or destination is a floating point type. + In this section, a floating point type may be any of: * The [=f32=], [=f16=], and [=AbstractFloat=] types in WGSL. * A hypothetical type corresponding to a binary format defined by the [[!IEEE-754|IEEE-754]] @@ -12701,7 +12714,8 @@ Note: Recall that the [=f32=] WGSL type corresponds to the IEEE-754 [=ieee754/bi The scalar floating point to integral conversion algorithm is:
To convert a floating point scalar value |X| to an [=integer scalar=] type |T|: -* If the original value of |X| is exactly representable in the target type |T|, then the result is that value. +* If |X| is a [=NaN=], the result is an [=indeterminate value=] in |T|. +* If |X| is exactly representable in the target type |T|, then the result is that value. * Otherwise, the result is the value in |T| that is closest to [=truncate=](|X|).
@@ -12720,27 +12734,36 @@ but where [[!IEEE-754|IEEE-754]] mandates an invalid operation exception and a N + The numeric scalar conversion to floating point algorithm is: -
-When converting a [=numeric scalar=] value to a floating point type: -* If the original value is exactly representable in the destination type, then the result is that value. - * Additionally, if the original value is zero and of [=integer scalar=] type, then the resulting value has a zero sign bit. -* If the original value is a NaN for the source type, then the result is a NaN in the destination type. -* Otherwise, the original value is not exactly representable. - * If the original value is different from but lies between two adjacent finite values representable in the destination type, - then the result is one of those two values. - WGSL does not specify whether the larger or smaller representable +
+**Algorithm:** Numeric [=scalar conversion=] to floating point + +**Inputs:** +* |X|, a [=numeric scalar=] value of type |S| +* |T|, a destination floating point type. + +**Output:** XOut, the result of converting |X| to type |T|, or generate an error. + +**Procedure:** +* If |X| is a NaN for the source type |S|, then XOut is a NaN in type |T|. +* If |X| is exactly representable in the destination type |T|, then XOut is the value in |T| equal to |X|. + * Additionally, if |X| is zero and of [=integer scalar=] type, then XOut has a zero [=ieee754/sign field|sign bit=]. +* Otherwise, |X| is not exactly representable in |T|: + * If |X| lies between two adjacent finite values in |T|, + then XOut is one of those two values. + WGSL does not specify whether the higher or lower representable value is chosen, and different instances of such a conversion may choose differently. - * Otherwise, the original value lies outside the [=finite range=] of the destination type: - * A [=shader-creation error=] results if the original expression is a [=const-expression=]. - * A [=pipeline-creation error=] results if the original expression is an [=override-expression=]. + * Otherwise, |X| lies outside the [=finite range=] of the destination type: + * A [=shader-creation error=] results if the expression for |X| is a [=const-expression=]. + * A [=pipeline-creation error=] results if the expression for |X| is an [=override-expression=]. * Otherwise the conversion proceeds as follows: - 1. Set |X| to the original value. - 2. If the source type is a floating point type with more significand bits than the destination type, - the extra significand bits of the source value *may* be discarded (i.e. treated as if they are 0). - Update |X| accordingly. - 3. If |X| is the most-positive or most-negative normal value of the destination type, then the result is |X|. - 4. Otherwise, the result is the infinity value of the destination type, with the same sign as |X|. + 1. Set X' to the original value |X|. + 2. If source type |S| is a floating point type with more significand bits than the destination type |T|, + the extra significand bits of the source value |X| *may* be discarded (i.e. treated as if they are 0). + Update X' accordingly. + 3. If X' is the most-positive or most-negative finite value of the destination type |T|, then set XOut = X'. + 4. Otherwise, set XOut to the infinity value of destination type |T|, with the same sign as X'.
From bd5b126fbd563b9813730a8756d9167897dff085 Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Mon, 28 Oct 2024 16:54:55 -0400 Subject: [PATCH 236/285] Allow out-of-memory errors in createTexture/createQuerySet (#4941) --- spec/index.bs | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 2d7573d48a..eddde137b9 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -3402,8 +3402,7 @@ The {{GPUBufferUsage}} flags determine how a {{GPUBuffer}} may be used after its 1. Create a device allocation for |b| where each byte is zero. If the allocation fails without side-effects, - [$generate an out-of-memory error$], - make [$invalidate$] |b|, and return. + [$generate an out-of-memory error$], [$invalidate$] |b|, and return. @@ -4273,6 +4272,12 @@ The {{GPUTextureUsage}} flags determine how a {{GPUTexture}} may be used after i 1. Set |t|.{{GPUTexture/[[viewFormats]]}} to |descriptor|.{{GPUTextureDescriptor/viewFormats}}. + + 1. Create a device allocation for |t| where each block has an + [=equivalent texel representation=] to a block with a bit representation of zero. + + If the allocation fails without side-effects, + [$generate an out-of-memory error$], [$invalidate$] |t|, and return. @@ -13711,6 +13716,11 @@ dictionary GPUQuerySetDescriptor - |this| must not be [$invalid|lost$]. - |descriptor|.{{GPUQuerySetDescriptor/count}} must be ≤ 4096. + + 1. Create a device allocation for |q| where each entry in the query set is zero. + + If the allocation fails without side-effects, + [$generate an out-of-memory error$], [$invalidate$] |q|, and return. From bb78f45260f2570943952e4b5cf3fd0bac8a0481 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Beaufort?= Date: Thu, 31 Oct 2024 23:35:17 +0100 Subject: [PATCH 237/285] Remove superfluous exposed on GPUDevice onuncapturederror (#4947) --- spec/index.bs | 1 - 1 file changed, 1 deletion(-) diff --git a/spec/index.bs b/spec/index.bs index eddde137b9..18b5e5158a 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -15040,7 +15040,6 @@ dictionary GPUUncapturedErrorEventInit : EventInit { From cfb491495f2759ade2eba186e2a23e67503eb654 Mon Sep 17 00:00:00 2001 From: David Neto Date: Thu, 31 Oct 2024 16:34:17 -0700 Subject: [PATCH 238/285] Clarify: texture+sampling validation applies to pairs used together in a builtin (#4945) Fixed: #4944 --- spec/index.bs | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/spec/index.bs b/spec/index.bs index 18b5e5158a..aad4d6b016 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -72,6 +72,7 @@ spec: WEBGL-1; urlPrefix: https://www.khronos.org/registry/webgl/specs/latest/1. text: WebGL Drawing Buffer; url: THE_DRAWING_BUFFER spec: WGSL; urlPrefix: https://gpuweb.github.io/gpuweb/wgsl/# type: dfn + text: functions in the shader stage; url: functions-in-a-shader-stage text: f16; url: f16 text: location; url: input-output-locations text: blend_src; url: input-output-locations @@ -7512,7 +7513,8 @@ typedef double GPUPipelineConstantValue; // May represent WGSL's bool, f32, i32, 1. |entryPoint| |must| not be `null`. 1. For each |binding| that is [=statically used=] by |entryPoint|: - [$validating shader binding$](|binding|, |layout|) |must| return `true`. - 1. For each texture and sampler [=statically used=] together by |entryPoint| in texture sampling calls: + 1. For each texture and sampler used together in a texture builtin function call in any of the + [=functions in the shader stage=] rooted at |entryPoint|: 1. Let |texture| be the {{GPUBindGroupLayoutEntry}} corresponding to the sampled texture in the call. 1. Let |sampler| be the {{GPUBindGroupLayoutEntry}} corresponding to the used sampler in the call. 1. If |sampler|.{{GPUSamplerBindingLayout/type}} is {{GPUSamplerBindingType/"filtering"}}, From d03b256c77ad4bf24bbafdcd4c88f4481c7b6a85 Mon Sep 17 00:00:00 2001 From: David Neto Date: Thu, 31 Oct 2024 17:23:24 -0700 Subject: [PATCH 239/285] Fix default pipeline layout determination of sampling type for f32 textures (#4949) Instead of the condition being: - used in a textureSample* call. the condition should be: - used in a texture builtin call that also uses a sampler Because the latter is textureSample* but also adds textureGather* calls. Issue: #4944 --- spec/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/index.bs b/spec/index.bs index aad4d6b016..34e4baad6e 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -7284,7 +7284,7 @@ run the following [=device timeline=] steps: Else if the sampled type of |resource| is:
- : `f32` and there exists a [=static use=] of |resource| by |stageDesc| with a `textureSample*` builtin + : `f32` and there exists a [=static use=] of |resource| by |stageDesc| in a texture builtin function call that also uses a sampler :: Set |textureLayout|.{{GPUTextureBindingLayout/sampleType}} to {{GPUTextureSampleType/"float"}} : `f32` otherwise :: Set |textureLayout|.{{GPUTextureBindingLayout/sampleType}} to {{GPUTextureSampleType/"unfilterable-float"}} From 7a2472b26b0bc0de4efae79ab9db7650bdbd0009 Mon Sep 17 00:00:00 2001 From: Stephen White Date: Fri, 1 Nov 2024 17:32:36 -0400 Subject: [PATCH 240/285] Compatibility Mode: add vertex storage buffer and texture limits (#4927) --- proposals/compatibility-mode.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/proposals/compatibility-mode.md b/proposals/compatibility-mode.md index 7405b00132..e51725b56f 100644 --- a/proposals/compatibility-mode.md +++ b/proposals/compatibility-mode.md @@ -197,6 +197,18 @@ generate a validation error. other APIs only support the first vertex so only `@interpolation(flat, either)` is supported in compatibility mode. +## 18. Introduce new `maxStorageBuffersInVertexStage` and `maxStorageTexturesInVertexStage` limits. + +If the number of shader variables of type `storage_buffer` in a vertex shader exceeds the `maxStorageBuffersInVertexStage` limit, a validation error will occur at pipeline creation time. + +If the number of shader variables of type `texture_storage_1d`, `texture_storage_2d`, `texture_storage_2d_array` and `texture_storage_3d` in a vertex shader exceeds the `maxStorageTexturesInVertexStage` limit, a validation error will occur at pipeline creation time. + +In Compatibility mode, these new limits will have a default of zero. In Core mode, they will default to the maximum value of a GPUSize32. + +In addition to the new limits, the existing `maxStorageBuffersPerShaderStage` and `maxStorageTexturesPerShaderStage` limits continue to apply to all stages. E.g., the effective storage buffer limit in the vertex stage is `min(maxStorageBuffersPerShaderStage, maxStorageBuffersInVertexStage)`. + +**Justification**: OpenGL ES 3.1 allows `MAX_VERTEX_SHADER_STORAGE_BLOCKS` and `MAX_VERTEX_IMAGE_UNIFORMS` to be zero, and there are a significant number of devices in the field with that value. + ## Issues Q: OpenGL ES does not have "coarse" and "fine" variants of the derivative instructions (`dFdx()`, `dFdy()`, `fwidth()`). Should WGSL's "fine" derivatives (`dpdxFine()`, `dpdyFine()`, and `fwidthFine()`) be required to deliver high precision results? See [Issue 4325](https://github.com/gpuweb/gpuweb/issues/4325). From 3f48a1d8fd538b43c5699adfa359a39dccba91f1 Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Fri, 1 Nov 2024 18:28:59 -0400 Subject: [PATCH 241/285] Rename image copies -> texel copies (#4838) * [subst] GPUImageDataLayout -> GPUTexelCopyBufferLayout * [subst] GPUImageCopyBuffer -> GPUTexelCopyBufferInfo * [subst] GPUImageCopyTexture -> GPUTexelCopyTextureInfo * [subst] GPUImageCopyTextureTagged -> GPUCopyExternalImageDestInfo * [subst] GPUImageCopyExternalImage -> GPUCopyExternalImageSourceInfo * [subst] GPUImageCopyExternalImageSource -> GPUCopyExternalImageSource * image {copy,data} -> texel {copy,data} * image copy -> [=texel copy=] * update section headers * [subst] |imageCopyTexture| -> |texelCopyTextureInfo| * [subst] Update algorithm name * [subst] |dataLayout| -> |bufferLayout| * Add docs to explain the names * Restore approximate link for #typedefdef-gpuimagecopyexternalimagesource --- spec/index.bs | 136 +++++++++++------------ spec/sections/copies.bs | 234 +++++++++++++++++++++------------------- 2 files changed, 195 insertions(+), 175 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 34e4baad6e..505e91e5e7 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -2113,7 +2113,7 @@ implementations **should** elide it, for performance. For optimal performance, applications **should** set their color space and encoding options so that the number of necessary conversions is minimized throughout the process. -For various image sources of {{GPUImageCopyExternalImage}}: +For various image sources of {{GPUCopyExternalImageSourceInfo}}: - {{ImageBitmap}}: - Premultiplication is controlled via {{ImageBitmapOptions/premultiplyAlpha}}. @@ -4580,7 +4580,7 @@ enum GPUTextureViewDimension {
Cubemap faces. The +U/+V axes indicate the individual faces' texture coordinates, - and thus the [=image copy=] memory layout of each face. + and thus the [=texel copy=] memory layout of each face.
@@ -4878,7 +4878,7 @@ The texel block width and texel block height speci of values for every texture format. The texel block copy footprint of an [=aspect=] of a {{GPUTextureFormat}} is the number of -bytes one texel block occupies during an [=image copy=], if applicable. +bytes one texel block occupies during a [=texel copy=], if applicable. Note: The texel block memory cost of a {{GPUTextureFormat}} is the number of @@ -9781,18 +9781,18 @@ interface GPUCommandEncoder { GPUSize64 size); undefined copyBufferToTexture( - GPUImageCopyBuffer source, - GPUImageCopyTexture destination, + GPUTexelCopyBufferInfo source, + GPUTexelCopyTextureInfo destination, GPUExtent3D copySize); undefined copyTextureToBuffer( - GPUImageCopyTexture source, - GPUImageCopyBuffer destination, + GPUTexelCopyTextureInfo source, + GPUTexelCopyBufferInfo destination, GPUExtent3D copySize); undefined copyTextureToTexture( - GPUImageCopyTexture source, - GPUImageCopyTexture destination, + GPUTexelCopyTextureInfo source, + GPUTexelCopyTextureInfo destination, GPUExtent3D copySize); undefined clearBuffer( @@ -10230,7 +10230,9 @@ dictionary GPUCommandEncoderDescriptor
-## Image Copy Commands ## {#commands-image-copies} +

Texel Copy Commands + +

: copyBufferToTexture(source, destination, copySize) @@ -10254,7 +10256,7 @@ dictionary GPUCommandEncoderDescriptor [=Content timeline=] steps: - 1. [=?=] [$validate GPUOrigin3D shape$](|destination|.{{GPUImageCopyTexture/origin}}). + 1. [=?=] [$validate GPUOrigin3D shape$](|destination|.{{GPUTexelCopyTextureInfo/origin}}). 1. [=?=] [$validate GPUExtent3D shape$](|copySize|). 1. Issue the subsequent steps on the [=Device timeline=] of |this|.{{GPUObjectBase/[[device]]}}: @@ -10263,12 +10265,12 @@ dictionary GPUCommandEncoderDescriptor 1. [$Validate the encoder state$] of |this|. If it returns false, return. 1. Let |aligned| be `true`. - 1. Let |dataLength| be |source|.{{GPUImageCopyBuffer/buffer}}.{{GPUBuffer/size}}. + 1. Let |dataLength| be |source|.{{GPUTexelCopyBufferInfo/buffer}}.{{GPUBuffer/size}}. 1. If any of the following conditions are unsatisfied, [$invalidate$] |this| and return.
- - [$validating GPUImageCopyBuffer$](|source|) returns `true`. - - |source|.{{GPUImageCopyBuffer/buffer}}.{{GPUBuffer/usage}} contains + - [$validating GPUTexelCopyBufferInfo$](|source|) returns `true`. + - |source|.{{GPUTexelCopyBufferInfo/buffer}}.{{GPUBuffer/usage}} contains {{GPUBufferUsage/COPY_SRC}}. - [$validating texture buffer copy$](|destination|, |source|, |dataLength|, |copySize|, {{GPUTextureUsage/COPY_DST}}, |aligned|) returns `true`.
@@ -10279,10 +10281,10 @@ dictionary GPUCommandEncoderDescriptor
[=Queue timeline=] steps: - 1. Let |blockWidth| be the [=texel block width=] of |destination|.{{GPUImageCopyTexture/texture}}. - 1. Let |blockHeight| be the [=texel block height=] of |destination|.{{GPUImageCopyTexture/texture}}. + 1. Let |blockWidth| be the [=texel block width=] of |destination|.{{GPUTexelCopyTextureInfo/texture}}. + 1. Let |blockHeight| be the [=texel block height=] of |destination|.{{GPUTexelCopyTextureInfo/texture}}. - 1. Let |dstOrigin| be |destination|.{{GPUImageCopyTexture/origin}}. + 1. Let |dstOrigin| be |destination|.{{GPUTexelCopyTextureInfo/origin}}. 1. Let |dstBlockOriginX| be (|dstOrigin|.[=GPUOrigin3D/x=] ÷ |blockWidth|). 1. Let |dstBlockOriginY| be (|dstOrigin|.[=GPUOrigin3D/y=] ÷ |blockHeight|). @@ -10297,11 +10299,11 @@ dictionary GPUCommandEncoderDescriptor 1. For each |y| in the range [0, |blockRows| − 1]: 1. For each |x| in the range [0, |blockColumns| − 1]: 1. Let |blockOffset| be the [$texel block byte offset$] of |source| for (|x|, |y|, |z|) of - |destination|.{{GPUImageCopyTexture/texture}}. + |destination|.{{GPUTexelCopyTextureInfo/texture}}. 1. Set [=texel block=] (|dstBlockOriginX| + |x|, |dstBlockOriginY| + |y|) of |dstSubregion| to be an [=equivalent texel representation=] to the [=texel block=] - described by |source|.{{GPUImageCopyBuffer/buffer}} at offset |blockOffset|. + described by |source|.{{GPUTexelCopyBufferInfo/buffer}} at offset |blockOffset|.
@@ -10327,7 +10329,7 @@ dictionary GPUCommandEncoderDescriptor [=Content timeline=] steps: - 1. [=?=] [$validate GPUOrigin3D shape$](|source|.{{GPUImageCopyTexture/origin}}). + 1. [=?=] [$validate GPUOrigin3D shape$](|source|.{{GPUTexelCopyTextureInfo/origin}}). 1. [=?=] [$validate GPUExtent3D shape$](|copySize|). 1. Issue the subsequent steps on the [=Device timeline=] of |this|.{{GPUObjectBase/[[device]]}}: @@ -10336,12 +10338,12 @@ dictionary GPUCommandEncoderDescriptor 1. [$Validate the encoder state$] of |this|. If it returns false, return. 1. Let |aligned| be `true`. - 1. Let |dataLength| be |destination|.{{GPUImageCopyBuffer/buffer}}.{{GPUBuffer/size}}. + 1. Let |dataLength| be |destination|.{{GPUTexelCopyBufferInfo/buffer}}.{{GPUBuffer/size}}. 1. If any of the following conditions are unsatisfied, [$invalidate$] |this| and return.
- - [$validating GPUImageCopyBuffer$](|destination|) returns `true`. - - |destination|.{{GPUImageCopyBuffer/buffer}}.{{GPUBuffer/usage}} contains + - [$validating GPUTexelCopyBufferInfo$](|destination|) returns `true`. + - |destination|.{{GPUTexelCopyBufferInfo/buffer}}.{{GPUBuffer/usage}} contains {{GPUBufferUsage/COPY_DST}}. - [$validating texture buffer copy$](|source|, |destination|, |dataLength|, |copySize|, {{GPUTextureUsage/COPY_SRC}}, |aligned|) returns `true`.
@@ -10352,10 +10354,10 @@ dictionary GPUCommandEncoderDescriptor
[=Queue timeline=] steps: - 1. Let |blockWidth| be the [=texel block width=] of |source|.{{GPUImageCopyTexture/texture}}. - 1. Let |blockHeight| be the [=texel block height=] of |source|.{{GPUImageCopyTexture/texture}}. + 1. Let |blockWidth| be the [=texel block width=] of |source|.{{GPUTexelCopyTextureInfo/texture}}. + 1. Let |blockHeight| be the [=texel block height=] of |source|.{{GPUTexelCopyTextureInfo/texture}}. - 1. Let |srcOrigin| be |source|.{{GPUImageCopyTexture/origin}}. + 1. Let |srcOrigin| be |source|.{{GPUTexelCopyTextureInfo/origin}}. 1. Let |srcBlockOriginX| be (|srcOrigin|.[=GPUOrigin3D/x=] ÷ |blockWidth|). 1. Let |srcBlockOriginY| be (|srcOrigin|.[=GPUOrigin3D/y=] ÷ |blockHeight|). @@ -10370,9 +10372,9 @@ dictionary GPUCommandEncoderDescriptor 1. For each |y| in the range [0, |blockRows| − 1]: 1. For each |x| in the range [0, |blockColumns| − 1]: 1. Let |blockOffset| be the [$texel block byte offset$] of |destination| for (|x|, |y|, |z|) of - |source|.{{GPUImageCopyTexture/texture}}. + |source|.{{GPUTexelCopyTextureInfo/texture}}. - 1. Set |destination|.{{GPUImageCopyBuffer/buffer}} at offset |blockOffset| to be an + 1. Set |destination|.{{GPUTexelCopyBufferInfo/buffer}} at offset |blockOffset| to be an [=equivalent texel representation=] to [=texel block=] (|srcBlockOriginX| + |x|, |srcBlockOriginY| + |y|) of |srcSubregion|.
@@ -10401,8 +10403,8 @@ dictionary GPUCommandEncoderDescriptor [=Content timeline=] steps: - 1. [=?=] [$validate GPUOrigin3D shape$](|source|.{{GPUImageCopyTexture/origin}}). - 1. [=?=] [$validate GPUOrigin3D shape$](|destination|.{{GPUImageCopyTexture/origin}}). + 1. [=?=] [$validate GPUOrigin3D shape$](|source|.{{GPUTexelCopyTextureInfo/origin}}). + 1. [=?=] [$validate GPUOrigin3D shape$](|destination|.{{GPUTexelCopyTextureInfo/origin}}). 1. [=?=] [$validate GPUExtent3D shape$](|copySize|). 1. Issue the subsequent steps on the [=Device timeline=] of |this|.{{GPUObjectBase/[[device]]}}: @@ -10413,17 +10415,17 @@ dictionary GPUCommandEncoderDescriptor 1. If any of the following conditions are unsatisfied, [$invalidate$] |this| and return.
- - Let |srcTexture| be |source|.{{GPUImageCopyTexture/texture}}. - - Let |dstTexture| be |destination|.{{GPUImageCopyTexture/texture}}. - - [$validating GPUImageCopyTexture$](|source|, |copySize|) returns `true`. + - Let |srcTexture| be |source|.{{GPUTexelCopyTextureInfo/texture}}. + - Let |dstTexture| be |destination|.{{GPUTexelCopyTextureInfo/texture}}. + - [$validating GPUTexelCopyTextureInfo$](|source|, |copySize|) returns `true`. - |srcTexture|.{{GPUTexture/usage}} contains {{GPUTextureUsage/COPY_SRC}}. - - [$validating GPUImageCopyTexture$](|destination|, |copySize|) returns `true`. + - [$validating GPUTexelCopyTextureInfo$](|destination|, |copySize|) returns `true`. - |dstTexture|.{{GPUTexture/usage}} contains {{GPUTextureUsage/COPY_DST}}. - |srcTexture|.{{GPUTexture/sampleCount}} is equal to |dstTexture|.{{GPUTexture/sampleCount}}. - |srcTexture|.{{GPUTexture/format}} and |dstTexture|.{{GPUTexture/format}} must be [=copy-compatible=]. - If |srcTexture|.{{GPUTexture/format}} is a depth-stencil format: - - |source|.{{GPUImageCopyTexture/aspect}} and |destination|.{{GPUImageCopyTexture/aspect}} + - |source|.{{GPUTexelCopyTextureInfo/aspect}} and |destination|.{{GPUTexelCopyTextureInfo/aspect}} must both refer to all aspects of |srcTexture|.{{GPUTexture/format}} and |dstTexture|.{{GPUTexture/format}}, respectively. - The [$set of subresources for texture copy$](|source|, |copySize|) and @@ -10436,14 +10438,14 @@ dictionary GPUCommandEncoderDescriptor
[=Queue timeline=] steps: - 1. Let |blockWidth| be the [=texel block width=] of |source|.{{GPUImageCopyTexture/texture}}. - 1. Let |blockHeight| be the [=texel block height=] of |source|.{{GPUImageCopyTexture/texture}}. + 1. Let |blockWidth| be the [=texel block width=] of |source|.{{GPUTexelCopyTextureInfo/texture}}. + 1. Let |blockHeight| be the [=texel block height=] of |source|.{{GPUTexelCopyTextureInfo/texture}}. - 1. Let |srcOrigin| be |source|.{{GPUImageCopyTexture/origin}}. + 1. Let |srcOrigin| be |source|.{{GPUTexelCopyTextureInfo/origin}}. 1. Let |srcBlockOriginX| be (|srcOrigin|.[=GPUOrigin3D/x=] ÷ |blockWidth|). 1. Let |srcBlockOriginY| be (|srcOrigin|.[=GPUOrigin3D/y=] ÷ |blockHeight|). - 1. Let |dstOrigin| be |destination|.{{GPUImageCopyTexture/origin}}. + 1. Let |dstOrigin| be |destination|.{{GPUTexelCopyTextureInfo/origin}}. 1. Let |dstBlockOriginX| be (|dstOrigin|.[=GPUOrigin3D/x=] ÷ |blockWidth|). 1. Let |dstBlockOriginY| be (|dstOrigin|.[=GPUOrigin3D/y=] ÷ |blockHeight|). @@ -13229,14 +13231,14 @@ interface GPUQueue { optional GPUSize64 size); undefined writeTexture( - GPUImageCopyTexture destination, + GPUTexelCopyTextureInfo destination, AllowSharedBufferSource data, - GPUImageDataLayout dataLayout, + GPUTexelCopyBufferLayout dataLayout, GPUExtent3D size); undefined copyExternalImageToTexture( - GPUImageCopyExternalImage source, - GPUImageCopyTextureTagged destination, + GPUCopyExternalImageSourceInfo source, + GPUCopyExternalImageDestInfo destination, GPUExtent3D copySize); }; GPUQueue includes GPUObjectBase; @@ -13331,7 +13333,7 @@ GPUQueue includes GPUObjectBase; [=Content timeline=] steps: - 1. [=?=] [$validate GPUOrigin3D shape$](|destination|.{{GPUImageCopyTexture/origin}}). + 1. [=?=] [$validate GPUOrigin3D shape$](|destination|.{{GPUTexelCopyTextureInfo/origin}}). 1. [=?=] [$validate GPUExtent3D shape$](|size|). 1. Let |dataBytes| be [=get a copy of the buffer source|a copy of the bytes held by the buffer source=] |data|. @@ -13349,13 +13351,13 @@ GPUQueue includes GPUObjectBase; [$generate a validation error$] and return.
- - |destination|.{{GPUImageCopyTexture/texture}}.{{GPUTexture/[[destroyed]]}} is `false`. + - |destination|.{{GPUTexelCopyTextureInfo/texture}}.{{GPUTexture/[[destroyed]]}} is `false`. - [$validating texture buffer copy$](|destination|, |dataLayout|, |dataLength|, |size|, {{GPUTextureUsage/COPY_DST}}, |aligned|) returns `true`. Note: unlike {{GPUCommandEncoder}}.{{GPUCommandEncoder/copyBufferToTexture()}}, there is no alignment requirement on either - |dataLayout|.{{GPUImageDataLayout/bytesPerRow}} or |dataLayout|.{{GPUImageDataLayout/offset}}. + |dataLayout|.{{GPUTexelCopyBufferLayout/bytesPerRow}} or |dataLayout|.{{GPUTexelCopyBufferLayout/offset}}.
1. Issue the subsequent steps on the [=Queue timeline=] of |this|. @@ -13363,10 +13365,10 @@ GPUQueue includes GPUObjectBase;
[=Queue timeline=] steps: - 1. Let |blockWidth| be the [=texel block width=] of |destination|.{{GPUImageCopyTexture/texture}}. - 1. Let |blockHeight| be the [=texel block height=] of |destination|.{{GPUImageCopyTexture/texture}}. + 1. Let |blockWidth| be the [=texel block width=] of |destination|.{{GPUTexelCopyTextureInfo/texture}}. + 1. Let |blockHeight| be the [=texel block height=] of |destination|.{{GPUTexelCopyTextureInfo/texture}}. - 1. Let |dstOrigin| be |destination|.{{GPUImageCopyTexture/origin}}; + 1. Let |dstOrigin| be |destination|.{{GPUTexelCopyTextureInfo/origin}}; 1. Let |dstBlockOriginX| be (|dstOrigin|.[=GPUOrigin3D/x=] ÷ |blockWidth|). 1. Let |dstBlockOriginY| be (|dstOrigin|.[=GPUOrigin3D/y=] ÷ |blockHeight|). @@ -13381,7 +13383,7 @@ GPUQueue includes GPUObjectBase; 1. For each |y| in the range [0, |blockRows| − 1]: 1. For each |x| in the range [0, |blockColumns| − 1]: 1. Let |blockOffset| be the [$texel block byte offset$] of |dataLayout| for (|x|, |y|, |z|) of - |destination|.{{GPUImageCopyTexture/texture}}. + |destination|.{{GPUTexelCopyTextureInfo/texture}}. 1. Set [=texel block=] (|dstBlockOriginX| + |x|, |dstBlockOriginY| + |y|) of |dstSubregion| to be an [=equivalent texel representation=] to the [=texel block=] @@ -13395,7 +13397,7 @@ GPUQueue includes GPUObjectBase; into the destination texture. This operation performs [[#color-space-conversions|color encoding]] into the destination - encoding according to the parameters of {{GPUImageCopyTextureTagged}}. + encoding according to the parameters of {{GPUCopyExternalImageDestInfo}}. Copying into a `-srgb` texture results in the same texture bytes, not the same decoded values, as copying into the corresponding non-`-srgb` format. @@ -13435,10 +13437,10 @@ GPUQueue includes GPUObjectBase; [=Content timeline=] steps: - 1. [=?=] [$validate GPUOrigin2D shape$](|source|.{{GPUImageCopyExternalImage/origin}}). - 1. [=?=] [$validate GPUOrigin3D shape$](|destination|.{{GPUImageCopyTexture/origin}}). + 1. [=?=] [$validate GPUOrigin2D shape$](|source|.{{GPUCopyExternalImageSourceInfo/origin}}). + 1. [=?=] [$validate GPUOrigin3D shape$](|destination|.{{GPUTexelCopyTextureInfo/origin}}). 1. [=?=] [$validate GPUExtent3D shape$](|copySize|). - 1. Let |sourceImage| be |source|.{{GPUImageCopyExternalImage/source}} + 1. Let |sourceImage| be |source|.{{GPUCopyExternalImageSourceInfo/source}} 1. If |sourceImage| [=is not origin-clean=], throw a {{SecurityError}} and return. 1. If any of the following requirements are unmet, throw an {{OperationError}} and return. @@ -13457,14 +13459,14 @@ GPUQueue includes GPUObjectBase;
[=Device timeline=] steps: - 1. Let |texture| be |destination|.{{GPUImageCopyTexture/texture}}. + 1. Let |texture| be |destination|.{{GPUTexelCopyTextureInfo/texture}}. 1. If any of the following requirements are unmet, [$generate a validation error$] and return.
- |usability| must be `good`. - |texture|.{{GPUTexture/[[destroyed]]}} must be `false`. - |texture| must be [$valid to use with$] |this|. - - [$validating GPUImageCopyTexture$](destination, copySize) must return `true`. + - [$validating GPUTexelCopyTextureInfo$](destination, copySize) must return `true`. - |texture|.{{GPUTexture/usage}} must include both {{GPUTextureUsage/RENDER_ATTACHMENT}} and {{GPUTextureUsage/COPY_DST}}. - |texture|.{{GPUTexture/dimension}} must be {{GPUTextureDimension/"2d"}}. @@ -13492,26 +13494,26 @@ GPUQueue includes GPUObjectBase;
[=Queue timeline=] steps: - 1. [=Assert=] that the [=texel block width=] of |destination|.{{GPUImageCopyTexture/texture}} is 1, - the [=texel block height=] of |destination|.{{GPUImageCopyTexture/texture}} is 1, and that + 1. [=Assert=] that the [=texel block width=] of |destination|.{{GPUTexelCopyTextureInfo/texture}} is 1, + the [=texel block height=] of |destination|.{{GPUTexelCopyTextureInfo/texture}} is 1, and that |copySize|.[=GPUExtent3D/depthOrArrayLayers=] is 1. - 1. Let |srcOrigin| be |source|.{{GPUImageCopyExternalImage/origin}}. - 1. Let |dstOrigin| be |destination|.{{GPUImageCopyTexture/origin}}. + 1. Let |srcOrigin| be |source|.{{GPUCopyExternalImageSourceInfo/origin}}. + 1. Let |dstOrigin| be |destination|.{{GPUTexelCopyTextureInfo/origin}}. 1. Let |dstSubregion| be [$texture copy sub-region$] (|dstOrigin|.[=GPUOrigin3D/z=]) of |destination|. 1. For each |y| in the range [0, |copySize|.[=GPUExtent3D/height=] − 1]: - 1. Let |srcY| be |y| if |source|.{{GPUImageCopyExternalImage/flipY}} is `false` and + 1. Let |srcY| be |y| if |source|.{{GPUCopyExternalImageSourceInfo/flipY}} is `false` and (|copySize|.[=GPUExtent3D/height=] − 1 − |y|) otherwise. 1. For each |x| in the range [0, |copySize|.[=GPUExtent3D/width=] − 1]: 1. Set [=texel block=] (|dstOrigin|.[=GPUOrigin3D/x=] + |x|, |dstOrigin|.[=GPUOrigin3D/y=] + |y|) of |dstSubregion| to be an [=equivalent texel representation=] of the pixel at (|srcOrigin|.[=GPUOrigin2D/x=] + |x|, |srcOrigin|.[=GPUOrigin2D/y=] + |srcY|) of - |source|.{{GPUImageCopyExternalImage/source}} after applying any + |source|.{{GPUCopyExternalImageSourceInfo/source}} after applying any [[#color-space-conversions|color encoding]] required by - |destination|.{{GPUImageCopyTextureTagged/colorSpace}} and - |destination|.{{GPUImageCopyTextureTagged/premultipliedAlpha}}. + |destination|.{{GPUCopyExternalImageDestInfo/colorSpace}} and + |destination|.{{GPUCopyExternalImageDestInfo/premultipliedAlpha}}.
@@ -16911,8 +16913,8 @@ be used with {{GPUSamplerBindingType/"comparison"}} samplers even if they use fi [=Texel block memory cost=] (Bytes) Aspect {{GPUTextureSampleType}} - Valid [=image copy=] source - Valid [=image copy=] destination + Valid [=texel copy=] source + Valid [=texel copy=] destination [=Texel block copy footprint=] (Bytes) Aspect-specific format @@ -17025,7 +17027,7 @@ As a result, copies into such textures are only valid from other textures of the The depth aspects of depth24plus formats ({{GPUTextureFormat/"depth24plus"}} and {{GPUTextureFormat/"depth24plus-stencil8"}}) have opaque representations (implemented as either [=24-bit depth=] or {{GPUTextureFormat/"depth32float"}}). -As a result, depth-aspect [=image copies=] are not allowed with these formats. +As a result, depth-aspect [=texel copies=] are not allowed with these formats.
It is possible to imitate these disallowed copies: diff --git a/spec/sections/copies.bs b/spec/sections/copies.bs index 6d7fb51def..3b2266c799 100644 --- a/spec/sections/copies.bs +++ b/spec/sections/copies.bs @@ -13,9 +13,11 @@ and "immediate" {{GPUQueue}} operations: - {{GPUQueue/writeBuffer()}}, for {{ArrayBuffer}}-to-{{GPUBuffer}} writes -## Image Copies ## {#image-copies} +

Texel Copies + +

-Image copy operations operate on texture/"image" data, rather than bytes. +Texel copy operations operate on texture/"image" data, rather than bytes. WebGPU provides "buffered" {{GPUCommandEncoder}} commands: @@ -42,12 +44,16 @@ Note: Copies may be performed with WGSL shaders, which means that any of the doc The following definitions are used by these methods: -

`GPUImageDataLayout` +

`GPUTexelCopyBufferLayout` +

+"{{GPUTexelCopyBufferLayout}}" describes the "**layout**" of texels in a "**buffer**" of bytes +({{GPUBuffer}} or {{AllowSharedBufferSource}}) in a "[=texel copy=]" operation. + -
+
: buffer :: - A buffer which either contains image data to be copied or will store the image data being + A buffer which either contains texel data to be copied or will store the texel data being copied, depending on the method it is being passed to.
- validating GPUImageCopyBuffer + validating GPUTexelCopyBufferInfo **Arguments:** - - {{GPUImageCopyBuffer}} |imageCopyBuffer| + - {{GPUTexelCopyBufferInfo}} |imageCopyBuffer| **Returns:** {{boolean}} @@ -138,21 +146,23 @@ dictionary GPUImageCopyBuffer 1. Return `true` if and only if all of the following conditions are satisfied:
- - |imageCopyBuffer|.{{GPUImageCopyBuffer/buffer}} must be a [=valid=] {{GPUBuffer}}. - - |imageCopyBuffer|.{{GPUImageDataLayout/bytesPerRow}} must be a multiple of 256. + - |imageCopyBuffer|.{{GPUTexelCopyBufferInfo/buffer}} must be a [=valid=] {{GPUBuffer}}. + - |imageCopyBuffer|.{{GPUTexelCopyBufferLayout/bytesPerRow}} must be a multiple of 256.
-

`GPUImageCopyTexture` +

`GPUTexelCopyTextureInfo` +

-In an [=image copy=] operation, a {{GPUImageCopyTexture}} defines a {{GPUTexture}} and, together with -the `copySize`, the sub-region of the texture (spanning one or more contiguous -[=texture subresources=] at the same mip-map level). +"{{GPUTexelCopyTextureInfo}}" describes the "**info**" ({{GPUTexture}}, etc.) +about a "**texture**" source or destination of a "[=texel copy=]" operation. +Together with the `copySize`, it describes a sub-region of a texture +(spanning one or more contiguous [=texture subresources=] at the same mip-map level). -
+
: texture :: Texture to copy to/from. : mipLevel :: - Mip-map level of the {{GPUImageCopyTexture/texture}} to copy to/from. + Mip-map level of the {{GPUTexelCopyTextureInfo/texture}} to copy to/from. : origin :: @@ -176,14 +186,14 @@ dictionary GPUImageCopyTexture { : aspect :: - Defines which aspects of the {{GPUImageCopyTexture/texture}} to copy to/from. + Defines which aspects of the {{GPUTexelCopyTextureInfo/texture}} to copy to/from.
- The texture copy sub-region for depth slice or array layer |index| of {{GPUImageCopyTexture}} + The texture copy sub-region for depth slice or array layer |index| of {{GPUTexelCopyTextureInfo}} |copyTexture| is determined by running the following steps: - 1. Let |texture| be |copyTexture|.{{GPUImageCopyTexture/texture}}. + 1. Let |texture| be |copyTexture|.{{GPUTexelCopyTextureInfo/texture}}. 1. If |texture|.{{GPUTexture/dimension}} is:
: {{GPUTextureDimension/1d}} @@ -196,61 +206,61 @@ dictionary GPUImageCopyTexture { : {{GPUTextureDimension/3d}} :: Let |depthSliceOrLayer| be depth slice |index| of |texture|
- 1. Let |textureMip| be mip level |copyTexture|.{{GPUImageCopyTexture/mipLevel}} of |depthSliceOrLayer|. - 1. Return aspect |copyTexture|.{{GPUImageCopyTexture/aspect}} of |textureMip|. + 1. Let |textureMip| be mip level |copyTexture|.{{GPUTexelCopyTextureInfo/mipLevel}} of |depthSliceOrLayer|. + 1. Return aspect |copyTexture|.{{GPUTexelCopyTextureInfo/aspect}} of |textureMip|.
- The texel block byte offset of data described by {{GPUImageDataLayout}} |dataLayout| + The texel block byte offset of data described by {{GPUTexelCopyBufferLayout}} |bufferLayout| corresponding to [=texel block=] |x|, |y| of depth slice or array layer |z| of a {{GPUTexture}} |texture| is determined by running the following steps: 1. Let |blockBytes| be the [=texel block copy footprint=] of |texture|.{{GPUTexture/format}}. - 1. Let |imageOffset| be (|z| × |dataLayout|.{{GPUImageDataLayout/rowsPerImage}} × - |dataLayout|.{{GPUImageDataLayout/bytesPerRow}}) + |dataLayout|.{{GPUImageDataLayout/offset}}. - 1. Let |rowOffset| be (|y| × |dataLayout|.{{GPUImageDataLayout/bytesPerRow}}) + |imageOffset|. + 1. Let |imageOffset| be (|z| × |bufferLayout|.{{GPUTexelCopyBufferLayout/rowsPerImage}} × + |bufferLayout|.{{GPUTexelCopyBufferLayout/bytesPerRow}}) + |bufferLayout|.{{GPUTexelCopyBufferLayout/offset}}. + 1. Let |rowOffset| be (|y| × |bufferLayout|.{{GPUTexelCopyBufferLayout/bytesPerRow}}) + |imageOffset|. 1. Let |blockOffset| be (|x| × |blockBytes|) + |rowOffset|. 1. Return |blockOffset|.
- validating GPUImageCopyTexture(|imageCopyTexture|, |copySize|) + validating GPUTexelCopyTextureInfo(|texelCopyTextureInfo|, |copySize|) **Arguments:** - - {{GPUImageCopyTexture}} |imageCopyTexture| + - {{GPUTexelCopyTextureInfo}} |texelCopyTextureInfo| - {{GPUExtent3D}} |copySize| **Returns:** {{boolean}} [=Device timeline=] steps: - 1. Let |blockWidth| be the [=texel block width=] of |imageCopyTexture|.{{GPUImageCopyTexture/texture}}.{{GPUTexture/format}}. - 1. Let |blockHeight| be the [=texel block height=] of |imageCopyTexture|.{{GPUImageCopyTexture/texture}}.{{GPUTexture/format}}. + 1. Let |blockWidth| be the [=texel block width=] of |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/texture}}.{{GPUTexture/format}}. + 1. Let |blockHeight| be the [=texel block height=] of |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/texture}}.{{GPUTexture/format}}. 1. Return `true` if and only if all of the following conditions apply:
- - [$validating texture copy range$](|imageCopyTexture|, |copySize|) returns `true`. - - |imageCopyTexture|.{{GPUImageCopyTexture/texture}} must be a [=valid=] {{GPUTexture}}. - - |imageCopyTexture|.{{GPUImageCopyTexture/mipLevel}} must be < - |imageCopyTexture|.{{GPUImageCopyTexture/texture}}.{{GPUTexture/mipLevelCount}}. - - |imageCopyTexture|.{{GPUImageCopyTexture/origin}}.[=GPUOrigin3D/x=] must be a multiple of |blockWidth|. - - |imageCopyTexture|.{{GPUImageCopyTexture/origin}}.[=GPUOrigin3D/y=] must be a multiple of |blockHeight|. - - The [=imageCopyTexture physical subresource size=] of |imageCopyTexture| is equal to |copySize| if either of + - [$validating texture copy range$](|texelCopyTextureInfo|, |copySize|) returns `true`. + - |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/texture}} must be a [=valid=] {{GPUTexture}}. + - |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/mipLevel}} must be < + |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/texture}}.{{GPUTexture/mipLevelCount}}. + - |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/origin}}.[=GPUOrigin3D/x=] must be a multiple of |blockWidth|. + - |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/origin}}.[=GPUOrigin3D/y=] must be a multiple of |blockHeight|. + - The [=GPUTexelCopyTextureInfo physical subresource size=] of |texelCopyTextureInfo| is equal to |copySize| if either of the following conditions is true: - - |imageCopyTexture|.{{GPUImageCopyTexture/texture}}.{{GPUTexture/format}} is a depth-stencil format. - - |imageCopyTexture|.{{GPUImageCopyTexture/texture}}.{{GPUTexture/sampleCount}} > 1. + - |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/texture}}.{{GPUTexture/format}} is a depth-stencil format. + - |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/texture}}.{{GPUTexture/sampleCount}} > 1.
- validating texture buffer copy(|imageCopyTexture|, |dataLayout|, |dataLength|, |copySize|, |textureUsage|, |aligned|) + validating texture buffer copy(|texelCopyTextureInfo|, |bufferLayout|, |dataLength|, |copySize|, |textureUsage|, |aligned|) **Arguments:** - - {{GPUImageCopyTexture}} |imageCopyTexture| - - {{GPUImageDataLayout}} |dataLayout| + - {{GPUTexelCopyTextureInfo}} |texelCopyTextureInfo| + - {{GPUTexelCopyBufferLayout}} |bufferLayout| - {{GPUSize64Out}} |dataLength| - {{GPUExtent3D}} |copySize| - {{GPUTextureUsage}} |textureUsage| @@ -260,60 +270,63 @@ dictionary GPUImageCopyTexture { [=Device timeline=] steps: - 1. Let |texture| be |imageCopyTexture|.{{GPUImageCopyTexture/texture}} + 1. Let |texture| be |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/texture}} 1. Let |aspectSpecificFormat| = |texture|.{{GPUTexture/format}}. 1. Let |offsetAlignment| = [=texel block copy footprint=] of |texture|.{{GPUTexture/format}}. 1. Return `true` if and only if all of the following conditions apply:
- 1. [$validating GPUImageCopyTexture$](|imageCopyTexture|, |copySize|) returns `true`. + 1. [$validating GPUTexelCopyTextureInfo$](|texelCopyTextureInfo|, |copySize|) returns `true`. 1. |texture|.{{GPUTexture/sampleCount}} is 1. 1. |texture|.{{GPUTexture/usage}} contains |textureUsage|. 1. If |texture|.{{GPUTexture/format}} is a [=depth-or-stencil format=] format: - 1. |imageCopyTexture|.{{GPUImageCopyTexture/aspect}} must refer to a single aspect of + 1. |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/aspect}} must refer to a single aspect of |texture|.{{GPUTexture/format}}. 1. If |textureUsage| is:
: {{GPUTextureUsage/COPY_SRC}} - :: That aspect must be a valid image copy source according to [[#depth-formats]]. + :: That aspect must be a valid [=texel copy=] source according to [[#depth-formats]]. : {{GPUTextureUsage/COPY_DST}} - :: That aspect must be a valid image copy destination according to [[#depth-formats]]. + :: That aspect must be a valid [=texel copy=] destination according to [[#depth-formats]].
1. Set |aspectSpecificFormat| to the [=aspect-specific format=] according to [[#depth-formats]]. 1. Set |offsetAlignment| to 4. 1. If |aligned| is `true`: - 1. |dataLayout|.{{GPUImageDataLayout/offset}} is a multiple of |offsetAlignment|. - 1. [$validating linear texture data$](|dataLayout|, + 1. |bufferLayout|.{{GPUTexelCopyBufferLayout/offset}} is a multiple of |offsetAlignment|. + 1. [$validating linear texture data$](|bufferLayout|, |dataLength|, |aspectSpecificFormat|, |copySize|) succeeds.
-

`GPUImageCopyTextureTagged` +

`GPUCopyExternalImageDestInfo` +

WebGPU textures hold raw numeric data, and are not tagged with semantic metadata describing colors. However, {{GPUQueue/copyExternalImageToTexture()}} copies from sources that describe colors. -A {{GPUImageCopyTextureTagged}} is a {{GPUImageCopyTexture}} which is additionally tagged with +"{{GPUCopyExternalImageDestInfo}}" describes the "**info**" about the "destination" of a +"copyExternalImageToTexture()" operation. +It is a {{GPUTexelCopyTextureInfo}} which is additionally tagged with color space/encoding and alpha-premultiplication metadata, so that semantic color data may be preserved during copies. -This metadata affects only the semantics of the {{GPUQueue/copyExternalImageToTexture()}} -operation, not the semantics of the destination texture. +This metadata affects only the semantics of the copy operation +operation, not the state or semantics of the destination texture object. -
+
: colorSpace :: Describes the color space and encoding used to encode data into the destination texture. @@ -323,7 +336,7 @@ dictionary GPUImageCopyTextureTagged Otherwise, the results are clamped to the target texture format's range. Note: - If {{GPUImageCopyTextureTagged/colorSpace}} matches the source image, + If {{GPUCopyExternalImageDestInfo/colorSpace}} matches the source image, conversion may not be necessary. See [[#color-space-conversion-elision]]. : premultipliedAlpha @@ -331,19 +344,24 @@ dictionary GPUImageCopyTextureTagged Describes whether the data written into the texture should have its RGB channels premultiplied by the alpha channel, or not. - If this option is set to `true` and the {{GPUImageCopyExternalImage/source}} is also + If this option is set to `true` and the {{GPUCopyExternalImageSourceInfo/source}} is also premultiplied, the source RGB values must be preserved even if they exceed their corresponding alpha values. Note: - If {{GPUImageCopyTextureTagged/premultipliedAlpha}} matches the source image, + If {{GPUCopyExternalImageDestInfo/premultipliedAlpha}} matches the source image, conversion may not be necessary. See [[#color-space-conversion-elision]].
-

`GPUImageCopyExternalImage` +

`GPUCopyExternalImageSourceInfo` + +

+"{{GPUCopyExternalImageSourceInfo}}" describes the "**info**" about the "**source**" of a +"copyExternalImageToTexture()" operation. + -{{GPUImageCopyExternalImage}} has the following members: +{{GPUCopyExternalImageSourceInfo}} has the following members: -
+
: source :: - The source of the [=image copy=]. The copy source data is captured at the moment that + The source of the [=texel copy=]. The copy source data is captured at the moment that {{GPUQueue/copyExternalImageToTexture()}} is issued. Source size is determined as described by the [=external source dimensions=] table. @@ -380,7 +398,7 @@ dictionary GPUImageCopyExternalImage { If this option is set to `true`, the copy is flipped vertically: the bottom row of the source region is copied into the first row of the destination region, and so on. - The {{GPUImageCopyExternalImage/origin}} option is still relative to the top-left corner + The {{GPUCopyExternalImageSourceInfo/origin}} option is still relative to the top-left corner of the source image, increasing downward.
@@ -432,20 +450,20 @@ are defined by the source type, given by this table: ### Subroutines ### {#image-copies-subroutines} -
- imageCopyTexture physical subresource size +
+ GPUTexelCopyTextureInfo physical subresource size **Arguments:** - - {{GPUImageCopyTexture}} |imageCopyTexture| + - {{GPUTexelCopyTextureInfo}} |texelCopyTextureInfo| **Returns:** {{GPUExtent3D}} - The [=imageCopyTexture physical subresource size=] of |imageCopyTexture| is calculated as follows: + The [=GPUTexelCopyTextureInfo physical subresource size=] of |texelCopyTextureInfo| is calculated as follows: Its [=GPUExtent3D/width=], [=GPUExtent3D/height=] and [=GPUExtent3D/depthOrArrayLayers=] are the width, height, and depth, respectively, - of the [=physical miplevel-specific texture extent=] of |imageCopyTexture|.{{GPUImageCopyTexture/texture}} [=subresource=] at [=mipmap level=] - |imageCopyTexture|.{{GPUImageCopyTexture/mipLevel}}. + of the [=physical miplevel-specific texture extent=] of |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/texture}} [=subresource=] at [=mipmap level=] + |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/mipLevel}}.
@@ -453,7 +471,7 @@ are defined by the source type, given by this table: **Arguments:** - : {{GPUImageDataLayout}} |layout| + : {{GPUTexelCopyBufferLayout}} |layout| :: Layout of the linear texture data. : {{GPUSize64}} |byteSize| :: Total size of the linear data, in bytes. @@ -474,18 +492,18 @@ are defined by the source type, given by this table:
- If |heightInBlocks| > 1, - |layout|.{{GPUImageDataLayout/bytesPerRow}} must be specified. + |layout|.{{GPUTexelCopyBufferLayout/bytesPerRow}} must be specified. - If |copyExtent|.[=GPUExtent3D/depthOrArrayLayers=] > 1, - |layout|.{{GPUImageDataLayout/bytesPerRow}} and - |layout|.{{GPUImageDataLayout/rowsPerImage}} must be specified. - - If specified, |layout|.{{GPUImageDataLayout/bytesPerRow}} + |layout|.{{GPUTexelCopyBufferLayout/bytesPerRow}} and + |layout|.{{GPUTexelCopyBufferLayout/rowsPerImage}} must be specified. + - If specified, |layout|.{{GPUTexelCopyBufferLayout/bytesPerRow}} must be ≥ |bytesInLastRow|. - - If specified, |layout|.{{GPUImageDataLayout/rowsPerImage}} + - If specified, |layout|.{{GPUTexelCopyBufferLayout/rowsPerImage}} must be ≥ |heightInBlocks|.
1. Let: - - |bytesPerRow| be |layout|.{{GPUImageDataLayout/bytesPerRow}} ?? 0. - - |rowsPerImage| be |layout|.{{GPUImageDataLayout/rowsPerImage}} ?? 0. + - |bytesPerRow| be |layout|.{{GPUTexelCopyBufferLayout/bytesPerRow}} ?? 0. + - |rowsPerImage| be |layout|.{{GPUTexelCopyBufferLayout/rowsPerImage}} ?? 0. Note: These default values have no effect, as they're always multiplied by 0. 1. Let |requiredBytesInCopy| be 0. @@ -499,7 +517,7 @@ are defined by the source type, given by this table:
- The layout fits inside the linear data: - |layout|.{{GPUImageDataLayout/offset}} + |requiredBytesInCopy| ≤ |byteSize|. + |layout|.{{GPUTexelCopyBufferLayout/offset}} + |requiredBytesInCopy| ≤ |byteSize|.
@@ -508,22 +526,22 @@ are defined by the source type, given by this table: **Arguments:** - : {{GPUImageCopyTexture}} |imageCopyTexture| + : {{GPUTexelCopyTextureInfo}} |texelCopyTextureInfo| :: The texture subresource being copied into and copy origin. : {{GPUExtent3D}} |copySize| :: The size of the texture. [=Device timeline=] steps: - 1. Let |blockWidth| be the [=texel block width=] of |imageCopyTexture|.{{GPUImageCopyTexture/texture}}.{{GPUTexture/format}}. - 1. Let |blockHeight| be the [=texel block height=] of |imageCopyTexture|.{{GPUImageCopyTexture/texture}}.{{GPUTexture/format}}. - 1. Let |subresourceSize| be the [=imageCopyTexture physical subresource size=] of |imageCopyTexture|. + 1. Let |blockWidth| be the [=texel block width=] of |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/texture}}.{{GPUTexture/format}}. + 1. Let |blockHeight| be the [=texel block height=] of |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/texture}}.{{GPUTexture/format}}. + 1. Let |subresourceSize| be the [=GPUTexelCopyTextureInfo physical subresource size=] of |texelCopyTextureInfo|. 1. Return whether all the conditions below are satisfied:
- - (|imageCopyTexture|.{{GPUImageCopyTexture/origin}}.[=GPUOrigin3D/x=] + |copySize|.[=GPUExtent3D/width=]) ≤ |subresourceSize|.[=GPUExtent3D/width=] - - (|imageCopyTexture|.{{GPUImageCopyTexture/origin}}.[=GPUOrigin3D/y=] + |copySize|.[=GPUExtent3D/height=]) ≤ |subresourceSize|.[=GPUExtent3D/height=] - - (|imageCopyTexture|.{{GPUImageCopyTexture/origin}}.[=GPUOrigin3D/z=] + |copySize|.[=GPUExtent3D/depthOrArrayLayers=]) ≤ |subresourceSize|.[=GPUExtent3D/depthOrArrayLayers=] + - (|texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/origin}}.[=GPUOrigin3D/x=] + |copySize|.[=GPUExtent3D/width=]) ≤ |subresourceSize|.[=GPUExtent3D/width=] + - (|texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/origin}}.[=GPUOrigin3D/y=] + |copySize|.[=GPUExtent3D/height=]) ≤ |subresourceSize|.[=GPUExtent3D/height=] + - (|texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/origin}}.[=GPUOrigin3D/z=] + |copySize|.[=GPUExtent3D/depthOrArrayLayers=]) ≤ |subresourceSize|.[=GPUExtent3D/depthOrArrayLayers=] - |copySize|.[=GPUExtent3D/width=] must be a multiple of |blockWidth|. - |copySize|.[=GPUExtent3D/height=] must be a multiple of |blockHeight|. @@ -542,17 +560,17 @@ are defined by the source type, given by this table:
- The set of subresources for texture copy(|imageCopyTexture|, |copySize|) - is the subset of subresources of |texture| = |imageCopyTexture|.{{GPUImageCopyTexture/texture}} + The set of subresources for texture copy(|texelCopyTextureInfo|, |copySize|) + is the subset of subresources of |texture| = |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/texture}} for which each subresource |s| satisfies the following: - The [=mipmap level=] of |s| equals - |imageCopyTexture|.{{GPUImageCopyTexture/mipLevel}}. + |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/mipLevel}}. - The [=aspect=] of |s| is in the [=GPUTextureAspect/set of aspects=] of - |imageCopyTexture|.{{GPUImageCopyTexture/aspect}}. + |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/aspect}}. - If |texture|.{{GPUTexture/dimension}} is {{GPUTextureDimension/"2d"}}: - The [=array layer=] of |s| is ≥ - |imageCopyTexture|.{{GPUImageCopyTexture/origin}}.[=GPUOrigin3D/z=] and < - |imageCopyTexture|.{{GPUImageCopyTexture/origin}}.[=GPUOrigin3D/z=] + + |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/origin}}.[=GPUOrigin3D/z=] and < + |texelCopyTextureInfo|.{{GPUTexelCopyTextureInfo/origin}}.[=GPUOrigin3D/z=] + |copySize|.[=GPUExtent3D/depthOrArrayLayers=].
From eea67b5b4d9cb6b5e10261ae59dfc62ca16fe300 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Beaufort?= Date: Sat, 2 Nov 2024 00:45:41 +0100 Subject: [PATCH 242/285] Allow empty bindGroupLayouts in GPUPipelineLayoutDescriptor (#4946) --- spec/index.bs | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 505e91e5e7..72b50c1f4b 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -6517,7 +6517,7 @@ A {{GPUPipelineLayout}} is created via {{GPUDevice/createPipelineLayout()|GPUDev @@ -6527,9 +6527,9 @@ pipeline, and have the following members:
: bindGroupLayouts :: - A list of {{GPUBindGroupLayout}}s the pipeline will use. Each element corresponds to a - [=@group=] attribute in the {{GPUShaderModule}}, with the `N`th element corresponding with - `@group(N)`. + A list of optional {{GPUBindGroupLayout}}s the pipeline will use. Each element corresponds + to a [=@group=] attribute in the {{GPUShaderModule}}, with the `N`th element corresponding + with `@group(N)`.
@@ -6559,14 +6559,17 @@ pipeline, and have the following members: [=Device timeline=] |initialization steps|: 1. Let |limits| be |this|.{{GPUObjectBase/[[device]]}}.{{device/[[limits]]}}. + 1. Let |bindGroupLayouts| be a copy of |descriptor|.{{GPUPipelineLayoutDescriptor/bindGroupLayouts}} + 1. For each |i| in the [=list/get the indices|indices=] of |bindGroupLayouts|: + 1. If |bindGroupLayouts|[|i|] is `undefined` or [=list/empty=], set |bindGroupLayouts|[|i|] to `null`. 1. Let |allEntries| be the result of concatenating |bgl|.{{GPUBindGroupLayout/[[descriptor]]}}.{{GPUBindGroupLayoutDescriptor/entries}} - for all |bgl| in |descriptor|.{{GPUPipelineLayoutDescriptor/bindGroupLayouts}}. + for all non-`null` |bgl| in |bindGroupLayouts|. 1. If any of the following conditions are unsatisfied [$generate a validation error$], [$invalidate$] |pl| and return.
- - Every {{GPUBindGroupLayout}} in |descriptor|.{{GPUPipelineLayoutDescriptor/bindGroupLayouts}} + - Every non-`null` {{GPUBindGroupLayout}} in |bindGroupLayouts| must be [$valid to use with$] |this| and have a {{GPUBindGroupLayout/[[exclusivePipeline]]}} of `null`. - The [=list/size=] of |descriptor|.{{GPUPipelineLayoutDescriptor/bindGroupLayouts}} @@ -6574,8 +6577,7 @@ pipeline, and have the following members: - |allEntries| must not [=exceeds the binding slot limits|exceed the binding slot limits=] of |limits|.
- 1. Set the |pl|.{{GPUPipelineLayout/[[bindGroupLayouts]]}} to - |descriptor|.{{GPUPipelineLayoutDescriptor/bindGroupLayouts}}. + 1. Set the |pl|.{{GPUPipelineLayout/[[bindGroupLayouts]]}} to |bindGroupLayouts|.
@@ -10781,6 +10783,7 @@ It must only be included by interfaces which also include those mixins. - All bind groups used by the pipeline must be set and compatible with the pipeline layout: For each pair of ({{GPUIndex32}} |index|, {{GPUBindGroupLayout}} |bindGroupLayout|) in |pipeline|.{{GPUPipelineBase/[[layout]]}}.{{GPUPipelineLayout/[[bindGroupLayouts]]}}: + - If |bindGroupLayout| is `null`, [=iteration/continue=]. - Let |bindGroup| be |encoder|.{{GPUBindingCommandsMixin/[[bind_groups]]}}[|index|]. - Let |dynamicOffsets| be |encoder|.{{GPUBindingCommandsMixin/[[dynamic_offsets]]}}[|index|]. - |bindGroup| must not be `null`. From b6a764223275356f485506693413d1bbfcb368b3 Mon Sep 17 00:00:00 2001 From: Corentin Wallez Date: Sat, 2 Nov 2024 00:50:37 +0100 Subject: [PATCH 243/285] Add 1-component vertex formats and unorm8x4-bgra (#4951) Fixes #4549 --- spec/index.bs | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) diff --git a/spec/index.bs b/spec/index.bs index 72b50c1f4b..8194e271e4 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -9238,22 +9238,31 @@ shader. @@ -9282,6 +9292,12 @@ enum GPUVertexFormat { Example WGSL type + + "uint8" + unsigned int + 1 + 1 + u32 "uint8x2" unsigned int @@ -9294,6 +9310,12 @@ enum GPUVertexFormat { 4 4 vec4<u32> + + "sint8" + signed int + 1 + 1 + i32 "sint8x2" signed int @@ -9306,6 +9328,12 @@ enum GPUVertexFormat { 4 4 vec4<i32> + + "unorm8" + unsigned normalized + 1 + 1 + f32 "unorm8x2" unsigned normalized @@ -9318,6 +9346,12 @@ enum GPUVertexFormat { 4 4 vec4<f32> + + "snorm8" + signed normalized + 1 + 1 + f32 "snorm8x2" signed normalized @@ -9330,6 +9364,12 @@ enum GPUVertexFormat { 4 4 vec4<f32> + + "uint16" + unsigned int + 1 + 2 + u32 "uint16x2" unsigned int @@ -9342,6 +9382,12 @@ enum GPUVertexFormat { 4 8 vec4<u32> + + "sint16" + signed int + 1 + 2 + i32 "sint16x2" signed int @@ -9354,6 +9400,12 @@ enum GPUVertexFormat { 4 8 vec4<i32> + + "unorm16" + unsigned normalized + 1 + 2 + f32 "unorm16x2" unsigned normalized @@ -9366,6 +9418,12 @@ enum GPUVertexFormat { 4 8 vec4<f32> + + "snorm16" + signed normalized + 1 + 2 + f32 "snorm16x2" signed normalized @@ -9378,6 +9436,12 @@ enum GPUVertexFormat { 4 8 vec4<f32> + + "float16" + float + 1 + 2 + f32 "float16x2" float @@ -9468,6 +9532,12 @@ enum GPUVertexFormat { 4 4 vec4<f32> + + "unorm8x4-bgra" + unsigned normalized + 4 + 4 + vec4<f32> From 075be5e590ac38ac7210373b12e0e8fb996a0048 Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Mon, 4 Nov 2024 10:17:25 -0800 Subject: [PATCH 244/285] Clarify that filtering sampler_comparison is allowed with depth texture (#4950) Issue: #4944, followup to #4945 --- spec/index.bs | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 8194e271e4..c657f23e50 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -134,6 +134,9 @@ spec: WGSL; urlPrefix: https://gpuweb.github.io/gpuweb/wgsl/# text: f16; url: extension-f16 text: clip_distances; url: extension-clip_distances text: dual_source_blending; url: extension-dual_source_blending + for: type + text: sampled texture; url: type-sampled-texture + text: depth texture; url: type-depth-texture type: abstract-op text: SizeOf; url: sizeof spec: Internationalization Glossary; urlPrefix: https://www.w3.org/TR/i18n-glossary/# @@ -7515,10 +7518,11 @@ typedef double GPUPipelineConstantValue; // May represent WGSL's bool, f32, i32, 1. |entryPoint| |must| not be `null`. 1. For each |binding| that is [=statically used=] by |entryPoint|: - [$validating shader binding$](|binding|, |layout|) |must| return `true`. - 1. For each texture and sampler used together in a texture builtin function call in any of the - [=functions in the shader stage=] rooted at |entryPoint|: - 1. Let |texture| be the {{GPUBindGroupLayoutEntry}} corresponding to the sampled texture in the call. - 1. Let |sampler| be the {{GPUBindGroupLayoutEntry}} corresponding to the used sampler in the call. + 1. For each texture builtin function call in any of the [=functions in the shader stage=] rooted at |entryPoint|, + if it uses a |textureBinding| of [=type/sampled texture=] or [=type/depth texture=] type + together with a |samplerBinding| of `sampler` type (excluding `sampler_comparison`): + 1. Let |texture| be the {{GPUBindGroupLayoutEntry}} corresponding to |textureBinding|. + 1. Let |sampler| be the {{GPUBindGroupLayoutEntry}} corresponding to |samplerBinding|. 1. If |sampler|.{{GPUSamplerBindingLayout/type}} is {{GPUSamplerBindingType/"filtering"}}, then |texture|.{{GPUTextureBindingLayout/sampleType}} |must| be {{GPUTextureSampleType/"float"}}. From 63b719d61dbf169e4691381264e763db5ae9536e Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Mon, 4 Nov 2024 10:35:29 -0800 Subject: [PATCH 245/285] Allow `featureLevel: "compatibility"`, which does nothing (#4897) * Briefly document featureLevel * Allow "compatibility" but it does nothing * Define "core" as well, and say "compatibility" shouldn't be used --- spec/index.bs | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index c657f23e50..3ca8c96322 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -2290,7 +2290,8 @@ interface GPU { 1. All of the requirements in the following steps |must| be met.
- 1. |options|.{{GPURequestAdapterOptions/featureLevel}} |must| be `undefined`. + 1. |options|.{{GPURequestAdapterOptions/featureLevel}} |must| be + a [=feature level string=].
If they are met **and** the user agent chooses to return an adapter: @@ -2433,7 +2434,7 @@ configuration is suitable for the application. @@ -2518,6 +2529,17 @@ enum GPUPowerPreference { [=fallback adapter=]. Developers that wish to prevent their applications from running on [=fallback adapters=] should check the {{GPUAdapter}}.{{GPUAdapter/isFallbackAdapter}} attribute prior to requesting a {{GPUDevice}}. + + : xrCompatible + :: + When set to `true` indicates that the best [=adapter=] for rendering to a [=WebXR session=] + must be returned. If the user agent or system does not support [=WebXR sessions=] then + adapter selection may ignore this value. + + Note: + If {{GPURequestAdapterOptions/xrCompatible}} is not set to `true` when the adapter is + requested, {{GPUDevice}}s created from the adapter cannot be used to render for + [=WebXR sessions=].
From a851e93678a32b15aa4da916e71a82896be600cf Mon Sep 17 00:00:00 2001 From: David Neto Date: Tue, 19 Nov 2024 16:08:21 -0500 Subject: [PATCH 255/285] wgsl: bool has size and alignment of 4 bytes (#4974) * wgsl: bool has size and alignment of 4 bytes Fixed: #4972 * Explain why bool is 4 bytes, in a note --- wgsl/index.bs | 66 +++++++++++++++++++++++++++++++-------------------- 1 file changed, 40 insertions(+), 26 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 285f0657ba..a3186dea94 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -899,20 +899,16 @@ shader that goes beyond the specified limits. Maximum combined [=byte-size=] of all [=variables=] instantiated in the [=address spaces/private=] address space that are [=statically accessed=] by a single [=shader=] - - For the purposes of this limit, [=bool=] has a size of 4 bytes. 8192 Maximum combined [=byte-size=] of all [=variables=] instantiated in the [=address spaces/function=] address space that are declared in a single [=function/function=] - - For the purposes of this limit, [=bool=] has a size of 4 bytes. 8192 Maximum combined [=byte-size=] of all [=variables=] instantiated in the [=address spaces/workgroup=] address space that are [=statically accessed=] by a single [=shader=] - For the purposes of this limit, [=bool=] has a size of 4 bytes and a + For the purposes of this limit, a [=fixed footprint|fixed-footprint=] array is treated as a [=creation-fixed footprint=] array when substituting the override value. @@ -10140,7 +10136,7 @@ When writing a [=variable declaration=] or a [=pointer type=] in WGSL source: ## Memory Layout ## {#memory-layouts} The layout of types in WGSL is independent of [=address space=]. -Strictly speaking, however, that layout can only be observed by host-shareable +Strictly speaking, however, that layout can only be observed by [=host-shareable=] buffers. [=Uniform buffer=] and [=storage buffer=] variables are used to share bulk data organized as a sequence of bytes in memory. @@ -10167,13 +10163,16 @@ The memory layout of a type is significant only when evaluating an expression wi An 8-bit byte is the most basic unit of [=host-shareable=] memory. The terms defined in this section express counts of 8-bit bytes. -We will use the following notation: -* AlignOf(|T|) is the [=alignment=] of host-shareable type |T|. -* AlignOfMember(|S|, |i|) is the alignment of the |i|'th member of the host-shareable structure |S|. -* SizeOf(|T|) is the [=byte-size=] of host-shareable type |T|. -* SizeOfMember(|S|, |i|) is the size of the |i|'th member of the host-shareable structure |S|. -* OffsetOfMember(|S|, |i|) is the offset of the |i|'th member from the start of the host-shareable structure |S|. -* StrideOf(|A|) is the element stride of host-shareable array type |A|, defined +We will use the following notation, where +|T| is a [=host-shareable=] or [=fixed footprint=] type, +|S| is a host-shareable or fixed footprint structure type, +and |A| is a host-shareable or fixed footprint array or runtime-sized array: +* AlignOf(|T|) is the [=alignment=] of |T|. +* AlignOfMember(|S|, |i|) is the alignment of the |i|'th member of |S|. +* SizeOf(|T|) is the [=byte-size=] of |T|. +* SizeOfMember(|S|, |i|) is the size of the |i|'th member of |S|. +* OffsetOfMember(|S|, |i|) is the offset of the |i|'th member from the start of |S|. +* StrideOf(|A|) is the element stride of |A|, defined as the number of bytes from the start of one array element to the start of the next element. It equals the size of the array's element type, rounded up to the alignment of the element type:

@@ -10202,18 +10201,22 @@ The size may include non-addressable padding at the end of the type. Consequently, loads and stores of a value might access fewer memory locations than the value's size. -Alignment and size of [=host-shareable=] types are defined recursively in the +Alignment and size of [=host-shareable=] and [=fixed footprint=] types are defined recursively in the following table: - +
- Alignment and size for host-shareable types
+ Alignment and size for host-shareable and fixed footprint types
Host-shareable type |T| +
Host-shareable or fixed footprint type |T| [=AlignOf=](|T|) [=SizeOf=](|T|)
[=bool=] +
See Note. +
4 + 4
[=i32=], [=u32=], or [=f32=] 4 4 @@ -10223,19 +10226,19 @@ following table:
[=atomic type|atomic<|T|>=] 4 4 -
[=vector|vec=]2<|T|>, |T| is [=i32=], [=u32=], or [=f32=] +
[=vector|vec=]2<|T|>, |T| is [=bool=], [=i32=], [=u32=], or [=f32=] 8 8
vec2<f16> 4 4 -
vec3<|T|>, |T| is [=i32=], [=u32=], or [=f32=] +
vec3<|T|>, |T| is [=bool=], [=i32=], [=u32=], or [=f32=] 16 12
vec3<f16> 8 6 -
vec4<|T|>, |T| is [=i32=], [=u32=], or [=f32=] +
vec4<|T|>, |T| is [=bool=], [=i32=], [=u32=], or [=f32=] 16 16
vec4<f16> @@ -10312,6 +10315,13 @@ following table: where NRuntime is the runtime-determined number of elements of |T|
+

+Note: +Many GPUs cannot implement single-byte writes without introducing potential data races. +By specifying that a `bool` value occupies 4 bytes with 4 byte alignment, +implementations can support adjacent boolean values in memory without introducing data races. +
+ ### Structure Member Layout ### {#structure-member-layout} The internal layout of a [=structure=] is computed from the sizes and alignments of its members. @@ -10462,7 +10472,7 @@ For each member index |i| > 1: ### Internal Layout of Values ### {#internal-value-layout} -This section describes how the internals of a value are placed in the byte locations +This section describes how the internals of a [=host-shareable=] value are placed in the byte locations of a buffer, given an assumed placement of the overall value. These layouts depend on the value's type, and the [=attribute/align=] and [=attribute/size=] attributes on structure members. @@ -10474,6 +10484,10 @@ non-negative integer |c|. The data [=behavioral requirement|will=] appear identically regardless of the address space. +Note: The [=bool=] type is not [=host-shareable=]. +WGSL specifies that a [=bool=] value has a size and alignment of 4 bytes, +but does not specify the internal layout of a bool value. + When a value |V| of type [=u32=] or [=i32=] is placed at byte offset |k| of a host-shared buffer, then: * Byte |k| contains bits 0 through 7 of |V| @@ -10547,7 +10561,7 @@ then: The [=address spaces/storage=] and [=address spaces/uniform=] address spaces have different buffer layout constraints which are described in this section. -Note: All [=address spaces=] except [=address spaces/uniform=] have the same +All [=address spaces=] except [=address spaces/uniform=] have the same constraints as the [=address spaces/storage=] address space. All structure and array types directly or indirectly referenced by a variable @@ -10555,20 +10569,20 @@ All structure and array types directly or indirectly referenced by a variable Violations of an address space constraint results in a [=shader-creation error=]. In this section we define RequiredAlignOf(|S|, |C|) as the -byte offset [=alignment=] requirement of values of host-shareable type |S| when +byte offset [=alignment=] requirement of values of host-shareable or fixed-footprint type |S| when used in address space |C|. - - @@ -15855,11 +15855,21 @@ Note: The vec2<f32> case is the same as `unpack2x16float(pack2x16float(e)) For scalar `T`, the result is `t * t * (3.0 - 2.0 * t)`,
- where `t = clamp((x - low) / (high - low), 0.0, 1.0)`. + where `t = clamp((x - edge0) / (edge1 - edge0), 0.0, 1.0)`. + + Qualitatively: + * When `edge0` < `edge1`, the function is 0 for `x` below `edge0`, then smoothly rises + until `x` reaches `edge1`, and remains at 1 afterward. + * When `edge0` > `edge1`, the function is 1 for `x` below `edge1`, then smoothly descends + until `x` reaches `edge0`, and remains at 0 afterward. + + If `edge0` = `edge1`: + * It is a [=shader-creation error=] if `edge0` and `edge1` are [=const-expressions=]. + * It is a [=pipeline-creation error=] if `edge0` and `edge1` are [=override-expressions=]. + * Otherwise, the result is an [=indeterminate value=] for `T`. + In this case the computation performs a floating point [=ieee754/division by zero=], + and the [=Finite Math Assumption=] applies. - If `low >= high`: - * It is a [=shader-creation error=] if `low` and `high` are [=const-expressions=]. - * It is a [=pipeline-creation error=] if `low` and `high` are [=override-expressions=].
- Alignment requirements of a host-shareable type for + Alignment requirements of a host-shareable or fixed footprint type for [=address spaces/storage=] and [=address spaces/uniform=] address spaces
Host-shareable type |S| +
Host-shareable or fixed footprint type |S| [=RequiredAlignOf=](|S|, [=address spaces/storage=]) [=RequiredAlignOf=](|S|, [=address spaces/uniform=])
[=i32=], [=u32=], [=f32=], or [=f16=] +
[=bool=], [=i32=], [=u32=], [=f32=], or [=f16=] [=AlignOf=](|S|) [=AlignOf=](|S|)
[=atomic types|atomic=]<T> @@ -18091,7 +18105,7 @@ fn atomicCompareExchangeWeak(atomic_ptr: ptr, read_write>, cmp : T if comparison { *atomic_ptr = v; } - return _atomic_compare_exchange_result(old, comparison); + return _atomic_compare_exchange_result(old, comparison); } From 9a0e419d08e6fcaf28ab9a355a4ec12f8722a277 Mon Sep 17 00:00:00 2001 From: David Neto Date: Wed, 20 Nov 2024 09:57:17 -0500 Subject: [PATCH 256/285] wgsl: smoothstep is well defined for edge0 > edge1 (#4981) Fixes: #4900 --- wgsl/index.bs | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index a3186dea94..5b3afc983d 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -12643,7 +12643,7 @@ the rules in [[#floating-point-rounding-and-overflow]] apply. Absolute error at most 2-7 when `x` is in the interval [-π, π]
`sinh(x)`Inherited from `(exp(x) - exp(-x)) * 0.5`
`saturate(x)`Correctly rounded -
`smoothstep(low, high, x)`Inherited from `t * t * (3.0 - 2.0 * t)`,
where `t = clamp((x - low) / (high - low), 0.0, 1.0)` +
`smoothstep(edge0, edge1, x)`Inherited from `t * t * (3.0 - 2.0 * t)`,
where `t = clamp((x - edge0) / (edge1 - edge0), 0.0, 1.0)`
`sqrt(x)`Inherited from `1.0 / inverseSqrt(x)`
`step(edge, x)`Correctly rounded
`tan(x)`Inherited from `sin(x) / cos(x)` @@ -15841,8 +15841,8 @@ Note: The vec2<f32> case is the same as `unpack2x16float(pack2x16float(e)) Overload - @const @must_use fn smoothstep(low: T, - high: T, + @const @must_use fn smoothstep(edge0: T, + edge1: T, x: T) -> T
### `sqrt` ### {#sqrt-builtin} From 6738a8a4b4eae1a73b9c8ed980d3c5008037c912 Mon Sep 17 00:00:00 2001 From: David Neto Date: Wed, 20 Nov 2024 11:01:17 -0500 Subject: [PATCH 257/285] Clarify uniqueness of nearest enclosing diagnostic (#4980) Also add a note saying multiple non-conflicting global diagnostic filters are permitted. Issue: #4976 --- wgsl/index.bs | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 5b3afc983d..f68d21a35f 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -862,6 +862,8 @@ Two [=diagnostic filters=] *DF*(*AR1*,*NS1*,*TR1*) and *DF*(*AR2*,*NS2*,*TR2*) < [=Diagnostic filters=] [=shader-creation error|must=] not [=diagnostic/conflict=]. +Note: Multiple [=global diagnostic filters=] are permitted when they do not [=diagnostic/conflict=]. + WGSL's diagnostic filters are designed so their affected ranges nest perfectly. If the affected range of DF1 overlaps with the affected range of DF2, then either DF1's affected range is fully contained in DF2's affected range, or the other way around. @@ -871,7 +873,10 @@ if one exists, is the diagnostic filter *DF(AR,NS,TR)* where: * *L* falls in the affected range *AR*, and * If there is another filter *DF'(AR',NS',TR)* where *L* falls in *AR'*, then *AR* is contained in *AR'*. -Because affected ranges nest, the nearest enclosing diagnostic is unique, or does not exist. +Because affected ranges nest, the nearest enclosing diagnostic: +* is a unique [=range diagnostic filter=], +* otherwise is one of a set of duplicate (non-[=diagnostic/conflict|conflicting=]) [=global diagnostic filters=], +* otherwise does not exist. ## Limits ## {#limits} From 1df9c5b2bd62a95c3141b8ced105c57d22ad64c1 Mon Sep 17 00:00:00 2001 From: Brandon Jones Date: Wed, 20 Nov 2024 09:34:29 -0800 Subject: [PATCH 258/285] Adding clearer notes and examples for dynamic offset ordering (#4982) * Adding clearer notes and examples for dynamic offset ordering * Apply Kai's suggestions from code review * Add a static+dynamic offset example --------- Co-authored-by: Kai Ninomiya --- spec/index.bs | 84 +++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 75 insertions(+), 9 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 5b2a827913..9e2c700dfe 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -1871,7 +1871,7 @@ including the absence of those values. The {{GPUAdapterInfo}} for an adapter is exposed via {{GPUAdapter/info|GPUAdapter.info}} and {{GPUDevice/adapterInfo|GPUDevice.adapterInfo}}). -This info is immutable: +This info is immutable: for a given adapter, each {{GPUAdapterInfo}} attribute will return the same value every time it's accessed. Note: @@ -10742,17 +10742,13 @@ It must only be included by interfaces which also include those mixins. : |dynamicOffsets|, of type [=sequence=]<{{GPUBufferDynamicOffset}}>, non-nullable, defaulting to `[]` :: Array containing buffer offsets in bytes for each entry in - |bindGroup| marked as {{GPUBindGroupLayoutEntry/buffer}}.{{GPUBufferBindingLayout/hasDynamicOffset}}. + |bindGroup| marked as {{GPUBindGroupLayoutEntry/buffer}}.{{GPUBufferBindingLayout/hasDynamicOffset}}, + ordered by {{GPUBindGroupLayoutEntry}}.{{GPUBindGroupLayoutEntry/binding}}. + See [note](#dynamicOffsetOrder) for additional details. **Returns:** {{undefined}} - Note: - |dynamicOffsets|[|i|] is used for the |i|-th dynamic buffer binding in the bind group, - when bindings are ordered by {{GPUBindGroupLayoutEntry}}.{{GPUBindGroupLayoutEntry/binding}}. - Said differently |dynamicOffsets| are in the same order as dynamic buffer binding's - {{GPUBindGroupLayoutEntry}}.{{GPUBindGroupLayoutEntry/binding}}. - [=Content timeline=] steps: 1. Issue the subsequent steps on the [=Device timeline=] of @@ -10819,7 +10815,9 @@ It must only be included by interfaces which also include those mixins. |index|: The index to set the bind group at. |bindGroup|: Bind group to use for subsequent render or compute commands. |dynamicOffsetsData|: Array containing buffer offsets in bytes for each entry in - |bindGroup| marked as {{GPUBindGroupLayoutEntry/buffer}}.{{GPUBufferBindingLayout/hasDynamicOffset}}. + |bindGroup| marked as {{GPUBindGroupLayoutEntry/buffer}}.{{GPUBufferBindingLayout/hasDynamicOffset}}, + ordered by {{GPUBindGroupLayoutEntry}}.{{GPUBindGroupLayoutEntry/binding}}. + See [note](#dynamicOffsetOrder) for additional details. |dynamicOffsetsDataStart|: Offset in elements into |dynamicOffsetsData| where the buffer offset data begins. |dynamicOffsetsDataLength|: Number of buffer offsets to read from |dynamicOffsetsData|. @@ -10845,6 +10843,74 @@ It must only be included by interfaces which also include those mixins.
+
+ Dynamic offset are applied in {{GPUBindGroupLayoutEntry}}.{{GPUBindGroupLayoutEntry/binding}} order. + + This means that if `dynamic bindings` is the list of each {{GPUBindGroupLayoutEntry}} in the {{GPUBindGroupLayout}} + with {{GPUBindGroupLayoutEntry/buffer}}?.{{GPUBufferBindingLayout/hasDynamicOffset}} set to `true`, sorted by + {{GPUBindGroupLayoutEntry}}.{{GPUBindGroupLayoutEntry/binding}}, then `dynamic offset[i]`, as supplied to + [=GPUBindingCommandsMixin/setBindGroup()=], will correspond to `dynamic bindings[i]`. + +
+ For a {{GPUBindGroupLayout}} created with the following call: + +
+            // Note the bindings are listed out-of-order in this array, but it
+            // doesn't matter because they will be sorted by binding index.
+            let layout = gpuDevice.createBindGroupLayout({
+                entries: [{
+                    binding: 1,
+                    buffer: {},
+                }, {
+                    binding: 2,
+                    buffer: { dynamicOffset: true },
+                }, {
+                    binding: 0,
+                    buffer: { dynamicOffset: true },
+                }]
+            });
+        
+ + Used by a {{GPUBindGroup}} created with the following call: + +
+            // Like above, the array order doesn't matter here.
+            // It doesn't even need to match the order used in the layout.
+            let bindGroup = gpuDevice.createBindGroup({
+                layout: layout,
+                entries: [{
+                    binding: 1,
+                    resource: { buffer: bufferA, offset: 256 },
+                }, {
+                    binding: 2,
+                    resource: { buffer: bufferB, offset: 512 },
+                }, {
+                    binding: 0,
+                    resource: { buffer: bufferC },
+                }]
+            });
+        
+ + And bound with the following call: + +
+            pass.setBindGroup(0, bindGroup, [1024, 2048]);
+        
+ + The following buffer offsets will be applied: + + + + +
Binding Buffer Offset +
0 bufferC 1024 (Dynamic) +
1 bufferA 256 (Static) +
2 bufferB 2560 (Static + Dynamic) +
+
+
+ +
To Iterate over each dynamic binding offset in a given {{GPUBindGroup}} |bindGroup| with a given list of |steps| to be executed for each dynamic offset, run the following [=device timeline=] steps: From 70d7c5e616e9a9e744b7930e7e8eb78b88a56483 Mon Sep 17 00:00:00 2001 From: James Price Date: Wed, 20 Nov 2024 13:21:45 -0500 Subject: [PATCH 259/285] Texel buffer proposal (#4912) * WIP: texture buffers * Remove memory model modifications These have landed in the main spec. * Address some review comments * Make it a core feature * Rename to "texel buffer" * Remove formats that are not widely supported * Update open questions with decision on core vs extension * Add restriction for vertex shaders --- proposals/texel-buffers.md | 401 +++++++++++++++++++++++++++++++++++++ 1 file changed, 401 insertions(+) create mode 100644 proposals/texel-buffers.md diff --git a/proposals/texel-buffers.md b/proposals/texel-buffers.md new file mode 100644 index 0000000000..bb6fece866 --- /dev/null +++ b/proposals/texel-buffers.md @@ -0,0 +1,401 @@ +# Texel Buffers + +**Roadmap:** This proposal is **under active development, but has not been standardized for inclusion in the WebGPU specification. The proposal is likely to change before it is standardized.** WebGPU implementations **must not** expose this functionality; doing so is a spec violation. Note however, an implementation might provide an option (e.g. command line flag) to enable a draft implementation, for developers who want to test this proposal. + +Last modified: 2024-10-07 + +Issue: [#162](https://github.com/gpuweb/gpuweb/issues/162) + +# WGSL + + +## Extension Names + +Add `'texel_buffer'` as a new language extension name. + + +## Language Extensions + +[[Add new table entry to *Language-extensions*]] + +| WGSL language extension | Description | +| ----------------------- | ------------------------------------------------------------------------ | +| **texel_buffer** | Allows the use of the `texel_buffer` type and related builtin functions. | + + +## Texel Buffer Types + +[[New subsection of **Texture and Sampler Types**]] + +A **texel buffer** supports accessing texels stored in a 1D buffer using texture load and store functions. + +Unlike other WGSL texture types, the texels of a texel buffer are stored in a `GPUBuffer`, and bound to the pipeline via a `GPUTexelBufferView`. +Additionally, the maximum number of texels in a texel buffer is often much larger than for storage textures. See https://gpuweb.github.io/gpuweb/#supported-limits + +A texel buffer type must be parameterized by one of the [texel formats](https://w3.org/TR/WGSL/#texel-formats) for storage textures. +The texel format determines the conversion function as specified in [Texel Formats](https://w3.org/TR/WGSL/#texel-formats). + +For a `textureStore` operation, the inverse of the conversion function is used to convert the shader value to the stored texel. + +| Type | Description | +| ------------------------------------ | ------------------------------ | +| **texel_buffer**<_Format_, _Access_> | A texel buffer type that accesses buffer data using texture functions. | + +- _Format_ must be an enumerant for one of the texel formats for storage textures +- _Access_ must be `read` or `read_write` + +Writes to texel buffers are visible to the same invocation, and can be synchronized with other invocations from the same workgroup using a `textureBarrier`. + + +## Restrictions on Functions + +Add `texel_buffer` to the list of valid function parameter types. + + +## Texture Built-in Functions + +[[Add new overloads]] + +| Parameterization | Overload | +| ------------------------------ | ------------------------ | +| _AM_ is `read` or `read_write` | `@must_use fn textureDimensions(t : texel_buffer) -> u32` | +| _C_ is `i32` or `u32`
_AM_ is `read` or `read_write`
_CF_ depends on the storage texel format _F_. [See the texel format table](https://w3.org/TR/WGSL/#storage-texel-formats) for the mapping of texel format to channel format. | `@must_use fn textureLoad(t : texel_buffer, coords : C) -> vec4` | +| _C_ is `i32` or `u32`
_CF_ depends on the storage texel format _F_. [See the texel format table](https://w3.org/TR/WGSL/#storage-texel-formats) for the mapping of texel format to channel format. | `@must_use fn textureStore(t : texel_buffer, coords : C, value: vec4)` | + + +# API + + +## Limits + +| Limit name | Type | Limit class | Default | +| ---------------------- | ----------- | ----------- | ------------------------- | +| **maxTexelBufferSize** | `GPUSize64` | maximum | 134217728 bytes (128 MiB) | + + +## Adapter Capability Guarantees + +Add "`maxTexelBufferSize` must be <= `maxBufferSize`". + + +## Resource Usages + +[[Modify description of internal usages]] + +**storage**
+Read/write storage resource binding. Allowed by buffer `STORAGE`, texture `STORAGE_BINDING`, or buffer `TEXEL_BUFFER`. + +**storage-read**
+Read-only storage resource bindings. Preserves the contents. Allowed by buffer `STORAGE`, texture `STORAGE_BINDING`, or buffer `TEXEL_BUFFER`. + + +## Buffer Usages + +[[Add new const to `GPUBufferUsage` namespace]] + +```javascript + const GPUFlagsConstant TEXEL_BUFFER = 0x0400; +``` + + +## GPUTexelBufferView + +[[New subsection of **Textures and Texture Views**]] + +A `GPUTexelBufferView` is a view onto some subset of the buffer subresources defined by a particular `GPUBuffer`. + +```javascript +[Exposed=(Window, Worker), SecureContext] +interface GPUTexelBufferView { +}; +GPUTexelBufferView includes GPUObjectBase; +``` + +`GPUTexelBufferView` has the following immutable properties: + +> **[[buffer]], readonly**
+> The `GPUBuffer` into which this is a view. +> +> **[[descriptor]], readonly**
+> The `GPUTexelBufferViewDescriptor` describing this texel buffer view. +> +> All optional fields of `GPUTexelBufferViewDescriptor` are defined. + + +### Texel Buffer View Creation + +```javascript +dictionary GPUTexelBufferViewDescriptor : GPUObjectDescriptorBase { + GPUTextureFormat format; + GPUSize64 offset = 0; + GPUSize64 size; +}; +``` + +`GPUTexelBufferViewDescriptor` has the following members: + +> **format, of type GPUTextureFormat**
+> The format of the texel buffer view. +> +> **offset, of type GPUSize64, defaulting to 0**
+> The offset, in bytes, from the beginning of the buffer to the range exposed by the texel buffer view. +> +> **size, of type GPUSize64**
+> The size, in bytes, of the texel buffer view. If not provided, specifies the range starting at `offset` and ending at the end of the buffer. + +**createView(descriptor)**
+Creates a `GPUTexelBufferView`. + +> **Called on:** `GPUBuffer` _this_. +> +> **Arguments:** +> +> | Parameter | Type | `Nullable` | `Optional` | Description | +> | ------------ | ------------------------------ | ---------- | ---------- | ---------------- | +> | `descriptor` | `GPUTexelBufferViewDescriptor` | ✘ | ✔ | Description of the `GPUTexelBufferView` to create. | +> +> **Returns:** _view_, of type `GPUTexelBufferView`. +> +> [Content timeline](https://w3.org/TR/WGSL/#content-timeline) steps: +> +> 1. ? Validate +> 2. Let _view_ be ! [create a new WebGPU object](https://w3.org/TR/WGSL/#abstract-opdef-create-a-new-webgpu-object)(_this_, `GPUTexelBufferView`, _descriptor_) +> 3. Issue the _initialization steps_ on the Device timeline of _this_. +> 4. Return _view_. +> +> [Device timeline](https://w3.org/TR/WGSL/#device-timeline) steps: +> +> 1. If any of the following conditions are unsatisfied generate a validation error, invalidate _view_ and return. +> - _this_ is valid to use with _this_.[[device]]. +> - _this_.usage must contain the `TEXEL_BUFFER` bit +> - _descriptor_.`offset` + _descriptor_.`size` must be <= _this_.`size` +> - _descriptor_.`size` must be <= _limits_.`maxTexelBufferSize`. +> - _descriptor_.`size` must be a multiple of the texel size of _descriptor_.`format`. +> - _descriptor_.`offset` must be a multiple of `256`. +> 2. Let _view_ be a new `GPUTexelBufferView` object. +> 3. Set _view_.[[buffer]] to _this_. +> 4. Set _view_.[[descriptor]] to _descriptor_. + + +## Bind Group Layout Creation + +[[Add new field to **GPUBindGroupLayoutEntry**]] + +```javascript +GPUTexelBufferBindingLayout texelBuffer; +``` + +**texelBuffer, of type [GPUTexelBufferBindingLayout]**
+When provided, indicates the binding resource type for this `GPUBindGroupLayoutEntry` is `GPUTexelBufferBindingLayout`. + +[[Add new entry to table of `GPUBindGroupLayoutEntry` members]] + +| Binding member | Resource type | Binding type | Binding usage | +| -------------- | ---------------------- | --------------------------------- | ------------- | +| texelBuffer | `GPUTexelBufferView` | `storage`
`read-only-storage` | storage
storage-read | + +**TODO:** Do these use buffer slots, texture slots, storage texture slots, or a new type of slot? + +[[Add new enum and dictionary]] + +```javascript +enum GPUTexelBufferAccess { + "read-only", + "read-write", +}; + +dictionary GPUTexelBufferBindingLayout { + GPUTexelBufferAccess access = "read-write"; + GPUTextureFormat format; +}; +``` + +`GPUTexelBufferBindingLayout` dictionaries have the following members: + +**access, of type GPUTexelBufferAccess, defaulting to "read-write"**
+Indicates the access mode that will be used for texel buffer views bound to this binding. +**format, of type GPUTextureFormat**
+The required format of texel buffer views bound to this binding. + +[[Add new validation when *entry*.`visibility` includes `VERTEX` ]] + +* If *entry*.`texelBuffer` is provided, *entry*.`texelBuffer`.`access` must be `"read-only"`. + + +## Bind Group Creation + +[[Add new validation rules for `GPUBindGroupEntry` in `createBindGroup`]] + +**texelBuffer** + +- _resource_ is a `GPUTexelBufferView`. +- _resource_ is valid to use with _this_. +- _layoutBinding_.texelBuffer.format is equal to _resource_.format. +- _resource_.[[buffer]].usage includes `TEXEL_BUFFER`. + + +## Default Pipeline Layout + +[[Add new steps for creating default pipeline layout]] + +> If _resource_ is for a texel buffer binding: +> +> - Let _texelBufferLayout_ be a new `GPUTexelBufferBindingLayout`. +> - Set _texelBufferLayout_.format to _resource_’s format. +> - If the access mode is:
+> -> **read**
+> Set _texelBufferLayout_.access to `"read-only"`.
+> -> **read_write**
+> Set _texelBufferLayout_.access to `"read-write"`. +> - Set _entry_.texelBuffer to _texelBufferLayout_. + + +## Bind Groups + +[[Add new aliasing limitations for texel buffers]] + +**Replace:** “writable buffer binding range” with “writable buffer binding range or texel buffer view” + +**Replace:** “of the same buffer” with “of the same buffer or texel buffer view” + + +## Plain color formats + +[[Add new column to format table for `TEXEL_BUFFER`]] + +| Format | `TEXEL_BUFFER` | +| ------------------------- | -------------- | +| **8-bit per component** | | +| `r8unorm` | | +| `r8snorm` | | +| `r8uint` | | +| `r8sint` | | +| `rg8unorm` | | +| `rg8snorm` | | +| `rg8uint` | | +| `rg8sint` | | +| `rgba8unorm` | ✔ | +| `rgba8unorm-srgb` | | +| `rgba8snorm` | | +| `rgba8uint` | ✔ | +| `rgba8sint` | ✔ | +| `bgra8unorm` | | +| `bgra8unorm-srgb` | | +| **16-bit per component** | | +| `r16uint` | | +| `r16sint` | | +| `r16float` | | +| `rg16uint` | | +| `rg16sint` | | +| `rg16float` | | +| `rgba16uint` | ✔ | +| `rgba16sint` | ✔ | +| `rgba16float` | ✔ | +| **32-bit per component** | | +| `r32uint` | ✔ | +| `r32sint` | ✔ | +| `r32float` | ✔ | +| `rg32uint` | | +| `rg32sint` | | +| `rg32float` | | +| `rgba32uint` | ✔ | +| `rgba32sint` | ✔ | +| `rgba32float` | ✔ | +| **mixed component width** | | +| `rgb10a2uint` | | +| `rgb10a2unorm` | | +| `rg11b10ufloat` | | + + +# Appendix A: Implementation details + + +### Vulkan + +In Vulkan, a `read_write` texel buffer would map to a storage texel buffer decorated as `Coherent`, and shader accesses would be performed with the `OpImageRead` and `OpImageWrite` instructions. +A texel buffer with a `read-only` access mode could use a uniform texel buffer instead, which would use `OpImageFetch` instead `OpImageRead`. + +When the `TEXEL_BUFFER` usage flag is set on buffer creation, both of the Vulkan texel buffer bits would be set:
+`VK_BUFFER_USAGE_UNIFORM_TEXEL_BUFFER_BIT | VK_BUFFER_USAGE_STORAGE_TEXEL_BUFFER_BIT` + +The [required image formats](https://registry.khronos.org/vulkan/specs/1.2-extensions/html/vkspec.html#features-required-format-support) for storage texel buffers includes: + +``` +R8G8B8A8_UNORM +R8G8B8A8_UINT +R8G8B8A8_SINT +R16G16B16A16_UINT +R16G16B16A16_SINT +R16G16B16A16_SFLOAT +R32_UINT +R32_SINT +R32_SFLOAT +R32G32_UINT +R32G32_SINT +R32G32_SFLOAT +R32G32B32A32_UINT +R32G32B32A32_SINT +R32G32B32A32_SFLOAT +``` + +For the other formats, [gpuinfo.org](https://vulkan.gpuinfo.org/listbufferformats.php) has information on how widespread support is. +For 1- and 2-channel `R{8,16}_{SINT,UINT,SFLOAT}`, support is currently around 80% for storage texel buffers. + +**TODO:** We should do this query against WebGPU's baseline requirements, as the percentage for devices we actually support may be higher. + +Vulkan has a `maxTexelBufferElements` limit for the maximum size of a texel buffer. +[gpuinfo.org shows that](https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxTexelBufferElements&platform=all) more than 85% of devices support 128MB texel buffers. + + +### Metal + +Metal has a `texel_buffer` type that provides similar functionality, which was introduced in Metal 2.1. +The [Metal Feature Set Tables](https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf) show the supported formats for each access mode. +Unnormalized integer and floating point formats are supported for all access modes, as is `RGBA8Unorm`. +Using a `read-write` access mode also requires support for the Tier 2 `MTLReadWriteTextureTier`. +A `mem_texture` fence would be needed to make texel buffer writes visible within an invocation. + +To get coverage on older Metal versions, it would be possible to polyfill by using a regular device buffer and doing the format conversions inside the shader. +This requires that the storage format is specified inside the shader. + +The maximum texel buffer size is 64MB for the Apple2 GPU family, and 256MB for Apple3 and above. + + +### D3D12 + +In D3D12, a texel buffer can map to an Unordered Access View (UAV) for a buffer with a `DXGI_FORMAT`, and that UAV can be accessed in the shader with 32-bit result types. +See [Typed unordered access view (UAV) loads](https://docs.microsoft.com/en-us/windows/win32/direct3d12/typed-unordered-access-view-loads). +The `RWBuffer` should be prefixed with `globallycoherent`, and the element type needs to be prefixed with `unorm` or `snorm` if a normalized format is being used. + +Format support for typed UAV loads and stores in D3D12 can be checked [here](https://docs.microsoft.com/en-us/windows/win32/direct3ddxgi/hardware-support-for-direct3d-12-0-formats). +The set of required formats includes: + +``` +R8G8B8A8_UNORM +R8G8B8A8_UINT +R8G8B8A8_SINT +R16G16B16A16_UINT +R16G16B16A16_SINT +R16G16B16A16_FLOAT +R8_UINT +R8_SINT +R16_UINT +R16_SINT +R16_FLOAT +R32_UINT +R32_SINT +R32_SFLOAT +R32G32B32A32_UINT +R32G32B32A32_SINT +R32G32B32A32_FLOAT +``` + + +# Open Questions + +1. Should this be an extension, or a core feature? + - To make it core, implementations would need to polyfill for Metal <2.1. We would also need to drop the formats that are not required everywhere (e.g. `R8_UINT`), or make them optional. + - Decision at F2F: + - Make it core. + - Drop the formats that are not widespread (leaving them for a [future texture format tier extension](https://github.com/gpuweb/gpuweb/issues/3837)). + - We do not need to support Metal <2.1 (Metal 2.2 is our minimum requirement now). From bd061d4f39044ddefa22501883887e5b266b5bf0 Mon Sep 17 00:00:00 2001 From: David Neto Date: Thu, 21 Nov 2024 09:45:17 -0500 Subject: [PATCH 260/285] wgsl: @align(n) must divide required-align-of, for all structs (#4978) * wgsl: @align(n) must divide required-align-of, for all structs Reverses: #3756 - The previous phrasing was a note that was "implied" by other rules. It wasn't actually perfectly implied. - Avoids a silly and confusing case with the first element of a struct. - Brings the spec more in line with Naga and WebKit. - More clearly applies the rule for *all* structs, no matter if it is actually instantiated by a variable. * Add missing word Co-authored-by: alan-baker --------- Co-authored-by: alan-baker --- wgsl/index.bs | 67 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 40 insertions(+), 27 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index f68d21a35f..40fb474b83 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -8557,19 +8557,28 @@ path: syntax/align_attr.syntax.bs.include [=shader-creation error|Must=] only be applied to a member of a [=structure=] type. - Note: This attribute influences how a value of the enclosing structure type can appear in memory: - at which byte addresses the structure itself and its component members can appear. - In particular, the rules in [[#memory-layouts]] combine to imply the following constraint: + This attribute influences how a value of the enclosing structure type can appear in memory: + it constrains the byte addresses at which the structure itself and its component members can appear. -

+

If `align(`|n|`)` is applied to a member of |S| - with type |T|, and |S| is the [=store type=] - or contained in the store type for a variable in address space |C|, + with type |T|, + and |S| can be the [=store type=] for a variable in address space |AS|, + where |AS| is not [=address spaces/uniform=], then |n| [=shader-creation error|must=] satisfy: - |n| = |k| × [=RequiredAlignOf=](|T|,|C|) +
+ |n| = |k| × [=RequiredAlignOf=](|T|,|AS|) for some positive integer |k|. +
+
+ +

The rules for alignment and size are mutually recursive. + However, the above constraint is well defined because it depends on the required alignment of a + *nested* type, and types have bounded [=nesting depth=].

+ See [[#memory-layouts]]. + Parameters [=shader-creation error|Must=] be a [=const-expression=] that [=type rules|resolves=] to an [=i32=] or [=u32=].
@@ -10343,37 +10352,37 @@ by [=SizeOfMember=](|S|, |i|) and [=AlignOfMember=](|S|, |i|), respectively. The member sizes and alignments are used to calculate each member's byte offset from the start of the structure, as described in [[#internal-value-layout]]. -

+

[=SizeOfMember=](|S|, |i|) is |k| if the |i|'th member of |S| has attribute [=attribute/size=](|k|). Otherwise, it is [=SizeOf=](|T|) where |T| is the type of the member. -

+
-

+

[=AlignOfMember=](|S|, |i|) is |k| if the |i|'th member of |S| has attribute [=attribute/align=](|k|). Otherwise, it is [=AlignOf=](|T|) where |T| is the type of the member. -

+
If a structure member has the [=attribute/size=] attribute applied, the value [=shader-creation error|must=] be at least as large as the size of the member's type: -

+

[=SizeOfMember=](|S|, |i|) ≥ [=SizeOf=](T)
Where |T| is the type of the |i|'th member of |S|. -

+
The first structure member always has a zero byte offset from the start of the structure: -

+

[=OffsetOfMember=](S, 1) = 0 -

+
Each subsequent member is placed at the lowest offset that satisfies the member type alignment, and which avoids overlap with the previous member. For each member index |i| > 1: -

+

[=OffsetOfMember=](|S|, |i|) = [=roundUp=]([=AlignOfMember=](|S|, |i| ), [=OffsetOfMember=](|S|, |i|-1) + [=SizeOfMember=](|S|, |i|-1))
-

+
@@ -10579,13 +10588,15 @@ used in address space |C|. <table class='data'> <caption> - Alignment requirements of a host-shareable or fixed footprint type for - [=address spaces/storage=] and [=address spaces/uniform=] address spaces + Alignment requirements of host-shareable or fixed footprint types in address space |C| </caption> <thead> - <tr><th>Host-shareable or fixed footprint type |S| - <th>[=RequiredAlignOf=](|S|, [=address spaces/storage=]) - <th>[=RequiredAlignOf=](|S|, [=address spaces/uniform=]) + <tr><th>Host-shareable or fixed footprint type |S|, + assuming |S| can appear in |C| + <th>[=RequiredAlignOf=](|S|, |C|),<br/> + |C| is not [=address spaces/uniform=] + <th>[=RequiredAlignOf=](|S|, |C|),<br/> + |C| is [=address spaces/uniform=] </thead> <tr><td>[=bool=], [=i32=], [=u32=], [=f32=], or [=f16=] <td>[=AlignOf=](|S|) @@ -10607,7 +10618,7 @@ used in address space |C|. <tr algorithm="alignment of an runtime-sized array"> <td>array&lt;T&gt; <td>[=AlignOf=](|S|) - <td>[=roundUp=](16, [=AlignOf=](|S|)) + <td>not applicable <tr algorithm="alignment of a structure"> <td>[=structure|struct=] |S| <td>[=AlignOf=](|S|) @@ -10618,25 +10629,27 @@ Structure members of type |T| [=shader-creation error|must=] have a byte offset from the start of the structure that is a multiple of the [=RequiredAlignOf=](|T|, |C|) for the address space |C|: -<p algorithm="structure member minimum alignment"> +<blockquote algorithm="structure member minimum alignment"> [=OffsetOfMember=](|S|, |i|) = |k| &times; [=RequiredAlignOf=](|T|, C)<br> Where |k| is a non-negative integer and the |i|'th member of structure |S| has type |T| -</p> +</blockquote> Arrays of element type |T| [=shader-creation error|must=] have an [=element stride=] that is a multiple of the [=RequiredAlignOf=](|T|, |C|) for the address space |C|: -<p algorithm="array element minimum alignment"> +<blockquote algorithm="array element minimum alignment"> [=StrideOf=](array<|T|, |N|>) = |k| &times; [=RequiredAlignOf=](|T|, C)<br> [=StrideOf=](array<|T|>) = |k| &times; [=RequiredAlignOf=](|T|, C)<br> Where |k| is a positive integer -</p> +</blockquote> +<!-- Enforcing the rule on @align negates this: Note: [=RequiredAlignOf=](|T|, |C|) does not impose any additional restrictions on the values permitted for an [=attribute/align=] attribute, nor does it affect the rules of [=AlignOf=](|T|). Data is laid out with the rules defined in previous sections and then the resulting layout is validated against the [=RequiredAlignOf=](|T|, |C|) rules. +--> The [=address spaces/uniform=] address space also requires that: * Array elements are aligned to 16 byte boundaries. From de8f070620195942682d959f4dddeae8b04ab1ef Mon Sep 17 00:00:00 2001 From: Greggman <github@greggman.com> Date: Fri, 22 Nov 2024 08:36:57 -0800 Subject: [PATCH 261/285] Compat: Disallow using a depth texture with a non-comparison sampler (#4988) --- proposals/compatibility-mode.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/proposals/compatibility-mode.md b/proposals/compatibility-mode.md index e51725b56f..8c1fe1322a 100644 --- a/proposals/compatibility-mode.md +++ b/proposals/compatibility-mode.md @@ -209,6 +209,13 @@ In addition to the new limits, the existing `maxStorageBuffersPerShaderStage` an **Justification**: OpenGL ES 3.1 allows `MAX_VERTEX_SHADER_STORAGE_BLOCKS` and `MAX_VERTEX_IMAGE_UNIFORMS` to be zero, and there are a significant number of devices in the field with that value. +## 19. Disallow using a depth texture with a non-comparison sampler + +Using a depth texture `texture_depth_2d`, `texture_depth_cube`, `texture_depth_2d_array` with a non-comparison +sampler in a shader will generate a validation error at pipeline creation time. + +**Justification**: OpenGL ES 3.1 says such usage has undefined behavior. + ## Issues Q: OpenGL ES does not have "coarse" and "fine" variants of the derivative instructions (`dFdx()`, `dFdy()`, `fwidth()`). Should WGSL's "fine" derivatives (`dpdxFine()`, `dpdyFine()`, and `fwidthFine()`) be required to deliver high precision results? See [Issue 4325](https://github.com/gpuweb/gpuweb/issues/4325). From 21ad1ef61f77cd90480221e683d1a9836de2a2a3 Mon Sep 17 00:00:00 2001 From: alan-baker <alanbaker@google.com> Date: Fri, 22 Nov 2024 14:10:08 -0500 Subject: [PATCH 262/285] Fix host-shareable types note (#4992) --- wgsl/index.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 40fb474b83..00d3066439 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -3390,9 +3390,9 @@ A type is <dfn noexport>host-shareable</dfn> if it is both [=type/concrete=] and * a [=runtime-sized=] array type, if its element type is host-shareable * a [=structure=] type, if all its members are host-shareable -Note: Restrictions on the types of inter-stage inputs and outputs]] are +Note: Restrictions on the types of inter-stage inputs and outputs are described in [[#stage-inputs-outputs]] and subsequent sections. -Those types are also sized, but the counting is differs. +Those types are also sized, but the counting differs. Note: [[#texture-sampler-types|Textures and samplers]] can also be shared between the host and the GPU, but their contents are opaque. From 0609d948d9239b86e51da02ba404d8a594b52b3f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Beaufort?= <beaufortfrancois@gmail.com> Date: Fri, 22 Nov 2024 22:24:54 +0100 Subject: [PATCH 263/285] Fix styling of adapter immutable properties (#4991) --- spec/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/index.bs b/spec/index.bs index 9e2c700dfe..50d7f1b50c 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -1275,7 +1275,7 @@ improved privacy. It is not required that a [=fallback adapter=] is available on [=adapter=] has the following [=immutable properties=]: -<dl dfn-type=attribute dfn-for=adapter data-timeline=device> +<dl dfn-type=attribute dfn-for=adapter data-timeline=const> : <dfn>\[[features]]</dfn>, of type [=ordered set=]&lt;{{GPUFeatureName}}&gt;, readonly :: The [=features=] which can be used to create devices on this adapter. From e567f2e5486b317d4d67767fd8d591d1e658eff0 Mon Sep 17 00:00:00 2001 From: Kai Ninomiya <kainino@chromium.org> Date: Tue, 26 Nov 2024 10:13:08 -0800 Subject: [PATCH 264/285] Fix attribute validation (#4998) * Switch "validating GPUVertexState" to algorithm style * Fix attribute validation --- spec/index.bs | 51 ++++++++++++++++++++++++--------------------------- 1 file changed, 24 insertions(+), 27 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 50d7f1b50c..dd47603f56 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -9664,13 +9664,12 @@ dictionary GPUVertexAttribute { </dl> <div algorithm data-timeline=device> - <dfn abstract-op>validating GPUVertexBufferLayout</dfn>(device, descriptor, vertexStage) + <dfn abstract-op>validating GPUVertexBufferLayout</dfn>(device, descriptor) **Arguments:** - {{GPUDevice}} |device| - {{GPUVertexBufferLayout}} |descriptor| - - {{GPUProgrammableStage}} |vertexStage| [=Device timeline=] steps: @@ -9693,21 +9692,6 @@ dictionary GPUVertexAttribute { [$GPUVertexFormat/byteSize$](|attrib|.{{GPUVertexAttribute/format}}). - |attrib|.{{GPUVertexAttribute/shaderLocation}} is &lt; |device|.{{GPUObjectBase/[[device]]}}.{{device/[[limits]]}}.{{supported limits/maxVertexAttributes}}. - - Let |entryPoint| be [$get the entry point$]({{GPUShaderStage/VERTEX}}, |vertexStage|). [=Assert=] it is not `null`. - For every vertex attribute |var| [=statically used=] by |entryPoint|, - there is a corresponding |attrib| element of |descriptor|.{{GPUVertexBufferLayout/attributes}} for which - all of the following are true: - - The type |T| of |var| is compatible with |attrib|.{{GPUVertexAttribute/format}}'s [=vertex data type=]: - - <dl class=switch> - : "unorm", "snorm", or "float" - :: |T| must be `f32` or `vecN<f32>`. - : "uint" - :: |T| must be `u32` or `vecN<u32>`. - : "sint" - :: |T| must be `i32` or `vecN<i32>`. - </dl> - - The shader location is |attrib|.{{GPUVertexAttribute/shaderLocation}}. </div> </div> @@ -9722,21 +9706,34 @@ dictionary GPUVertexAttribute { [=Device timeline=] steps: - 1. Return `true`, if and only if, all of the following conditions are satisfied: + 1. Let |entryPoint| be [$get the entry point$]({{GPUShaderStage/VERTEX}}, |descriptor|). + 1. [=Assert=] |entryPoint| is not `null`. + 1. All of the requirements in the following steps |must| be met. <div class=validusage> - - [$validating GPUProgrammableStage$]({{GPUShaderStage/VERTEX}}, |descriptor|, |layout|, |device|) succeeds. - - |descriptor|.{{GPUVertexState/buffers}}.[=list/size=] is &le; + 1. [$validating GPUProgrammableStage$]({{GPUShaderStage/VERTEX}}, |descriptor|, |layout|, |device|) |must| succeed. + 1. |descriptor|.{{GPUVertexState/buffers}}.[=list/size=] |must| be &le; |device|.{{GPUObjectBase/[[device]]}}.{{device/[[limits]]}}.{{supported limits/maxVertexBuffers}}. - - Each |vertexBuffer| layout descriptor in the list |descriptor|.{{GPUVertexState/buffers}} - passes [$validating GPUVertexBufferLayout$](|device|, |vertexBuffer|, |descriptor|) - - The sum of |vertexBuffer|.{{GPUVertexBufferLayout/attributes}}.[=list/size=], + 1. Each |vertexBuffer| layout descriptor in the list |descriptor|.{{GPUVertexState/buffers}} + |must| pass [$validating GPUVertexBufferLayout$](|device|, |vertexBuffer|). + 1. The sum of |vertexBuffer|.{{GPUVertexBufferLayout/attributes}}.[=list/size=], over every |vertexBuffer| in |descriptor|.{{GPUVertexState/buffers}}, - is &le; + |must| be &le; |device|.{{GPUObjectBase/[[device]]}}.{{device/[[limits]]}}.{{supported limits/maxVertexAttributes}}. - - Each |attrib| in the union of all {{GPUVertexAttribute}} - across |descriptor|.{{GPUVertexState/buffers}} has a distinct - |attrib|.{{GPUVertexAttribute/shaderLocation}} value. + 1. For every vertex attribute declaration (at location |location| with type |T|) that is + [=statically used=] by |entryPoint|, there |must| be exactly one pair (|i|, |j|) for which + |descriptor|.{{GPUVertexState/buffers}}[|i|]?.{{GPUVertexBufferLayout/attributes}}[|j|].{{GPUVertexAttribute/shaderLocation}} == |location|. + + Let |attrib| be that {{GPUVertexAttribute}}. + 1. |T| |must| be compatible with |attrib|.{{GPUVertexAttribute/format}}'s [=vertex data type=]: + <dl class=switch> + : "unorm", "snorm", or "float" + :: |T| must be `f32` or `vecN<f32>`. + : "uint" + :: |T| must be `u32` or `vecN<u32>`. + : "sint" + :: |T| must be `i32` or `vecN<i32>`. + </dl> </div> </div> From 42bd9a154658d142707cab9e3df4b6e90bc33d61 Mon Sep 17 00:00:00 2001 From: Greggman <github@greggman.com> Date: Tue, 26 Nov 2024 19:12:22 -0800 Subject: [PATCH 265/285] Change GLSL loop examples to WGSL examples. (#4997) I believe these examples are left over from before WGSL had `for` loops. Now that WGSL has for loops these examples showing a `for` loop expressed with `loop` can directly reference `for` in WGSL. Fixed: #4996 --- wgsl/index.bs | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 00d3066439..6c0c036c09 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -7340,23 +7340,24 @@ until the end of the loop body. The declaration executes each time it is reached, so each new iteration creates a new instance of the [=variable declaration|variable=] or [=value declaration|value=], and re-initializes it. -Note: The loop statement is one of the biggest differences from other shader -languages. +Note: The loop statement is a specialized construct, you probably want the `for` +or `while` statements. The loop statement is one of the biggest differences from +other shader languages. This design directly expresses loop idioms commonly found in compiled code. In particular, placing the loop update statements at the end of the loop body allows them to naturally use values defined in the loop body. -<div class='example glsl' heading='GLSL Loop'> - <xmp> - int a = 2; - for (int i = 0; i < 4; i++) { +<div class='example wgsl function-scope' heading="for loop"> + <xmp highlight=wgsl> + var a: i32 = 2; + for (var i: i32 = 0; i < 4; i++) { a *= 2; }
-
+
var a: i32 = 2; var i: i32 = 0; // <1> @@ -7371,18 +7372,18 @@ allows them to naturally use values defined in the loop body. </div> * <1> The initialization is listed before the loop. -<div class='example glsl' heading='GLSL Loop with continue'> - <xmp> - int a = 2; - int step = 1; - for (int i = 0; i < 4; i += step) { - if (i % 2 == 0) continue; +<div class='example wgsl function-scope' heading="for loop with continue"> + <xmp highlight=wgsl> + var a: i32 = 2; + let step: i32 = 1; + for (var i: i32 = 0; i < 4; i += step) { + if (i % 2 == 0) { continue; } a *= 2; }
-
+
var a: i32 = 2; var i: i32 = 0; @@ -7399,7 +7400,7 @@ allows them to naturally use values defined in the loop body.
-
+
var a: i32 = 2; var i: i32 = 0; From bfd7d1db0050cf13e8a744067a766e6f1d908f4b Mon Sep 17 00:00:00 2001 From: Greggman <github@greggman.com> Date: Wed, 27 Nov 2024 08:33:17 -0800 Subject: [PATCH 266/285] Compat: Limit the number of texture+sampler combinations (#4989) --- proposals/compatibility-mode.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/proposals/compatibility-mode.md b/proposals/compatibility-mode.md index 8c1fe1322a..e4013409ab 100644 --- a/proposals/compatibility-mode.md +++ b/proposals/compatibility-mode.md @@ -216,6 +216,28 @@ sampler in a shader will generate a validation error at pipeline creation time. **Justification**: OpenGL ES 3.1 says such usage has undefined behavior. +## 20. Limit the number of texture+sampler combinations in a stage. + +If the number of texture+sampler combinations used a in single stage in a pipeline exceeds +`min(maxSampledTexturesPerShaderStage, maxSamplersPerShaderStage)` a validation error is generated. + +The validation occurs as follows: + +``` +maxCombinationsPerStage = min(maxSampledTexturesPerShaderStage, maxSamplersPerShaderStage) +for each stage of the pipeline: + sum = 0 + for each texture binding in the pipeline layout which is visible to that stage: + sum += max(1, number of texture sampler combos for that texture binding) + for each external texture binding in the pipeline layout which is visible to that stage: + sum += 1 // for LUT texture + LUT sampler + sum += 3 * max(1, number of external_texture sampler combos) // for Y+U+V + if sum > maxCombinationsPerStage + generate a validation error. +``` + +**Justification**: In OpenGL ES 3.1 does not support more combinations. Sampler units and texture units are bound together. Texture unit X uses sampler unit X. + ## Issues Q: OpenGL ES does not have "coarse" and "fine" variants of the derivative instructions (`dFdx()`, `dFdy()`, `fwidth()`). Should WGSL's "fine" derivatives (`dpdxFine()`, `dpdyFine()`, and `fwidthFine()`) be required to deliver high precision results? See [Issue 4325](https://github.com/gpuweb/gpuweb/issues/4325). From 97a3eada8636f037e76ad3f8c0181480c55a282e Mon Sep 17 00:00:00 2001 From: alan-baker <alanbaker@google.com> Date: Mon, 2 Dec 2024 18:10:18 -0500 Subject: [PATCH 267/285] Update CTS status for subgroups feature (#5002) --- proposals/subgroups.md | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/proposals/subgroups.md b/proposals/subgroups.md index b496c70797..9b846a9537 100644 --- a/proposals/subgroups.md +++ b/proposals/subgroups.md @@ -2,7 +2,7 @@ Status: **Draft** -Last modified: 2024-11-15 +Last modified: 2024-12-02 Issue: [#4306](https://github.com/gpuweb/gpuweb/issues/4306) @@ -264,7 +264,7 @@ D3D12 would have to be proven empricially. # Appendix C: CTS Status -Last updated: 2024-10-16 +Last updated: 2024-12-02 | Built-in value | Validation | Compute | Fragment | | --- | --- | --- | --- | @@ -276,13 +276,13 @@ Last updated: 2024-10-16 | `subgroupElect` | &check; | &cross; | &cross; | | `subgroupAll` | &check; | &check; | &check; | | `subgroupAny` | &check; | &check; | &check; | -| `subgroupBroadcast` | &check; | &check; | &cross; | -| `subgroupBroadcastFirst`<sup>1</sup> | &check; | &cross; | &cross; | -| `subgroupBallot` | &check; | &check; | &cross; | -| `subgroupShuffle` | &check; | &cross; | &cross; | -| `subgroupShuffleXor` | &check; | &cross; | &cross; | -| `subgroupShuffleUp` | &check; | &cross; | &cross; | -| `subgroupShuffleDown` | &check; | &cross; | &cross; | +| `subgroupBroadcast` | &check; | &check; | &check; | +| `subgroupBroadcastFirst` | &check; | &check; | &check; | +| `subgroupBallot` | &check; | &check; | &check; | +| `subgroupShuffle` | &check; | &check; | &check; | +| `subgroupShuffleXor` | &check; | &check; | &check; | +| `subgroupShuffleUp` | &check; | &check; | &check; | +| `subgroupShuffleDown` | &check; | &check; | &check; | | `subgroupAdd` | &check; | &check; | &cross; | | `subgroupExclusiveAdd` | &check; | &check; | &cross; | | `subgroupInclusiveAdd` | &check; | &check; | &cross; | @@ -298,9 +298,13 @@ Last updated: 2024-10-16 | `quadSwapX` | &check; | &check; | &check; | | `quadSwapY` | &check; | &check; | &check; | | `quadSwapDiagonal` | &check; | &check; | &check; | -1. Indirectly tested via other built-in functions. | Diagnostic | Validation | | --- | --- | -| `subgroup_uniformity` | &cross; | -| `subgroup_branching` | &cross; | +| `subgroup_uniformity` | &check; | + +| Uniformity analysis | Validation | +| --- | --- | +| `subgroup_size` uniform in compute | &check; | +| Built-in functions require uniformity | &check; | +| Shuffle delta/mask params require uniformity | &check; | From d248f85c9fd459c76e7a6d7524206abe25f9a954 Mon Sep 17 00:00:00 2001 From: Stephen White <senorblanco@chromium.org> Date: Wed, 4 Dec 2024 10:16:44 -0500 Subject: [PATCH 268/285] Compat mode: make f16 and f32 rendering optional. (#4983) Introduce float16-renderable and float32-renderable as optional features in Compatibility mode, and required in Core. --- proposals/compatibility-mode.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/proposals/compatibility-mode.md b/proposals/compatibility-mode.md index e4013409ab..3be5832859 100644 --- a/proposals/compatibility-mode.md +++ b/proposals/compatibility-mode.md @@ -238,6 +238,18 @@ for each stage of the pipeline: **Justification**: In OpenGL ES 3.1 does not support more combinations. Sampler units and texture units are bound together. Texture unit X uses sampler unit X. +## 21. Introduce new `float16-renderable` and `float32-renderable` features. + +When supported, `float16-renderable` allows the `RENDER_ATTACHMENT` usage on textures with format `"r16float"`, `"rg16float"`, and `"rgba16float"`. + +When supported, `float32-renderable` allows the `RENDER_ATTACHMENT` usage on textures with format `"r32float"`, `"rg32float"`, and `"rgba32float"`. + +Without support, an error will occur at texture creation time as described in section 6.1.3. + +Support for both features is mandatory in core WebGPU. + +**Justification**: OpenGL ES 3.1 does not require the relevant f16- or f32-based texture formats (`R16F`, `RG16F`, `RGBA16F`, `R32F`, `RG32F`, and `RGBA32F`) to be color-renderable. While there exist OpenGL ES extensions to enable renderability (`GL_EXT_COLOR_BUFFER_HALF_FLOAT` and `GL_EXT_COLOR_BUFFER_FLOAT`), there are a significant number of devices which lack support for these extensions. + ## Issues Q: OpenGL ES does not have "coarse" and "fine" variants of the derivative instructions (`dFdx()`, `dFdy()`, `fwidth()`). Should WGSL's "fine" derivatives (`dpdxFine()`, `dpdyFine()`, and `fwidthFine()`) be required to deliver high precision results? See [Issue 4325](https://github.com/gpuweb/gpuweb/issues/4325). From b36b7cbe4b69ac476bf16cd4c34171e1d3ec57f0 Mon Sep 17 00:00:00 2001 From: Stephen White <senorblanco@chromium.org> Date: Fri, 6 Dec 2024 10:25:43 -0500 Subject: [PATCH 269/285] Add new fragment storage buffer, texture limits. (#5010) Add new limits for storage buffers and textures in the fragment stage. --- proposals/compatibility-mode.md | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/proposals/compatibility-mode.md b/proposals/compatibility-mode.md index 3be5832859..e71d17fe31 100644 --- a/proposals/compatibility-mode.md +++ b/proposals/compatibility-mode.md @@ -209,14 +209,26 @@ In addition to the new limits, the existing `maxStorageBuffersPerShaderStage` an **Justification**: OpenGL ES 3.1 allows `MAX_VERTEX_SHADER_STORAGE_BLOCKS` and `MAX_VERTEX_IMAGE_UNIFORMS` to be zero, and there are a significant number of devices in the field with that value. -## 19. Disallow using a depth texture with a non-comparison sampler +## 19. Introduce new `maxStorageBuffersInFragmentStage` and `maxStorageTexturesInFragmentStage` limits. + +If the number of shader variables of type `storage_buffer` in a fragment shader exceeds the `maxStorageBuffersInFragmentStage` limit, a validation error will occur at pipeline creation time. + +If the number of shader variables of type `texture_storage_1d`, `texture_storage_2d`, `texture_storage_2d_array` and `texture_storage_3d` in a fragment shader exceeds the `maxStorageTexturesInFragmentStage` limit, a validation error will occur at pipeline creation time. + +In Compatibility mode, these new limits will have a default of zero. In Core mode, they will default to the maximum value of a GPUSize32. + +In addition to the new limits, the existing `maxStorageBuffersPerShaderStage` and `maxStorageTexturesPerShaderStage` limits continue to apply to all stages. E.g., the effective storage buffer limit in the fragment stage is `min(maxStorageBuffersPerShaderStage, maxStorageBuffersInFragmentStage)`. + +**Justification**: OpenGL ES 3.1 allows `MAX_FRAGMENT_SHADER_STORAGE_BLOCKS` and `MAX_FRAGMENT_IMAGE_UNIFORMS` to be zero, and there are a significant number of devices in the field with that value. + +## 20. Disallow using a depth texture with a non-comparison sampler Using a depth texture `texture_depth_2d`, `texture_depth_cube`, `texture_depth_2d_array` with a non-comparison sampler in a shader will generate a validation error at pipeline creation time. **Justification**: OpenGL ES 3.1 says such usage has undefined behavior. -## 20. Limit the number of texture+sampler combinations in a stage. +## 21. Limit the number of texture+sampler combinations in a stage. If the number of texture+sampler combinations used a in single stage in a pipeline exceeds `min(maxSampledTexturesPerShaderStage, maxSamplersPerShaderStage)` a validation error is generated. @@ -238,7 +250,7 @@ for each stage of the pipeline: **Justification**: In OpenGL ES 3.1 does not support more combinations. Sampler units and texture units are bound together. Texture unit X uses sampler unit X. -## 21. Introduce new `float16-renderable` and `float32-renderable` features. +## 22. Introduce new `float16-renderable` and `float32-renderable` features. When supported, `float16-renderable` allows the `RENDER_ATTACHMENT` usage on textures with format `"r16float"`, `"rg16float"`, and `"rgba16float"`. From 73a75f549ded049ead52015e249a4f85aa2635bc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Daoust?= <fd@tidoust.net> Date: Tue, 10 Dec 2024 20:42:20 +0100 Subject: [PATCH 270/285] Fix typos in refs to adapter capability guarantees section (#5015) --- spec/index.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index dd47603f56..b9a06c26f8 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -1455,7 +1455,7 @@ A <dfn dfn>feature</dfn> is a set of optional WebGPU functionality that is not s on all implementations, typically due to hardware or system software constraints. All [=features=] are optional, but [=adapters=] make some guarantees about their availability -(see [[#adapter capability guarantees]]. +(see [[#adapter-capability-guarantees]]). A [=device=] supports the exact set of features determined at creation (see [[#optional-capabilities]]). API calls perform validation according to these features (not the [=adapter=]'s features): @@ -1493,7 +1493,7 @@ generally only request better limits if they may actually require them. Each limit has a <dfn dfn for=limit>default</dfn> value. [=Adapters=] are always guaranteed to support the defaults or [=limit/better=] -(see [[#adapter capability guarantees]]. +(see [[#adapter-capability-guarantees]]). A [=device=] supports the exact set of limits determined at creation (see [[#optional-capabilities]]). API calls perform validation according to these limits (not the [=adapter=]'s limits), From 327948dbf8b08be99bf77c519185ca3ddba1125b Mon Sep 17 00:00:00 2001 From: alan-baker <alanbaker@google.com> Date: Thu, 12 Dec 2024 11:32:06 -0500 Subject: [PATCH 271/285] Update CTS status (#5019) --- proposals/subgroups.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/proposals/subgroups.md b/proposals/subgroups.md index 9b846a9537..4006e2fda5 100644 --- a/proposals/subgroups.md +++ b/proposals/subgroups.md @@ -2,7 +2,7 @@ Status: **Draft** -Last modified: 2024-12-02 +Last modified: 2024-12-12 Issue: [#4306](https://github.com/gpuweb/gpuweb/issues/4306) @@ -264,7 +264,7 @@ D3D12 would have to be proven empricially. # Appendix C: CTS Status -Last updated: 2024-12-02 +Last updated: 2024-12-12 | Built-in value | Validation | Compute | Fragment | | --- | --- | --- | --- | @@ -273,7 +273,7 @@ Last updated: 2024-12-02 | Built-in function | Validation | Compute | Fragment | | --- | --- | --- | --- | -| `subgroupElect` | &check; | &cross; | &cross; | +| `subgroupElect` | &check; | &check; | &check; | | `subgroupAll` | &check; | &check; | &check; | | `subgroupAny` | &check; | &check; | &check; | | `subgroupBroadcast` | &check; | &check; | &check; | @@ -292,8 +292,8 @@ Last updated: 2024-12-02 | `subgroupAnd` | &check; | &check; | &check; | | `subgroupOr` | &check; | &check; | &check; | | `subgroupXor` | &check; | &check; | &check; | -| `subgroupMin` | &check; | &cross; | &cross; | -| `subgroupMax` | &check; | &cross; | &cross; | +| `subgroupMin` | &check; | &check; | &check; | +| `subgroupMax` | &check; | &check; | &check; | | `quadBroadcast` | &check; | &check; | &check; | | `quadSwapX` | &check; | &check; | &check; | | `quadSwapY` | &check; | &check; | &check; | From a87657b2115925db17c43486d1fb37856b56c2cc Mon Sep 17 00:00:00 2001 From: Mehmet Oguz Derin <mehmetoguzderin@mehmetoguzderin.com> Date: Sat, 14 Dec 2024 15:33:52 +0900 Subject: [PATCH 272/285] Add attributes and settings for consistent line endings (#5021) * Add attributes * Normalize line endings * Trim trailing blankspaces * Make endings consistent --- .gitattributes | 1 + .github/workflows/build-validate-publish.yml | 4 +- .github/workflows/preview-pull-request.yml | 2 +- .vscode/settings.json | 3 + correspondence/img/.gitignore | 1 + design/AdapterIdentifiers.md | 444 +++++++++---------- design/RejectedErrorHandling.md | 2 +- explainer/index.bs | 2 - proposals/compatibility-mode.md | 4 +- tools/copy-if-different.sh | 1 - tools/wgsl-meeting-helper | 1 - wgsl/Makefile | 1 - wgsl/diagrams/uniformity_1.mmd | 1 - wgsl/media-type-registration.txt | 6 +- wgsl/tools/TSPath.py | 4 +- wgsl/tools/analyze/Grammar.py | 1 - wgsl/tools/analyze/ObjectRegistry.py | 2 +- wgsl/tools/analyze/test.py | 10 +- 18 files changed, 243 insertions(+), 247 deletions(-) create mode 100644 .gitattributes diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000000..6313b56c57 --- /dev/null +++ b/.gitattributes @@ -0,0 +1 @@ +* text=auto eol=lf diff --git a/.github/workflows/build-validate-publish.yml b/.github/workflows/build-validate-publish.yml index be00c85116..88862bb420 100644 --- a/.github/workflows/build-validate-publish.yml +++ b/.github/workflows/build-validate-publish.yml @@ -10,13 +10,13 @@ name: build-validate-publish on: pull_request: paths-ignore: [ "tools/custom-action/Dockerfile" ] - + push: branches: [main] paths-ignore: - "tools/custom-action/Dockerfile" - "tools/custom-action/entrypoint.sh" - + # Allows admins to trigger the workflow manually from GitHub's UI. workflow_dispatch: diff --git a/.github/workflows/preview-pull-request.yml b/.github/workflows/preview-pull-request.yml index 335615f402..3d38e3a197 100644 --- a/.github/workflows/preview-pull-request.yml +++ b/.github/workflows/preview-pull-request.yml @@ -53,7 +53,7 @@ jobs: uses: ./tools/custom-action/ with: check-repo-clean: 'OFF' - + # Adjusts Bikeshed specs - name: Adjust Bikeshed if: ${{ github.event.workflow_run.event == 'pull_request' && env.PR }} diff --git a/.vscode/settings.json b/.vscode/settings.json index a743ef65b5..4757593b6f 100644 --- a/.vscode/settings.json +++ b/.vscode/settings.json @@ -9,6 +9,9 @@ "files.associations": { "*.bs.include": "bikeshed", }, + "files.eol": "\n", + "files.trimTrailingWhitespace": true, + "files.insertFinalNewline": true, "[bikeshed]": { "editor.detectIndentation": false, "editor.indentSize": 4, diff --git a/correspondence/img/.gitignore b/correspondence/img/.gitignore index e69de29bb2..8b13789179 100644 --- a/correspondence/img/.gitignore +++ b/correspondence/img/.gitignore @@ -0,0 +1 @@ + diff --git a/design/AdapterIdentifiers.md b/design/AdapterIdentifiers.md index 6a3360100b..dd913c6711 100644 --- a/design/AdapterIdentifiers.md +++ b/design/AdapterIdentifiers.md @@ -1,222 +1,222 @@ -# WebGPU Adapter Identifiers - -**This document is outdated. `adapter.requestAdapterInfo()` has been replaced with -`adapter.info` and `unmaskHints` doesn't exist anymore. See: -[#4536](https://github.com/gpuweb/gpuweb/issues/4536), -[#4316](https://github.com/gpuweb/gpuweb/pull/4316). - -## Introduction - -The WebGL extension [WEBGL_debug_renderer_info](https://www.khronos.org/registry/webgl/extensions/WEBGL_debug_renderer_info/) reports identifying information about a device's graphics driver for the purposes of debugging or detection and avoidance of bugs or performance pitfalls on a particular driver or piece of hardware. - -These identifiers have proven to be a valuable tool for developers over the years (See [Appendix B: Motivating real-world use cases](#Appendix-B-Motivating-real-world-use-cases)), but have also been observed to be frequently used as a source of high-entropy fingerprinting data. Additionally, the format that WebGL returns the identifiers in (a string of undefined structure) is difficult to work with, akin to the user agent string. - -For WebGPU we need mechanisms which report similar data about the GPU hardware (called an "adapter" in WebGPU) to enable legitimate development use cases, such as driver bug workarounds, while minimizing the amount of fingerprintable data that is exposed without user consent. This document will refer to that data as "adapter identifiers". - -## Use Cases: - -### Bug workarounds -A WebGPU developer wants to ensure that their content works on all devices, but is aware of a bug on a specific family of GPUs that causes corrupted rendering. Using a minimal subset of adapter identifiers they can identify when a user's GPUs is part of a group which includes the known-buggy hardware and switch to a slower code path that doesn't provoke the issue. - -### Filing issue reports -A WebGPU developer has included a "Report an issue" button on their page. Normally they have found that they need very little adapter information to operate, but when users experience a problem they want to gather as much data as they can about the problem. On the report filing page they include UI to allow the user to include their GPU information in the report, which when checked causes the browser to confirm that they want to let the page know their full adapter details. - -### Performance optimization -A WebGPU developer wants all users to experience good performance on their page, but has developed some effects that are not practical on mobile GPUs. They check the adapter identifiers on page load to get a broad idea of what family of GPU the user has to start them off with a reasonable set of defaults. On a settings page, however, they can include a button which detects the best settings for their device. Clicking it may prompt the user for consent to see more detailed GPU information so that ideal settings for their device can be selected. - -### WebGPU developer community assets -A common and useful asset for developers is sites such as https://gpuinfo.org/, which visualize your current devices capabilities in an easy to read format and (with user's consent) can collect information about GPU capabilities to report in aggregate to other developers, giving them a sense of how widespread various capabilities are. Offering a way for users to opt-in to contributing to such a database is desirable. - -## Goals: - - Offer a mechanism to report GPU adapter identifiers in a scalable way. - - Allow for reporting no information (tracking prevention modes, privacy-oriented UAs). - - Enable UAs to decide for themselves how much information to expose by default. - - Allow developers to have some input on how much information they need, especially with respect to triggering user prompts. - - Any such feature needs to be invokable late in the device lifetime, to allow for cases like filing bug reports. - - Developers need to know when a call may cause a user prompt to be shown so that they can avoid that path if desired. - - Offer control over how much data is exposed to embedded content/iframes. - - Minimize string parsing for accuracy and developer convenience. - -## API usage - -> Full details of how to use WebGPU will not be covered here. Please refer to the [WebGPU explainer](https://gpuweb.github.io/gpuweb/explainer/) or [WebGPU spec](https://gpuweb.github.io/gpuweb) for further information. - -### Masked adapter identifiers - -The first step when using WebGPU is to query a `GPUAdapter`, of which there may be several available to the system. This will typically correspond to a physical or software emulated GPU. - -```js -const gpuAdapter = await navigator.gpu.requestAdapter(); -``` - -WebGPU applications often require a significant amount of resource initialization at startup, and it's possible that the resources being initialized may need to be altered depending on the adapter in use. For example: Shader sources may need to be re-written to avoid a known bug or lower detail meshes and textures may need to be fetched to avoid overtaxing a slower device. In these cases some amount of adapter identifiers need to be queried very early in the application's lifetime, and preferably without invoking a user consent prompt. (Nobody likes to be asked for permission immediately on navigation, at which point they likely have little to know context for why the permission is needed.) - -In this case, the developer would call the `requestAdapterInfo()` method of the `GPUAdapter`, which returns a `GPUAdapterInfo` interface containing several potential identifiers for the adapter, and may contain values similar to the following: - -```js -const adapterInfo = await gpuAdapter.requestAdapterInfo(); -console.log(adapterInfo); - -// Output: -{ - vendor: 'nvidia', - architecture: 'turing', - device: '', - description: '' -} -``` - -Note that some values of the interface are the empty string, because the UA deemed that they were too high-entropy to return without explicit user consent. If the UA wished, it would have the ability to return empty string for all values. This would be most commonly expected in "enhanced privacy" modes like [Edge's strict tracking prevention](https://support.microsoft.com/en-us/microsoft-edge/learn-about-tracking-prevention-in-microsoft-edge-5ac125e8-9b90-8d59-fa2c-7f2e9a44d869) or [Firefox's Enhanced Tracking Protection](https://support.mozilla.org/en-US/kb/enhanced-tracking-protection-firefox-desktop). Ideally returning little to no identifiers is common enough that user agents that wish to expose very little information by default can do so without severe compatibility concerns. - -The information that _is_ returned should be helpful in identifying broad buckets of adapters with similar capabilities and performance characteristics. For example, Nvidia's "Turing" architecture [covers a range of nearly 40 different GPUs](https://en.wikipedia.org/wiki/Turing_(microarchitecture)#Products_using_Turing) across a wide range of prices and form factors. Identifing the adapter as an Turing device is enough to allow developers to activate broad workarounds aimed at that family of hardware and make some assumptions about baseline performance, but is also broad enough to not give away too much identifiable information about the user. - -Additionally, in some cases the UA may find it beneficial to return a value that is not the most accurate one that could be reported but still gives developers a reasonable reference point with a lower amount of entropy. - -Finally, it may not always be possible or practical to detemine a value for some fields (like a GPU's architecture) and in those cases returning empty string is acceptible even if the user agent would have considered the information low-entropy. - -### Unmasked adapter identifiers - -At some point during the lifetime of the application the developer may determine that they need more information about the user's specific adapter. A common scenario would be filing a bug report. The developer will be able to best respond to the user's issue if they know exactly what device is being used. In this case, they can request an "unmasked" version any fields of the `GPUAdapterInfo`: - -```js -feedbackButton.addEventListener('click', async ()=> { - const unmaskHints = ['architecture', 'device', 'description']; - const unmaskedAdapterInfo = await gpuAdapter.requestAdapterInfo(unmaskHints); - generateUserFeedback(unmaskedAdapterInfo); -}); -``` - -The resolved value is the adapter's `GPUAdapterInfo` with any fields specified by `unmaskHints` that were previously omitted or reported with a less accurate value now populated with the most accurate information the UA will deliver. For example: - -```js -console.log(unmaskedAdapterInfo); - -// Output: -{ - vendor: 'nvidia', - architecture: 'turing', - device: '0x8644', - description: 'NVIDIA GeForce GTX 1660 SUPER' -} -``` - -Because the unmasked values may contain higher entropy identifying information, the bar for querying it is quite a bit higher. Calling `requestAdapterInfo()` with any `unmaskHints` requires user activation, and will reject the promise otherwise. If the `unmaskHints` array contains any previously masked value it also requires that user consent be given before returning, and as such may display a prompt to the user asking if the page can access the newly requested GPU details before allowing the promise to resolve. If the user declines to give consent then the promise is rejected. - -Once the user has given their consent any future calls to `requestAdapterInfo()` should return the unmasked fields even if no `unmaskHints` are specified, and future instances of the same underlying adapter returned from `navigator.gpu.requestAdapter()` on that page load should also return unmasked data without requiring hints to be passed. - -Even after `unmaskHints` have been passed to `requestAdapterInfo()` the UA is still allowed to return empty string for attributes requested in the `unmaskHints` array if the UA cannot determine the value in question or decides not to reveal it. (UAs should not request user consent when unmasking is requested for attributes that will be left empty.) - -### Identifier formatting - -To minimize developer work and reduce the chances of fingerprinting via casing differences between platforms, and string values reported as part of the `GPUAdapterInfo` conform to strict formatting rules. They must be lowercase ASCII strings containing no spaces, with separate words concatenated with a hyphen ("-") character. - -The exception to this is `description`, which may be a string reported directly from the driver without modification. As a result, however, `description` should always be omitted from masked adapters. Additionally, enough information should be offered via other fields that developers don't feel the need to attempt parsing the `description` string. - -User agents should also make an effort to normalize the strings returned, ideally through a public registery. This especially applies to fields like `vendor` which are presumed to have a relatively low number of possible values. - -Some values, such as `architecture`, are unlikely to be directly provided by the driver. As such, User Agents are expected to make a best-effort at identifying and reporting common architectures, and report empty string otherwise. - -### Iframe controls - -In addition to using the above mechanisms to hit a balance between offering developers useful information and mitigating fingerprinting concerns, [Permissions Policy](https://w3c.github.io/webappsec-permissions-policy/) should be used to control whether or not WebGPU features are exposed to iframes. - -The recommended feature identifier is `"webgpu"`, and the [default allowlist](https://w3c.github.io/webappsec-permissions-policy/#default-allowlist) for this feature would be `["self"]`. This allows documents from the top level browsing context use the feature by default, but requires documents included in iframes to be explicitly granted permission from the top level context in order to use WebGPU, like so: - -```html -<iframe src="https://example.com/embed" allow="webgpu"></iframe> -``` - -If the `"webgpu"` feature is not granted to a page, all calls that page makes to `navigator.gpu.requestAdapter()` will resolve to `null`. - -This helps strike a balance between enabling powerful rendering and computation capabilities on the web and a desire to mitigate abuse by bad actors. - -## Proposed IDL - -```webidl -partial interface GPUAdapter { - Promise<GPUAdapterInfo> requestAdapterInfo(optional sequence<DOMString> unmaskHints = []); -}; - -interface GPUAdapterInfo { - DOMString vendor; - DOMString architecture; - DOMString device; - DOMString description; -}; -``` - -## Appendix A: Alternatives considered - -### A single identifier string -Previously the WebGPU spec had a single string identifier, `GPUAdapter.name`, which would have reported a string very similar to the values reported by `WEBGL_debug_renderer_info`. [Concerns were raised about this approach](https://github.com/gpuweb/gpuweb/issues/2191), and the group generally agreed that we wanted something with finer grained control over the values reported and that was less problematic to parse for developers. - -### Force reliance on feature detection -It was suggested that, similar to other web platform features, no identifiers should be exposed at all and instead developers should rely on feature tests to determine if they need to take a different code path. Unfortunately this is impractical for GPU APIs such as WebGPU or WebGL. There have been multiple documented bugs in the past that are not trivially detectable, such as bugs which are only provoked under high memory usage situations or which only occur intermittently over long time periods. In addition, reading back information from the GPU in order to detect certain classes of issues is not trivial, and in some cases may actually change the driver's behavior. - -This means that realtime bug detection can be extremely constly, and may incur performance penalties or add significantly to startup time. As such it is not desirable or practical to ask developers to try and provoke any known driver issues on application startup. - -### Rely on the UA, etc. to fix bugs -It was also suggested that developers should generally not be the ones shouldering the burden of detecting and working around driver or hardware issues, and instead that responsibility should lie with the hardware manufacturer, OS, or User Agent. In general we agree with this sentiment! User agents, in particular, have a history of implementing workarounds for issues observed on a specific OS, GPU, or driver, as well as working with the appropriate parties to ensure that the problems are fixed upstream. (For example, you can see the [list of bugs that Chromium works around currently here](https://source.chromium.org/chromium/chromium/src/+/main:gpu/config/gpu_driver_bug_list.json). All modern browsers have some variation of this type of workaround list.) This is work we expect to continue in perpetuity. - -However, we have also observed that developers cannot rely on platform owners alone to resolve issues. For one, no matter how quickly a user agent or hardware manufacturer responds to bug reports there will always be some period of development, testing, and deployment before developers can rely on the fix, and even then they will likely have to contend with users on older software versions for a long time. This effect is exaggerated when considering that in some cases user agents only release new updates on a yearly cadence. - -In some other cases, the issue may not be one of correctness, but of performance. If a certain technique is performed by the GPU in a conformant manner but performs poorly compared to other devices it is generally not the User Agent's place to intervene. An individual developer, however, can make quality vs. performance tradeoffs that are appropriate for their application as long as they are given sufficent information to know when the tradeoff in necessary. - -### Inference from other signals -There are some other properties, such as a `GPUAdapter`'s limits and available features, that could be used in some cases to infer what kind of device a developer is using. Additionally, developers could use other platform signals (user agent string, screen resolution, etc) to infer that they are on a known device which has a certain class of GPU. (For example, a specific generation iPhone.) The concern with this approach is that it encourages developers to collect _more_ identifiable user information for a less reliable result. - -In practical terms it's likely that not providing adapter identifiers via WebGPU will simply encourage developers to initialize and tear down a WebGL context prior to initializing WebGPU simply to get the `WEBGL_debug_renderer_info` strings, which may return info from the incorrect adapter and is not a pattern we want to encourage. - -## Appendix B: Motivating real-world use cases - -These are some known use cases for GPU identifiers that we have heard of in the past. These refer to WebGL applications specifically, but we have every reason to expect that they will be applicable to WebGPU as well. - -### Developer feedback on WEBGL\_debug\_renderer\_info: -Ken Russell (@kenrussell) collected quotes from various WebGL developers and reported them to the WebGL Working Group in 2019. - -The following are some quoted reasons why various pages use `WEBGL_debug_renderer_info`: - -**Unity** - - Using exact GPU info+device+OS+browser to ... identify weak fillrate systems for whether to use "SD" or "HD" rendering - -**Uber** - - Use this feature to activate nVidia/Intel specific GLSL workarounds. - - Print the driver in the console when we create contexts, so that when remote operators (e.g. in Asia/Australia) report problems we can ... unblock them with minimal effort. - -**[Sketchfab](https://sketchfab.com)** - - Report user GPU in our automatic error reporting tools. When we need to reproduce shader bugs it's invaluable. - - Warn users when they are switched to software webgl acceleration. "Otherwise users might think the Sketchfab render is very slow, using their laptop batteries, and pushing laptop fan to the max where just restarting/reloading chrome fixes it." - -**[Scirra](https://www.construct.net/en)** - - identifying GPUs affected by driver bugs, and working around it - - analytics on the unmasked renderer to identify the impact of such bugs and help us decide how to respond - - identifying which GPU is really in use on dual-GPU systems - - displaying it to the user as a diagnostic (also for them to identify which GPU is in use)." - -**[Figma](https://www.figma.com/)** - - Rely on this feature to be able to track down and detect obscure GPU issues with users that have old unreliable hardware. - - "Without this information, we would have been unable to debug and fix these WebGL implementation bugs that we've been encountering." - - Use this information to enable workarounds for WebGL implementation bugs. "The workarounds are not enabled by default because they are slower, and in some cases actually even incorrect (but less incorrect than when the bug is triggered)." - -**[noclip.website](https://noclip.website/)** - - detect and work around known bugs in drivers - - provide better error messages to users - - "The immediate impact if this extension was removed would be that all Apple devices would fail to render." (Due to a driver bug at the time.) - -### Tweets replying to [Dean Jackson's](https://twitter.com/grorgwork/status/1062395616867700736) inquiry about removing WEBGL\_debug\_renderer\_info: - - - Google maps, [to identify poorly performing devices.](https://twitter.com/gfxprogrammer/status/1062422760662528000?s=20) - - [Active Theory](https://activetheory.net/), [to scale visual quality](https://twitter.com/michaeltheory/status/1062402110396874752?s=20) - - [2DKit](http://2dkit.com/), [to estimate available memory and scale quality](https://twitter.com/b_garcia/status/1062413508212600832?s=20) - - [Matterport](https://matterport.com/), [to identify when to serve higher resolution textures](https://twitter.com/haeric/status/1134155677411110913?s=20) - -## Appendix C: API Prior Art - -### Native equivalents: -The following structures are what expose similar information in the various native libraries, though they obviously don't have the same privacy considerations. Included here as reference. - - [VkPhysicalDeviceProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceProperties.html) - - [DXGI_ADAPTER_DESC](https://docs.microsoft.com/en-us/windows/win32/api/dxgi/ns-dxgi-dxgi_adapter_desc) - - [MTLDevice](https://developer.apple.com/documentation/metal/mtldevice) - -### Prior art on the Web Platform: -[User-Agent client hints](https://web.dev/user-agent-client-hints/), and especially [NavigatorUAData.getHighEntropyValues()](https://developer.mozilla.org/en-US/docs/Web/API/NavigatorUAData/getHighEntropyValues), have been introduced previously as a more privacy preserving and developer friendly alternative to UA string parsing. +# WebGPU Adapter Identifiers + +**This document is outdated. `adapter.requestAdapterInfo()` has been replaced with +`adapter.info` and `unmaskHints` doesn't exist anymore. See: +[#4536](https://github.com/gpuweb/gpuweb/issues/4536), +[#4316](https://github.com/gpuweb/gpuweb/pull/4316). + +## Introduction + +The WebGL extension [WEBGL_debug_renderer_info](https://www.khronos.org/registry/webgl/extensions/WEBGL_debug_renderer_info/) reports identifying information about a device's graphics driver for the purposes of debugging or detection and avoidance of bugs or performance pitfalls on a particular driver or piece of hardware. + +These identifiers have proven to be a valuable tool for developers over the years (See [Appendix B: Motivating real-world use cases](#Appendix-B-Motivating-real-world-use-cases)), but have also been observed to be frequently used as a source of high-entropy fingerprinting data. Additionally, the format that WebGL returns the identifiers in (a string of undefined structure) is difficult to work with, akin to the user agent string. + +For WebGPU we need mechanisms which report similar data about the GPU hardware (called an "adapter" in WebGPU) to enable legitimate development use cases, such as driver bug workarounds, while minimizing the amount of fingerprintable data that is exposed without user consent. This document will refer to that data as "adapter identifiers". + +## Use Cases: + +### Bug workarounds +A WebGPU developer wants to ensure that their content works on all devices, but is aware of a bug on a specific family of GPUs that causes corrupted rendering. Using a minimal subset of adapter identifiers they can identify when a user's GPUs is part of a group which includes the known-buggy hardware and switch to a slower code path that doesn't provoke the issue. + +### Filing issue reports +A WebGPU developer has included a "Report an issue" button on their page. Normally they have found that they need very little adapter information to operate, but when users experience a problem they want to gather as much data as they can about the problem. On the report filing page they include UI to allow the user to include their GPU information in the report, which when checked causes the browser to confirm that they want to let the page know their full adapter details. + +### Performance optimization +A WebGPU developer wants all users to experience good performance on their page, but has developed some effects that are not practical on mobile GPUs. They check the adapter identifiers on page load to get a broad idea of what family of GPU the user has to start them off with a reasonable set of defaults. On a settings page, however, they can include a button which detects the best settings for their device. Clicking it may prompt the user for consent to see more detailed GPU information so that ideal settings for their device can be selected. + +### WebGPU developer community assets +A common and useful asset for developers is sites such as https://gpuinfo.org/, which visualize your current devices capabilities in an easy to read format and (with user's consent) can collect information about GPU capabilities to report in aggregate to other developers, giving them a sense of how widespread various capabilities are. Offering a way for users to opt-in to contributing to such a database is desirable. + +## Goals: + - Offer a mechanism to report GPU adapter identifiers in a scalable way. + - Allow for reporting no information (tracking prevention modes, privacy-oriented UAs). + - Enable UAs to decide for themselves how much information to expose by default. + - Allow developers to have some input on how much information they need, especially with respect to triggering user prompts. + - Any such feature needs to be invokable late in the device lifetime, to allow for cases like filing bug reports. + - Developers need to know when a call may cause a user prompt to be shown so that they can avoid that path if desired. + - Offer control over how much data is exposed to embedded content/iframes. + - Minimize string parsing for accuracy and developer convenience. + +## API usage + +> Full details of how to use WebGPU will not be covered here. Please refer to the [WebGPU explainer](https://gpuweb.github.io/gpuweb/explainer/) or [WebGPU spec](https://gpuweb.github.io/gpuweb) for further information. + +### Masked adapter identifiers + +The first step when using WebGPU is to query a `GPUAdapter`, of which there may be several available to the system. This will typically correspond to a physical or software emulated GPU. + +```js +const gpuAdapter = await navigator.gpu.requestAdapter(); +``` + +WebGPU applications often require a significant amount of resource initialization at startup, and it's possible that the resources being initialized may need to be altered depending on the adapter in use. For example: Shader sources may need to be re-written to avoid a known bug or lower detail meshes and textures may need to be fetched to avoid overtaxing a slower device. In these cases some amount of adapter identifiers need to be queried very early in the application's lifetime, and preferably without invoking a user consent prompt. (Nobody likes to be asked for permission immediately on navigation, at which point they likely have little to know context for why the permission is needed.) + +In this case, the developer would call the `requestAdapterInfo()` method of the `GPUAdapter`, which returns a `GPUAdapterInfo` interface containing several potential identifiers for the adapter, and may contain values similar to the following: + +```js +const adapterInfo = await gpuAdapter.requestAdapterInfo(); +console.log(adapterInfo); + +// Output: +{ + vendor: 'nvidia', + architecture: 'turing', + device: '', + description: '' +} +``` + +Note that some values of the interface are the empty string, because the UA deemed that they were too high-entropy to return without explicit user consent. If the UA wished, it would have the ability to return empty string for all values. This would be most commonly expected in "enhanced privacy" modes like [Edge's strict tracking prevention](https://support.microsoft.com/en-us/microsoft-edge/learn-about-tracking-prevention-in-microsoft-edge-5ac125e8-9b90-8d59-fa2c-7f2e9a44d869) or [Firefox's Enhanced Tracking Protection](https://support.mozilla.org/en-US/kb/enhanced-tracking-protection-firefox-desktop). Ideally returning little to no identifiers is common enough that user agents that wish to expose very little information by default can do so without severe compatibility concerns. + +The information that _is_ returned should be helpful in identifying broad buckets of adapters with similar capabilities and performance characteristics. For example, Nvidia's "Turing" architecture [covers a range of nearly 40 different GPUs](https://en.wikipedia.org/wiki/Turing_(microarchitecture)#Products_using_Turing) across a wide range of prices and form factors. Identifing the adapter as an Turing device is enough to allow developers to activate broad workarounds aimed at that family of hardware and make some assumptions about baseline performance, but is also broad enough to not give away too much identifiable information about the user. + +Additionally, in some cases the UA may find it beneficial to return a value that is not the most accurate one that could be reported but still gives developers a reasonable reference point with a lower amount of entropy. + +Finally, it may not always be possible or practical to detemine a value for some fields (like a GPU's architecture) and in those cases returning empty string is acceptible even if the user agent would have considered the information low-entropy. + +### Unmasked adapter identifiers + +At some point during the lifetime of the application the developer may determine that they need more information about the user's specific adapter. A common scenario would be filing a bug report. The developer will be able to best respond to the user's issue if they know exactly what device is being used. In this case, they can request an "unmasked" version any fields of the `GPUAdapterInfo`: + +```js +feedbackButton.addEventListener('click', async ()=> { + const unmaskHints = ['architecture', 'device', 'description']; + const unmaskedAdapterInfo = await gpuAdapter.requestAdapterInfo(unmaskHints); + generateUserFeedback(unmaskedAdapterInfo); +}); +``` + +The resolved value is the adapter's `GPUAdapterInfo` with any fields specified by `unmaskHints` that were previously omitted or reported with a less accurate value now populated with the most accurate information the UA will deliver. For example: + +```js +console.log(unmaskedAdapterInfo); + +// Output: +{ + vendor: 'nvidia', + architecture: 'turing', + device: '0x8644', + description: 'NVIDIA GeForce GTX 1660 SUPER' +} +``` + +Because the unmasked values may contain higher entropy identifying information, the bar for querying it is quite a bit higher. Calling `requestAdapterInfo()` with any `unmaskHints` requires user activation, and will reject the promise otherwise. If the `unmaskHints` array contains any previously masked value it also requires that user consent be given before returning, and as such may display a prompt to the user asking if the page can access the newly requested GPU details before allowing the promise to resolve. If the user declines to give consent then the promise is rejected. + +Once the user has given their consent any future calls to `requestAdapterInfo()` should return the unmasked fields even if no `unmaskHints` are specified, and future instances of the same underlying adapter returned from `navigator.gpu.requestAdapter()` on that page load should also return unmasked data without requiring hints to be passed. + +Even after `unmaskHints` have been passed to `requestAdapterInfo()` the UA is still allowed to return empty string for attributes requested in the `unmaskHints` array if the UA cannot determine the value in question or decides not to reveal it. (UAs should not request user consent when unmasking is requested for attributes that will be left empty.) + +### Identifier formatting + +To minimize developer work and reduce the chances of fingerprinting via casing differences between platforms, and string values reported as part of the `GPUAdapterInfo` conform to strict formatting rules. They must be lowercase ASCII strings containing no spaces, with separate words concatenated with a hyphen ("-") character. + +The exception to this is `description`, which may be a string reported directly from the driver without modification. As a result, however, `description` should always be omitted from masked adapters. Additionally, enough information should be offered via other fields that developers don't feel the need to attempt parsing the `description` string. + +User agents should also make an effort to normalize the strings returned, ideally through a public registery. This especially applies to fields like `vendor` which are presumed to have a relatively low number of possible values. + +Some values, such as `architecture`, are unlikely to be directly provided by the driver. As such, User Agents are expected to make a best-effort at identifying and reporting common architectures, and report empty string otherwise. + +### Iframe controls + +In addition to using the above mechanisms to hit a balance between offering developers useful information and mitigating fingerprinting concerns, [Permissions Policy](https://w3c.github.io/webappsec-permissions-policy/) should be used to control whether or not WebGPU features are exposed to iframes. + +The recommended feature identifier is `"webgpu"`, and the [default allowlist](https://w3c.github.io/webappsec-permissions-policy/#default-allowlist) for this feature would be `["self"]`. This allows documents from the top level browsing context use the feature by default, but requires documents included in iframes to be explicitly granted permission from the top level context in order to use WebGPU, like so: + +```html +<iframe src="https://example.com/embed" allow="webgpu"></iframe> +``` + +If the `"webgpu"` feature is not granted to a page, all calls that page makes to `navigator.gpu.requestAdapter()` will resolve to `null`. + +This helps strike a balance between enabling powerful rendering and computation capabilities on the web and a desire to mitigate abuse by bad actors. + +## Proposed IDL + +```webidl +partial interface GPUAdapter { + Promise<GPUAdapterInfo> requestAdapterInfo(optional sequence<DOMString> unmaskHints = []); +}; + +interface GPUAdapterInfo { + DOMString vendor; + DOMString architecture; + DOMString device; + DOMString description; +}; +``` + +## Appendix A: Alternatives considered + +### A single identifier string +Previously the WebGPU spec had a single string identifier, `GPUAdapter.name`, which would have reported a string very similar to the values reported by `WEBGL_debug_renderer_info`. [Concerns were raised about this approach](https://github.com/gpuweb/gpuweb/issues/2191), and the group generally agreed that we wanted something with finer grained control over the values reported and that was less problematic to parse for developers. + +### Force reliance on feature detection +It was suggested that, similar to other web platform features, no identifiers should be exposed at all and instead developers should rely on feature tests to determine if they need to take a different code path. Unfortunately this is impractical for GPU APIs such as WebGPU or WebGL. There have been multiple documented bugs in the past that are not trivially detectable, such as bugs which are only provoked under high memory usage situations or which only occur intermittently over long time periods. In addition, reading back information from the GPU in order to detect certain classes of issues is not trivial, and in some cases may actually change the driver's behavior. + +This means that realtime bug detection can be extremely constly, and may incur performance penalties or add significantly to startup time. As such it is not desirable or practical to ask developers to try and provoke any known driver issues on application startup. + +### Rely on the UA, etc. to fix bugs +It was also suggested that developers should generally not be the ones shouldering the burden of detecting and working around driver or hardware issues, and instead that responsibility should lie with the hardware manufacturer, OS, or User Agent. In general we agree with this sentiment! User agents, in particular, have a history of implementing workarounds for issues observed on a specific OS, GPU, or driver, as well as working with the appropriate parties to ensure that the problems are fixed upstream. (For example, you can see the [list of bugs that Chromium works around currently here](https://source.chromium.org/chromium/chromium/src/+/main:gpu/config/gpu_driver_bug_list.json). All modern browsers have some variation of this type of workaround list.) This is work we expect to continue in perpetuity. + +However, we have also observed that developers cannot rely on platform owners alone to resolve issues. For one, no matter how quickly a user agent or hardware manufacturer responds to bug reports there will always be some period of development, testing, and deployment before developers can rely on the fix, and even then they will likely have to contend with users on older software versions for a long time. This effect is exaggerated when considering that in some cases user agents only release new updates on a yearly cadence. + +In some other cases, the issue may not be one of correctness, but of performance. If a certain technique is performed by the GPU in a conformant manner but performs poorly compared to other devices it is generally not the User Agent's place to intervene. An individual developer, however, can make quality vs. performance tradeoffs that are appropriate for their application as long as they are given sufficent information to know when the tradeoff in necessary. + +### Inference from other signals +There are some other properties, such as a `GPUAdapter`'s limits and available features, that could be used in some cases to infer what kind of device a developer is using. Additionally, developers could use other platform signals (user agent string, screen resolution, etc) to infer that they are on a known device which has a certain class of GPU. (For example, a specific generation iPhone.) The concern with this approach is that it encourages developers to collect _more_ identifiable user information for a less reliable result. + +In practical terms it's likely that not providing adapter identifiers via WebGPU will simply encourage developers to initialize and tear down a WebGL context prior to initializing WebGPU simply to get the `WEBGL_debug_renderer_info` strings, which may return info from the incorrect adapter and is not a pattern we want to encourage. + +## Appendix B: Motivating real-world use cases + +These are some known use cases for GPU identifiers that we have heard of in the past. These refer to WebGL applications specifically, but we have every reason to expect that they will be applicable to WebGPU as well. + +### Developer feedback on WEBGL\_debug\_renderer\_info: +Ken Russell (@kenrussell) collected quotes from various WebGL developers and reported them to the WebGL Working Group in 2019. + +The following are some quoted reasons why various pages use `WEBGL_debug_renderer_info`: + +**Unity** + - Using exact GPU info+device+OS+browser to ... identify weak fillrate systems for whether to use "SD" or "HD" rendering + +**Uber** + - Use this feature to activate nVidia/Intel specific GLSL workarounds. + - Print the driver in the console when we create contexts, so that when remote operators (e.g. in Asia/Australia) report problems we can ... unblock them with minimal effort. + +**[Sketchfab](https://sketchfab.com)** + - Report user GPU in our automatic error reporting tools. When we need to reproduce shader bugs it's invaluable. + - Warn users when they are switched to software webgl acceleration. "Otherwise users might think the Sketchfab render is very slow, using their laptop batteries, and pushing laptop fan to the max where just restarting/reloading chrome fixes it." + +**[Scirra](https://www.construct.net/en)** + - identifying GPUs affected by driver bugs, and working around it + - analytics on the unmasked renderer to identify the impact of such bugs and help us decide how to respond + - identifying which GPU is really in use on dual-GPU systems + - displaying it to the user as a diagnostic (also for them to identify which GPU is in use)." + +**[Figma](https://www.figma.com/)** + - Rely on this feature to be able to track down and detect obscure GPU issues with users that have old unreliable hardware. + - "Without this information, we would have been unable to debug and fix these WebGL implementation bugs that we've been encountering." + - Use this information to enable workarounds for WebGL implementation bugs. "The workarounds are not enabled by default because they are slower, and in some cases actually even incorrect (but less incorrect than when the bug is triggered)." + +**[noclip.website](https://noclip.website/)** + - detect and work around known bugs in drivers + - provide better error messages to users + - "The immediate impact if this extension was removed would be that all Apple devices would fail to render." (Due to a driver bug at the time.) + +### Tweets replying to [Dean Jackson's](https://twitter.com/grorgwork/status/1062395616867700736) inquiry about removing WEBGL\_debug\_renderer\_info: + + - Google maps, [to identify poorly performing devices.](https://twitter.com/gfxprogrammer/status/1062422760662528000?s=20) + - [Active Theory](https://activetheory.net/), [to scale visual quality](https://twitter.com/michaeltheory/status/1062402110396874752?s=20) + - [2DKit](http://2dkit.com/), [to estimate available memory and scale quality](https://twitter.com/b_garcia/status/1062413508212600832?s=20) + - [Matterport](https://matterport.com/), [to identify when to serve higher resolution textures](https://twitter.com/haeric/status/1134155677411110913?s=20) + +## Appendix C: API Prior Art + +### Native equivalents: +The following structures are what expose similar information in the various native libraries, though they obviously don't have the same privacy considerations. Included here as reference. + - [VkPhysicalDeviceProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceProperties.html) + - [DXGI_ADAPTER_DESC](https://docs.microsoft.com/en-us/windows/win32/api/dxgi/ns-dxgi-dxgi_adapter_desc) + - [MTLDevice](https://developer.apple.com/documentation/metal/mtldevice) + +### Prior art on the Web Platform: +[User-Agent client hints](https://web.dev/user-agent-client-hints/), and especially [NavigatorUAData.getHighEntropyValues()](https://developer.mozilla.org/en-US/docs/Web/API/NavigatorUAData/getHighEntropyValues), have been introduced previously as a more privacy preserving and developer friendly alternative to UA string parsing. diff --git a/design/RejectedErrorHandling.md b/design/RejectedErrorHandling.md index a0e5b3698f..230d9fca1f 100644 --- a/design/RejectedErrorHandling.md +++ b/design/RejectedErrorHandling.md @@ -30,7 +30,7 @@ interface GPUDeviceLostEvent : Event { }; ``` -If `GPUAdapter`'s `isReady` attribute is false, `createDevice` will fail. +If `GPUAdapter`'s `isReady` attribute is false, `createDevice` will fail. `isReady` may be set to `false` when a `"gpu-device-lost"` event fires. It will always be set to `true` when a `"gpu-adapter-ready"` event fires. diff --git a/explainer/index.bs b/explainer/index.bs index 3428709c18..82c7b84de6 100644 --- a/explainer/index.bs +++ b/explainer/index.bs @@ -1267,5 +1267,3 @@ However investigation in WebGL show that GPU timings can be used to leak from su # WebGPU Shading Language # {#wgsl} - - diff --git a/proposals/compatibility-mode.md b/proposals/compatibility-mode.md index e71d17fe31..a16c0ece3c 100644 --- a/proposals/compatibility-mode.md +++ b/proposals/compatibility-mode.md @@ -45,7 +45,7 @@ See "Texture view dimension may be specified", below. ## Compatibility mode restrictions -### 1. Texture view dimension may be specified +### 1. Texture view dimension may be specified When specifying a texture, a `textureBindingViewDimension` property determines the views which can be bound from that texture for sampling (see "Proposed IDL changes", above). Binding a view of a different dimension for sampling than specified at texture creation time will cause a validation error. If `textureBindingViewDimension` is unspecified, use [the same algorithm as `createView()`](https://gpuweb.github.io/gpuweb/#abstract-opdef-resolving-gputextureviewdescriptor-defaults): ``` @@ -193,7 +193,7 @@ Note: this does not affect textures made with depth formats bound to `texture_2d If code is passed to `createShaderModule` that uses `@interpolation(flat)` or `@interpolation(flat, first)` generate a validation error. -**Justification**: OpenGL ES 3.1 only supports the last vertex as the provoking vertex where as +**Justification**: OpenGL ES 3.1 only supports the last vertex as the provoking vertex where as other APIs only support the first vertex so only `@interpolation(flat, either)` is supported in compatibility mode. diff --git a/tools/copy-if-different.sh b/tools/copy-if-different.sh index 22f381adc2..4ab3ec39b5 100755 --- a/tools/copy-if-different.sh +++ b/tools/copy-if-different.sh @@ -21,4 +21,3 @@ fi if [ ! -f "$output" ] || ! diff "$input" "$output" >/dev/null; then cp "$input" "$output" fi - diff --git a/tools/wgsl-meeting-helper b/tools/wgsl-meeting-helper index 2815128ffd..28dec6f9f9 100755 --- a/tools/wgsl-meeting-helper +++ b/tools/wgsl-meeting-helper @@ -113,4 +113,3 @@ async function run(targetFunction, ...args) { } argumentProcessor.parse(); - diff --git a/wgsl/Makefile b/wgsl/Makefile index 7e0e28714a..89d7b8ae37 100644 --- a/wgsl/Makefile +++ b/wgsl/Makefile @@ -145,4 +145,3 @@ lalr_star: $(ANALYZER) analyze/test/star.json .PHONY: lalr_442 lalr_442: $(ANALYZER) analyze/test/ex442.json python3 $(ANALYZE_SCRIPT) -lalr analyze/test/ex442.json - diff --git a/wgsl/diagrams/uniformity_1.mmd b/wgsl/diagrams/uniformity_1.mmd index 902d9eae14..fba6c43ca0 100644 --- a/wgsl/diagrams/uniformity_1.mmd +++ b/wgsl/diagrams/uniformity_1.mmd @@ -37,4 +37,3 @@ flowchart BT textureSamplearg2[textureSample arg_2] textureSamplereturnvalue[textureSample return_value] CFaftertextureSample([CF_after_textureSample]) - diff --git a/wgsl/media-type-registration.txt b/wgsl/media-type-registration.txt index f8e8cae1f0..45285f17c7 100644 --- a/wgsl/media-type-registration.txt +++ b/wgsl/media-type-registration.txt @@ -18,7 +18,7 @@ Security considerations: considerations, see [WebGPU] Section 2.1 Security Considerations. Interoperability considerations: - + Implementations of WebGPU may have different capabilities, and these differences may affect what features may be exercised by WGSL programs. See [WebGPU] Section 3.6 Optional capabilities, @@ -46,7 +46,7 @@ Additional information: Macintosh file type code(s): TEXT Person & email address to contact for further information: - David Neto, dneto@google.com, or the Editors listed in [WGSL]. + David Neto, dneto@google.com, or the Editors listed in [WGSL]. Intended usage: COMMON @@ -61,5 +61,3 @@ Normative References: [WGSL] W3C, “WebGPU Shading Language” W3C Working Draft, October 2022. <https://www.w3.org/TR/wgsl/>. - - diff --git a/wgsl/tools/TSPath.py b/wgsl/tools/TSPath.py index d460bb126a..e9d62c830f 100644 --- a/wgsl/tools/TSPath.py +++ b/wgsl/tools/TSPath.py @@ -37,7 +37,7 @@ Examples: - /translation_unit + /translation_unit The translation_unit node at the top of the tree //enable_directive @@ -197,7 +197,7 @@ class SeqNode(ExprNode): def __init__(self,children): super().__init__(ENKind.seq) self.exprs = children - + def match(self,ts_node): result = [] # Walk through both lists diff --git a/wgsl/tools/analyze/Grammar.py b/wgsl/tools/analyze/Grammar.py index 02a23697dd..42939c67a3 100755 --- a/wgsl/tools/analyze/Grammar.py +++ b/wgsl/tools/analyze/Grammar.py @@ -3023,4 +3023,3 @@ def LALR1_ItemSets(self, max_item_sets=None): item_sets = self.LALR1(max_item_sets=max_item_sets).states return item_sets - diff --git a/wgsl/tools/analyze/ObjectRegistry.py b/wgsl/tools/analyze/ObjectRegistry.py index 53f66d677f..e04ea02196 100755 --- a/wgsl/tools/analyze/ObjectRegistry.py +++ b/wgsl/tools/analyze/ObjectRegistry.py @@ -134,7 +134,7 @@ def register(self,registerable): assert registerable.reg_info.obj is not None return registerable.reg_info.obj - + key = registerable.key if key in self.key_to_object: return self.key_to_object[key] diff --git a/wgsl/tools/analyze/test.py b/wgsl/tools/analyze/test.py index 53e83d9735..9321badff0 100755 --- a/wgsl/tools/analyze/test.py +++ b/wgsl/tools/analyze/test.py @@ -1,23 +1,23 @@ #!/usr/bin/env python3 -# +# # Copyright 2022 Google LLC # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: -# +# # 1. Redistributions of works must retain the original copyright # notice, this list of conditions and the following disclaimer. -# +# # 2. Redistributions in binary form must reproduce the original # copyright notice, this list of conditions and the following disclaimer # in the documentation and/or other materials provided with the # distribution. -# +# # 3. Neither the name of the W3C nor the names of its contributors # may be used to endorse or promote products derived from this work # without specific prior written permission. -# +# # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS From c7fc9bb18caf0bc946ed35abae40ab1f1f67ae78 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Beaufort?= <beaufortfrancois@gmail.com> Date: Tue, 17 Dec 2024 09:27:18 +0100 Subject: [PATCH 273/285] Explainer: Update Adapter Selection and Device Init section (#5011) --- explainer/index.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/explainer/index.bs b/explainer/index.bs index 82c7b84de6..4fbf30755b 100644 --- a/explainer/index.bs +++ b/explainer/index.bs @@ -189,12 +189,12 @@ issued through "child" objects. To get an adapter, an application calls `navigator.gpu.requestAdapter()`, optionally passing options which may influence what adapter is chosen, like a `powerPreference` (`"low-power"` or `"high-performance"`) or -`forceSoftware` to force a software implementation. +`forceFallbackAdapter` to force a software implementation. `requestAdapter()` never rejects, but may resolve to null if an adapter can't be returned with the specified options. -A returned adapter exposes a `name` ([=implementation-defined=]), a boolean `isSoftware` so +A returned adapter exposes `info` (`vendor`/`architecture`/etc., implementation-defined), a boolean `isFallbackAdapter` so applications with fallback paths (like WebGL or 2D canvas) can avoid slow software implementations, and the [[#optional-capabilities]] available on the adapter. From 254de568a17cb68df04503f4cd4a937087802f62 Mon Sep 17 00:00:00 2001 From: Stephen White <senorblanco@chromium.org> Date: Wed, 18 Dec 2024 19:33:42 -0500 Subject: [PATCH 274/285] Compatibility Mode: update to use featureLevel. (#5012) Now that GPUAdapter.featureLevel has landed in the core spec, remove the GPUAdapter.compatibilityMode IDL change from the proposal. Explain that setting GPUAdapter.featureLevel to "compatibility" enables Compatibility Mode validation. Change the GPUAdapter.isCompatibilityMode boolean to a "featureLevel" DOMString. --- proposals/compatibility-mode.md | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/proposals/compatibility-mode.md b/proposals/compatibility-mode.md index a16c0ece3c..991180e8ef 100644 --- a/proposals/compatibility-mode.md +++ b/proposals/compatibility-mode.md @@ -16,24 +16,17 @@ Since WebGPU Compatibility mode is a subset of WebGPU, all valid Compatibility m ## WebGPU Spec Changes -```webidl -partial dictionary GPURequestAdapterOptions { - boolean compatibilityMode = false; -} -``` +When calling `GPU.requestAdapter()`, passing `featureLevel = "compatibility"` in the `GPURequestAdapterOptions` will indicate to the User Agent to select the Compatibility subset of WebGPU. Any Devices created from the resulting Adapter on supporting UAs will support only Compatibility mode. Calls to APIs unsupported by Compatibility mode will result in validation errors. -When calling `GPU.RequestAdapter()`, passing `compatibilityMode = true` in the `GPURequestAdapterOptions` will indicate to the User Agent to select the Compatibility subset of WebGPU. Any Devices created from the resulting Adapter on supporting UAs will support only Compatibility mode. Calls to APIs unsupported by Compatibility mode will result in validation errors. - -Note that a supporting User Agent may return a `compatibilityMode = true` Adapter which is backed by a fully WebGPU-capable hardware adapter, such as D3D12, Metal or Vulkan, so long as it validates all subsequent API calls made on the Adapter and the objects it vends against the Compatibility subset. +Note that a supporting User Agent may return a `featureLevel = "compatibility"` Adapter which is backed by a fully WebGPU-capable hardware adapter, such as D3D12, Metal or Vulkan, so long as it validates all subsequent API calls made on the Adapter and the objects it vends against the Compatibility subset. ```webidl partial interface GPUAdapter { - readonly attribute boolean isCompatibilityMode; + readonly attribute DOMstring featureLevel; } ``` -As a convenience to the developer, the Adapter returned will have the `isCompatibilityMode` property set to `true`. - +As a convenience to the developer, the Adapter returned will have the `featureLevel` property set to `"compatibility"`. ```webidl partial dictionary GPUTextureDescriptor { From 0bf3070ff54529574d216d62967f2409ffa03279 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Beaufort?= <beaufortfrancois@gmail.com> Date: Thu, 19 Dec 2024 23:26:42 +0100 Subject: [PATCH 275/285] Fix maxTextureDimension limits refs (#5027) --- spec/index.bs | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index b9a06c26f8..64b2a3bf23 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -4321,6 +4321,7 @@ The {{GPUTextureUsage}} flags determine how a {{GPUTexture}} may be used after i [=Device timeline=] steps: + 1. Let |limits| be |this|.{{device/[[limits]]}}. 1. Return `true` if all of the following requirements are met, and `false` otherwise: <div class=validusage> @@ -4338,7 +4339,7 @@ The {{GPUTextureUsage}} flags determine how a {{GPUTexture}} may be used after i : {{GPUTextureDimension/"1d"}} :: - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=] must be &le; - |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension1D}}. + |limits|.{{supported limits/maxTextureDimension1D}}. - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=] must be 1. - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be 1. - |descriptor|.{{GPUTextureDescriptor/sampleCount}} must be 1. @@ -4347,20 +4348,20 @@ The {{GPUTextureUsage}} flags determine how a {{GPUTexture}} may be used after i : {{GPUTextureDimension/"2d"}} :: - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=] must be &le; - |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension2D}}. + |limits|.{{supported limits/maxTextureDimension2D}}. - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=] must be &le; - |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension2D}}. + |limits|.{{supported limits/maxTextureDimension2D}}. - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be &le; - |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureArrayLayers}}. + |limits|.{{supported limits/maxTextureArrayLayers}}. : {{GPUTextureDimension/"3d"}} :: - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/width=] must be &le; - |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension3D}}. + |limits|.{{supported limits/maxTextureDimension3D}}. - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/height=] must be &le; - |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension3D}}. + |limits|.{{supported limits/maxTextureDimension3D}}. - |descriptor|.{{GPUTextureDescriptor/size}}.[=GPUExtent3D/depthOrArrayLayers=] must be &le; - |this|.{{GPUDevice/limits}}.{{GPUSupportedLimits/maxTextureDimension3D}}. + |limits|.{{supported limits/maxTextureDimension3D}}. - |descriptor|.{{GPUTextureDescriptor/sampleCount}} must be 1. - |descriptor|.{{GPUTextureDescriptor/format}} must support {{GPUTextureDimension/"3d"}} textures according to [[#texture-format-caps]]. From 7f4ea7dcad35718c6dc3369901ef5bcfc197745a Mon Sep 17 00:00:00 2001 From: alan-baker <alanbaker@google.com> Date: Fri, 20 Dec 2024 11:50:23 -0500 Subject: [PATCH 276/285] Update subgroups CTS status (#5026) --- proposals/subgroups.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/proposals/subgroups.md b/proposals/subgroups.md index 4006e2fda5..912ce40a8b 100644 --- a/proposals/subgroups.md +++ b/proposals/subgroups.md @@ -264,7 +264,7 @@ D3D12 would have to be proven empricially. # Appendix C: CTS Status -Last updated: 2024-12-12 +Last updated: 2024-12-18 | Built-in value | Validation | Compute | Fragment | | --- | --- | --- | --- | @@ -283,12 +283,12 @@ Last updated: 2024-12-12 | `subgroupShuffleXor` | &check; | &check; | &check; | | `subgroupShuffleUp` | &check; | &check; | &check; | | `subgroupShuffleDown` | &check; | &check; | &check; | -| `subgroupAdd` | &check; | &check; | &cross; | -| `subgroupExclusiveAdd` | &check; | &check; | &cross; | -| `subgroupInclusiveAdd` | &check; | &check; | &cross; | -| `subgroupMul` | &check; | &check; | &cross; | -| `subgroupExclusiveMul` | &check; | &check; | &cross; | -| `subgroupInclusiveMul` | &check; | &check; | &cross; | +| `subgroupAdd` | &check; | &check; | &check; | +| `subgroupExclusiveAdd` | &check; | &check; | &check; | +| `subgroupInclusiveAdd` | &check; | &check; | &check; | +| `subgroupMul` | &check; | &check; | &check; | +| `subgroupExclusiveMul` | &check; | &check; | &check; | +| `subgroupInclusiveMul` | &check; | &check; | &check; | | `subgroupAnd` | &check; | &check; | &check; | | `subgroupOr` | &check; | &check; | &check; | | `subgroupXor` | &check; | &check; | &check; | From c29a5d8c272b0bd84a0dc7f71893b8c1ce63ebb9 Mon Sep 17 00:00:00 2001 From: David Neto <dneto@google.com> Date: Fri, 20 Dec 2024 11:51:32 -0500 Subject: [PATCH 277/285] texel buffer proposal: Update from review (#4985) * texel buffer proposal: Update from review - Clarify Metal limits are in units of pixels - Add TODO to get info for pre-Apple silicon Metal GPUs - Add @teoxoy's question about uniform texel buffer as an open question Followup to #4912 * Update per review Resolve the 'uniform texel buffer' question: read-only views map to uniorm texel buffers. Add the question about needing another create-view method for these. --- proposals/texel-buffers.md | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/proposals/texel-buffers.md b/proposals/texel-buffers.md index bb6fece866..bd8d31a802 100644 --- a/proposals/texel-buffers.md +++ b/proposals/texel-buffers.md @@ -358,7 +358,10 @@ A `mem_texture` fence would be needed to make texel buffer writes visible within To get coverage on older Metal versions, it would be possible to polyfill by using a regular device buffer and doing the format conversions inside the shader. This requires that the storage format is specified inside the shader. -The maximum texel buffer size is 64MB for the Apple2 GPU family, and 256MB for Apple3 and above. +The maximum texel buffer size is 64 M pixels for the Apple2 GPU family, and 256 M pixels for Apple3 and above. +The texel buffer size is also bounded above by the generic buffer size constraint. + +**TODO**: Get data for non-Apple GPUs. ### D3D12 @@ -391,7 +394,7 @@ R32G32B32A32_FLOAT ``` -# Open Questions +# Open and Resolved Questions 1. Should this be an extension, or a core feature? - To make it core, implementations would need to polyfill for Metal <2.1. We would also need to drop the formats that are not required everywhere (e.g. `R8_UINT`), or make them optional. @@ -399,3 +402,14 @@ R32G32B32A32_FLOAT - Make it core. - Drop the formats that are not widespread (leaving them for a [future texture format tier extension](https://github.com/gpuweb/gpuweb/issues/3837)). - We do not need to support Metal <2.1 (Metal 2.2 is our minimum requirement now). +2. In the original issue it was mentioned [#162 (comment)](https://github.com/gpuweb/gpuweb/issues/162#issuecomment-452771668) + that uniform texel buffers support more formats. This proposal is only for storage texel buffers but uses the name "texture buffer" throughout. + + - Is it worth adding uniform texel buffers? Besides wider format support, are they faster? + - If the answer is yes or not sure, we should probably use "storage texture/texel buffer" for this proposal instead. + - Decision at F2F: + - Implementations may map read-only texel buffer to uniform texel buffers in the underlying API. + This unifies things on the WebGPU side. + The texture format table can then enable usage of more formats for read-only texel buffers. +3. Do we need the new createView method on the buffer? Implementations could create the view at bind time, assuming they are light weight. + - Corentin Wallez raised this at the F2F. From 21cae9b2165c82cc9f892e5a6142ab22b628027c Mon Sep 17 00:00:00 2001 From: Greggman <github@greggman.com> Date: Sat, 21 Dec 2024 01:55:26 +0900 Subject: [PATCH 278/285] Call out `texture_mulitsampled_2d` parameterization (#5030) Fixed: #5028 --- wgsl/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 6c0c036c09..e2dcdd97a2 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -4318,7 +4318,7 @@ WebGPU [$validating shader binding|validates$] compatibility between the texture, the {{GPUTextureBindingLayout/sampleType}} of the bind group layout, and the [=sampled type=] of the texture variable. -The texture is parameterized by a [=sampled type=] and +`texture_multisampled_2d` is parameterized by a [=sampled type=] and [=shader-creation error|must=] be `f32`, `i32`, or `u32`. <table class='data'> From 0a2cda4d9eb206f2d26b11f594f882ff05edeae1 Mon Sep 17 00:00:00 2001 From: Kai Ninomiya <kainino@chromium.org> Date: Wed, 8 Jan 2025 15:07:40 -0800 Subject: [PATCH 279/285] Add placeholder Deadlines to specs as a workaround for Bikeshed problem (#5040) --- spec/index.bs | 2 ++ wgsl/index.bs | 2 ++ 2 files changed, 4 insertions(+) diff --git a/spec/index.bs b/spec/index.bs index 64b2a3bf23..ebd2d92218 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -21,7 +21,9 @@ Markup Shorthands: dfn yes Markup Shorthands: idl yes Markup Shorthands: css no Assume Explicit For: yes +Deadline: 1111-11-11 </pre> +<!-- TODO(https://github.com/speced/bikeshed/issues/3000): Deadline is a hack for a Bikeshed compile error. --> <pre class=link-defaults> spec:webidl; type:exception; text:TypeError diff --git a/wgsl/index.bs b/wgsl/index.bs index e2dcdd97a2..6beed5c3b9 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -37,7 +37,9 @@ Markup Shorthands: biblio yes Markup Shorthands: idl yes Markup Shorthands: css no Assume Explicit For: yes +Deadline: 1111-11-11 </pre> +<!-- TODO(https://github.com/speced/bikeshed/issues/3000): Deadline is a hack for a Bikeshed compile error. --> <style> tr:nth-child(2n) { From 94f0ddec0ed412d654999447c24ea642488ed8bd Mon Sep 17 00:00:00 2001 From: Kai Ninomiya <kainino@chromium.org> Date: Wed, 8 Jan 2025 15:18:03 -0800 Subject: [PATCH 280/285] Move Correspondence Reference from LD to UD, to fix CSS (#5038) `Status: LD` is "Living Document", which makes sense but has broken CSS if used with `Group: webgpu`. If not used with `Group: webgpu` then the license is CC0 (Public Domain) which is probably fine but not really appropriate. `Status: UD` is "Unofficial Proposal Draft", which is not really what I would call this document, but it renders correctly :) Any other status causes an error message saying something like: "You used Status NOTE, but your Group (WEBGPU) is limited to the statuses CG-DRAFT, CG-FINAL, or UD." (Probably LD isn't supposed to be used either and that's why it's broken.) Documentation: https://speced.github.io/bikeshed/#metadata --- correspondence/index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/correspondence/index.bs b/correspondence/index.bs index 16c8ddf482..c3e8877a39 100644 --- a/correspondence/index.bs +++ b/correspondence/index.bs @@ -2,7 +2,7 @@ Title: WebGPU Correspondence Reference Shortname: webgpu-correspondence Level: None -Status: LD +Status: UD Group: webgpu URL: https://gpuweb.github.io/gpuweb/correspondence/ !Participate: <a href="https://github.com/gpuweb/gpuweb/issues/new">File an issue</a> (<a href="https://github.com/gpuweb/gpuweb/issues">open issues</a>) From ee18f9feb58882093c64353c4a2947b94c5495c8 Mon Sep 17 00:00:00 2001 From: Kai Ninomiya <kainino@chromium.org> Date: Wed, 8 Jan 2025 15:18:26 -0800 Subject: [PATCH 281/285] Document maxFragmentCombinedOutputResources (#5039) --- correspondence/index.bs | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/correspondence/index.bs b/correspondence/index.bs index c3e8877a39..eb4092d07f 100644 --- a/correspondence/index.bs +++ b/correspondence/index.bs @@ -177,12 +177,14 @@ User agents are not required to use these formulas and may expose whatever they <tr> <th>`maxStorageTexturesPerShaderStage` <td>[#409](https://github.com/gpuweb/gpuweb/issues/409) - <td>`maxPerStageDescriptorStorageImages` + <td>*Strategy-dependent.* Choose a value &le; `maxPerStageDescriptorStorageImages` + while adhering to [[#vulkan-maxFragmentCombinedOutputResources]]. <td rowspan=2>*Strategy-dependent.* Allocate `Maximum number of Unordered Access Views in all descriptor tables across all stages` (guaranteed to be 64) across stages across these two limits. For example, 32 for each shader stage, split as 16 textures and 16 buffers per shader stage. <tr> <th>`maxStorageBuffersPerShaderStage` <td>[#409](https://github.com/gpuweb/gpuweb/issues/409) - <td>`maxPerStageDescriptorStorageBuffers` + <td>*Strategy-dependent.* Choose a value &le; `maxPerStageDescriptorStorageBuffers` + while adhering to [[#vulkan-maxFragmentCombinedOutputResources]]. <td rowspan=3>*Strategy-dependent.* Allocate `Maximum number of entries in the buffer argument table, per graphics or kernel function` across these three limits. <tr> <th>`maxUniformBuffersPerShaderStage` @@ -241,9 +243,9 @@ User agents are not required to use these formulas and may expose whatever they <tr> <th>`maxColorAttachments` <td>[#2820](https://github.com/gpuweb/gpuweb/issues/2820) - <td>`min(maxColorAttachments, maxFragmentOutputAttachments, maxFragmentCombinedOutputResources)` - <td>`Maximum number of color render targets per render - pass descriptor` + <td>*Strategy-dependent.* Choose a value &le; `min(maxColorAttachments, maxFragmentOutputAttachments)` + while adhering to [[#vulkan-maxFragmentCombinedOutputResources]]. + <td>`Maximum number of color render targets per render pass descriptor` <td>8 = `D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT` <tr> <th>`maxColorAttachmentBytesPerSample` @@ -286,3 +288,16 @@ User agents are not required to use these formulas and may expose whatever they <td><p class=issue>*No documented limit?* <td>65535 = `D3D12_CS_DISPATCH_MAX_THREAD_GROUPS_PER_DIMENSION` </table> + +## Vulkan `maxFragmentCombinedOutputResources` ## {#vulkan-maxFragmentCombinedOutputResources} + +Choose `maxStorageBuffersPerShaderStage`, `maxStorageTexturesPerShaderStage`, and `maxColorAttachments` +such that their sum is &le; Vulkan's `maxFragmentCombinedOutputResources`. + +<p class=advisement> +Warning: +`maxFragmentCombinedOutputResources` is incorrectly reported on many +[Intel, AMD, NVIDIA](https://github.com/gpuweb/gpuweb/issues/4018#issuecomment-1499725189), and +[Imagination](https://github.com/gpuweb/gpuweb/issues/3631#issuecomment-1498747606) drivers. +On these drivers, the combined limit may need to be ignored. +</p> From 6f0dd66fe630bd0a07f87c4732846a3a46c4ca2a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Daoust?= <fd@tidoust.net> Date: Fri, 10 Jan 2025 10:03:23 +0100 Subject: [PATCH 282/285] Adjust metadata and workflow to resume publication to /TR (#5041) New spec drafts need to be published as "Candidate Recommendation Drafts". This status requires a deadline date to be specified. That deadline could be set as a `W3C_BUILD_OVERRIDE` parameter in the workflow, but then it's inherent to the specs, so it seems good to have it defined in the source directly. (That information is not useful for the Editor's Draft in theory, but also needed in practice since Bikeshed replaces all macro texts in the boilerplate before it drops the bits that aren't needed). The update also adds a link to the test suite in WebGPU and renames "Tests" into "Test Suite" in WGSL (usual name across specs). --- .github/workflows/publish-TR-webgpu.yml | 2 +- .github/workflows/publish-TR-wgsl.yml | 2 +- spec/index.bs | 4 ++-- wgsl/index.bs | 5 ++--- 4 files changed, 6 insertions(+), 7 deletions(-) diff --git a/.github/workflows/publish-TR-webgpu.yml b/.github/workflows/publish-TR-webgpu.yml index 3046b7aa43..3dc6da5ddd 100644 --- a/.github/workflows/publish-TR-webgpu.yml +++ b/.github/workflows/publish-TR-webgpu.yml @@ -34,4 +34,4 @@ jobs: W3C_ECHIDNA_TOKEN: ${{ secrets.ECHIDNA_TOKEN_WEBGPU }} W3C_WG_DECISION_URL: https://lists.w3.org/Archives/Public/public-gpu/2021Apr/0004.html W3C_BUILD_OVERRIDE: | - status: WD + status: CRD diff --git a/.github/workflows/publish-TR-wgsl.yml b/.github/workflows/publish-TR-wgsl.yml index 4b5ba1f6b4..50ac413d46 100644 --- a/.github/workflows/publish-TR-wgsl.yml +++ b/.github/workflows/publish-TR-wgsl.yml @@ -34,4 +34,4 @@ jobs: W3C_ECHIDNA_TOKEN: ${{ secrets.ECHIDNA_TOKEN_WGSL }} W3C_WG_DECISION_URL: https://lists.w3.org/Archives/Public/public-gpu/2021Apr/0004.html W3C_BUILD_OVERRIDE: | - status: WD + status: CRD diff --git a/spec/index.bs b/spec/index.bs index ebd2d92218..c286e5c7b4 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -8,6 +8,7 @@ ED: https://gpuweb.github.io/gpuweb/ TR: https://www.w3.org/TR/webgpu/ Repository: gpuweb/gpuweb !Participate: <a href="https://github.com/gpuweb/gpuweb/issues/new">File an issue</a> (<a href="https://github.com/gpuweb/gpuweb/issues">open issues</a>) +!Test Suite: <a href="https://github.com/gpuweb/cts">WebGPU CTS</a> Editor: Kai Ninomiya, Google https://www.google.com, kainino@google.com, w3cid 99487 Editor: Brandon Jones, Google https://www.google.com, bajones@google.com, w3cid 87824 @@ -21,9 +22,8 @@ Markup Shorthands: dfn yes Markup Shorthands: idl yes Markup Shorthands: css no Assume Explicit For: yes -Deadline: 1111-11-11 +Deadline: 2025-02-28 </pre> -<!-- TODO(https://github.com/speced/bikeshed/issues/3000): Deadline is a hack for a Bikeshed compile error. --> <pre class=link-defaults> spec:webidl; type:exception; text:TypeError diff --git a/wgsl/index.bs b/wgsl/index.bs index 6beed5c3b9..33d1c24843 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -24,7 +24,7 @@ Text Macro: NINF &minus;&infin; Ignored Vars: i, c0, e, e1, e2, e3, edge, eN, p, s1, s2, sn, AS, AM, N, newbits, M, C, R, v, Stride, Offset, Align, Extent, T, T1, E, S, F, x, y, a, b !Participate: <a href="https://github.com/gpuweb/gpuweb/issues/new?labels=wgsl">File an issue</a> (<a href="https://github.com/gpuweb/gpuweb/issues?q=is%3Aissue+is%3Aopen+label%3Awgsl">open issues</a>) -!Tests: <a href=https://github.com/gpuweb/cts/tree/main/src/webgpu/shader/>WebGPU CTS shader/</a> +!Test Suite: <a href=https://github.com/gpuweb/cts/tree/main/src/webgpu/shader/>WebGPU CTS shader/</a> Editor: Alan Baker, Google https://www.google.com, alanbaker@google.com, w3cid 129277 Editor: Mehmet Oguz Derin, mehmetoguzderin@mehmetoguzderin.com, w3cid 101130 @@ -37,9 +37,8 @@ Markup Shorthands: biblio yes Markup Shorthands: idl yes Markup Shorthands: css no Assume Explicit For: yes -Deadline: 1111-11-11 +Deadline: 2025-02-28 </pre> -<!-- TODO(https://github.com/speced/bikeshed/issues/3000): Deadline is a hack for a Bikeshed compile error. --> <style> tr:nth-child(2n) { From bb105afa0160573210b5fab1342dda1d1a7f9c2d Mon Sep 17 00:00:00 2001 From: alan-baker <alanbaker@google.com> Date: Thu, 16 Jan 2025 11:00:59 -0500 Subject: [PATCH 283/285] Add subgroups feature (#4963) * API * new feature: subgroups * new properties for adapter info: * subgroupMinSize * subgroupMaxSize * WGSL * new enable: subgroups * new built-in values * subgroup_invocation_id * subgroup_size * subgroup and quad built-in functions * new uniformity diagnostic subgroup_uniformity --- spec/index.bs | 44 +++ wgsl/index.bs | 791 +++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 831 insertions(+), 4 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index c286e5c7b4..2485ee8301 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -136,9 +136,12 @@ spec: WGSL; urlPrefix: https://gpuweb.github.io/gpuweb/wgsl/# text: f16; url: extension-f16 text: clip_distances; url: extension-clip_distances text: dual_source_blending; url: extension-dual_source_blending + text: subgroup; url: extension-subgroups for: type text: sampled texture; url: type-sampled-texture text: depth texture; url: type-depth-texture + for: builtin-value + text: subgroup size; url: subgroup-size type: abstract-op text: SizeOf; url: sizeof spec: Internationalization Glossary; urlPrefix: https://www.w3.org/TR/i18n-glossary/# @@ -1898,6 +1901,8 @@ interface GPUAdapterInfo { readonly attribute DOMString architecture; readonly attribute DOMString device; readonly attribute DOMString description; + readonly attribute unsigned long subgroupMinSize; + readonly attribute unsigned long subgroupMaxSize; }; </script> @@ -1930,6 +1935,16 @@ interface GPUAdapterInfo { this value is not recommended. Applications which change their behavior based on the {{GPUAdapterInfo}}, such as applying workarounds for known driver issues, should rely on the other fields when possible. + + : <dfn>subgroupMinSize</dfn> + :: + If the {{GPUFeatureName/"subgroups"}} feature is supported, the minimum + supported [=builtin-value/subgroup size=] for the [=adapter=]. + + : <dfn>subgroupMaxSize</dfn> + :: + If the {{GPUFeatureName/"subgroups"}} feature is supported, the maximum + supported [=builtin-value/subgroup size=] for the [=adapter=]. </dl> <div algorithm data-timeline=content> @@ -1960,6 +1975,20 @@ interface GPUAdapterInfo { instead set |adapterInfo|.{{GPUAdapterInfo/description}} to the empty string or a reasonable approximation of a description. + 1. If {{GPUFeatureName/"subgroups"}} is supported, set {{GPUAdapterInfo/subgroupMinSize}} + to the smallest supported subgroup size. Otherwise, set this value to 4. + + Note: To preserve privacy, the user agent may choose to not support some features or provide values + for the property which do not distinguish different devices, but are still usable + (e.g. use the default value of 4 for all devices). + + 1. If {{GPUFeatureName/"subgroups"}} is supported, set {{GPUAdapterInfo/subgroupMaxSize}} + to the largest supported subgroup size. Otherwise, set this value to 128. + + Note: To preserve privacy, the user agent may choose to not support some features or provide values + for the property which do not distinguish different devices, but are still usable + (e.g. use the default value of 128 for all devices). + 1. Return |adapterInfo|. </div> @@ -2848,6 +2877,7 @@ enum GPUFeatureName { "float32-blendable", "clip-distances", "dual-source-blending", + "subgroups", }; </script> @@ -16613,6 +16643,20 @@ This feature adds the following [=optional API surfaces=]: - New WGSL extensions: - [=extension/dual_source_blending=] +<h3 id=subgroups data-dfn-type=enum-value data-dfn-for=GPUFeatureName>`"subgroups"` +<span id=dom-gpufeaturename-subgroups></span> +</h3> + +Allows the use of the subgroup and quad operations in WGSL. + +This feature adds no [=optional API surfaces=], but the following entries of {{GPUAdapterInfo}} +expose real values whenever the feature is available on the adapter: +- {{GPUAdapterInfo/subgroupMinSize}} +- {{GPUAdapterInfo/subgroupMaxSize}} + +- New WGSL extensions: + - [=extension/subgroups=] + # Appendices # {#appendices} ## Texture Format Capabilities ## {#texture-format-caps} diff --git a/wgsl/index.bs b/wgsl/index.bs index 33d1c24843..3a2060fab9 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -269,6 +269,9 @@ spec: WebGPU; urlPrefix: https://gpuweb.github.io/gpuweb/# text: rasterizationpoint-depth; url: rasterizationpoint-depth text: rasterizationpoint-perspectivedivisor; url: rasterizationpoint-perspectivedivisor text: fragmentdestination-position; url: fragmentdestination-position + text: subgroups-feature; url: subgroups + text: subgroupMinSize; url: dom-gpuadapterinfo-subgroupminsize + text: subgroupMaxSize; url: dom-gpuadapterinfo-subgroupmaxsize for: supported limits text: maxComputeWorkgroupStorageSize; url: dom-supported-limits-maxcomputeworkgroupstoragesize type: attribute @@ -763,6 +766,19 @@ The following table lists the standard set of triggering rules that can be filte <td>A call to a builtin function computes derivatives, but [=uniformity analysis=] cannot prove that the call occurs in [=uniform control flow=]. See [[#uniformity]]. + + <tr> + <td><dfn noexport dfn-for="trigger">subgroup_uniformity</dfn> + <td>[=severity/error=] + <td>The location of the [=call site=] for any + [[#subgroup-builtin-functions|subgroup]] or [[#quad-builtin-functions|quad]] built-in function. + + <td>A call to a subgroup or quad builtin function, but [=uniformity analysis=] cannot prove that the call occurs in [=uniform control flow=]. + Additionally, when uniformity analysis cannot prove that the following parameter values are [=uniform value|uniform=]: + * `delta` in [[#subgroupshuffleup-builtin|subgroupShuffleUp]] and [[#subgroupshuffledown-builtin|subgroupShuffleDown]] + * `mask` in [[#subgroupshufflexor-builtin|subgroupShuffleXor]] + + See [[#uniformity]]. </table> Using an unrecognized triggering rule consisting of a single [=diagnostic name-token=] should trigger a warning from the user agent. @@ -1381,6 +1397,8 @@ The [=built-in value=] names are: * <a for="built-in values" lt=global_invocation_id>`'global_invocation_id'`</a> * <a for="built-in values" lt=workgroup_id>`'workgroup_id'`</a> * <a for="built-in values" lt=num_workgroups>`'num_workgroups'`</a> +* <a for="built-in values" lt=subgroup_invocation_id>`'subgroup_invocation_id'`</a> +* <a for="built-in values" lt=subgroup_size>`'subgroup_size'`</a> ### Diagnostic Rule Names ### {#diagnostic-rule-names} @@ -1395,6 +1413,7 @@ path: syntax/diagnostic_name_token.syntax.bs.include The pre-defined [=diagnostic/triggering rule|diagnostic rule=] names are: * <a for=trigger lt=derivative_uniformity>`'derivative_uniformity'`</a> +* <a for=trigger lt=subgroup_uniformity>`'subgroup_uniformity'`</a> ### Diagnostic Severity Control Names ### {#diagnostic-severity-control-names} @@ -1781,6 +1800,12 @@ The valid [=enable-extensions=] are listed in the following table. <td>[[WebGPU#dom-gpufeaturename-dual-source-blending|"dual-source-blending"]] <td>The attribute [=attribute/blend_src=] is valid to use in the WGSL module. Otherwise, using [=attribute/blend_src=] will result in a [=shader-creation error=]. + <tr><td><dfn noexport dfn-for="extension">`subgroups`</dfn> + <td>"[=subgroups-feature|subgroups=]" + <td>It is valid to use [[#builtin-inputs-outputs|subgroup built-in values]], + [[#subgroup-builtin-functions|subgroup built-in functions]], and + [[#quad-builtin-functions|quad built-in functions]] in a WGSL module. + Otherwise, any use will result in a [=shader-creation error=]. </table> <div class='example wgsl using extensions expect-error' heading="Using hypothetical enable-extensions"> @@ -9273,6 +9298,20 @@ Each is described in detail in subsequent sections. <td>compute <td>input <td>vec3&lt;u32&gt; + + <tr><td rowspan=2>[=built-in values/subgroup_invocation_id=] + <td>compute + <td rowspan=2>input + <td rowspan=2>u32 + <td rowspan=2>[=extension/subgroups=] + <tr><td>fragment + + <tr><td rowspan=2>[=built-in values/subgroup_size=] + <td>compute + <td rowspan=2>input + <td rowspan=2>u32 + <td rowspan=2>[=extension/subgroups=] + <tr><td>fragment </table> <div class='example wgsl global-scope' heading="Declaring built-in values"> @@ -9639,6 +9678,40 @@ Each is described in detail in subsequent sections. [=group_count_y=] - 1, [=group_count_z=] - 1). </table> +##### `subgroup_invocation_id` ##### {#subgroup-invocation-id-builtin-value} + +<table class='data'> + <tr><td style="width:10%">Name + <td><dfn noexport dfn-for="built-in values">subgroup_invocation_id</dfn> + <tr><td style="width:10%">Stage + <td>[=compute shader stage|compute=] or [=fragment shader stage|fragment=] + <tr><td style="width:10%">Type + <td>u32 + <tr><td style="width:10%">Direction + <td>Input + <tr><td style="width:10%">Description + <td> + The current invocation's [=subgroup invocation ID=]. + + The ID is within the range [0, [=built-in values/subgroup_size=] - 1]. +</table> + +##### `subgroup_size` ##### {#subgroup-size-builtin-value} + +<table class='data'> + <tr><td style="width:10%">Name + <td><dfn noexport dfn-for="built-in values">subgroup_size</dfn> + <tr><td style="width:10%">Stage + <td>[=compute shader stage|compute=] or [=fragment shader stage|fragment=] + <tr><td style="width:10%">Type + <td>u32 + <tr><td style="width:10%">Direction + <td>Input + <tr><td style="width:10%">Description + <td> + The [=subgroup size=] of current invocation's subgroup. +</table> + #### User-defined Inputs and Outputs #### {#user-defined-inputs-outputs} User-defined data can be passed as input to the start of a pipeline, passed @@ -10928,6 +11001,16 @@ when [=uniformity analysis=] cannot prove that a particular [[#collective-operat * If a uniformity failure is triggered for a [[#sync-builtin-functions|synchronization builtin]], an [=severity/error=] [=diagnostic=] is [=triggered=], which results in a [=shader-creation error=]. +* If a uniformity failure is triggered for a [[#subgroup-builtin-functions|subgroup]] or + [[#quad-builtin-functions|quad]] builtin, then a [=trigger/subgroup_uniformity=] + [=diagnostic=] is [=triggered=] + * The diagnostic's [=diagnostic/triggering location=] is the location of the [=call site=] of that builtin + or the location of the parameter required to be uniform in the case of + [[#subgroupshuffleup-builtin|subgroupShuffleUp]], + [[#subgroupshuffledown-builtin|subgroupShuffleDown]], or + [[#subgroupshufflexor-builtin|subgroupShuffleXor]] + * The diagnostic's [=diagnostic/severity=] defaults to an [=severity/error=], but can be controlled with + a [=diagnostic filter=]. ### Terminology and Concepts ### {#uniformity-concepts} @@ -10977,8 +11060,9 @@ Note: Another way of saying the same thing is that we do a topological sort of f Additionally, for each function call, the analysis computes and propagates the set of [=diagnostic/triggering rules=], if any, that would be triggered if that call cannot be proven to be in uniform control flow. We call this the <dfn noexport>potential-trigger-set</dfn> for the call. -The elements of this set are drawn from two possibilites: -* [=trigger/derivative_uniformity=], for functions relying on computing a derivative, or +The elements of this set are drawn from the following possibilites: +* [=trigger/derivative_uniformity=], for functions relying on computing a derivative, +* [=trigger/subgroup_uniformity=], for functions in [[#subgroup-builtin-functions]] or [[#quad-builtin-functions]], or * an unnamed triggering rule, for the uniformity requirements that cannot be filtered. * This is used for compute shader functions relying on [[#sync-builtin-functions|synchronization functions]]. @@ -11638,7 +11722,36 @@ Here is the list of exceptions: - [=ReturnValueMayBeNonUniform=] if the argument corresponding to the `t` parameter is a [=type/read-write storage texture=] - [=NoRestriction=] otherwise - +- A call to a function in + [[#subgroup-builtin-functions]] or [[#quad-builtin-functions]]: + - Has a [=function tag=] of [=ReturnValueMayBeNonUniform=]. + - Let *DF* be the [=nearest enclosing diagnostic filter=] for the call site location and triggering rule [=trigger/subgroup_uniformity=] + - Has a [=call site tag=] as follows: + - If *DF* exists, then let *S* be the *DF*'s new severity parameter. + - If *S* is the severity [=severity/off=], the call site tag is [=CallSiteNoRestriction=]. + - Otherwise, the call site tag is [=CallSiteRequiredToBeUniform.S=], with [=potential-trigger-set=] + consisting of a [=trigger/subgroup_uniformity=] element. + - If there is no such *DF*, + the call site tag is [=CallSiteRequiredToBeUniform.S|CallSiteRequiredToBeUniform.error=], with [=potential-trigger-set=] + consisting of a [=trigger/subgroup_uniformity=] element. + - Additionally for the case of a call to [[#subgroupshuffleup-builtin|subgroupShuffleUp]] or [[#subgroupshuffledown-builtin|subgroupShuffleDown]], + the parameter `delta` has a [=parameter tag=] of: + - If *DF* exists, then let *S* be *DF*'s new severity parameter. + - If *S* is the severity [=severity/off=], the parameter tag is [=NoRestriction=]. + - Otherwise, the parameter tag is [=ParameterRequiredToBeUniform.S=] with [=potential-trigger-set=] + consisting of a [=trigger/subgroup_uniformity=] element. + - If there is no such *DF*, + the parameter tag is [=ParameterRequiredToBeUniform.S|ParameterRequiredToBeUniform.error=], with [=potential-trigger-set=] + consisting of a [=trigger/subgroup_uniformity=] element. + - Additionally for the case of a call to [[#subgroupshufflexor-builtin|subgroupShuffleXor]], + the parameter `mask` has a [=parameter tag=] of: + - If *DF* exists, then let *S* be *DF*'s new severity parameter. + - If *S* is the severity [=severity/off=], the parameter tag is [=NoRestriction=]. + - Otherwise, the parameter tag is [=ParameterRequiredToBeUniform.S=] with [=potential-trigger-set=] + consisting of a [=trigger/subgroup_uniformity=] element. + - If there is no such *DF*, + the parameter tag is [=ParameterRequiredToBeUniform.S|ParameterRequiredToBeUniform.error=], with [=potential-trigger-set=] + consisting of a [=trigger/subgroup_uniformity=] element. Note: A WGSL implementation will ensure that if control flow prior to a function call is [=uniform control flow|uniform=], it will also be uniform @@ -11781,6 +11894,7 @@ The rules for analyzing expressions take as argument both the expression itself The following built-in input variables are considered uniform: - [=built-in values/workgroup_id=] - [=built-in values/num_workgroups=] +- [=built-in values/subgroup_size=] when used in a [=compute shader stage=] All other ones (see [=built-in values=]) are considered non-uniform. @@ -12175,9 +12289,18 @@ WebGPU provides no guarantees about: ## Fragment Shaders and Helper Invocations ## {#fragment-shaders-helper-invocations} Invocations in the [=fragment shader stage=] are divided into 2x2 grids of -invocations with neighbouring [=built-in values/position|positions=] in the X and Y dimensions. +invocations with neighbouring [=built-in values/position|positions=] in the X +and Y dimensions. Each of these grids is referred to as a <dfn noexport>quad</dfn>. Quads can collaborate in some collective operations (see [[#derivatives]]). +An invocation's <dfn noexport>quad invocation ID</dfn> is the unique ID within +the quad, where: +* ID 0 is the upper-left invocation. +* ID 1 is the upper-right invocation. +* ID 2 is the lower-left invocation. +* ID 3 is the lower-right invocation. + +Note: There is no built-in value accessor for the quad ID. Ordinarily, [[WebGPU#fragment-processing|fragment processing]] creates one invocation of a fragment shader for each @@ -12206,6 +12329,58 @@ executing a [=statement/discard=] statement), execution of the quad may be terminated; however, such termination is not considered to produce [=uniform control flow|non-uniform control flow=]. +## Subgroups ## {#subgroups} + +A <dfn noexport>subgroup</dfn> is a set of invocations which concurrently +execute a [=compute shader stage|compute=] or [=fragment shader +stage|fragment=] shader stage [=entry point=], and which can efficiently share +data and [[#subgroup-ops|collectively]] compute results. +Each invocation in a compute or fragment shader belongs to exactly one subgroup. +In a compute shader, each subgroup is a subset of a particular [=compute shader stage/workgroup=]. +In a fragment shader, a subgroup might contain invocations from multiple [=draw commands=]. +Each [=quad=] will be contained in a single subgroup. + +The <dfn noexport>subgroup size</dfn> is the maximum number of invocations in a subgroup. +This value is accessible via the [=built-in values/subgroup_size=] built-in value. +Subgroup size is a [=uniform value=] within a [=compute shader stage/workgroup=], +but not within a [=draw command=]. +All subgroup sizes are within the range [4, 128] and the value for a shader +compiled for a specific device will be within the range [[=subgroupMinSize=], +[=subgroupMaxSize=]] for the [[WebGPU#gpuadapter]]. +The actual size is a function of the device properties and device compiler. +Each device supports a subset of the possible range of subgroup sizes (possibly a single value). +The device compiler selects a size from the supported sizes using a variety of heuristics. +Each subgroup may contain fewer invocations than the reported subgroup size +(e.g. if fewer invocations than the subgroup size are launched). + +An invocation's <dfn noexport>subgroup invocation ID</dfn> is a unique ID within the subgroup. +This id is accessible via the [=built-in values/subgroup_invocation_id=] +built-in value and is in the range [0, `subgroup_size` - 1]. +There is no defined relationship between `subgroup_invocation_id` and +[=built-in values/local_invocation_index=]. +To avoid non-portable code, shader authors should not assume a particular +mapping between these two values. + +When invocations in the same subgroup execute different control flow paths, we +say subgroup execution has diverged. +This is a special case of [=uniform control flow|non-uniform control flow=]. +Divergence affects the semantics of subgroup operations. +The invocations in a subgroup that concurrently execute a subgroup operation +are <dfn noexport dfn-for="subgroups">active</dfn> for that operation. +Other invocations in the subgroup are <dfn noexport dfn-for="subgroups">inactive</dfn> +for that operation. +When the subgroup size exceeds the number of invocations in a subgroup, the +extra hypothetical invocations are considered inactive. +[=Helper invocations=] may be active or inactive in an operation. +That is, on some devices helper invocations may participate in subgroup operations, +while on other devices they may not. + +Note: There is considerable non-portability among underlying devices when +operating in non-uniform control flow, and device compilers often aggressively +optimize such code. +The result is that the subgroup may contain a different set of active +invocations than the shader author expects. + ## Collective Operations ## {#collective-operations} ### Barriers ### {#barrier} @@ -12252,6 +12427,30 @@ For each call to one of these functions, a [=trigger/derivative_uniformity=] [=d If one of these functions is called in non-uniform control flow, then the result is an [=indeterminate value=]. +Note: Derivatives are an implicit type of [[#quad-ops|quad operation]]. +Their use does not require the [=extension/subgroups=] extension. + +### Subgroup Operations ### {#subgroup-ops} + +The [[#subgroup-builtin-functions|subgroup built-in functions]] allow efficient +communication and computation between the invocations in a [=subgroup=]. +Subgroup operations are single-instruction multiple-thread (SIMT) operations. + +The [=subgroups/active=] invocations in a subgroup communicate to determine results. +Therefore, portability is maximized when these functions are invoked when all +invocations are active (i.e. in [=uniform control flow=] at the subgroup +level). + +### Quad Operations ### {#quad-ops} + +The [[#quad-builtin-functions|quad built-in functions]] operate on a [=quad=] +of invocations. +They are useful for data communication among the [=quad=]. + +The [=subgroups/active=] invocations in a quad communicate to determine results. +Therefore, portability is maximized when these functions are invoked when all +invocations are active (i.e. in [=uniform control flow=] at the quad level). + ## Floating Point Evaluation ## {#floating-point-evaluation} WGSL floating point features are based on the [[!IEEE-754|IEEE-754]] standard for floating point, @@ -12676,6 +12875,24 @@ the rules in [[#floating-point-rounding-and-overflow]] apply. <tr><td>`unpack2x16snorm(x)`<td>3 ULP<td>N/A <tr><td>`unpack2x16unorm(x)`<td>3 ULP<td>N/A <tr><td>`unpack2x16float(x)`<td>Correctly rounded<td>N/A + <tr><td>`subgroupBroadcast(x, i)`<td colspan=2>Correctly rounded + <tr><td>`subgroupBroadcastFirst(x)`<td colspan=2>Correctly rounded + <tr><td>`subgroupAdd(x)`<td colspan=2>Inherited from sum of x for all [=subgroups/active=] invocations in the subgroup + <tr><td>`subgroupExclusiveAdd(x)`<td colspan=2>Inherited from the sum of x for all [=subgroups/active=] invocations in the subgroup whose [=subgroup invocation ID=]s are less than the current invocation's ID. + <tr><td>`subgroupInclusiveAdd(x)`<td colspan=2>Inherited from the sum of x for all [=subgroups/active=] invocations in the subgroup whose [=subgroup invocation ID=]s are less than or equal to current invocation's ID. + <tr><td>`subgroupMul(x)`<td colspan=2>Inherited from the product of x for all [=subgroups/active=] invocations in the subgroup + <tr><td>`subgroupExclusiveMul(x)`<td colspan=2>Inherited from the product of x<sub>i</sub> for all [=subgroups/active=] invocations in the subgroup whose [=subgroup invocation ID=] is less than ID of the i'th invocation + <tr><td>`subgroupInclusiveMul(x)`<td colspan=2>Inherited from the product of x<sub>i</sub> for all [=subgroups/active=] invocations in the subgroup whose [=subgroup invocation ID=] is less than or equal to ID of the i'th invocation + <tr><td>`subgroupMax(x)`<td colspan=2>Inherited from max(x) for all [=subgroups/active=] invocations in the subgroup + <tr><td>`subgroupMin(x)`<td colspan=2>Inherited from min(x) for all [=subgroups/active=] invocations in the subgroup + <tr><td>`subgroupShuffle(x, id)`<td colspan=2>Correctly rounded + <tr><td>`subgroupShuffleDown(x, delta)`<td colspan=2>Correctly rounded + <tr><td>`subgroupShuffleUp(x, delta)`<td colspan=2>Correctly rounded + <tr><td>`subgroupShuffleXor(x, mask)`<td colspan=2>Correctly rounded + <tr><td>`quadBroadcast(x, id)`<td colspan=2>Correctly rounded + <tr><td>`quadSwapDiagonal(x)`<td colspan=2>Correctly rounded + <tr><td>`quadSwapX(x)`<td colspan=2>Correctly rounded + <tr><td>`quadSwapY(x)`<td colspan=2>Correctly rounded </table> @@ -18538,6 +18755,572 @@ All synchronization functions [=shader-creation error|must=] only be invoked in space. </table> +## Subgroup Built-in Functions ## {#subgroup-builtin-functions} + +See [[#subgroup-ops]]. + +Calls to these functions: +* [=shader-creation error|Must=] only be used in a [=fragment=] or [=compute=] shader stage. +* [=Trigger=] a [=trigger/subgroup_uniformity=] [=diagnostic=] if [=uniformity analysis=] + cannot prove the call is in [=uniform control flow=]. + +Note: For the [=compute shader stage=], the scope of uniform control flow is +the [=compute shader stage/workgroup=]. +For the [=fragment shader stage=], the scope of uniform control flow is the +[=draw command=]. +Both of these scopes are larger than [=subgroup=]. + +### `subgroupAdd` ### {#subgroupadd-builtin} + +<table class='data builtin'> + <tr algorithm="subgroupAdd"> + <td style="width:10%">Overload + <td class="nowrap"> + <xmp highlight=wgsl> + @must_use fn subgroupAdd(e : T) -> T + + + Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] + + Description + Reduction operation.
+ + Returns the sum of `e` among all [=subgroups/active=] invocations in the [=subgroup=]. + + +#### `subgroupExclusiveAdd` #### {#subgroupexclusiveadd-builtin} + + + + + +
Overload + + + @must_use fn subgroupExclusiveAdd(e : T) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] +
Description + Exclusive prefix scan operation.
+ + Returns the sum of `e` among all [=subgroups/active=] invocations in the + [=subgroup=] whose [=subgroup invocation IDs=] are less than the current + invocation's id. + + The value returned for the invocation with the lowest id among active invocations is `T(0)`. +
+ +#### `subgroupInclusiveAdd` #### {#subgroupinclusiveadd-builtin} + + + + + +
Overload + + + @must_use fn subgroupInclusiveAdd(e : T) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] +
Description + Inclusive prefix scan operation.
+ + Returns the sum of `e` among all [=subgroups/active=] invocations in the + [=subgroup=] whose [=subgroup invocation IDs=] are less than or equal to + the current invocation's id. + + Note: equivalent to `subgroupInclusiveAdd(x) + x`. +
+ +### `subgroupAll` ### {#subgroupall-builtin} + + + + +
Overload + + + @must_use fn subgroupAll(e : bool) -> bool + +
Description + Returns `true` if `e` is `true` for all [=subgroups/active=] invocations in the [=subgroup=]. +
+ +### `subgroupAnd` ### {#subgroupand-builtin} + + + + + +
Overload + + + @must_use fn subgroupAnd(e : T) -> T + +
Preconditions + `T` is [INTEGRAL] +
Description + Reduction operation.
+ + Returns the bitwise and (`&`) of `e` among all [=subgroups/active=] + invocations in the [=subgroup=]. +
+ +### `subgroupAny` ### {#subgroupany-builtin} + + + + +
Overload + + + @must_use fn subgroupAny(e : bool) -> bool + +
Description + Returns `true` if `e` is `true` for any [=subgroups/active=] invocations in the [=subgroup=]. +
+ +### `subgroupBallot` ### {#subgroupballot-builtin} + + + + +
Overload + + + @must_use fn subgroupBallot(pred : bool) -> vec4<u32> + +
Description + Returns a bitmask of the [=subgroups/active=] invocations in the + [=subgroup=] for whom `pred` is `true`.
+ + The x component of the return value contains invocations 0 through 31.
+ The y component of the return value contains invocations 32 through 63.
+ The z component of the return value contains invocations 64 through 95.
+ The w component of the return value contains invocations 96 through 127.
+ + Within each component, the IDs are in ascending order by bit position + (e.g. ID 32 is at bit position 0 in the y component). +
+ +### `subgroupBroadcast` ### {#subgroupbroadcast-builtin} + + + + + +
Overload + + + @must_use fn subgroupBroadcast(e : T, id : I) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=]
+ `I` is [=u32=] or [=i32=] +
Description + Returns the value of `e` from the invocation whose [=subgroup invocation ID=] + matches `id` in the subgroup to all [=subgroups/active=] invocations in the [=subgroup=]. + + `id` [=shader-creation error|must=] be a [=const-expression=] in the range [0, 128). + + It is a [=dynamic error=] if `id` does not select an [=subgroups/active=] + invocation. + + Note: If a non-constant version of `id` is required, use + [[#subgroupshuffle-builtin|subgroupShuffle]] instead. +
+ +#### `subgroupBroadcastFirst` #### {#subgroupbroadcastfirst-builtin} + + + + + +
Overload + + + @must_use fn subgroupBroadcastFirst(e : T) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] +
Description + Returns the value of `e` from the invocation that has the lowest + [=subgroup invocation ID=] among [=subgroups/active=] invocations in the + [=subgroup=] to all active invocations in the subgroup. +
+ +### `subgroupElect` ### {#subgroupelect-builtin} + + + + +
Overload + + + @must_use fn subgroupElect() -> bool + +
Description + Returns `true` if the current invocation has the lowest [=subgroup invocation ID=] + among [=subgroups/active=] invocations in the [=subgroup=]. +
+ +### `subgroupMax` ### {#subgroupmax-builtin} + + + + + +
Overload + + + @must_use fn subgroupMax(e : T) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] +
Description + Reduction operation.
+ + Returns the maximum value of `e` among all [=subgroups/active=] invocations in the [=subgroup=]. +
+ +### `subgroupMin` ### {#subgroupmin-builtin} + + + + + +
Overload + + + @must_use fn subgroupMin(e : T) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] +
Description + Reduction operation.
+ + Returns the minimum value of `e` among all [=subgroups/active=] invocations in the [=subgroup=]. +
+ +### `subgroupMul` ### {#subgroupmul-builtin} + + + + + +
Overload + + + @must_use fn subgroupMul(e : T) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] +
Description + Reduction operation.
+ + Returns the product of `e` among all [=subgroups/active=] invocations in the [=subgroup=]. +
+ +#### `subgroupExclusiveMul` #### {#subgroupexclusivemul-builtin} + + + + + +
Overload + + + @must_use fn subgroupExclusiveMul(e : T) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] +
Description + Exclusive prefix scan operation.
+ + Returns the product of `e` among all [=subgroups/active=] invocations in + the [=subgroup=] whose [=subgroup invocation IDs=] are less than the + current invocation's id. + + The value returned for the invocation with the lowest id among active invocations is `T(1)`. +
+ +#### `subgroupInclusiveMul` #### {#subgroupinclusivemul-builtin} + + + + + +
Overload + + + @must_use fn subgroupInclusiveMul(e : T) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] +
Description + Inclusive prefix scan operation.
+ + Returns the product of `e` among all [=subgroups/active=] invocations in + the [=subgroup=] whose [=subgroup invocation IDs=] are less than or equal + to the current invocation's id. + + Note: equivalent to `subgroupExclusiveMul(x) * x`. +
+ +### `subgroupOr` ### {#subgroupor-builtin} + + + + + +
Overload + + + @must_use fn subgroupOr(e : T) -> T + +
Preconditions + `T` is [INTEGRAL] +
Description + Reduction operation.
+ + Returns the bitwise or (`|`) of `e` among all [=subgroups/active=] + invocations in the [=subgroup=]. +
+ +### `subgroupShuffle` ### {#subgroupshuffle-builtin} + + + + + +
Overload + + + @must_use fn subgroupShuffle(e : T, id : I) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=]
+ `I` is [=u32=] or [=i32=] +
Description + Returns `e` from the invocation whose [=subgroup invocation ID=] matches `id`. + + If `id` is outside the range [0, 128), then: + * It is a [=shader-creation error=] if `id` is a [=const-expression=]. + * It is a [=pipeline-creation error=] if `id` is an [=override-expression=]. + + An [=indeterminate value=] is returned if `id` does not select an + [=subgroups/active=] invocation. +
+ +#### `subgroupShuffleDown` #### {#subgroupshuffledown-builtin} + + + + + +
Overload + + + @must_use fn subgroupShuffle(e : T, delta : u32) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] +
Description + Returns `e` from the invocation whose [=subgroup invocation ID=] + matches `subgroup_invocation_id + delta` for the current invocation. + + If `delta` is greater than 127, then: + * It is a [=shader-creation error=] if `delta` is a [=const-expression=]. + * It is a [=pipeline-creation error=] if `delta` is an [=override-expression=]. + + A [=trigger/subgroup_uniformity=] [=diagnostic=] is [=triggered=] if + `delta` is not a [=uniform value=]. + An [=indeterminate value=] is returned if `subgroup_invocation_id + delta` + does not select an [=subgroups/active=] invocation or if `delta` is a not a + uniform value within the subgroup. +
+ +#### `subgroupShuffleUp` #### {#subgroupshuffleup-builtin} + + + + + +
Overload + + + @must_use fn subgroupShuffleUp(e : T, delta : u32) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] +
Description + Returns `e` from the invocation whose [=subgroup invocation ID=] + matches `subgroup_invocation_id - delta` for the current invocation. + + If `delta` is greater than 127, then: + * It is a [=shader-creation error=] if `delta` is a [=const-expression=]. + * It is a [=pipeline-creation error=] if `delta` is an [=override-expression=]. + + A [=trigger/subgroup_uniformity=] [=diagnostic=] is [=triggered=] if + `delta` is not a [=uniform value=]. + An [=indeterminate value=] is returned if `subgroup_invocation_id - delta` + does not select an [=subgroups/active=] invocation or if `delta` is not a + uniform value within the subgroup. +
+ +#### `subgroupShuffleXor` #### {#subgroupshufflexor-builtin} + + + + + +
Overload + + + @must_use fn subgroupShuffleXor(e : T, mask : u32) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] +
Description + Returns `e` from the invocation whose [=subgroup invocation ID=] + matches `subgroup_invocation_id ^ mask` for the current invocation. + + If `mask` is greater than 127, then: + * It is a [=shader-creation error=] if `mask` is a [=const-expression=]. + * It is a [=pipeline-creation error=] if `mask` is an [=override-expression=]. + + A [=trigger/subgroup_uniformity=] [=diagnostic=] is [=triggered=] if + `mask` is not a [=uniform value=]. + An [=indeterminate value=] is returned if `mask` + does not select an [=subgroups/active=] invocation or if `mask` is not a + uniform value within the subgroup. +
+ +### `subgroupXor` ### {#subgroupxor-builtin} + + + + + +
Overload + + + @must_use fn subgroupXor(e : T) -> T + +
Preconditions + `T` is [INTEGRAL] +
Description + Reduction operation.
+ + Returns the bitwise xor (`^`) of `e` among all [=subgroups/active=] + invocations in the [=subgroup=]. +
+ +## Quad Operations ## {#quad-builtin-functions} + +See [[#quad-ops]]. + +Calls to these functions: +* [=shader-creation error|Must=] only be used in a [=fragment=] or [=compute=] shader stage. +* [=Trigger=] a [=trigger/subgroup_uniformity=] [=diagnostic=] if [=uniformity analysis=] + cannot prove the call is in [=uniform control flow=]. + +Note: For the [=compute shader stage=], the scope of uniform control flow is +the [=compute shader stage/workgroup=]. +For the [=fragment shader stage=], the scope of uniform control flow is the +[=draw command=]. +Both of these scopes are larger than [=quad=]. + +### `quadBroadcast` ### {#quadbroadcast-builtin} + + + + + +
Overload + + + @must_use fn quadBroadcast(e : T, id : I) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=]
+ `I` is [=u32=] or [=i32=] +
Description + Returns the value of `e` from the invocation whose [=quad invocation ID=] + matches `id` in the quad to all [=subgroups/active=] invocations in the quad. + + `id` [=shader-creation error|must=] be a [=const-expression=] in the range [0, 4). + + An [=indeterminate value=] is returned if `id` does not select an + [=subgroups/active=] invocation. + + Note: Unlike [[#subgroupbroadcast-builtin|subgroupBroadcast]], there is currently + no non-constant alternative. +
+ +### `quadSwapDiagonal` ### {#quadswapdiagonal-builtin} + + + + + +
Overload + + + @must_use fn quadSwapDiagonal(e : T) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] +
Description + Returns the value of `e` from the invocation in the [=quad=] with the opposite coordinates. + That is: + * IDs 0 and 3 swap. + * IDs 1 and 2 swap. +
+ +### `quadSwapX` ### {#quadswapx-builtin} + + + + + +
Overload + + + @must_use fn quadSwapX(e : T) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] +
Description + Returns the value of `e` from invocation in the [=quad=] sharing the same X dimension. + That is: + * IDs 0 and 1 swap. + * IDs 2 and 3 swap. +
+ +### `quadSwapY` ### {#quadswapy-builtin} + + + + + +
Overload + + + @must_use fn quadSwapY(e : T) -> T + +
Preconditions + `T` is [=type/concrete=] [=numeric scalar=] or [=numeric vector=] +
Description + Returns the value of `e` from invocation in the [=quad=] sharing the same Y dimension. + That is: + * IDs 0 and 2 swap. + * IDs 1 and 3 swap. +
+ # Grammar for Recursive Descent Parsing # {#grammar-recursive-descent} This section is non-normative. From 3dc53130dafbd7754134058d2863b229102fabd2 Mon Sep 17 00:00:00 2001 From: Mehmet Oguz Derin Date: Wed, 22 Jan 2025 03:56:46 +0900 Subject: [PATCH 284/285] Fix tags that cause build errors in the WGSL spec (#5046) --- wgsl/index.bs | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/wgsl/index.bs b/wgsl/index.bs index 3a2060fab9..79a396774c 100644 --- a/wgsl/index.bs +++ b/wgsl/index.bs @@ -694,14 +694,14 @@ A [=diagnostic=] has the following properties: * A [=diagnostic/triggering location=]. The severity of a diagnostic is one of the following, ordered from greatest to least: -: error +: error :: The diagnostic is an error. This corresponds to a [=shader-creation error=] or to a [=pipeline-creation error=]. -: warning +: warning :: The diagnostic describes an anomaly that merits the attention of the application developer, but is not an error. -: info +: info :: The diagnostic describes a notable condition that merits attention of the application developer, but is not an error or warning. -: off +: off :: The diagnostic is disabled. It will not be conveyed to the application. The name of a [=diagnostic/triggering rule=] is either: @@ -7121,7 +7121,7 @@ first a [=read access=] gets the old value, and then a [=write access=] stores t A compound assignment can rewritten as different WGSL code that uses a [=simple assignment=] instead. The idea is to use a pointer to hold the result of evaluating the reference once. -

For example, +

For example, when |e1| is *not* a reference to a component inside a vector, then
|e1|` += `|e2|; @@ -7131,9 +7131,8 @@ can be rewritten as `{ let p = &(`|e1|`); *p = *p + (`|e2|`); }`
where the identifier `p` is chosen to be different from all other identifiers in the program. -

- -

When +

+
When |e1| is a reference to a component inside a vector, the above technique needs to be modified because WGSL does not allow [[#address-of-expr|taking the address]] in that case. For example, if ev is a reference to a vector, the statement @@ -9718,7 +9717,7 @@ User-defined data can be passed as input to the start of a pipeline, passed between stages of a pipeline or output from the end of a pipeline. Each user-defined input datum and -user-defined output datum [=shader-creation error|must=]: +user-defined output datum [=shader-creation error|must=]: * be of [=numeric scalar=] type or [=numeric vector=] type. * be assigned an IO location. See [[#input-output-locations]]. @@ -10002,7 +10001,7 @@ is determined from the size of the corresponding {{GPUBufferBinding}}: * Let |EBS| be the [=effective buffer binding size=] for the {{GPUBufferBinding}} bound to the pipeline binding address corresponding to the storage buffer variable. -* Then NRuntime, i.e. +* Then NRuntime, i.e. the number of elements in the runtime-sized array, is the largest integer such that [=SizeOf=](|T|) ≤ |EBS|. @@ -10103,7 +10102,6 @@ The following table shows examples of [=NRuntime=] for the `point` member of the 102531[=truncate=]( ( 1025 - 16 ) ÷ 32) ) 103931[=truncate=]( ( 1039 - 16 ) ÷ 32) ) 104032[=truncate=]( ( 1040 - 16 ) ÷ 32) ) -
@@ -12479,7 +12477,7 @@ An [[!IEEE-754|IEEE-754]] binary floating point type approximates the [=extended * A 1-bit sign field. * A fixed-width exponent field. * A fixed-width trailing significand field. - * An integer-valued exponent bias related to interpretation of the [=ieee754/exponent field=]. + * An integer-valued exponent bias related to interpretation of the [=ieee754/exponent field=]. The finite range of a floating point type is the [=interval=] [|low|, |high|], where |low| is the lowest finite value in the type, and |high| is the highest finite value in the type. From d53c4c12bb2a94d5220ae4cc003239e869d81eaa Mon Sep 17 00:00:00 2001 From: Kai Ninomiya Date: Tue, 21 Jan 2025 19:09:26 -0800 Subject: [PATCH 285/285] [editorial] API spec: Fix incorrect end tags (#5047) --- spec/index.bs | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/spec/index.bs b/spec/index.bs index 2485ee8301..1a1f4a1ce5 100644 --- a/spec/index.bs +++ b/spec/index.bs @@ -2374,6 +2374,7 @@ interface GPU { 1. Return either {{GPUTextureFormat/"rgba8unorm"}} or {{GPUTextureFormat/"bgra8unorm"}}, depending on which format is optimal for displaying WebGPU canvases on this system. +
@@ -8796,27 +8797,27 @@ location: - - - - - -
RGBAsrc + RGBAsrc Color output by the fragment shader for the color attachment. If the shader doesn't return an alpha channel, src-alpha blend factors cannot be used.
RGBAsrc1 + RGBAsrc1 Color output by the fragment shader for the color attachment with "@blend_src" attribute equal to `1`. If the shader doesn't return an alpha channel, src1-alpha blend factors cannot be used.
RGBAdst + RGBAdst Color currently in the color attachment. Missing green/blue/alpha channels default to `0, 0, 1`, respectively.
RGBAconst + RGBAconst The current {{RenderState/[[blendConstant]]}}.
RGBAsrcFactor + RGBAsrcFactor The source blend factor components, as defined by {{GPUBlendComponent/srcFactor}}.
RGBAdstFactor + RGBAdstFactor The destination blend factor components, as defined by {{GPUBlendComponent/dstFactor}}.
@@ -12251,7 +12252,6 @@ called the render pass encoder can no longer be used. update the attachment. Validation that requires the store op to not be provided for read-only attachments is done in [$GPURenderPassDepthStencilAttachment/GPURenderPassDepthStencilAttachment Valid Usage$].
-