Skip to content

Options cost estimates

siliconvoodoo edited this page May 16, 2023 · 7 revisions

Introduction

In order to help engineers and technical artists to optimize their projects, o3de has a flexible concept of shader builds based on a tree of "virtual" variants that may physically be there or not. Physically meaning having a bytecode built with hardcoded values for the specific option of that variant. If it's not present, there will be a dynamic fallback where the option is a variable. The root is the bytecode with all options as variables. (Unpacked from a bitfield in an auto-generated fallback key, usually uint4, packed in a constant buffer with the rest of the SRG constants).

Here is the o3de document page about them: https://www.o3de.org/docs/atom-guide/dev-guide/shaders/azsl/shader-variant-options/

The shader management console (SMC) is a tool that allows toying with the presence or abscence of physical bytecodes depending on the options the engineer/designer deems important.

Here is the o3de wiki document user guide for SMC: https://github.com/o3de/o3de/wiki/%5BShader-Management-Console%5D-User-Guide

To help this estimate, we optionally use AMD RGA to count the number of registers used by a variant.
And secondly, AZSLc is cooking an estimate we can call the "impact cost", of an option. With the intention of providing a reasonable default order of priority for variant baking. Hopefully ordering first the baking of variant bytecodes with options-as-constants for the most impactful options first.

Gist

As a support for example, let's take this extract for enhanced PBR forward pass shader of o3de:

void ApplyDirectLighting( SurfaceData_EnhancedPBR  surface, inout  LightingData_BasePBR  lightingData, float4 screenUv)
{
    if( IsDirectLightingEnabled() )
    {
        if (o_enableDirectionalLights)
        {
            ApplyDirectionalLights(surface, lightingData, screenUv);
        }
        if (o_enablePunctualLights)
        {
            ApplySimplePointLights(surface, lightingData);
            ApplySimpleSpotLights(surface, lightingData);
        }
        if (o_enableAreaLights)
        {
            ApplyPointLights(surface, lightingData);
            ApplyDiskLights(surface, lightingData);
            ApplyCapsuleLights(surface, lightingData);
            ApplyQuadLights(surface, lightingData);
            ApplyPolygonLights(surface, lightingData);
        }
    }
    else if(IsDebuggingEnabled_PLACEHOLDER() && GetRenderDebugViewMode() == RenderDebugViewMode::CascadeShadows)
    {
        if (o_enableDirectionalLights)
        {
            ApplyDirectionalLights(surface, lightingData, screenUv);
        }
    }

Options o_enableDirectionalLights, o_enablePunctualLights, o_enableAreaLights are protecting execution of code blocks, therefore we can imagine that the cost of say o_enablePunctualLights option, is the cost of ApplySimplePointLights added to ApplySimpleSpotLights.

This is how AZSLc will estimate the "impact cost" of options.

A case study through manual execution

Let's analyze the practical results we have as of latest PR for that feature: https://github.com/o3de/o3de-azslc/pull/85

First, we need to find the azslc executable, the .azslin file that is prepared by the asset processor. Let's use everything application (unix's locate). I got a bunch of them because I version the binaries for regression testing. But in your case you want to find the one you just built from the relevant git branch. So it will look like o3de-azslc/build... image

Then about the input shader. First you'll need to have run a project. The editor or the sample viewer for instance, because we need a cache of assets made by the AP. The AP has shader builders that execute complex preparations earlier to azslc invocation, like preprocess, SRG header injections etc. The result is a .azslin. That, is what azslc.exe can digest. (unintuitively, not the source-controlled .azsl from the git repo, they aren't ripe for compilation, shader building is a long chain of tools)

Here is how I went about it, just use a bit of wildcard, sort by size to get the biggest monster, and pick DX12, non customz: image

shift+right click->copy as path.

And here is the command line in my case:

$ "D:\o3de-azslc\build\win_x64\Release\azslc.exe" "D:\o3de-atom-sampleviewer\Cache\pc\materials\types\enhancedpbr_mainpipeline_forwardpass_enhancedlighting_dx12.azslin" --options

The result will look like:

{
   "ShaderOptions":[
      {
         "costImpact":7,
         "defaultValue":"",
         "keyOffset":0,
         "keySize":2,
         "kind":"user-defined",
         "name":"o_opacity_mode",
         "order":0,
         "range":false,
         "type":"OpacityMode",
         "values":[
            "OpacityMode::Opaque",
            "OpacityMode::Cutout",
            "OpacityMode::Blended",
            "OpacityMode::TintedTransparent"
         ]

To list only the costs from that output, we can muster a tiny python filter as such:

import json
import sys

f = json.loads("\n".join(sys.stdin.readlines()))
for so in f["ShaderOptions"]:
	print(so["name"], " ", so["costImpact"])

Now we can pipe the previous command into that python:

$ "D:\o3de-azslc\build\win_x64\Release\azslc.exe" "D:\o3de-atom-sampleviewer\Cache\pc\materials\types\enhancedpbr_mainpipeline_forwardpass_enhancedlighting_dx12.azslin" --options | py "D:\o3de-azslc\bin\win_x64\Release\filter.py"

result:

o_opacity_mode   7
o_specularF0_enableMultiScatterCompensation   2
o_enableShadows   2489
o_enableDirectionalLights   4192
o_enablePunctualLights   392
o_enableAreaLights   3881
o_enableSphereLights   736
o_enableSphereLightShadows   310
o_enableDiskLights   756
o_enableDiskLightShadows   284
o_enableCapsuleLights   479
o_enableQuadLightLTC   1864
o_enableQuadLightApprox   1843
o_enablePolygonLights   464
o_enableIBL   1
...

Analysis of the analysis

o_enableIBL

Let's take a look at what's going on with o_enableIBL, why does it cost only 1. When we search through the .azslin file, we find only 2 occurrences, the declaration, and this line:

OUT.m_normal.a = EncodeUnorm2BitFlags(o_enableIBL, o_specularF0_enableMultiScatterCompensation);

The IBL at this stage is only a GBuffer flag, therefore its associated cost is its mere access from constant buffer (at worst). This is a specificity that the optimizing engineers will have to keep in mind, shaders have their compartmentalized job, but they don't necessarily reflect the whole cost of the feature indicated by the option. Just the cost in that shader.

Lights

How about o_enableDirectionalLights with score 4192, versus o_enableAreaLights with score 3881? In the previous snippet of code we saw that o_enableDirectionalLights was covering a call to ApplyDirectionalLights and o_enableAreaLights 5 calls to point, disk, capsules, quads and polygons evaluators, which ought to be more complex than directional.

Let's use the --verbose flag of azslc to get more data points.

CLI: "D:\o3de-azslc\build\win_x64\Release\azslc.exe" "D:\o3de-atom-sampleviewer\Cache\pc\materials\types\enhancedpbr_mainpipeline_forwardpass_enhancedlighting_dx12.azslin" --options --verbose | grep 'ApplyDirectionalLights\|o_enableDirectionalLights' | grep -v 'seenat\|new'

The greps are to filter all the prior semantic analysis and symbol registrars. Result:

214: var decl: o_enableDirectionalLights
10449: register func: /ApplyDirectionalLights full identity: /ApplyDirectionalLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR,?float4)
Analyzing /o_enableDirectionalLights
 /ApplyDirectionalLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR,?float4) non-memoized. discovering cost
 /ApplyDirectionalLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR,?float4) call score 2094 added
 /ApplyDirectionalLights(/SurfaceData_EnhancedPBR,/LightingData_BasePBR,?float4) call score 2094 added
/o_enableDirectionalLights final cost 4192
                        "name" : "o_enableDirectionalLights",

The function covered by the option is actually about half the option's reported cost, 2094. It just turns out that in a second if block a bit under the normal shader execution path, there is a debug else case that doubles the estimated cost by re-calling the Apply function. So we can manually correct the estimates, and assume that the "real" score is more 2094 than 4192. Which makes the order of importance: o_enableAreaLights > o_enableDirectionalLights > o_enablePunctualLights for baking variants.

Clone this wiki locally