Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix out of memory building shaders #12166

Closed
wants to merge 7 commits into from
Closed

Conversation

jjspace
Copy link
Contributor

@jjspace jjspace commented Aug 28, 2024

Description

Our glslToJavascript build function had some outstanding issues that lead to huge spikes in memory usage when running the local development server. After some performance digging I made a couple small but very impactful changes

  • The check for the last modified date for the minifyState cache was always true because the writeFile always updated it regardless if it changed. This caused every file to need to be rebuilt. Now it checks first
    • huge reduction in files that need to be processed after the initial run
  • The file watcher in the server triggered a full rebuild for all files even if a single one was changed
    • Augmented the glslToJavascript function to take specific files to look at. Guarantees it runs super fast on a single file when one changes and no longer bottlenecks
  • Pulled out some path variables that do not need to be calculated every loop per file

Issue number and link

Fixes #12164

Testing plan

  • Run all the build functions and ensure they still work as expected
  • Run npm start for the local server
  • Modify any shader files and make sure memory does not skyrocket
    • Try this on main and see that it does

Author checklist

  • I have submitted a Contributor License Agreement
  • I have added my name to CONTRIBUTORS.md
  • I have updated CHANGES.md with a short summary of my change
  • I have added or updated unit tests to ensure consistent code coverage
  • I have updated the inline documentation, and included code examples where relevant
  • I have performed a self-review of my code

@jjspace jjspace requested review from ggetz and javagl August 28, 2024 18:39
Copy link

Thank you for the pull request, @jjspace!

✅ We can confirm we have a CLA on file for you.

@javagl
Copy link
Contributor

javagl commented Aug 29, 2024

As mentioned in the issue: I also noticed this, but didn't pinpoint the exact workflow that caused it. I now tried to reproduce it systematically. But that turns out to be a bit difficult.

Issue details

The basic workflow to reproduce the issue:

  • Check out main (as of the time of writing this)
  • npm install/run build/run start
  • Open a Sandcastle like http://localhost:8080/Apps/Sandcastle/?src=3D%20Models.html
  • Open the developer console with F12 (this seems to have an effect, but not 100% sure - https://shouldiblamecaching.com/ )
  • Modify and save a shader file
  • Re-run the sandcastle with "Run (F8)" (this seems to force the actual re-build)

The last two steps are the important ones: It seems like not all kinds of modifications cause the error. A somewhat reliable way of enforcing "errors" was to insert....

    ...
    color = vec4(0.1, 1.0, 1.0, 1.0); // THIS LINE
    out_FragColor = color;
}

at the end of packages\engine\Source\Shaders\Model\ModelFS.glsl (which then also serves as a visual hint of whether the modified or unmodified shader is used).

Sometimes it works. But in doubt, one can modify that line (e.g. change the 'red' component of the color) once or twice. Eventually, it should cause ~"an error" during the build.

This error will announce itself by the growing memory usage. And eventually, it will

  • either cause an out-of-memory error
  • or bail out with another error

The out-of-memory error is one of the scary looking ones:

<--- Last few GCs --->

[13352:000001F8D7FE79F0]   184215 ms: Mark-sweep 4046.7 (4137.3) -> 4034.8 (4141.3) MB, 1371.5 / 0.0 ms  (average mu = 0.476, current mu = 0.027) allocation failure; scavenge might 
not succeed
[13352:000001F8D7FE79F0]   186660 ms: Mark-sweep 4050.6 (4141.3) -> 4038.8 (4145.3) MB, 2419.0 / 0.0 ms  (average mu = 0.240, current mu = 0.010) allocation failure; scavenge might 
not succeed

<--- JS stacktrace --->

 1: 00007FF6EE2A806F node_api_throw_syntax_error+175583
 2: 00007FF6EE22A976 v8::internal::wasm::WasmCode::safepoint_table_offset+59766
 3: 00007FF6EE22C660 v8::internal::wasm::WasmCode::safepoint_table_offset+67168
 4: 00007FF6EECD73A4 v8::Isolate::ReportExternalAllocationLimitReached+116
 5: 00007FF6EECC2732 v8::Isolate::Exit+674
 6: 00007FF6EEB4462C v8::internal::EmbedderStackStateScope::ExplicitScopeForTesting+124
 7: 00007FF6EEB4184B v8::internal::Heap::CollectGarbage+3963
 8: 00007FF6EEB57A83 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath+2099
 9: 00007FF6EEB5832D v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath+93
10: 00007FF6EEB67AF0 v8::internal::Factory::NewFillerObject+816
11: 00007FF6EE858535 v8::internal::DateCache::Weekday+1349
12: 00007FF6EED752F1 v8::internal::SetupIsolateDelegate::SetupHeap+558193
13: 00007FF6EED2C4CF v8::internal::SetupIsolateDelegate::SetupHeap+259663
14: 00007FF66F15E431

But for me, more frequently, it ended with this one

[Error: EPERM: operation not permitted, open 'C:\cesium\packages\engine\Source\Shaders\Builtin\CzmBuiltins.js'] {
  errno: -4048,
  code: 'EPERM',
  syscall: 'open',
  path: 'C:\\cesium\\packages\\engine\\Source\\Shaders\\Builtin\\CzmBuiltins.js'
}

What's particularly annoying: It seems to leave ~"some of the build output in a messed-up state". Just trying to restart will then cause a

Error: Build failed with 1 error:
packages/engine/Source/Renderer/ShaderSource.js:4:24: ERROR: Could not resolve "../Shaders/Builtin/CzmBuiltins.js"

So a fresh npm run build will be neccessary.


"Review"

Using the state of this PR, after modifying the ModelFS.glsl as described above, re-running the sandcastle always caused the following error for me:

[Cesium WebGL] Fragment shader compile log: ERROR: 0:6: 'czm_textureCube' : no matching overloaded function found
ERROR: 0:6: '=' : dimension mismatch
ERROR: 0:6: '=' : cannot convert from 'const mediump float' to 'highp 4-component vector of float'
ERROR: 0:7: 'czm_gammaCorrect' : no matching overloaded function found
ERROR: 0:7: 'rgb' :  field selection requires structure, vector, or interface block on left hand side
ERROR: 0:7: 'constructor' : not enough data provided for construction

[Cesium WebGL]  Fragment shader source:
#version 300 es
#ifdef GL_FRAGMENT_PRECISION_HIGH
    precision highp float;
    precision highp int;
#else
    precision mediump float;
    precision mediump int;
    #define highp mediump
#endif

#define OES_texture_float_linear

#define OES_texture_float


#line 0
layout(location = 0) out vec4 out_FragColor;

uniform float czm_morphTime;


#line 0
uniform samplerCube u_cubeMap;

in vec3 v_texCoord;

void main()
{
    vec4 color = czm_textureCube(u_cubeMap, normalize(v_texCoord));
    out_FragColor = vec4(czm_gammaCorrect(color).rgb, czm_morphTime);
}

An error occurred while rendering.  Rendering has stopped.
RuntimeError: Fragment shader failed to compile.  Compile log: ERROR: 0:6: 'czm_textureCube' : no matching overloaded function found
ERROR: 0:6: '=' : dimension mismatch
ERROR: 0:6: '=' : cannot convert from 'const mediump float' to 'highp 4-component vector of float'
ERROR: 0:7: 'czm_gammaCorrect' : no matching overloaded function found
ERROR: 0:7: 'rgb' :  field selection requires structure, vector, or interface block on left hand side
ERROR: 0:7: 'constructor' : not enough data provided for construction

Error
    at new RuntimeError (http://localhost:8080/Build/CesiumUnminified/index.js:11491:11)
    at createAndLinkProgram (http://localhost:8080/Build/CesiumUnminified/index.js:26973:9)
    at reinitialize (http://localhost:8080/Build/CesiumUnminified/index.js:27126:19)
    at initialize2 (http://localhost:8080/Build/CesiumUnminified/index.js:27121:3)
    at ShaderProgram._bind (http://localhost:8080/Build/CesiumUnminified/index.js:27180:3)
    at beginDraw (http://localhost:8080/Build/CesiumUnminified/index.js:34576:17)
    at Context.draw (http://localhost:8080/Build/CesiumUnminified/index.js:34670:3)
    at DrawCommand.execute (http://localhost:8080/Build/CesiumUnminified/index.js:24165:11)
    at executeCommand (http://localhost:8080/Build/CesiumUnminified/index.js:224330:13)
    at executeCommands2 (http://localhost:8080/Build/CesiumUnminified/index.js:224428:7)

Doing a fresh npm run build/npm run start fixed this and used the modified shader, but ... something's still wrong here.

Copy link
Contributor

@javagl javagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I did not (yet) review the code changes (because I'll have to read that more thoroughly, and don't really know the build code even before this change), but ... based on a test, as described in the comment, it seems like there's something wrong...)

@jjspace
Copy link
Contributor Author

jjspace commented Aug 29, 2024

This error will announce itself by the growing memory usage. And eventually, it will

  • either cause an out-of-memory error
  • or bail out with another error

I believe these are both due to the root cause I sorta highlighted in the original description. The read operations across the ~100+ shader files we have can take around 700ms on my system after I did some performance logging. These are currently done async so they can happen in different orders. ON TOP of that multiple "build calls" to glslToJavascript can be kicked off before the previous one finishes. I think this is what triggers the memory growth and the out-of-memory. The other error I think happens because of unique timing of different glslToJavascript calls ending in overlapped states so the file it's trying to access sometimes isn't available? Just a best guess since I haven't seen that myself.

Doing a fresh npm run build/npm run start fixed this and used the modified shader, but ... something's still wrong here.

I can observe this error now too but don't see it in main either. I think this might be something I missed/misunderstood about the final build steps and I might be doing an early return that we can't do or missed steps that I didn't think were necessary. I'll take another look

Edit: Ok I figured out what is going on. When we run glslToJavascript we progressively build a list of all the builtin functions that are discovered then generate CzmBuiltins.js with those. With my updates we're not actually processing the majority of shader files anymore. This is really good for the pure build speed but it means we aren't able to build that list of builtins because most aren't processed at all. I was trying to avoid rebuilding this whole function but it just seems so flawed in some of it's processes that I might have to. I'll try and see what I can do in the meantime though.

@jjspace
Copy link
Contributor Author

jjspace commented Aug 29, 2024

@javagl this should be fixed now. The builtins were getting regenerated but empty with my first change. I've now switched to always generate the full file regardless if we've already processed it in builting. This has the benefits of still only needing to build glsl files that have actually changed but still generating the full required builtins file for all existing builtin/*.js files.

@javagl
Copy link
Contributor

javagl commented Aug 29, 2024

The other error I think happens because of unique timing of different glslToJavascript calls ending in overlapped states so the file it's trying to access sometimes isn't available? Just a best guess since I haven't seen that myself.

The symptoms and error message suggested some concurrency issue - roughly: Trying to open/modify a file that is already opened. Whether or not that is caused by the memory issue, or an unrelated issue, is hard to say. I took care to only modify one file, and wait for the build to complete, but when the build itself is accessing multiple files concurrently, this may not avoid it.

(One "stress test" that I indended to do - once this PR avoided the memory issue - was to quickly edit a file, hit "reload", edit and save the file, and hit reload again. Well, how else to ~"provoke/test/debug" concurrency issues....? If this causes some access error (independent of the memory issue), it can probably be tracked separately).

I'll try out that state early tomorrow.

@jjspace
Copy link
Contributor Author

jjspace commented Aug 30, 2024

Just recording some offline conversation with @ggetz. It's possible the better change here would be to actually load the glsl files directly instead of having to build them in the first place like we do.
Given that and to avoid changing the build right before the release, don't merge this until after next tuesday so the release is unaffected.

@javagl
Copy link
Contributor

javagl commented Aug 30, 2024

Sorry for the delay. (I've been caught up in another PR, where this build issue has actually bugged me for a while - so it's good when this is about to be fixed).

With the current state, it does not cause any delays or runaway memory usage for me. But it also doesn't seem to pick up the changes in the shader file. The workflow that I just tried:

  • npm run build/npm run start
  • Open the "3D Models" sandcastle
  • add this color = vec4(0.2, 1.0, 1.0, 1.0); at the bottom of ModelFS.glsl
  • Hit 'Run (F8)'
  • It does print "Built index.js in 0.4 seconds.". But the plane still has its original colors.

When shutting down and restarting, and then hitting 'Run (F8)', it does pick up the change.

@ggetz
Copy link
Contributor

ggetz commented Sep 3, 2024

Just recording some offline conversation with @ggetz. It's possible the better change here would be to actually load the glsl files directly instead of having to build them in the first place like we do.

To elaborate on this, I think what we'd be able to do is pass a loader option to esbuild in order to load .glsl files as text. Then replace each **/Shaders/*.js import with the corresponding **/Shaders/*.glsl import. During dev, this means that updates to the GLSL files should now be able to be managed alongside JS in the same manner.

However, this won't account for built-ins or for the custom GLSL "minification" which we do during release. One option would be to create a resolve plugin to handle these cases.

@ggetz
Copy link
Contributor

ggetz commented Sep 18, 2024

@jjspace Are there plans to move forward with this soon? Or should we close for the time-being?

The last commit before Prettier v3 was used to reformat the code
This commit reformats all code using prettier v3 relative to the pre-prettier-v3 tag
@ggetz
Copy link
Contributor

ggetz commented Oct 3, 2024

@jjspace I'm going to close for the time being. Feel free to re-open once this is ready for review again!

@ggetz ggetz closed this Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Development server memory skyrockets after changing shaders
3 participants