Skip to content

Commit

Permalink
Merge pull request #1198 from CesiumGS/simple-performance-test
Browse files Browse the repository at this point in the history
First pass at performance testing
  • Loading branch information
kring authored Oct 16, 2023
2 parents 69f3dbf + 7b47a8b commit 20b6e63
Show file tree
Hide file tree
Showing 35 changed files with 1,173 additions and 56 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/testWindows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ jobs:
timeout-minutes: 60
run: |
cd "${{ inputs.unreal-binaries-path }}"
./UnrealEditor-Cmd.exe "$env:TESTS_PROJECT_ROOT/TestsProject.uproject" -execcmds="Automation RunTests Cesium.;quit" -nullrhi -unattended -nosplash -ReportExportPath="$env:TESTS_PROJECT_LOGS"
./UnrealEditor-Cmd.exe "$env:TESTS_PROJECT_ROOT/TestsProject.uproject" -execcmds="Automation RunTests Cesium.Unit.;quit" -nullrhi -unattended -nosplash -ReportExportPath="$env:TESTS_PROJECT_LOGS"
- name: Display tests log
if: always()
run: |
Expand Down
1 change: 1 addition & 0 deletions Documentation/developer-setup-windows.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ To build a "Release" build of cesium-native,
- Add a new configuration by clicking the `+` and choose `x64-Release`.
- Select the new "x64-Release" from the Solution Configuration dropdown.
- Right-click on `CMakeLists.txt` again and choose "Install".
> In Visual Studio, this defaults to the "RelWithDebInfo" configuration type. You can change this at any time
## Visual Studio Code
Expand Down
29 changes: 29 additions & 0 deletions Documentation/performance-profiling-setup-test.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Set up a repeatable test

We need an area of code to execute repeatedly, with as many variables locked down as possible.


### Set up Unreal
1) Open Unreal Editor (UnrealEditor.exe)
2) Create a blank map (project doesn't matter. Choose an existing one or create a new one)
3) Go to Edit->Plugins
4) Search for "Functional Testing plugin". Check it to enable it
![smaller](https://github.com/CesiumGS/cesium-unreal/assets/130494071/5a3bc9de-cdaf-4d9d-842d-104719426663)
5) Save all
6) Set this map as the 'Editor Startup Map' so it loads when starting from Visual Studio
![smaller 2](https://github.com/CesiumGS/cesium-unreal/assets/130494071/8ba5c6c2-8c97-4048-afe2-db74770d85cc)


### Build Release Code

We need to make sure all our C++ code is building in release mode, preferably with debug symbols.

> This assumes that you have already built your code successfully and are familiar with the concepts from our [developer setup guide](https://github.com/CesiumGS/cesium-unreal/blob/ue5-main/Documentation/developer-setup-windows.md). Although you could profile a debug build, it is typically more useful to build in release, since this is how a game is usually packaged.
1) If building the cesium-native library, make sure you are using a release configuration derived from "RelWithDebInfo"
2) Open your Unreal project's Visual Studio solution (.sln). This example uses the solution generated from [cesium-unreal-samples](https://github.com/CesiumGS/cesium-unreal-samples)
3) Choose "Development Editor"

![smaller 3](https://github.com/CesiumGS/cesium-unreal/assets/130494071/0e70065f-c717-466b-a92b-cab1dcfdd29b)

4) From the menu, choose Build -> Build Solution
83 changes: 83 additions & 0 deletions Documentation/performance-profiling-with-cpu-usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@

This guide will help you find performance problems in your C++ code using the [CPU Usage tool](https://learn.microsoft.com/en-us/visualstudio/profiling/beginners-guide-to-performance-profiling?view=vs-2022) included in Visual Studio's Diagnostic tools window.

The CPU Usage tool is easy to set up with minimal impact on how your app is built or how it runs. If you use Visual Studio often, you may have this running already. This is a sampling-based profiler, with pros and cons detailed [here](https://learn.microsoft.com/en-us/visualstudio/profiling/understanding-performance-collection-methods-perf-profiler?view=vs-2022).

# Set up a repeatable test

In this example, we will use our Cesium performance tests. Follow the steps outlined [here](https://github.com/CesiumGS/cesium-unreal/blob/profiling-documentation/Documentation/performance-profiling-setup-test.md).

# Prepare for capture

### Visual Studio

1) Open your project's Visual Studio solution (.sln). This example uses the solution generated from [cesium-unreal-samples](https://github.com/CesiumGS/cesium-unreal-samples)
2) From the menu, choose Debug->Windows->Show Diagnostic Tools
3) Configure it. Uncheck 'Memory Usage'. Under Settings, Uncheck "Enable CPU Profiling", we'll turn this back on later.

<img width="484" alt="DiagSetup" src="https://github.com/CesiumGS/cesium-unreal/assets/130494071/798d794c-19f0-4f15-93e1-1815e3f1e75b">

4) Optionally, find two places in your code to set breakpoints. In our example, performance test start / end marks are perfect.

![Breakpoint Set small](https://github.com/CesiumGS/cesium-unreal/assets/130494071/5a793b9c-fd68-42ed-96ae-6ec884c38951)

>We could profile the entire debugging session if we needed to. But it's generally good practice to reduce your timing capture as much as possible. This can improve responsiveness when using resource intensive profiling tools, like memory tracking.
# Run the timing capture session

1) From Visual Studio, start your debugging session (Debug->Start Debugging, F5)
2) Find the performance tests in Unreal. Tools->Test Automation
![Automation Window small](https://github.com/CesiumGS/cesium-unreal/assets/130494071/d27e7d67-3658-4cb2-ab10-777498cba0da)

3) Check "LoadTestDenver"
4) Click "Start Tests"
5) Your first break point should hit in Visual Studio
6) Go back to the Diagnostic Tools window, click on "Record CPU Profile". It should turn red.

![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/ce0c7e86-c1ef-4a01-97fd-c97275b6f62b)

7) Continue the debugging session (Debug->Continue, F5)
8) Your second break point should hit
9) Go back to the Diagnostic Tools window, you should now see a report

# Interpret the report

![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/a9fb3e0b-86f5-4239-b4ab-c7f9b1dba4a5)

This can be a bit daunting at first, but most profiling tools have a similar workflow.

### Start at the timeline

![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/da733adc-6cae-4c89-8a6c-01a367667a0d)

Note the highlighted area to the right where the CPU usage spikes. This corresponds to the breakpoints that we set.

All data from the report will reflect this selection only.

### Trace calls with high usage

From the main window, click on "Open Details"
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/f34b5ee3-15b7-485a-a90a-8f71310b1b44)

The CPU Usage window will appear. Set "Current View" to "Functions", then find the 'Self CPU' column and sort descending (down arrow).

![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/20836178-8337-4d53-be58-f388db905f9f)

This window now shows the functions that have the highest time spent within themselves only. Useful for finding individual functions that are called often or need to be optimized.

In this example, `stbir_resample_horizontal_downsample` is of particular interest because it's in the code base we built. Entries with `[External]` or originate from an unfamiliar module are generally ignored, although it is useful to know we are calling into them.

Right click on the `stbir_resample_horizontal_downsample` row, select "View in Call Tree".

![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/e5a88c06-5b76-4a07-83be-db5147a961b2)

# Draw conclusions

The window above is starting to show some actionable information:
- 20% of the sampled CPU time was spent in `CesiumTextureUtility::loadTextureAnyThreadPart`. Basically, we're loading textures
- There are a variety of `stbir_XXX functions` that are taking the bulk of the time, and might be candidates for optimization
- The highest cost single function is `stbir_resample_horizontal_downsample`, but keep in mind the limits of a sampling profiler. We don't know how many times it was called, just that it was being executed ~6% of the time.

Are these functions worth investigating and potentially optimizing? Maybe. Again, know this is a sampling profiler. Even if you optimize the highest cost function to 0.001%, you are only improving CPU efficiency.

If your goal is to reach absolute numbers, like specific loading times, or frames per second, you may need another type of profiling tool.
108 changes: 108 additions & 0 deletions Documentation/performance-profiling-with-unreal-insights.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
This guide will help you find performance problems in your C++ code using [Unreal Insights](https://docs.unrealengine.com/5.0/en-US/unreal-insights-in-unreal-engine/), included with Unreal Engine.

Unreal Insights can display the scope of timing events as well as activity across threads. There is minimal impact to app execution, and you can set up your own custom events. It provides more functionality than an exclusive [CPU sampling-based profiler](https://learn.microsoft.com/en-us/visualstudio/profiling/understanding-performance-collection-methods-perf-profiler?view=vs-2022), although both tools can complement each other.

# Set up a repeatable test

In this example, we will use our Cesium performance tests. Follow the steps outlined [here](https://github.com/CesiumGS/cesium-unreal/blob/profiling-documentation/Documentation/performance-profiling-setup-test.md).

# Prepare for capture

### Unreal Editor
1) In Visual Studio, click Debug -> Start Debugging (F5)
2) In Unreal, click Tools->Test Automation
3) Check the Cesium.Performance.LoadTestDenver row (don't start the test yet)
![Automation Window small](https://github.com/CesiumGS/cesium-unreal/assets/130494071/d27e7d67-3658-4cb2-ab10-777498cba0da)
4) Click Tools->Run Unreal Insights
5) In Unreal Insights, click on the "Connection" tab (don't connect yet)

> You can also find UnrealInsights.exe in UE_5.X\Engine\Binaries\Win64
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/eadd4013-ca10-4b61-bb7d-0ab233440a39)

# Run the timing capture session
1) In Unreal Insights, click "Connect"
2) In Unreal Editor, click "Start Tests" (you should already have the Test Automation window open)
3) When the test ends, close Unreal Editor. We don't need it anymore.
4) In Unreal Insights, click the Trace Store tab, notice the trace that was just created
5) Click on it, then click on the 'Open Trace' button

> On the right side, there's a "Explore Trace Store Directory" button. You can click on this to delete or organize your traces
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/f1e34fbc-35cd-4bc3-b935-5e322f5d9ba6)

# Interpret the report

![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/9cab7cf1-ab6d-4b58-a362-fc21ccff0334)

By default, the Timings Insights Tab is shown. More detail can be found [here](https://docs.unrealengine.com/5.0/en-US/timing-insights-in-unreal-engine-5/).

For this session, there are several sections of interest for us:
- The Frames panel (top, a timeline view)
- The Timings panel (middle, mostly empty because nothing is selected)
- The Log Panel (bottom)
- The Timers tab (right)

### Isolate your area of interest

1) In the Log Panel, search for "mark". This will show the logging of our timing marks for our test. Select the start mark, then hold shift and down arrow to select the end mark too
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/7cccc075-edf7-4b8e-b704-b9efc9de1a3c)

2) Notice that the Timings panel is now displaying timing data, with a specific time region highlighted
3) In the Timings panel, select View Mode -> Compact Mode to see more of a bird's eye view
4) Select All Tracks and uncheck the following threads that don't have much activity for our test: ```RenderThread 3-7, BackgroundThreadPool #1, ForegroundWorker #0-#1, DDC IO ThreadPool #0-#2, Reserve Worker #0-#13, AudioMixerXXX```
5) Use the mouse wheel to zoom in to selected region. Right click and drag to pan left and right.

The view should be a lot cleaner now
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/aca0680e-3dc3-4d23-9838-8f598f384089)


### Examine high traffic timers

Let's look at the Timers tab.

![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/823fc4d4-25d3-40dc-9b41-1cffee560454)

Every row is a timing event. Some events come from the engine, some are custom timers in the Cesium for Unreal plugin code. You'll notice that Incl is sorting descending, showing the events with the highest inclusive time.

> You may feel the need to jump right in to `Cesium::CreateRHITexture2D`. It seems to have one of the highest exclusive times (Excl) of any of the events in the list, 1 second. After all, our selection is only 1.2 seconds long, so this must be the performance bottleneck right? Hold on. The total sampled time at the top (CPU) is 19.8s, indicating the times are the total sampled times across threads, not absolute session duration.
Given that the sampled time of the highest cost calls are actually somewhat small compared to the total sampled CPU time, our bottleneck is most likely outside of our timed events.

This brings us to...


### Examine low use areas

1) Go back to the Timings panel.
2) In All Tracks, check Game Frames
2) Turn off compact mode by unchecking "View Mode->Compact Mode".
3) In View Mode, set "Depth Limit" to "4 lanes"
4) Zoom and pan to an area of the selection where the background workers haven't started loading yet

![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/caa47e66-b088-46d8-9aa0-1916a65777de)

The selected area is the first phase of the loading test. This is a region between when the start mark was logged until when background workers start loading models.

It lasts about 8 game frames, or 388 ms, and does not seem to be making use of background threads at all. Could be something to investigate.


### Examine fragmented use areas

1) Find the Timings panel
2) In View Mode, set "Depth Limit" to "Unlimited"
3) Zoom a bit into an area where our background workers are very busy

![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/29d7c3a2-3710-4a2b-a4f1-09050bdb9287)

This selected area is zoomed in enough to see that the background workers are all calling the same functions. They finish their work, then wait for more work to be available. Some of this work seems to take longer than others, especially at the beginning.

Note the gaps between the work. In general, there seems to be more inactivity than activity during this timeframe. Ideally, we would like to see all work squished together, with no waits in between. Improvements like this should bring the total execution duration lower. In this case, total load time.

# Draw conclusions

We've identified some actionable information so far, even if it only leads to investigation:
* There is a 388 ms low use area at the beginning of the test (~30%). What is happening here? Can it be faster?
* During high use areas, background threads seems to inactive more than they are active. Why? Can this be optimized so they are always active?

It's very common for profiling to be an iterative process. The result of a profiling session could easily be just adding more event timers, or digging deeper into how something works. Before we can expect that code change that results in a heroic 10x speedup, we need to be able to see clearly what is going on.
8 changes: 6 additions & 2 deletions Documentation/release-process.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,15 @@

This is the process we follow when releasing a new version of Cesium for Unreal on GitHub and on the Unreal Engine Marketplace.

## Test the release candidate

## Verify the code
* Update any hard coded API keys in the code
* CesiumSceneGeneration.cpp, testIonToken - Make sure matches samples project
* Google3dTilesLoadTest.cpp, testGoogleUrl - Make sure matches samples project
* Verify that the cesium-native submodule in the `extern` directory references the expected commit of cesium-native. Update it if necessary. Verify that CI has completed successfully for that commit of cesium-native.
* Merge `ue4-main` into `ue5-main`.
* Wait for CI to complete for the `ue4-main` and `ue5-main` branches. Verify that it does so successfully.

## Test the release candidate
* Remove all existing copies of the Cesium for Unreal plugin from the engine plugin directories on your system. On Windows this is usually `C:\Program Files\Epic Games\UE_4.26\Engine\Plugins\Marketplace`, `C:\Program Files\Epic Games\UE_4.27\Engine\Plugins\Marketplace`, and `C:\Program Files\Epic Games\UE_5.0\Engine\Plugins\Marketplace`.
* Download the `UE4.27-AllPlatforms` or `UE4.26-AllPlatforms` for the `ue4-main` branch of cesium-unreal. Extract it to the appropriate Unreal Engine installation's engine plugins directory.
* In the `ue5-main` branch, go to https://github.com/CesiumGS/cesium-unreal/actions and click the most recent build of the branch (it should be near the top). Scroll down to "Artifcats" and download the artifact that doesn't have an operating system in its name. It will also be the largest artifact. Extract it to `C:\Program Files\Epic Games\UE_5.0\Engine\Plugins\Marketplace` or equivalent.
Expand Down
1 change: 1 addition & 0 deletions Source/CesiumEditor/Private/CesiumEditorSubLevelMutex.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#include "CesiumSubLevelComponent.h"
#include "CesiumSubLevelSwitcherComponent.h"
#include "Components/ActorComponent.h"
#include "Engine/World.h"
#include "LevelInstance/LevelInstanceActor.h"

CesiumEditorSubLevelMutex::CesiumEditorSubLevelMutex() {
Expand Down
Loading

0 comments on commit 20b6e63

Please sign in to comment.