-
Notifications
You must be signed in to change notification settings - Fork 295
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1198 from CesiumGS/simple-performance-test
First pass at performance testing
- Loading branch information
Showing
35 changed files
with
1,173 additions
and
56 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Set up a repeatable test | ||
|
||
We need an area of code to execute repeatedly, with as many variables locked down as possible. | ||
|
||
|
||
### Set up Unreal | ||
1) Open Unreal Editor (UnrealEditor.exe) | ||
2) Create a blank map (project doesn't matter. Choose an existing one or create a new one) | ||
3) Go to Edit->Plugins | ||
4) Search for "Functional Testing plugin". Check it to enable it | ||
![smaller](https://github.com/CesiumGS/cesium-unreal/assets/130494071/5a3bc9de-cdaf-4d9d-842d-104719426663) | ||
5) Save all | ||
6) Set this map as the 'Editor Startup Map' so it loads when starting from Visual Studio | ||
![smaller 2](https://github.com/CesiumGS/cesium-unreal/assets/130494071/8ba5c6c2-8c97-4048-afe2-db74770d85cc) | ||
|
||
|
||
### Build Release Code | ||
|
||
We need to make sure all our C++ code is building in release mode, preferably with debug symbols. | ||
|
||
> This assumes that you have already built your code successfully and are familiar with the concepts from our [developer setup guide](https://github.com/CesiumGS/cesium-unreal/blob/ue5-main/Documentation/developer-setup-windows.md). Although you could profile a debug build, it is typically more useful to build in release, since this is how a game is usually packaged. | ||
1) If building the cesium-native library, make sure you are using a release configuration derived from "RelWithDebInfo" | ||
2) Open your Unreal project's Visual Studio solution (.sln). This example uses the solution generated from [cesium-unreal-samples](https://github.com/CesiumGS/cesium-unreal-samples) | ||
3) Choose "Development Editor" | ||
|
||
![smaller 3](https://github.com/CesiumGS/cesium-unreal/assets/130494071/0e70065f-c717-466b-a92b-cab1dcfdd29b) | ||
|
||
4) From the menu, choose Build -> Build Solution |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
|
||
This guide will help you find performance problems in your C++ code using the [CPU Usage tool](https://learn.microsoft.com/en-us/visualstudio/profiling/beginners-guide-to-performance-profiling?view=vs-2022) included in Visual Studio's Diagnostic tools window. | ||
|
||
The CPU Usage tool is easy to set up with minimal impact on how your app is built or how it runs. If you use Visual Studio often, you may have this running already. This is a sampling-based profiler, with pros and cons detailed [here](https://learn.microsoft.com/en-us/visualstudio/profiling/understanding-performance-collection-methods-perf-profiler?view=vs-2022). | ||
|
||
# Set up a repeatable test | ||
|
||
In this example, we will use our Cesium performance tests. Follow the steps outlined [here](https://github.com/CesiumGS/cesium-unreal/blob/profiling-documentation/Documentation/performance-profiling-setup-test.md). | ||
|
||
# Prepare for capture | ||
|
||
### Visual Studio | ||
|
||
1) Open your project's Visual Studio solution (.sln). This example uses the solution generated from [cesium-unreal-samples](https://github.com/CesiumGS/cesium-unreal-samples) | ||
2) From the menu, choose Debug->Windows->Show Diagnostic Tools | ||
3) Configure it. Uncheck 'Memory Usage'. Under Settings, Uncheck "Enable CPU Profiling", we'll turn this back on later. | ||
|
||
<img width="484" alt="DiagSetup" src="https://github.com/CesiumGS/cesium-unreal/assets/130494071/798d794c-19f0-4f15-93e1-1815e3f1e75b"> | ||
|
||
4) Optionally, find two places in your code to set breakpoints. In our example, performance test start / end marks are perfect. | ||
|
||
![Breakpoint Set small](https://github.com/CesiumGS/cesium-unreal/assets/130494071/5a793b9c-fd68-42ed-96ae-6ec884c38951) | ||
|
||
>We could profile the entire debugging session if we needed to. But it's generally good practice to reduce your timing capture as much as possible. This can improve responsiveness when using resource intensive profiling tools, like memory tracking. | ||
# Run the timing capture session | ||
|
||
1) From Visual Studio, start your debugging session (Debug->Start Debugging, F5) | ||
2) Find the performance tests in Unreal. Tools->Test Automation | ||
![Automation Window small](https://github.com/CesiumGS/cesium-unreal/assets/130494071/d27e7d67-3658-4cb2-ab10-777498cba0da) | ||
|
||
3) Check "LoadTestDenver" | ||
4) Click "Start Tests" | ||
5) Your first break point should hit in Visual Studio | ||
6) Go back to the Diagnostic Tools window, click on "Record CPU Profile". It should turn red. | ||
|
||
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/ce0c7e86-c1ef-4a01-97fd-c97275b6f62b) | ||
|
||
7) Continue the debugging session (Debug->Continue, F5) | ||
8) Your second break point should hit | ||
9) Go back to the Diagnostic Tools window, you should now see a report | ||
|
||
# Interpret the report | ||
|
||
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/a9fb3e0b-86f5-4239-b4ab-c7f9b1dba4a5) | ||
|
||
This can be a bit daunting at first, but most profiling tools have a similar workflow. | ||
|
||
### Start at the timeline | ||
|
||
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/da733adc-6cae-4c89-8a6c-01a367667a0d) | ||
|
||
Note the highlighted area to the right where the CPU usage spikes. This corresponds to the breakpoints that we set. | ||
|
||
All data from the report will reflect this selection only. | ||
|
||
### Trace calls with high usage | ||
|
||
From the main window, click on "Open Details" | ||
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/f34b5ee3-15b7-485a-a90a-8f71310b1b44) | ||
|
||
The CPU Usage window will appear. Set "Current View" to "Functions", then find the 'Self CPU' column and sort descending (down arrow). | ||
|
||
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/20836178-8337-4d53-be58-f388db905f9f) | ||
|
||
This window now shows the functions that have the highest time spent within themselves only. Useful for finding individual functions that are called often or need to be optimized. | ||
|
||
In this example, `stbir_resample_horizontal_downsample` is of particular interest because it's in the code base we built. Entries with `[External]` or originate from an unfamiliar module are generally ignored, although it is useful to know we are calling into them. | ||
|
||
Right click on the `stbir_resample_horizontal_downsample` row, select "View in Call Tree". | ||
|
||
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/e5a88c06-5b76-4a07-83be-db5147a961b2) | ||
|
||
# Draw conclusions | ||
|
||
The window above is starting to show some actionable information: | ||
- 20% of the sampled CPU time was spent in `CesiumTextureUtility::loadTextureAnyThreadPart`. Basically, we're loading textures | ||
- There are a variety of `stbir_XXX functions` that are taking the bulk of the time, and might be candidates for optimization | ||
- The highest cost single function is `stbir_resample_horizontal_downsample`, but keep in mind the limits of a sampling profiler. We don't know how many times it was called, just that it was being executed ~6% of the time. | ||
|
||
Are these functions worth investigating and potentially optimizing? Maybe. Again, know this is a sampling profiler. Even if you optimize the highest cost function to 0.001%, you are only improving CPU efficiency. | ||
|
||
If your goal is to reach absolute numbers, like specific loading times, or frames per second, you may need another type of profiling tool. |
108 changes: 108 additions & 0 deletions
108
Documentation/performance-profiling-with-unreal-insights.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
This guide will help you find performance problems in your C++ code using [Unreal Insights](https://docs.unrealengine.com/5.0/en-US/unreal-insights-in-unreal-engine/), included with Unreal Engine. | ||
|
||
Unreal Insights can display the scope of timing events as well as activity across threads. There is minimal impact to app execution, and you can set up your own custom events. It provides more functionality than an exclusive [CPU sampling-based profiler](https://learn.microsoft.com/en-us/visualstudio/profiling/understanding-performance-collection-methods-perf-profiler?view=vs-2022), although both tools can complement each other. | ||
|
||
# Set up a repeatable test | ||
|
||
In this example, we will use our Cesium performance tests. Follow the steps outlined [here](https://github.com/CesiumGS/cesium-unreal/blob/profiling-documentation/Documentation/performance-profiling-setup-test.md). | ||
|
||
# Prepare for capture | ||
|
||
### Unreal Editor | ||
1) In Visual Studio, click Debug -> Start Debugging (F5) | ||
2) In Unreal, click Tools->Test Automation | ||
3) Check the Cesium.Performance.LoadTestDenver row (don't start the test yet) | ||
![Automation Window small](https://github.com/CesiumGS/cesium-unreal/assets/130494071/d27e7d67-3658-4cb2-ab10-777498cba0da) | ||
4) Click Tools->Run Unreal Insights | ||
5) In Unreal Insights, click on the "Connection" tab (don't connect yet) | ||
|
||
> You can also find UnrealInsights.exe in UE_5.X\Engine\Binaries\Win64 | ||
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/eadd4013-ca10-4b61-bb7d-0ab233440a39) | ||
|
||
# Run the timing capture session | ||
1) In Unreal Insights, click "Connect" | ||
2) In Unreal Editor, click "Start Tests" (you should already have the Test Automation window open) | ||
3) When the test ends, close Unreal Editor. We don't need it anymore. | ||
4) In Unreal Insights, click the Trace Store tab, notice the trace that was just created | ||
5) Click on it, then click on the 'Open Trace' button | ||
|
||
> On the right side, there's a "Explore Trace Store Directory" button. You can click on this to delete or organize your traces | ||
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/f1e34fbc-35cd-4bc3-b935-5e322f5d9ba6) | ||
|
||
# Interpret the report | ||
|
||
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/9cab7cf1-ab6d-4b58-a362-fc21ccff0334) | ||
|
||
By default, the Timings Insights Tab is shown. More detail can be found [here](https://docs.unrealengine.com/5.0/en-US/timing-insights-in-unreal-engine-5/). | ||
|
||
For this session, there are several sections of interest for us: | ||
- The Frames panel (top, a timeline view) | ||
- The Timings panel (middle, mostly empty because nothing is selected) | ||
- The Log Panel (bottom) | ||
- The Timers tab (right) | ||
|
||
### Isolate your area of interest | ||
|
||
1) In the Log Panel, search for "mark". This will show the logging of our timing marks for our test. Select the start mark, then hold shift and down arrow to select the end mark too | ||
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/7cccc075-edf7-4b8e-b704-b9efc9de1a3c) | ||
|
||
2) Notice that the Timings panel is now displaying timing data, with a specific time region highlighted | ||
3) In the Timings panel, select View Mode -> Compact Mode to see more of a bird's eye view | ||
4) Select All Tracks and uncheck the following threads that don't have much activity for our test: ```RenderThread 3-7, BackgroundThreadPool #1, ForegroundWorker #0-#1, DDC IO ThreadPool #0-#2, Reserve Worker #0-#13, AudioMixerXXX``` | ||
5) Use the mouse wheel to zoom in to selected region. Right click and drag to pan left and right. | ||
|
||
The view should be a lot cleaner now | ||
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/aca0680e-3dc3-4d23-9838-8f598f384089) | ||
|
||
|
||
### Examine high traffic timers | ||
|
||
Let's look at the Timers tab. | ||
|
||
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/823fc4d4-25d3-40dc-9b41-1cffee560454) | ||
|
||
Every row is a timing event. Some events come from the engine, some are custom timers in the Cesium for Unreal plugin code. You'll notice that Incl is sorting descending, showing the events with the highest inclusive time. | ||
|
||
> You may feel the need to jump right in to `Cesium::CreateRHITexture2D`. It seems to have one of the highest exclusive times (Excl) of any of the events in the list, 1 second. After all, our selection is only 1.2 seconds long, so this must be the performance bottleneck right? Hold on. The total sampled time at the top (CPU) is 19.8s, indicating the times are the total sampled times across threads, not absolute session duration. | ||
Given that the sampled time of the highest cost calls are actually somewhat small compared to the total sampled CPU time, our bottleneck is most likely outside of our timed events. | ||
|
||
This brings us to... | ||
|
||
|
||
### Examine low use areas | ||
|
||
1) Go back to the Timings panel. | ||
2) In All Tracks, check Game Frames | ||
2) Turn off compact mode by unchecking "View Mode->Compact Mode". | ||
3) In View Mode, set "Depth Limit" to "4 lanes" | ||
4) Zoom and pan to an area of the selection where the background workers haven't started loading yet | ||
|
||
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/caa47e66-b088-46d8-9aa0-1916a65777de) | ||
|
||
The selected area is the first phase of the loading test. This is a region between when the start mark was logged until when background workers start loading models. | ||
|
||
It lasts about 8 game frames, or 388 ms, and does not seem to be making use of background threads at all. Could be something to investigate. | ||
|
||
|
||
### Examine fragmented use areas | ||
|
||
1) Find the Timings panel | ||
2) In View Mode, set "Depth Limit" to "Unlimited" | ||
3) Zoom a bit into an area where our background workers are very busy | ||
|
||
![image](https://github.com/CesiumGS/cesium-unreal/assets/130494071/29d7c3a2-3710-4a2b-a4f1-09050bdb9287) | ||
|
||
This selected area is zoomed in enough to see that the background workers are all calling the same functions. They finish their work, then wait for more work to be available. Some of this work seems to take longer than others, especially at the beginning. | ||
|
||
Note the gaps between the work. In general, there seems to be more inactivity than activity during this timeframe. Ideally, we would like to see all work squished together, with no waits in between. Improvements like this should bring the total execution duration lower. In this case, total load time. | ||
|
||
# Draw conclusions | ||
|
||
We've identified some actionable information so far, even if it only leads to investigation: | ||
* There is a 388 ms low use area at the beginning of the test (~30%). What is happening here? Can it be faster? | ||
* During high use areas, background threads seems to inactive more than they are active. Why? Can this be optimized so they are always active? | ||
|
||
It's very common for profiling to be an iterative process. The result of a profiling session could easily be just adding more event timers, or digging deeper into how something works. Before we can expect that code change that results in a heroic 10x speedup, we need to be able to see clearly what is going on. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.