Rust + Bevy implementation of GPU accelerated collision detection and standard CPU collision detection (narrow phases only), designed to realistically test the performance differences between each technique.
- Install the latest nightly version of Rust with rustup
- Clone the repo
Cargo run --release
to test it compiles properly- Go through the "run_tests.ipynb" jupyter notebook (requires python 3) to replicate my full comparison tests.
People using Rust for...
- 🎮 Game development
- 🧬 Simulations
... needing faster ⏩ collision detection.
Especially if you are already using the Bevy engine, although the code can be adapted to work with any rust codebase.
GPU acceleration can provide major performance increases over CPU-based collision detection
Note the logarithmic scale of the x axis ABOVE.
And here is a zoomed version to show in more detail the point where GPU acceleration becomes valuable (NOT log scale):
And here is a fully zoomed-in version to show the critical point where GPU acceleration becomes valuable:
I needed performant collision detection for a much larger scale than normal; hundreds of thousands of simultaneously colliding entities, at least. I tried popular existing collision detection solutions, but my brief testing indicated that their performance was unacceptable for the scales I required.
- ❌ Web-based applications, because they don't have low level GPU access.
- ❌ Games with very high existing GPU usage. However,
${\textsf{\color{lightgreen}However, for most games the extra GPU usage is small enough to not be a problem:}}$ Collision detection for 160k collisions per frame, for example, used only about 7% of GPU capacity (RTX 3070 laptop version).
This technique is only for narrow-phase of collision detection. If performance is an issue, you should first prioritize implementing a performant broad-phase to your collision detection as this is an easier way of making big performance gains. And for truly massive simulations a broad phase is required, because of the practical limits of narrow-phase collosion detection.
- around 200-300k collisions per frame for videogame applications. Total collisions can be much higher if you also implement broad-phase filtering (see above).
- tens or hundreds of millions of collisions per frame, limited only by the RAM available for storing all of the collision pairs.
- The main waste with GPU based collision detection is having to pre-allocate a lot of memory which we don't actually end up using, since we don't know ahead of time the number of collisions that will be detected. Testing indicates anything that can be done to pre-estimate the number of collisions yields large performance improvements (this is the purpose of the
max_detectable_collisions_scale
variable in the GPU code, but a lot of improvements can still be made to that part of the code). - A major bottleneck is the render device's maximum storage buffer size. If there is a way to safely increase this buffer size limit, performance can be dramatically improved. (This may be hardware limited, I haven't had the time to look into it yet.)
- Switch to integer positions and integer math instead of floating point. This requires client code to use integer positions, which is less convenient, which is why the code currently uses floating point positions.
- For truly massive simulations, running batches in parallel in order to utilize more of the GPU may be possible.
You may notice that simply combining the collision processing with the collision detection can dramatically improve the CPU algorithm's speed so that it is actually faster than the GPU method. However we have to keep in mind that the same thing can be said for the GPU algorithm. If we also put collision processing directly onto the GPU we will also gain dramatic performance improvements.
If you are trying to get the absolute best possible performance in your application you will probably have to use this strategy, but otherwise you should avoid it because it creates highly coupled, difficult to maintain code.
I have not done this, for either CPU or GPU, because I want this test to be representative of the general case, where we don't know what exactly the client is going to do with the collisions detected.