GPU-Accelerated Collision Detection

By Sheldon Frith

Rust + Bevy implementation of GPU accelerated collision detection and standard CPU collision detection (narrow phases only), designed to realistically test the performance differences between each technique.

How to Run:

Install the latest nightly version of Rust with rustup
Clone the repo
Cargo run --release to test it compiles properly
Go through the "run_tests.ipynb" jupyter notebook (requires python 3) to replicate my full comparison tests.

Who this is useful for?

People using Rust for...

🎮 Game development
🧬 Simulations

... needing faster ⏩ collision detection.

Especially if you are already using the Bevy engine, although the code can be adapted to work with any rust codebase.

Performance Results:

TLDR

GPU acceleration can provide major performance increases over CPU-based collision detection ${\textsf{\color{lightgreen}starting at around 15k collisions per frame}}$ and higher.

% Frame Time Reduction using GPU:

Note the logarithmic scale of the x axis ABOVE.

And here is a zoomed version to show in more detail the point where GPU acceleration becomes valuable (NOT log scale):

Raw Frame Time:

Slightly more zoomed-in:

And here is a fully zoomed-in version to show the critical point where GPU acceleration becomes valuable:

Rationale 💡

I needed performant collision detection for a much larger scale than normal; hundreds of thousands of simultaneously colliding entities, at least. I tried popular existing collision detection solutions, but my brief testing indicated that their performance was unacceptable for the scales I required.

Caveats:

Not Suitable For:

❌ Web-based applications, because they don't have low level GPU access.
❌ Games with very high existing GPU usage. However, ${\textsf{\color{lightgreen}However, for most games the extra GPU usage is small enough to not be a problem:}}$ Collision detection for 160k collisions per frame, for example, used only about 7% of GPU capacity (RTX 3070 laptop version).

Narrow vs Broad Phase

See narrow vs broad phase.

This technique is only for narrow-phase of collision detection. If performance is an issue, you should first prioritize implementing a performant broad-phase to your collision detection as this is an easier way of making big performance gains. And for truly massive simulations a broad phase is required, because of the practical limits of narrow-phase collosion detection.

Practical Upper Limits on Collisions/Frame

around 200-300k collisions per frame for videogame applications. Total collisions can be much higher if you also implement broad-phase filtering (see above).
tens or hundreds of millions of collisions per frame, limited only by the RAM available for storing all of the collision pairs.

Further Performance Improvements

The main waste with GPU based collision detection is having to pre-allocate a lot of memory which we don't actually end up using, since we don't know ahead of time the number of collisions that will be detected. Testing indicates anything that can be done to pre-estimate the number of collisions yields large performance improvements (this is the purpose of the max_detectable_collisions_scale variable in the GPU code, but a lot of improvements can still be made to that part of the code).
A major bottleneck is the render device's maximum storage buffer size. If there is a way to safely increase this buffer size limit, performance can be dramatically improved. (This may be hardware limited, I haven't had the time to look into it yet.)
Switch to integer positions and integer math instead of floating point. This requires client code to use integer positions, which is less convenient, which is why the code currently uses floating point positions.
For truly massive simulations, running batches in parallel in order to utilize more of the GPU may be possible.

Inlining

You may notice that simply combining the collision processing with the collision detection can dramatically improve the CPU algorithm's speed so that it is actually faster than the GPU method. However we have to keep in mind that the same thing can be said for the GPU algorithm. If we also put collision processing directly onto the GPU we will also gain dramatic performance improvements.

If you are trying to get the absolute best possible performance in your application you will probably have to use this strategy, but otherwise you should avoid it because it creates highly coupled, difficult to maintain code.

I have not done this, for either CPU or GPU, because I want this test to be representative of the general case, where we don't know what exactly the client is going to do with the collisions detected.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
project		project
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU-Accelerated Collision Detection

By Sheldon Frith

Table of Contents:

How to Run:

Who this is useful for?

Performance Results:

TLDR

% Frame Time Reduction using GPU:

Raw Frame Time:

Rationale 💡

Caveats:

Not Suitable For:

Narrow vs Broad Phase

Practical Upper Limits on Collisions/Frame

Further Performance Improvements

Inlining

About

Releases

Packages

Languages

License

Sheldonfrith/gpu_collision_detection_bevy

Folders and files

Latest commit

History

Repository files navigation

GPU-Accelerated Collision Detection

By Sheldon Frith

Table of Contents:

How to Run:

Who this is useful for?

Performance Results:

TLDR

% Frame Time Reduction using GPU:

Raw Frame Time:

Rationale 💡

Caveats:

Not Suitable For:

Narrow vs Broad Phase

Practical Upper Limits on Collisions/Frame

Further Performance Improvements

Inlining

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages