University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3
- Taylor Nelms
- Tested on: Windows 10, Intel i3 Coffee Lake 4-core 3.6GHz processor, 16GB RAM, NVidia GeForce GTX1650 4GB
A Path Tracer is a method of rendering virtual geometry onto the screen. Notably, they do so by simulating how light moves around a scene. This is in contrast to traditional rendering methods, which transform the geometry more directly from world-space to screen-space. While path tracers are slower than traditional renderers, they are able to natively perform much more impressive feats overall.
For example, features such as caustics (the more intense light on the floor at the bottom-left of the image), or complex reflections and re-reflections, are easier to get with path-tracing than other methods.
This particular implementation is running on the GPU (graphics processing unit) via the CUDA framework, which allows us to parallelize the rigorous task of doing all our calculations for each pixel one at a time. This allows for significant speed-ups over CPU path tracers.
For an Object file in the scene description, it may be given the type “mesh.” Their transformation parameters act the same way, but you may also specify a “FILE” string. This can be the path to an .obj
file, or a .gltf
file (the latter of which must be in the same place as its assets).
This loads all the file’s triangles into our data structures, and can then be rendered sensibly. As of now, no material characteristics are loaded in; however, textures are recognized for .gltf
files, across a few metric types. All of my gltf files came from Sketchfab's automated conversion system, so certain conventions may have become baked into the code.
I implemented a simple axis-aligned bounding volume for my loaded meshes. That is to say, I constructed a "box" around each based on the maximum and minimum x
, y
, and z
values for all of my triangle vertices, and did ray intersection tests on the box before attempting to intersect with each triangle in the mesh.
The image of the three guns was a good testbed for bounding box testing; the scene had a lot of triangles, but they were very densely localized. As such, cutting out a significant number of the triangle intersections was relevant.
I saw, with bounding-box culling, a time of 144s
to reach 500 iterations. Without, the same scene took 236s
. This, additionally, gives evidence that one of the more time-consuming parts of the path-tracer is intersecting each ray with the scene geometry; taking out roughly 2/3 of the intersections tests produced a 61% speedup.
Using the CUDA texture memory, I was able to hook up .gltf
files with their textures within the ray tracer. I worked almost entirely from files downloaded from Sketchfab, so their naming conventions may have ended up baked in to my implementation. (Specifically, the file naming for their texture image files is how I distinguish between different types of texture mappings.)
I acheived this by loading each of four different potential texture layers into the GPU's texture memory: color, emissivity, metallicRoughness (which also encoded an (unused) ambient occlusion parameter), and a bump texture (normal mapping). (These were what I was finding from the Sketchfab models I was downloading.) If a loaded model had a texture, I would keep track of its existence, and when applicable, pull it from memory. This was able to cut down on some amount of frustration of interpolating between values, as CUDA handled that for me. This convenience was counterbalanced by the frustration of figuring out how to navigate texture memory like CUDA wanted me to.
Here is a series of images of the same scene, with differing level of textures applied to them.
No Textures Color Texture Color and Emissivity Textures Color, Emissivity, and Metallic Textures Color, Emissivity, Metallic, and Normal TexturesThe above images took the following times to complete (across 2000 iterations):
Textures | None | Color | Emissive | Metallic/Roughness | Normal |
---|---|---|---|---|---|
Time(s) | 171 | 172 | 173 | 189 | 198 |
It is unsurprising that the color textures and emissivity textures did not affect performance much, as their values were easy to pull directly from texture memory. I imagine the metallic/roughness textures worsened performance at least a little bit because they introduced variance in the rendering method for the mesh; instead of always being diffuse, it needed to occasionally bounce off in a specular pattern, which may (due to the roughness) have involved a power function for the non-perfect specularity. The normal maps, as well, likely hurt performance for how the variation amongst warps changed.
I also implemented a very simple procedural texture for use on my cube primitives (though, in theory, it could be applied to the triangle meshes as well without too much issue). It was a simple checkerboard color texture, as well as a slight set of normal variations to create a waviness in a rough grid pattern. It is, reluctantly, shown here (the back wall combines the checkerboard and some bumpiness/"waviness"):
However, all it did was fill in texture memory in the same way that the loaded textures did; given that the GPU accesses both in the same way, there was no noticeable performance difference.
Implemented specular reflections with configurable exponent. Pictured below is a comparison of various exponential values for specularity. Notice that the very high value is effectively mirror-like; with such a highly specular object, the slight variations we get off the "mirror" direction are small enough to, effectively, not alter the ray at all. In this fashion, if we wished, we could eliminate the idea of "reflectivity" from our material description altogether.
Note: I used powers of three solely because they created a reasonable range of shininess across 7 samples; I have no idea if there was any computational speedup or slowdown because of this.
Refraction turned out to be trickier than I anticipated. Notably, it made triangle intersection tests more difficult, because I now had to check my meshes for backface triangles. (A smarter implementation than mine might only do so if the material for the mesh as a whole were refractive.) However, allowing for refraction on more complex models meant that I could display much more interesting results.
You can see some of the other objects in the scene through Zelda's dress in this image; in particular, the shadow of the ball behind her.
Of course, it's also better with a more interesting background:
Here, you see the emissive materials behind her come through, distorted but still clear. The "candle" light through her hand is also visible.
I implemented some simple antialiasing as well, modifying the starting camera ray within its pixel in a random fashion. The differences may be seen between these two images:
No Antialiasing AntialiasingYou can spot the difference around the sphere, in particular; the presence of "jaggies" in the non-antialiased picture gives it away.
In order to attempt to reduce warp divergence, and better make use of the GPU resources, I implemented a pass to allow for material sorting between computing intersections and shading the materials.
For a simple scene, such as scenes/checkersdemo.txt
, I was able to get 200 iterations deep in 20s
without material sorting. With sorting, it took 50s
.
Now, that was with only a dozen or so primitives; surely, when dealing with hundreds or thousands of triangles, the performance will be improved!
For a more complex scene, such as scenes/teapotdemo.txt
(containing some 16,000 triangles), without sorting the materials, it took 75s
to get to 200 iterations; with sorting, it took 79s
to get to 200 iterations. Still not a performance boost, but better nonetheless.
When working with a significantly complex scene, such as scenes/bunnydemo.txt
(containing 144,000 triangles), it took 184s
to get to just 50 iterations (pardon my impatience); with material sorting, it took 186s
total.
I can honestly conclude that it did not make much of a difference in my application; I suspect that what warp divergence I encountered came from the randomness involved in the ray-scattering function, which happened after the material sorting. Additionally, many of the objects in my scenes were spatially very localized as well, which probably cut down on the unsorted divergence.
It was useful to save the first camera-ray and scene intersection across iterations. I did not have the chance to do an extensive performance analysis, but an off-the-cuff trial on scenes/altardemo.txt
showd that with the optimization, running 1000 iterations took 98s
, as opposed to 111s
without. Given a max depth of 12
on that scene, that's a pretty reasonable speedup.
I made use of stream compaction to get rid of the terminated paths as I went through the depth of each iteration, removing those that had hit nothing or an emitter. I used the thrust
libraries to do so, and unfortunately, I did not get a chance to examine the performance implications of doing so. I imagine they are significant.
The OpenImageDenoiser was a particularly interesting (albeit late) addition. It uses machine learning (read: magic) to take some of the gritty noise out of a ray-traced image, and construct it as if it were closer to being converged.
I elected to only feed the image in at the very end of a run, so as to not sully the process of accumulating light up to that point. Notably, running an image for longer improves the final image, but I was able to get smoother images from the beginning than I would have anticipated, feeding just the initial normal and albedo maps into the program. See the following comparison of image qualities of the same scene after a different number of iterations through the path tracer:
50 iterations, not filtered 50 iterations, filtered 200 iterations, not filtered 200 iterations, filtered 2000 iterations, not filtered 2000 iterations, filteredAs you can see, the filtering smoothed out even particularly rough images, but also eliminated some significant actual detail; only the last image was able to acheive a good amount of detail that got through the filtering process.
Most of the switches for features and performance in the code are in #define
variables at the top of utilities.h
. Note that, if antialiasing is used, the first intersection is not cached.
I put the tinyobjloader
library contents into the external
folder, so I had to include the relevant header and source file in the project, as well as mark their locations to be included and linked.
I added the OpenImageDenoiser library to the external
folder, and so added the line include_directories(external/oidn/include)
so that the headers could be read sensibly. Additionally, I added the subdirectory external/oidn
, and linked the OpenImageDenoise
library to the target_link_libraries
function. This did not end up doing all the necessary linking (see below), but it helped.
Notably, this required having Intels tbb
installed; I acheived this by signing up for, and subsequently installing, Intel Parallel Studio. Time will tell if I made the right decision.
Additionally, I decided to compile this all with C++17
, in case I decided to make use of the std::filesystem
library (a slight quality of life fix over just calling it via std::experimental::filesystem
). I admittedly am not sure whether this change actually took.
Look, I don't like what I did either.
I manually copied the OpenImageDenoise.dll
and tbb.dll
from their rightful homes to the directory where my built Release
executable was, so that it might run.
Certainly, CMake has a way to do this, but as somebody who is not a CMake wizard at this point, this will have to do.
- Models downloaded from Morgan McGuire's Computer Graphics Archive
- Bunny, Dragon, Teapot, Tree, Fireplace Room
- Turbosquid
- Wine Glass by OmniStorm (unused)
- Secondary Wine Glass by Mig91 (unused)
- Sketchfab
- Fountain by Eugen Shuklin
- Altar by William Chang
- Zelda by ThatOneGuyWhoDoesThings
- Sheik by ThatOneGuyWhoDoesThings
- Hunter Rifle by cotman sam
- Textured Cube by Stakler (used for development)
- Sci-fi blaster by Artem Goyko
- Plasma gun by alx_flameniro
- Used TinyObjLoader library for loading
*.obj
files - Used TinyGltf library for loading
*.gltf
files- I also lifted their
gltf_loader
files from their raytrace examples (and fixed a bug therein). I did not use any other code from the example folder.
- I also lifted their
- OpenImageDenoiser for post-processing
- Formerly: Ray-triangle intersection algorithm stolen from the Wikipedia article for the Moller-Trumbore Intersection Algorithm. Now, using glm, to my chagrin (I swear I could have intersected both front and back faces had I gotten that algorithm to work correctly).