The project is a Hybrid GI solution, partly inspired by Lumen and DDGI. The main core idea is to provide a complete GI solution (including Indirect-Light, Infinite-Bounce, Emissive-Lighting, Glossy-Reflection, Shadow, AO) that can run on both Raytracing-supported and non-raytracing hardware. Due to the shortage of time, there are still some performance issues in this project, but as a throwaway, I believe it can provide some new ideas to most of the people who are interested in GI.
Before we start, let's talk about DDGI (Dynamic-Diffuse-Global-Illumination)
Among the original implementations of DDGI ( description, paper), DDGI is a real-time based probe and ray-tracing technique that mainly addresses the global light diffuse reflection term of dynamic scenes and light sources.
The traditional light probe is aim to collect light information from the perspective of the probe, and when a point on the screen is to going to be shaded, getting all probes that are close to the shading point, and then interpolation is performed. Therefore, the discrete information obtained by each probe can be reduced to continuous light information through interpolation, but once there is a sudden change in the signal, there can be light leakage problems.
However, the DDGI can avoid light leakage by saving the geometric information of the scene and then by probability.
DDGI Probe uses an Octahedral Map method to store spherical information. The Octahedral Map is a good way to be compressed it into a Texture2D. . (For those familiar with lumen, lumen also uses this way of expressing probes)
Step1. For each Probe generate 100-300 rays, and trace, to get the current ray's hit the color and distance information. Therefore, it is necessary to create Texture2D to record the current information.
//Radiance for every ray.
internal.radiance->setName("DDGI Raytrace Radiance");
internal.radiance->buildTexture(TextureFormat::R11G11B10F, pipeline.raysPerProbe, totalProbes);
//Direction and Depth for every ray.
internal.directionDepth->setName("DDGI Raytrace Direction Depth");
internal.directionDepth->buildTexture(TextureFormat::RGBA16, pipeline.raysPerProbe, totalProbes);
Since the diffuse GI of the point where the light hits is provided by the Probe of the previous frame, infinite bounce can be achieved in this way.
The vertical axis is the ProbeId and the horizontal axis is the Radiance obtained for each ray.
Step2. Once the Radiance and Depth are obtained, the probe can be updated, i.e., the Radiance map is used to update the Irradiance of the Probe (while updating the distance information).
for(int32_t i = 0;i<2;i++)
{
internal.depth[i]->setName("DDGI Depth Probe Grid " + std::to_string(i));
internal.depth[i]->buildTexture(TextureFormat::RG16F, depthWidth, depthHeight); //Depth and Depth^2
internal.irradiance[i]->setName("DDGI Irradiance Probe Grid " + std::to_string(i));
internal.irradiance[i]->buildTexture(TextureFormat::RGBA16, irradianceWidth, irradianceHeight);
}
Step3. Border update
Because DDGI uses bilinear interpolation, special operations are required for the borders.
Step4. Each pixel should be sampled by using 8 probes in its neighborhood. One important is that when sampling with probes, we need to perform a visibility test to prevent light leakage problems.
- Visibility testing. Since there is the problem of a wall between the probe and the sampling point, we need to perform visibility tests to avoid light leakage.
In the above, we know that the distance information
Here we introduce the Chebyshev inequality.
where
When
The data from the previous probe frame is repeatedly used to perform an indirect illumination of the current scene, which is calculated to achieve an infinite bounces.
SDF can be said to be the standard for the next generation engine, which not only provides ray tracing-like algorithms, but also supports SDF soft shadows, SDFAO and GPU-Particle collision detection.
During the Trace phase of DDGI's Probe, we need to record the distance by each ray as well as the color information. The algorithm can be well implemented on hardware that supports ray tracing, but since most machines do not support hardware ray tracing, we need to use other ways to get the color information.
First of all, SDF is a very good acceleration structure that can replace hardware ray tracing in some cases, so in current project, for machines that do not support hardware raytracing, we use SDF for replacement.
For each object in the scene generate its corresponding SDF, i.e. create Texture3D (R16F) with 32-128 (on-demand) resolutions to store distance information. And generate mipmap at the same time.
Since the SDF generation process uses ray tracing method, we can use Uniform Grid , KD-Tree , BVH-Tree to accelerate the construction. In this project, BHV-Tree is used for accelerated intersection.
A padding operation is also performed for each object to prevent the problem of leaks caused by objects that are too thin.
- Step2. Create GlobalDistanceField
After having the above MeshDistanceField, GlobalDistanceField is created and its corresponding Mimmap is generated, since the current algorithm does not use the Lumen-like Detail-Trace (it may be implemented in a future version, but GlobalSDF can work better with DDGI-Cascade). Global is created to accelerate the SDF-Trace.
float minDistance = DistanceMax;
minDistance *= imageLoad(uGlobalSDF,voxelCoord).r;
for (uint i = 0; i < pushConsts.objectsCount; i++)
{
uint objId = pushConsts.objects[i];
MeshDistanceField meshDF = data[objId];
minDistance = min(minDistance, distanceToMeshSDF(minDistance, meshDF, uMeshSDF[objId], voxelWorldPos));
}
In the game, we need to satisfy the real time, so we need to update the GlobalSDF when some elements in the scene are changed. Therefore, for dividing the space, i.e., updating the grid where dynamic objects are moved, in current algorithm we use the Uniform Grid (Chunk).
A attention point is that the generation of Mimmap needs to be calculated manually, we can not calculate the mipmap value by average the original one, we have to choose the minmum value. the way looks like Hi-Z.
Once we have the SDF acceleration structure, we can use the SDF for tracing, but the SDF is essentially a function describing the distance without any color or material information, so we need to build additional data types to generate the color and material information of the corresponding points of the SDF.
The implementation of LumenSurface Cache is a very efficient way to obtain SDF materials, and the same approach is also used in the current algorithm for SDF material representation
Each object in the scene is captured on 6 sides and rendered to a different SurfaceAtlas. This process is the same as the traditional GBuffer process. In the current algorithm, we generate Color/Normal/Depth/PBR/Emissive/LightCache Atlas and TileBuffer for each face.
In order to improve the overall rendering efficiency and avoid overdraw, we need to cull the objects in the scene (including the Tile associated with each object) in this stage.
The lighting just light objects within its range for point and spot lights, and all potentially affected objects (as needed) for parallel light intensities.
The objects outside the camera Far-Plane are culled, and the objects whose volume is smaller than the size threshold are ignored. Since the world space is divided into different Chunks, in order to ensure the performance of SDF after intersection, a culling operation is performed here, i.e., each Chunk keeps the objects that intersect with the current Chunk and sets a maximum limit on the number of objects.
Obtaining the material for a point in the SDF space can be done as follows.
-
Get the world coordinates after intersection, and project the current world coordinates to the corresponding ChunkId.
-
Iterate through all the Tile in the current Chunk, and judge the direction of the normal to the hit point and the orientation of the surface, if the current ray is almost parallel to the surface, then discard it. Otherwise, using normal to represent weight for its surface.
-
If not parallel to the current front, try to sample. Because there is a problem of accuracy after SDF intersection, we obtain 4 points around the current hit point for a bilinear interpolation to ensure the correctness of sampling when sampling.
Afterwards, the Tile data of all affected objects are updated (SDF scene representation does not require real-time, so it can be operated in latency-frames), and a direct lighting is performed using the current SDF-GBuffer and stored in SurfaceLightCache. Because SDF provides visibility queries, the SurfaceDirectLight stage can be directly query shadows using SDF, as shown in the following figure.
Infinite Bounce is the original function of DDGI. One point is that each indirect illumination result will be written back to SDFScene(LightCache) until convergence.
In the case of hardware support, ray tracing can be enabled instead of SDF-Trace, and the other implementation algorithms do not differ much, so I won't go into too much detail in this section.
The reflection of the object surface is basically similar to the implementation of ray tracing and SDF-Trace, which mainly deals with Glossy Reflection and Specular Reflection In the current algorithm implementation, when the surface roughness of the object is less than 0.05, we can think the surface as specular, which uses the specular illumination algorithm
When the material of the object surface is between 0.05 and 0.7, we consider the object material as Glossy at this time, because Glossy material produces reflected light in different directions. To ensure that the sampling is spatially homogeneous, we use Blue noise and Sobol sequence to generate samples, while using importance sampling to ensure its fast convergence (Reference)
When the material roughness is greater than 0.7, we directly sample the current surface using a probe.
Notable points are: SDF can not be specular reflection (because the scene for SDF expression), in order to make up for the shortcomings of specular reflection, you can add screen space reflection in the SDF reflection (not implemented) Lumen
SDF Reflection Raytracing Reflection
SDF can support soft shadow effect, but due to the accuracy of GlobalSDF, it is currently not recommended to use GlobalSDF for visibility calculation.
But provide a implementation. soft shading based on ray tracing method (higher cost)
In the current algorithm, the number of samples for soft shadows 1, so there will be a lot of noise in the scene, in the project used the way of SVGF for denoising (the same reflection).
Current algorithms for real-time ray tracing are basically based on primary sampling + denoise. In this algorithm, the Denoiser is mainly applied to Glossy reflections and Soft shadows.
Gaussian filter is a kind of low-pass filter (its core is to find the average value in the filter kernel), the result after the filter is finished is the low frequency information, so the filtered result will appear the whole screen blurred. But for the game, low-pass filtering is not perfect. Reference
Since Gaussian filtering is a low-pass filter, it does not maintain the boundaries well for images with large signal variations. Therefore, we can use bilateral filtering to process the image. In bilateral filtering, an important reference value is the change in brightness of the current pixel as a weight, i.e., a large change in brightness has a small weight. Reference
In Gaussian filtering, we consider the distance between pixels as the contributing weight, and in bilateral filtering, we consider the brightness of the pixels. In real-time rendering we can use GBuffer to guide the filtering, and if more factors are considered (e.g. depth, normal features, etc.) we call it joint bilateral filtering.
Spatiotemporal Variance-Guided Filtering NVIDA-Link
Among the current GI algorithms we chose SVGF for denoise, which is a denoise algorithm running on Spatial and Temporal, using joint bilateral filtering with the addition of variance analysis and some other features.
In SVGF we use an Atrous for the filtering operation.
In the spatial filtering part, SVGF uses the À-Trous wavelet filter, and also uses the progressive increase of Filter size to reduce the computational resource overhead, as shown in Fig.
for (int yy = -RADIUS; yy <= RADIUS; yy++)
{
for (int xx = -RADIUS; xx <= RADIUS; xx++)
{
const ivec2 coord = coord + ivec2(xx, yy) * STEP_SIZE;
// Edge stop....
}
}
In Atrous, we first consider the depth factor, as in Figure A/B, because AB is on the same face, so in the process of filtering, AB should contribute to each other, but there is a difference in the depth of AB, so for the processing of depth, we will be more biased to use the gradient of depth for the operation.
Next we consider the effect of normals on denoise
As shown in the figure, if the angle between the two points of AB is smaller, it means that the direction of the two points may exist all the way, so the value of $ W_n $ is more convergent with 1
The contribution of color to the weight is mainly reflected in the change of color brightness, i.e. Luminance, but due to the existence of noise, the original noised color may be obtained, so we can correct this problem by variance.
The filtering in time mainly reuses the information from the previous frame, in this case we can think of increasing the number of samples of the current pixel.
In order to obtain information about the previous frame's pixels in a reasonable way, we use the Motion Vector method to record the direction of the camera's movement, so that in the current frame, the specific pixels of the previous frame can be mapped back through the Motion Vector.
The main check is when the angle between two points is less than a specific threshold value, and the normal can be judged using the following formula
If the current Fn is greater than 0, then the two points are not in the same plane and the current point is discarded.
Because we can use the information of GBuffer in this stage, we can get the MeshID information of the corresponding point, and discard the point if the MeshIDs of two points of A/B do not match.