Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from bevy 0.12 to 0.14.2 #39

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

glasspangolin
Copy link

@glasspangolin glasspangolin commented Nov 18, 2024

Key observations of the commit are:

  • more careful memory management to comply with newer stricter WGPU.
    • The prim_raster move from 64 bytes to 32 bytes to satisfy a compile time WGPU limit.
    • the DiReservoirData struct. I recognise it's probably a downgrade since I've gone from 16->64 bytes per entry, but it was more readable for me and I think it solved a WGPU compile issue. Feel free to revert.
    • Figuring out padding and alignment was difficult, and I fear it might be platform dependent validation by WGPU. I found at least on my laptop the limit in size for the prim_raster buffers was 32 (2xRgba32Float). The byte alignment for texture buffers was 256, and the alignment for buffers within the shader ecosystem was 32. You may find yours is different at which point we may have to find a dynamic solution.
    • safe_any_orthonormal_pair as the glam any_orthonormal_pair was flagged by WGPU as a potential source of NaN.
  • upgraded rust-gpu depending on a more recent nightly rust.
    • I think I have to patch in the entire bevy engine from my github just to add the inline_const feature flag.
    • I have also hosted rust-gpu as I feel the slightest change there would break everything, feel free to revert to the official repo or your own hosted version.
  • A lot of subtle changes resulting from bevy 0.12 to 0.14.2, I did what I could to keep the functionality as similar as possible.

I hope this is ok! Feel free to tear this apart as it's my first contribution to a repo. I want to learn and I'm happy to go back to the drawing board!

-- Type naming changes
-- Some fairly large changes with separation of graphics world and game world.
-- New colours.
- WGPU became more fussy:
-- Avoiding extreme float values NaN, Inf.
-- had to change prim raster buffer packing arrangement 64 bytes to 32 bytes (Rgba32Float x 4 -> Rgba32Float x 1 + Rgba16Float x 2).
-- Seems to be more strict about padding and format of textures.
- Attempt to avoid type confusion between glam types and spir_v::glam, with conversion where necessary.
- updated requirement to nightly-2024-04-24 for rust-gpu.
-- Unfortunately, this results in a stubborn:
error[E0658]: inline-const is experimental
Which I found a work around to get a local version of bevy and add #![feature(inline_const)] to the crate root.
… does involve patching all bevy dependencies to add one line. This might be palatable because it is a similar style to the removed naga patch.

- Fixed one last example of newer WGPU versions being fussy with u8 cast as u32.
@Patryk27
Copy link
Owner

Amazing, thanks for preparing the changes! I'm a bit busy at the moment, but I should be able to take a look in a couple of days 🙂

@glasspangolin
Copy link
Author

Thanks! I'm not sure if this is something that makes sense on Github, but maybe this would be best left as a branch for now. I noticed this issue:
Rust-GPU/rust-gpu#50 (comment)
In rust-gpu suggesting they are going to update soon, and it looks like inline_const was stabilised a couple of days after the nightly build rust-gpu requires:
rust-lang/rust#104087
then we might be able to get away with no patching of dependencies!

Copy link
Owner

@Patryk27 Patryk27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks so much for preparing the update - I know things like those can get mundane, especially with the long compilation times, and you did a great job 🙂

I've taken a look and while I haven't reviewed everything yet, I think it's ready for the first round!

One important difference is that the renders are now much dimmer, in both the path-traced and restir-ed modes - before (current main):

image

... after:

image

I think this might be related to changes in gbuffer.rs - base_color is now encoded in 16-bits and the gamma correction is gone (the .powf(1.0 / 2.0) thingie), could you take a look at that?

Thanks & cheers;

.gitignore Show resolved Hide resolved
bevy-strolle/Cargo.toml Show resolved Hide resolved
bevy-strolle/examples/cornell.rs Show resolved Hide resolved
bevy-strolle/src/rendering_node.rs Show resolved Hide resolved
.map(|v| v.physical_position)
.unwrap_or_default();

let fov_y = std::f32::consts::FRAC_PI_4;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's with the manual calculations here? 🤔

We can continue to reuse the matrix from Bevy, it's just that the function name changed from projection.get_projection_matrix() to projection.get_clip_from_view() (bevyengine/bevy#13489).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I found it just didn't work using the bevy projection, and you are correct this is lazy, so I tried to figure out the reason.

What I'm thinking now is it comes down to the implementation of get_clip_from_view:

fn get_clip_from_view(&self) -> Mat4 {
    Mat4::perspective_infinite_reverse_rh(self.fov, self.aspect_ratio, self.near)
}

The "infinite" bit seems to cause a problem and my guess is it ruins the depth mapping down the line. I have changed it to a half way solution where I take the values from the bevy camera and manually run perspective_rh:

let fov_y = p_projection.fov;
let aspect_ratio = p_projection.aspect_ratio;
let near_plane = p_projection.near;
let far_plane = p_projection.far;

let p_mat = Mat4::perspective_rh(fov_y, aspect_ratio, far_plane, near_plane);

I could have misunderstood something, let me know!

@@ -125,7 +140,7 @@ impl Light {
}

pub fn clear_slot(&mut self) {
self.d3.x = f32::from_bits(0);
self.d3.x = f32::from_bits(1);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

@@ -50,7 +65,7 @@ impl Light {
Self {
// TODO incorrect
d0: position.extend(25.0),
d1: color.extend(f32::INFINITY),
d1: color.extend(1000_000f32),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not f32::INFINITY? (including the other places)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning is this temperamental wgpu run time check. To be clear, setting this particular case back to f32::INFINITY doesn't cause any errors, but elsewhere it does and I'm not sure about the specific rules. For example, in another place changing the large number to:

impl TriangleHit {
    pub fn none() -> Self {
        Self {
            distance: f32::INFINITY,
            point: Default::default(),
            normal: Default::default(),
            uv: Default::default(),
            material_id: MaterialId::new(0),
        }
    }

Leads to this error:

Shader validation error:
  ┌─ C:\...\bvh_heatmap-main.spv:1:1
  │
1 │
  │   naga::Expression [23]


    Constant expression [23] is invalid
    Float literal is infinite

I'm just being consistent not using f32::INFINITY or f32::NaN anywhere in strolle-gpu, as wgpu seems to randomly complain on a case by case basis maybe depending on the spir-v generated? I don't know.

@@ -41,6 +41,21 @@ pub struct Light {
pub prev_d2: Vec4,
}

impl Default for Light {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning behind this change? 👀

The comments mention altering the defaults to non-zero radius and range, but considering the place this method is called (strolle/src/lights.rs:111), I would actually expect it to generate an empty light (with zero radius and range).

Using all zeros for defaults also has a nice side effect of allowing the compiler to use memset instead of memcpy when allocating larger arrays - for instance, if you preview the assembly for https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=2bd5c339dab8cdb05c47d3b38d2649c5 you'll see a simple callq *memset@GOTPCREL(%rip), but changing the default range to 32.0 requires compiler to generate an explicit loop, which might be slower.

Those zero-defaults don't matter here for performance, I just thought it'll be nice to mention it!

let emissive_g_u = (self.emissive.y.clamp(0.0, 1.0) * 255.0).round() as u32;
let emissive_b_u = (self.emissive.z.clamp(0.0, 1.0) * 255.0).round() as u32;

let emissive_packed = (emissive_g_u << 8) | emissive_b_u;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the second gbuffer texture is now Rgba16Float, this is invalid and will cause the read values to be different:

#![feature(f16)]

fn main() {
    // Shader A generates the values and stores them:
    let emis_g = 50;
    let emis_b = 25;
    
    let out = f32::from_bits((emis_g << 8) | emis_b);
    
    // When shaders stores pixels into the texture, GPU automatically converts
    // the result to f16 (as specified by the texture format):
    let out = out as f16;
    
    // Shader B now loads the values back again, but whoops - they are
    // different!
    let out = (out as f32).to_bits();
    
    let emis_b = out & 0xff;
    let emis_g = (out >> 8) & 0xff;
    
    dbg!(emis_b, emis_g);
}

That's because using f32::from_bits() etc. here is just a trick to pass arbitrary numbers between consecutive shaders - this works, because when you're writing into a 32-bit data texture, the GPU doesn't care about the contents you're writing, it's all just some data.

But the moment you use a 16-bit texture, you force the GPU to interpret the written numbers strictly as floating-point values, which renders (heh) floating point tricks futile, since casting f32 to f16 preserves value (e.g. 1.23 -> 1.23), not the underlying bit pattern.

We should either use two 32-bit textures or find another way to encode this information, one that doesn't rely on bit tricks.

base_color.w,
]))
};
let base_color = self.base_color.clamp(Vec4::ZERO, Vec4::ONE);
Copy link
Owner

@Patryk27 Patryk27 Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you've kinda swapped base_color and emissive in here, but doing so actually reduces the quality.

base_color comes from the triangle's texture, which is always a 24-bit image, one that has 8 bits for each channel (see strolle/src/images.rs:75) - so encoding this 24-bit base_color in 3 x f32 = 96 bits is just wasteful.

Emissive, on the other hand, is a different story - while base_color remains in range 0..255 (or well, 0..=1 if you consider it "float"ed), an emissive color can be arbitrarily large. Rightfully so, emissives are basically lights and you wouldn't cap light's intensity, it can be as bright as it gets!

That's why the original implementation here stored "quantized" base_color (which wasn't really quantized, since the underlying data is 24-bit anyway) , but it kept emissive as-is - it's because it's alright for an emissive color to be e.g. rgb(500, 500, 500) - that'd be just a bright light.

(while it doesn't make sense for base_color to be rgb(500, 500, 500), since base_color doesn't emit light - base_color basically tells how much light gets reflected, so it can't end up generating light and thus must remain capped between 0..=1, aka 0..255.)

The code before said // TODO doesn't need to use as much space because there's a clever way of encoding emissives - the rgb9e5 format (used e.g. by Kajiya). If you'd like, you can try to take a stab at it 🙂

The most important thing about rgb9e5 and similar formats is that they try to quantize chromaticity, but preserve luminance - intuitively, for a packed (quantized) emissive color we care more about it's "strength" rather than its "chroma" and we can exploit this fact when quantizing it.

@jpx40
Copy link

jpx40 commented Dec 10, 2024

u might need to move this to 0.15 now

@Patryk27
Copy link
Owner

Hi, @glasspangolin - how are things going, do you need some extra help / more time / something's unclear? 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants