Will change film sizes causes issues for the efficiency of mega kernels? #1169

saedrna · 2024-05-11T09:15:38Z

saedrna
May 11, 2024

Hi there!

I am working on retrieving the reflectance (albedo) of a vase from real photos using inverse rendering. Since part of the image is transparent, I apply dataset transformations to subset portions of the image, employing random scaling and cropping techniques. I use crop_offset_x and crop_width for the film as specified below. Because the image is randomly scaled, the film dimensions ("width": film_width and "height": film_height) will vary with almost every iteration.

I discovered that turning off the mega kernel feature of drjit significantly improves the speed of the entire inverse rendering and gradient optimization process, nearly doubling the performance (but huge amount of more VRAM, about 3x). By examining the logs, I noticed that almost every drjit.eval call realted to Optix results in a kernel cache miss. I suspect that varying film sizes might eventually lead to different mega kernels, which could be causing the cache miss issue.

  dr.set_flag(dr.JitFlag.LoopRecord, False)
  dr.set_flag(dr.JitFlag.VCallRecord, False)
  dr.set_flag(dr.JitFlag.VCallOptimize, False)

Is this the issue? Or in which case, will the cache still be REUSED?

[ ] Change the width and height of the film
[ ] Change the crop_offset and crop_width
[ ] Change the fov of sensor
[ ] Change the to_world

  for i in range(B):
      to_world = batch["to_world"][i].cpu().numpy()
      film_width = batch["film_width"][i].item()
      film_height = batch["film_height"][i].item()
      fov_x = batch["fov_x"][i].item()
      crop_offset_x = batch["crop_offset_x"][i].item()
      crop_offset_y = batch["crop_offset_y"][i].item()
      crop_width = batch["crop_width"][i].item()
      crop_height = batch["crop_height"][i].item()

      film = mi.load_dict(
          {
              "type": "hdrfilm",
              "width": film_width,
              "height": film_height,
              "crop_offset_x": crop_offset_x,
              "crop_offset_y": crop_offset_y,
              "crop_width": crop_width,
              "crop_height": crop_height,
              "rfilter": {"type": "box"},
              "pixel_format": "rgba",
          }
      )
      sensor = mi.load_dict(
          {
              "type": "perspective",
              "fov": fov_x,
              "to_world": mi.ScalarTransform4f(to_world),
              "film": film,
          }
      )

      result = mi.render(self.scene, params=self.params, sensor=sensor, spp=num_spp * 2, spp_grad=num_spp)

saedrna · 2024-05-11T13:05:47Z

saedrna
May 11, 2024
Author

Almost all the missed cache has form like below. And after several iterations, each OptiX kernel is only used for a single time, e.g. id of 7812ed4459359786 only occur once. Other kernels have been launched multiple times.

jit_eval(): launching 1 kernel.
  -> launching 7812ed4459359786 (via OptiX, n=4194304, in=55, out=0, se=18, ops=4545, jit=1.1142 ms):
     cache miss, build: 198.143 ms, 0 B.
jit_eval(): done.

jit_eval(): launching 1 kernel.
  -> launching e8e7f22bee765614 (via OptiX, n=8388608, in=48, out=0, se=5, ops=3489, jit=776 us):
     cache miss, build: 153.124 ms, 0 B.
jit_eval(): done.
jit_eval(): launching 2 kernels.
  -> launching 12dc40514a9d0185 (via OptiX, n=8388608, in=53, out=0, se=18, ops=3685, jit=911.1 us):
     cache miss, build: 174.074 ms, 0 B.
  -> launching bf853982b393f43a (n=1048576, in=1, out=1, ops=22, jit=8.7 us):
jit_eval(): done.

4 replies

merlinND May 14, 2024
Collaborator

Hello @saedrna,

Since you already have a convenient setup to test your hypothesis, how about running the following test:

Switch to pure forward rendering (no inverse rendering)
Vary only one of: to_world, film dimensions, crop dimensions, crop offset, fov
Run the for loop and observe the logs to find out if the kernels are unique or re-used.

You can repeat this experiment with each quantity to find out which one(s) trigger recompilation.

If there are no recompilations in forward-only mode, then can your please share the definition of your loss function?

saedrna May 18, 2024
Author

Sorry for the late, I have out this week. I think I found the reason for the cache miss.
I remove almost all the code and only remain mi.render, and I also remove the part for gradient optimization. So this is the case for forward rendering only.
I have test three cases as below.

Film size is the whole image, no crop and varying to_sensor (according to SFM): cache HIT in every iteration.
Film size is the whole image, crop to fixed window (0,0,1024,1024 for example) and varying to_sensor (according to SFM): cache HIT in every iteration.
Film size is the whole image, crop to different window (according to the real image part) and varying to_sensor (according to SFM): cache MISS in every iteration.

Therefore, it seems to me that as long as the camera space ray information is different, cache is MISSED for the rendering.

merlinND May 21, 2024
Collaborator

Thanks for running these tests. Indeed, the Film class stores the crop size and offset as scalars (CPU-side literals):

mitsuba3/include/mitsuba/render/film.h

Lines 220 to 224 in c9e881b

    
           protected: 
        
               ScalarVector2u m_size; 
        
               ScalarVector2u m_crop_size; 
        
               ScalarPoint2u m_crop_offset; 
        
               bool m_sample_border;

If not special care is taken when using such literal values, they end up baked into the compiled kernels. And indeed, they are used by the Perspective sensor as-is:

mitsuba3/src/sensors/perspective.cpp

Lines 173 to 176 in c9e881b

    
           void update_camera_transforms() { 
        
               m_camera_to_sample = perspective_projection( 
        
                   m_film->size(), m_film->crop_size(), m_film->crop_offset(), 
        
                   m_x_fov, Float(m_near_clip), Float(m_far_clip));

However, the result of that computation is made opaque at the end of the function:

mitsuba3/src/sensors/perspective.cpp

Lines 197 to 198 in c9e881b

    
           dr::make_opaque(m_camera_to_sample, m_sample_to_camera, m_dx, m_dy, m_x_fov, 
        
                           m_image_rect, m_normalization, m_principal_point_offset);

So instead, I think that you should look into other usages of crop_offset() and crop_size(), for example in the Integrator and ImageBlock. If you see that quantity participating in a computation together with a JIT array (e.g. pixel position + crop offset), try replacing the literal crop offset by an opaque version (dr::opaque() or dr::make_opaque()).

saedrna May 23, 2024
Author

Honestly, I do not quite understand the opaque mechanism of drjit. I found this thread [#802], and tried to do similar thing like below as in the issue mentioned.

    params["sensor.to_world"] = to_world
    params["sensor.x_fov"] = x_fov
+   dr.make_opaque(params['sensor.to_world'])
+   dr.make_opaque(params['sensor.x_fov'])
    params.update()

Now my sensor part code looks like this, first I attach the sensor to the scene.

    film = mi.load_dict(
        {
            "type": "hdrfilm",
            "rfilter": {"type": "box"},
            "pixel_format": "rgba",
        }
    )
    sensor = mi.load_dict(
        {
            "type": "perspective",
            "fov": 45.0,
            "film": film,
        }
    )

    scene = mi.load_dict(
        {"type": "scene", "integrator": integrator, "shape": mesh, "light": envmap, "sensor": sensor}
    )

Then I make the parameter opaque in the render iterations.

    sensor_params = {}
    sensor_params["sensor.film.size"] = mi.ScalarVector2u(film_width, film_height)
    dr.make_opaque(sensor_params["sensor.film.size"])
    # It seems that if film size is not update first, when update crop size may cause assertion fails (if larger due to different scale)
    self.params.update(sensor_params)
    sensor_params["sensor.film.crop_size"] = mi.ScalarVector2u(crop_width, crop_height)
    sensor_params["sensor.film.crop_offset"] = mi.ScalarPoint2u(crop_offset_x, crop_offset_y)
    sensor_params["sensor.x_fov"] = mi.Float(fov_x)
    sensor_params["sensor.to_world"] = to_world
    for key in sensor_params.keys():
        dr.make_opaque(sensor_params[key])

    if self.hparams.camera_space_light:
        sensor_params[self.key_lgt_xform] = to_world

    self.params.update(sensor_params)
    result = mi.render(self.scene, params=self.params, spp=num_spp * 2, spp_grad=num_spp)

But this still do not solve the cache miss issue.

saedrna · 2024-05-23T15:49:59Z

saedrna
May 23, 2024
Author

I manage to make two different reproducible cases that shows the problem. Changing the sensor creates 5 caches HIT but Changing the film causes cache miss. I think changes fov_x will absolutely change the rays, but the caches are hitted (strange to me).

I REMOVE the directory ~/.drjit before running the script below

import mitsuba as mi
import drjit as dr

mi.set_variant('cuda_ad_rgb')

def test_sensor():
    scene = mi.load_dict(mi.cornell_box())
    img = mi.render(scene)  # Launches 2 kernels
    dr.eval()

    dr.set_log_level(dr.LogLevel.Warn)  # Suppress kernel launches
    params = mi.traverse(scene)

    for i in range(5):
        dr.set_log_level(dr.LogLevel.Warn)  # Suppress kernel launches
        params['sensor.x_fov'] = params['sensor.x_fov'] + 0.5
        params['sensor.to_world'] = params['sensor.to_world'] @ mi.Transform4f.translate([0.1, 0, 0])
        dr.make_opaque(params['sensor.to_world'])
        dr.make_opaque(params['sensor.x_fov'])
        params.update()
        dr.eval()  # In case there are scheduled variables

        dr.set_log_level(dr.LogLevel.Info)  # Log kernel launches
        img = mi.render(scene)
        mi.Bitmap(img).write(f'sensor_{i}.exr')


def test_film():
    scene = mi.load_dict(mi.cornell_box())
    img = mi.render(scene)  # Launches 2 kernels
    dr.eval()

    dr.set_log_level(dr.LogLevel.Warn)  # Suppress kernel launches
    params = mi.traverse(scene)

    for i in range(5):
        dr.set_log_level(dr.LogLevel.Warn)  # Suppress kernel launches
        params['sensor.film.crop_offset'] = mi.ScalarVector2u(i * 5, i * 5)
        params['sensor.film.crop_size'] = mi.ScalarVector2u(128, 128)

        dr.make_opaque(params['sensor.film.crop_offset'])
        dr.make_opaque(params['sensor.film.crop_size'])
        params.update()
        dr.eval()  # In case there are scheduled variables

        dr.set_log_level(dr.LogLevel.Info)  # Log kernel launches
        img = mi.render(scene)
        mi.Bitmap(img).write(f'film_{i}.exr')


print('Launch sensor kernels ...')
test_sensor()
print('Launch film kernels ...')
test_film()

I also attached the log below.

Launch sensor kernels ...
jit_eval(): launching 1 kernel.
  -> launching db6476cb2f433782 (via OptiX, n=4194304, in=48, out=0, se=1, ops=1217, jit=375.233 us):
     cache miss, build: 106.832 ms, 0 B.
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 694449ebc8bb02b2 (n=196608, in=1, out=1, ops=21, jit=7.88 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching db6476cb2f433782 (via OptiX, n=4194304, in=48, out=0, se=1, ops=1217, jit=404.023 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 694449ebc8bb02b2 (n=196608, in=1, out=1, ops=21, jit=5.102 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching db6476cb2f433782 (via OptiX, n=4194304, in=48, out=0, se=1, ops=1217, jit=380.82 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 694449ebc8bb02b2 (n=196608, in=1, out=1, ops=21, jit=4.811 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching db6476cb2f433782 (via OptiX, n=4194304, in=48, out=0, se=1, ops=1217, jit=382.372 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 694449ebc8bb02b2 (n=196608, in=1, out=1, ops=21, jit=4.58 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching db6476cb2f433782 (via OptiX, n=4194304, in=48, out=0, se=1, ops=1217, jit=810.283 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 694449ebc8bb02b2 (n=196608, in=1, out=1, ops=21, jit=9.942 us):
jit_eval(): done.
Launch film kernels ...
jit_eval(): launching 1 kernel.
  -> launching bf2e82abe1a3e3af (n=1, in=2, out=1, ops=9, jit=8.293 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching f8ff08575273c672 (n=1, in=2, out=1, ops=10, jit=2.822 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching cbfb9a23d3173c8c (n=1, in=1, out=1, ops=7, jit=1.829 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 13289b644e29b02a (via OptiX, n=4194304, in=48, out=0, se=1, ops=1217, jit=547.959 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 694449ebc8bb02b2 (n=196608, in=1, out=1, ops=21, jit=6.657 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 73adc348244ba07e (via OptiX, n=1048576, in=48, out=0, se=1, ops=1217, jit=391.27 us):
     cache miss, build: 109.276 ms, 0 B.
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 694449ebc8bb02b2 (n=49152, in=1, out=1, ops=21, jit=12.318 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 2a4d7243d54e468c (via OptiX, n=1048576, in=48, out=0, se=1, ops=1222, jit=389.18 us):
     cache miss, build: 108.419 ms, 0 B.
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 694449ebc8bb02b2 (n=49152, in=1, out=1, ops=21, jit=8.433 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 4e26a70e9de7ab1b (via OptiX, n=1048576, in=48, out=0, se=1, ops=1223, jit=389.113 us):
     cache miss, build: 110.472 ms, 0 B.
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 694449ebc8bb02b2 (n=49152, in=1, out=1, ops=21, jit=8.43 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching ea08e7b15709d7f6 (via OptiX, n=1048576, in=48, out=0, se=1, ops=1223, jit=387.838 us):
     cache miss, build: 109.513 ms, 0 B.
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 694449ebc8bb02b2 (n=49152, in=1, out=1, ops=21, jit=8.989 us):
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 0a53a94d12cbb4bf (via OptiX, n=1048576, in=48, out=0, se=1, ops=1223, jit=390.398 us):
     cache miss, build: 105.258 ms, 0 B.
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 694449ebc8bb02b2 (n=49152, in=1, out=1, ops=21, jit=8.186 us):
jit_eval(): done.
jit_shutdown(): detected variable leaks:
 - variable r930 is still being referenced! (ref=1, ref_se=0, type=void, size=1, stmt="", dep=[0, 0, 0, 0])
jit_shutdown(): 1 variables are still referenced!
jit_malloc_shutdown(): leaked
 - device memory: 256 B in 4 allocations

Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)

0 replies

njroussel · 2024-05-29T12:46:39Z

njroussel
May 29, 2024
Collaborator

Hi @saedrna

To answer your original questions, and as you've observed in your experiments, kernels are/should be reused in the following cases
[ ] Change the width and height of the film
[ ] Change the crop_offset and crop_width
[x] Change the fov of sensor
[x] Change the to_world

Why not films? There's actual a PR and an issue about this already: #920 and #908. The PR isn't perfect and we never got around to totally fixing it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will change film sizes causes issues for the efficiency of mega kernels? #1169

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Will change film sizes causes issues for the efficiency of mega kernels? #1169

saedrna May 11, 2024

Replies: 3 comments · 4 replies

saedrna May 11, 2024 Author

merlinND May 14, 2024 Collaborator

saedrna May 18, 2024 Author

merlinND May 21, 2024 Collaborator

saedrna May 23, 2024 Author

saedrna May 23, 2024 Author

njroussel May 29, 2024 Collaborator

saedrna
May 11, 2024

Replies: 3 comments 4 replies

saedrna
May 11, 2024
Author

merlinND May 14, 2024
Collaborator

saedrna May 18, 2024
Author

merlinND May 21, 2024
Collaborator

saedrna May 23, 2024
Author

saedrna
May 23, 2024
Author

njroussel
May 29, 2024
Collaborator