-
SummaryOnce again, thanks for the amazing work and sorry for another silly question. I am trying to implement a custom BRDF where I try to look up some rows from a I would appreciate it if you could give me some feedback. System configurationSystem information: OS: Windows-10 Dr.Jit: 0.4.4 Installed Mitsuba with:
Installed PyTorch with:
DescriptionMy original code is more involved and uses For simplicity, here is a silly reproducer that should roughly give you an idea of what I am trying to do. Unless I do not put the following flags dr.set_flag(dr.JitFlag.VCallRecord, False)
dr.set_flag(dr.JitFlag.LoopRecord, False) the following lines of code do not work on my custom bsdf's # ... my custom BSDF
def sample(
self: mi.BSDF,
ctx: mi.BSDFContext,
si: mi.SurfaceInteraction3f,
sample1: float,
sample2: mi.Point2f,
active: bool = True,
):
cos_theta_i = mi.Frame3f.cos_theta(si.wi)
# fill up the BSDFSample
bs = mi.BSDFSample3f()
active &= cos_theta_i > 0.0
""" NOTE: The original code is like this:
if (unlikely(dr::none_or<false>(active) ||
!ctx.is_enabled(BSDFFlags::DiffuseReflection)))
return { bs, 0.f };
# Somehow cannot combined `active` with the below expression
"""
if not ctx.is_enabled(mi.BSDFFlags.DiffuseReflection):
return (bs, 0.0)
### here is the reproducer
a, b = 10, 9
tmp_idx = dr.clamp(mi.UInt(sample2.x * a), 0, a - 1)
tmp_data = dr.zeros(mi.TensorXf, (a, b))
value = tmp_data[tmp_idx]
###
bs.wo = mi.warp.square_to_cosine_hemisphere(sample2)
bs.pdf = mi.warp.square_to_cosine_hemisphere_pdf(bs.wo)
bs.eta = 1.0
bs.sampled_type = mi.UInt32(+mi.BSDFFlags.DiffuseReflection)
bs.sampled_component = 0
value = self.m_albedo.eval(si, active)
return (
bs,
dr.select(active & (bs.pdf > 0.0), mi.depolarizer(value), 0.0),
) I also tried the following to disable the loop record locally like this but this did not work: ### reproducer
loop_record = dr.flag(dr.JitFlag.LoopRecord)
vcall_record = dr.flag(dr.JitFlag.VCallRecord)
dr.set_flag(dr.JitFlag.LoopRecord, False)
dr.set_flag(dr.JitFlag.VCallRecord, False)
a, b = 10, 9
tmp_idx = dr.clamp(mi.UInt(sample2.x * 10), a, b)
tmp_data = dr.zeros(mi.TensorXf, (10, 9))
value = tmp_data[tmp_idx]
dr.set_flag(dr.JitFlag.LoopRecord, loop_record)
dr.set_flag(dr.JitFlag.VCallRecord, vcall_record)
### I also tried by flattening the For efficiency reasons, I am trying to run things on the megakernel/recorded mode. I've seen different discussions somehow related to this (#1004, #866). But, I am still wondering if it would be possible to access the Is there a way to do this without disabling the loop record? I would really appreciate your feedback! PS Please let me know if you need further information |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hello @sapo17, From the reproducer I am not sure I understand the role of Note however that you will have to gather values "one by one" (from the point of view of each thread), you cannot get a dynamically-sized row of the Please also remember to always share full error messages when reporting issues or asking questions, it helps narrow down the problem. |
Beta Was this translation helpful? Give feedback.
Thanks for the additional detail @sapo17.
I think that the key thing to understand is that in symbolic mode, e.g. inside of a symbolic (recorded) loop of the path tracer, the wavefront size (== width of the variables == number of threads the kernel will be launched with) is fixed.
Outside of a symbolic loop, DrJit would automatically introduce a kernel boundary for you, and launch the different kernels with their required widths. But in the body of a recorded loop, that is not possible.
In your example code, you have:
dr.width(tmp_idx) == dr.width(sample2.x)
: the wavefront size (= number of rays), which is finewavefront_size * n
(?) inget_c…