-
Hello, I modified the main loop from render_sample to handle non-uniform sampling. It works fine, even with passes, at the cost of a very expensive (and anormal) cost in memory with passes. The adaptive sampling value is retrieved from a Tensor through dr::Loop<Bool> sampling_loop("Adaptive sampling", local_sampling,adaptive_sampling,pos);
while (sampling_loop(local_sampling < adaptive_sampling)) {
dr::scatter(pos, coords, offset + local_sampling);
local_sampling += 1; For passes, I filter the values with Is there a way to improve this structure to avoid cache flushes ? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @Angom8 I think I understood what you're doing. The high memory usage is somewhat expected, I believe. With this approach you effectively need to write I'm confused about the passses here. Are you not updating the positions between them? If so, why are they ran separately? I don't think this matters in any case, but I think I'm misunderstanding your explanation. Fundamentally, I don't think there is any reason why you shouldn't be able to achieve perfect cache reuse here. Every render step has the same set of computations. They only differ in what their initial rays are. |
Beta Was this translation helpful? Give feedback.
Hi @Angom8
I think I understood what you're doing.
The high memory usage is somewhat expected, I believe. With this approach you effectively need to write
pos
to global memory, so that requires a storage ofN_rays * 3 * 4
bytes. In general, you want to avoid storing anything that scales with your number of rays (that's one of the goals of megakernels). You should be able to write this without anyscatter
to generatepos
.I'm confused about the passses here. Are you not updating the positions between them? If so, why are they ran separately? I don't think this matters in any case, but I think I'm misunderstanding your explanation.
Fundamentally, I don't think there is any reason why you sh…