You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi dear author,
It's an honor to open one issue here, I have compiled your program "raytracing" successfully, and I use nvprof to test the sm_efficiency, which is only 1.79%.
That's actually super interesting, but not very surprising. My code only uses a single SM processor and the P100 has 56 of those some quick back-of-the-napkin math tells us that the SM processor that we are using is being used at 100% because 100 / 56 = 1.7857 or 1.79%.
Now an interesting question would be how can we change the code to use all available SMs? To be clear I don't have an answer but as far as I could tell from Googling around this is not something you want to do as the CUDA scheduler is smart enough to figure out a good allocation for us. My intuition is that since every pixel takes a while to render (by default I run with a depth of 400) there is no need for multiple SM because we already achieve max utilization.
That being said, if you figure out a way to make the code faster I am always interested!
Hi dear author,
It's an honor to open one issue here, I have compiled your program "raytracing" successfully, and I use nvprof to test the sm_efficiency, which is only 1.79%.
thank you
Best Regards
William
The text was updated successfully, but these errors were encountered: