You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The refine_map function is running quite slowly, even on a decent system. We use prange from Numba to iterate over each pixel of reconstruction space. At each pixel, we need to access a masked subset of 8 icolf columns (masking the sinogram).
Quite frequently when monitoring with htop, the processes that Numba creates are sitting in a D state, indicating uninterruptable sleep. Checking the Wait Channel with ps -o pid,stat,wchan:30,command -x gives:
PID STAT WCHAN
55794 Dl do_user_addr_fault
I believe this indicates a pagefault caused by our memory access pattern - we have many parallel processes (could be up to 196 at the ESRF) all wanting different portions of the same columnfile.
Is there a way we can mitigate this with the right columnfile sorting, or clever chunking/grouping?
I believe this indicates a pagefault caused by our memory access pattern - we
have many parallel processes (could be up to 196 at the ESRF) all wanting
different portions of the same columnfile.
Pagefault = you asked for a new page because you are allocating memory (this is
usually the problem).
Read memory access = there is no conflict between threads for reads, but they
share the same L1/L2/L3 cache. It might be better for adjacent threads to work
on adjacent points in space (e.g. grid tiles).
Multiprocessing vs threading = each process manages it's own heap and so you
avoid thread conflicts on memory management.
Did you try to run under py-spy? This might give clues. Otherwise
https://github.com/pythonspeed/profila
Given that you are merging reflections (line 301, perhaps the weights are not
optimal here): you could make a static array for each thread with dimensions:
[ hmax, kmax, lmax, 2, NY ]
Then merge the reflections into this static array. It is the same idea as the
sinogram merging that I added in #331 (moment sinograms).
Not sure this help ;-)
The
refine_map
function is running quite slowly, even on a decent system. We useprange
fromNumba
to iterate over each pixel of reconstruction space. At each pixel, we need to access a masked subset of 8icolf
columns (masking the sinogram).Quite frequently when monitoring with
htop
, the processes thatNumba
creates are sitting in aD
state, indicating uninterruptable sleep. Checking the Wait Channel withps -o pid,stat,wchan:30,command -x
gives:I believe this indicates a pagefault caused by our memory access pattern - we have many parallel processes (could be up to 196 at the ESRF) all wanting different portions of the same columnfile.
Is there a way we can mitigate this with the right columnfile sorting, or clever chunking/grouping?
ImageD11/ImageD11/sinograms/point_by_point.py
Line 1385 in 83ad4d7
The text was updated successfully, but these errors were encountered: