Slow execution of point-by-point refinement #348

jadball · 2024-11-09T12:25:27Z

The refine_map function is running quite slowly, even on a decent system. We use prange from Numba to iterate over each pixel of reconstruction space. At each pixel, we need to access a masked subset of 8 icolf columns (masking the sinogram).

Quite frequently when monitoring with htop, the processes that Numba creates are sitting in a D state, indicating uninterruptable sleep. Checking the Wait Channel with ps -o pid,stat,wchan:30,command -x gives:

PID STAT WCHAN
55794 Dl do_user_addr_fault

I believe this indicates a pagefault caused by our memory access pattern - we have many parallel processes (could be up to 196 at the ESRF) all wanting different portions of the same columnfile.

Is there a way we can mitigate this with the right columnfile sorting, or clever chunking/grouping?

ImageD11/ImageD11/sinograms/point_by_point.py

Line 1385 in 83ad4d7

    
           def refine_map(refine_points, all_pbpmap_ubis, ri_col, rj_col, sx_grid, sy_grid, mask,  # refinement stuff

The text was updated successfully, but these errors were encountered:

jonwright · 2024-11-12T07:43:32Z

I believe this indicates a pagefault caused by our memory access pattern - we have many parallel processes (could be up to 196 at the ESRF) all wanting different portions of the same columnfile.

Pagefault = you asked for a new page because you are allocating memory (this is usually the problem). Read memory access = there is no conflict between threads for reads, but they share the same L1/L2/L3 cache. It might be better for adjacent threads to work on adjacent points in space (e.g. grid tiles). Multiprocessing vs threading = each process manages it's own heap and so you avoid thread conflicts on memory management. Did you try to run under py-spy? This might give clues. Otherwise https://github.com/pythonspeed/profila Given that you are merging reflections (line 301, perhaps the weights are not optimal here): you could make a static array for each thread with dimensions: [ hmax, kmax, lmax, 2, NY ] Then merge the reflections into this static array. It is the same idea as the sinogram merging that I added in #331 (moment sinograms). Not sure this help ;-)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow execution of point-by-point refinement #348

Slow execution of point-by-point refinement #348

jadball commented Nov 9, 2024

jonwright commented Nov 12, 2024 via email

Slow execution of point-by-point refinement #348

Slow execution of point-by-point refinement #348

Comments

jadball commented Nov 9, 2024

jonwright commented Nov 12, 2024 via email