Add support for ndimage.map_coordinates #235

jni · 2021-05-26T00:09:04Z

This could be done in two stages:

Add support assuming that the number of coordinates is small and fits in worker memory. This is useful when grabbing a small number of coordinates from a large distributed array (which happens to be my use case! 😊), and should be relatively easy to do.
Add support when the coordinates array is also distributed. This is more challenging.

jakirkham · 2021-05-26T02:07:05Z

This seems like a map_overlap case. Guess the question is how much overlap (probably a function of spline order)?

Maybe I'm missing something, but what makes the Distributed case harder? It seems like this should be separable by chunks of coordinates. Or am I missing something?

GenevieveBuckley · 2021-05-26T07:05:44Z

This seems like a map_overlap case. Guess the question is how much overlap (probably a function of spline order)?

Yes, except that this is another case where the output isn't going to be the same shape as the input array, which complicates things. We had planned to have a discussion about this in general - I've sent an email to sort out scheduling for next week.

import numpy as np
import scipy.ndimage as ndi
import dask.array as da

image =  da.from_array(np.arange(100.).reshape((10, 10)), chunks=5)


ndi.map_coordinates(np.array(image), inds)
array([ 3.75660377, 24.        ])  # correct output from scipy

image.map_blocks(ndimage.map_coordinates, coordinates=inds, drop_axis=[0,1]).compute()
array([ 3.75660377, 24.        ])  # looks ok, but won't be robust to edge effects between array chunks

image.map_overlap(ndimage.map_coordinates, depth=3, coordinates=inds, drop_axis=[0,1]).compute()
array([18.36856823,  1.        ])  # nope!

GenevieveBuckley · 2021-05-26T07:06:46Z

Maybe I'm missing something, but what makes the Distributed case harder? It seems like this should be separable by chunks of coordinates. Or am I missing something?

I don't think you're missing anything, it seems like it should be achievable.

GenevieveBuckley · 2021-05-26T07:18:13Z

I think we need to add the overlap depth to the indices we're looking up, but I can't quite wrap my head around what boundary conditions we need.

GenevieveBuckley · 2021-05-26T07:31:13Z

Eg:

depth_value = 3
image.map_overlap(ndimage.map_coordinates, depth=depth_value, coordinates=inds+depth_value, boundary=0.0, drop_axis=[0,1]).compute()
array([ 5.09632784, 24.        ])  # still not correct

jni · 2021-05-26T07:45:42Z

@jakirkham the issue I'm pre-empting (but which might indeed not be super serious) is the coordinates I want to extract could be in any chunk, and presented in any order. So I can't just chunk my input, I have to somehow partition it according to the chunk in the raw data that I need to access.

Or I guess I could send all chunks from the coordinates to all chunks of the raw data. But that seems wasteful.

m-albert · 2021-05-26T14:38:38Z

I'd also think that "stage 2" is not easy to solve optimally. For the case of stage 1 (which should cover most use cases for now): The problem I'd see with directly applying map_overlap and dropping the image axes would be that the full image is gathered to be input to map_coordinates. So first the coordinates would need to be assigned to the chunks they map into.

However, applying map_overlap between a chunked coordinate array and the image would probably be inefficient, since many unmapped chunks would be loaded unnecessarily. For applications like this it would be nice to be able to mask entire chunks so that dask graphs could be culled smartly (as suggested by @GenevieveBuckley here dask/dask#7652, see also #217).

Alternatively, for every coordinate the required input slice could be determined (see dask_image.ndinterp.affine_transform for how this depends on the order). Then the coordinates would need to be sent to different tasks, and determining how to do this best is probably a nontrivial optimization problem. A simple (though far from optimal) solution could be to group them by the image chunk they match to, which seems like a natural choice. Then for each group a task could be defined that precisely slices the input array (compared to map_overlap, this would also avoid constructing overlaps for the dimensions where it's not needed).

jakirkham · 2021-05-26T14:42:18Z

Yeah was thinking when I wrote this last night. This sounds kind of like a shuffle, but have been spending a fair bit of time on that use case so may just be seeing use cases that are not there.

m-albert · 2021-05-26T15:01:38Z

Yeah was thinking when I wrote this last night. This sounds kind of like a shuffle, but have been spending a fair bit of time on that use case so may just be seeing use cases that are not there.

True, seems like a similar use case!

Probably a not too bad approximation to the "stage 2" case could be to take a more or less optimized "stage 1" implementation as described above and apply a map_blocks over the input coordinate dask array. The scheduler would then take care of grouping tasks working on related image chunks.

m-albert · 2021-05-28T08:44:41Z

Probably a not too bad approximation to the "stage 2" case could be to take a more or less optimized "stage 1" implementation as described above and apply a map_blocks over the input coordinate dask array. The scheduler would then take care of grouping tasks working on related image chunks.

Thinking again probably this would not be quite as straight forward, as if the coordinates are given as a dask array, the coordinates themselves and how they map onto the input array wouldn't be known at schedule time. Maybe launching tasks from within tasks could be useful here.

jakirkham · 2021-05-28T18:41:14Z

Yeah that's why I'm thinking of shuffle as it is also making decisions based on values that are not known until computed. People have been spending a lot of time optimizing that problem. So we could benefit from that very easily if we are able to articulate our problem in those terms

m-albert · 2021-05-30T22:17:37Z

@jakirkham Do you have a link to some more details about the shuffle problem and how existing solution to it can be used?

In the meantime #237 suggests an implementation for use case 1.

m-albert mentioned this issue May 30, 2021

Implement support for ndimage.map_coordinates #237

Open

GenevieveBuckley mentioned this issue Jun 28, 2021

Support for distributed skeleton analysis jni/skan#124

Draft

3 tasks

keflavich mentioned this issue Nov 20, 2021

Improve speed by using minimal subcubes radio-astro-tools/pvextractor#64

Open

clbarnes mentioned this issue Mar 10, 2024

Discussion : Apply transform to a 3D image schlegelp/xform#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for ndimage.map_coordinates #235

Add support for ndimage.map_coordinates #235

jni commented May 26, 2021

jakirkham commented May 26, 2021

GenevieveBuckley commented May 26, 2021

GenevieveBuckley commented May 26, 2021

GenevieveBuckley commented May 26, 2021

GenevieveBuckley commented May 26, 2021

jni commented May 26, 2021

m-albert commented May 26, 2021 •

edited

Loading

jakirkham commented May 26, 2021 •

edited

Loading

m-albert commented May 26, 2021

m-albert commented May 28, 2021

jakirkham commented May 28, 2021

m-albert commented May 30, 2021

Add support for ndimage.map_coordinates #235

Add support for ndimage.map_coordinates #235

Comments

jni commented May 26, 2021

jakirkham commented May 26, 2021

GenevieveBuckley commented May 26, 2021

GenevieveBuckley commented May 26, 2021

GenevieveBuckley commented May 26, 2021

GenevieveBuckley commented May 26, 2021

jni commented May 26, 2021

m-albert commented May 26, 2021 • edited Loading

jakirkham commented May 26, 2021 • edited Loading

m-albert commented May 26, 2021

m-albert commented May 28, 2021

jakirkham commented May 28, 2021

m-albert commented May 30, 2021

m-albert commented May 26, 2021 •

edited

Loading

jakirkham commented May 26, 2021 •

edited

Loading