Replies: 1 comment
-
By default dask will use the threaded scheduler, this won't help you with netCDF. First, I don't understand where you are computing values in the example but using Re: vectorization; I recommend starting here: https://tutorial.xarray.dev/advanced/apply_ufunc/automatic-vectorizing-numpy.html and reading the rest of the series. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I want to apply a function to a 3-D dataset in parallel. The function takes a 1-D numpy array of values as input and returns a 1-D array of values of the same length. The function's input data is assumed to be a time series of values for a lat/lon slice from a lat/lon/time cube. Conceptually the intention is to compute a lat/lon/time cube of the same size/shape as the input by applying the function to each time series slice (pseudocode):
The data originates as a NetCDF with the variable having dimensions (time, lat, lon).
What I've tried so far is to read the NetCDF and then re-orient the data with a transpose to make the time dimension inner-most since this should make the retrieval of each slice more efficient. Then I have used
apply_ufunc
, like so:When I run the code above it takes hours to complete on a dataset that's from a NetCDF only 1.1GB so not very big. When it runs I don't see my machine's CPU using more than a single core so it seems that this doesn't run in parallel using multiprocessing. I feel like I must be doing something wrong.
Maybe I've not started a required Dask service or somewhere have a configuration setting that's preventing multi-core execution, etc.? Perhaps I need to somehow "vectorize" the function?
I have developed a way to utilize multiprocessing and shared memory arrays to get a full parallelization of this process, but it's a Rube Goldberg machine that I'd love to replace if I can get the split-apply-combine approach with xarray to work.
Please advise if there's anything I'm doing above that's wrong or any other optimizations I can try to use for this. Thanks in advance for any comments or suggestions.
Beta Was this translation helpful? Give feedback.
All reactions