Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you put batches back together after processing #43

Open
jhamman opened this issue Nov 18, 2021 · 4 comments
Open

How do you put batches back together after processing #43

jhamman opened this issue Nov 18, 2021 · 4 comments

Comments

@jhamman
Copy link
Contributor

jhamman commented Nov 18, 2021

In #37, @robintw wrote:

  1. How do you put batches back together after processing?
    My machine learning model is producing a single value as an output, so for a batch of 100 64x64 patches, I get an output of a 100-element array. What's the best way of putting this back into a DataArray that has the same format/co-ordinates as the original input array? I'd be happy with either an array with dimensions of original_size / 64 in both the x and y dimension, or an array of the same size as the input with the single output value repeated for each of the input pixels in that batch.

I've tried to put some of this together myself, but it seems that the x co-ordinate value in the batch DataArray is the same for each batch. I'd have thought this would represent the x co-ordinates that had been extracted from the original DataArray, but it doesn't seem to. For example, if I run:

batches = []
for i, batch in enumerate(bgen):
batches.append(batch)
if i == 1:
break
to get the first two batches, I can then compare their x co-ordinate values:

np.all(batches[0].to_array().squeeze().x == batches[1].to_array().squeeze().x)
and it shows that they're all equal.

Do you have any ideas as to what I could do to be able to put the batches back together?

@tcchiao and I discussed today and she is planning to add an example to the demo notebook.

@RichardScottOZ
Copy link
Contributor

Good...hard part!

@dhruvbalwada
Copy link

Just wondering if this example is available some where?

I came across this discussion: https://discourse.pangeo.io/t/vectorized-sklearn/1444 , which seemed to be solving a similar problem.

@dhruvbalwada dhruvbalwada mentioned this issue Oct 24, 2022
5 tasks
@maxrjones maxrjones assigned maxrjones and unassigned tcchiao Oct 25, 2022
@maxrjones
Copy link
Member

Just wondering if this example is available some where?

AFAIK the example has not yet been made. It's helpful to hear more interest in this component of the documentation.

@dhruvbalwada
Copy link

I worked out it is often quiet simple to put batches back together.
Atleast in the simple situations that I am working with, just using .unstack('samples') will put the batches back together into original geospatial data format. Happy to add a few lines about this in the demo notebook, if you think that is the appropriate place for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants