Always stream out blocks in `dods_encode` #10

dcherian · 2024-12-04T04:43:09Z

jhamman · 2024-12-04T04:53:53Z

opendap_protocol/protocol.py

-        for block in serialize_data.blocks:
+        flat = data.ravel()
+        for start in range(0, data.size, chunk_size):
+            block = flat[slice(start, chunk_size)]


should this be

Suggested change

block = flat[slice(start, chunk_size)]

block = flat[slice(start, start+chunk_size)]

clearly this needs tests!

mpiannucci · 2024-12-04T11:34:46Z

opendap_protocol/protocol.py

+# we load one `DASK_ENCODE_CHUNK_SIZE`-sized block of linearized data
+# in to memory at one go. This may overlap with multiple dask chunks
+# so lets cache those chunks since we might come back to them.
+cache = Cache(Config.DASK_CACHE_SIZE)


Shouldn't this be configured at the server level? That is what we do

Now I think we should apply the cache more locally in that loop in dods_encode. We want to cache aggressively when we have multiple batches to stream out for a single request from a single array. This is because the order in which we yield bytes can be orthogonal to chunking, and we can visit the same chunk multiple times.

I think the more global server cache is appropriate for a less aggressive cache across multiple requests.

Perhaps we can pair at some point and just iterate through some options with a benchmark problem.

Also seems like a good place to stick in a bit of async: compute the next iteration while streaming out the current iteration.

dcherian · 2024-12-04T22:56:36Z

opendap_protocol/protocol.py

+        if isinstance(data, da.Array):
+            block = data[slice(start, end)].compute()
+        elif has_xarray and isinstance(data, Variable):
+            npidxr = np.unravel_index(np.arange(start, min(end, data.size)), shape=data.shape)


for 30MB blocks,

Add dask caching, avoid rechunk

415e020

jhamman reviewed Dec 4, 2024

View reviewed changes

mpiannucci reviewed Dec 4, 2024

View reviewed changes

Always stream + support xarray Variables.

370134e

dcherian commented Dec 4, 2024

View reviewed changes

dcherian changed the title ~~Add dask caching, avoid rechunk~~ Always stream out blocks in dods_encode Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always stream out blocks in `dods_encode` #10

Always stream out blocks in `dods_encode` #10

dcherian commented Dec 4, 2024 •

edited

Loading

jhamman Dec 4, 2024

dcherian Dec 4, 2024

mpiannucci Dec 4, 2024

dcherian Dec 4, 2024 •

edited

Loading

dcherian Dec 4, 2024

dcherian Dec 4, 2024

	block = flat[slice(start, chunk_size)]
	block = flat[slice(start, start+chunk_size)]

Always stream out blocks in dods_encode #10

Are you sure you want to change the base?

Always stream out blocks in dods_encode #10

Conversation

dcherian commented Dec 4, 2024 • edited Loading

jhamman Dec 4, 2024

Choose a reason for hiding this comment

dcherian Dec 4, 2024

Choose a reason for hiding this comment

mpiannucci Dec 4, 2024

Choose a reason for hiding this comment

dcherian Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

dcherian Dec 4, 2024

Choose a reason for hiding this comment

dcherian Dec 4, 2024

Choose a reason for hiding this comment

Always stream out blocks in `dods_encode` #10

Always stream out blocks in `dods_encode` #10

dcherian commented Dec 4, 2024 •

edited

Loading

dcherian Dec 4, 2024 •

edited

Loading