-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always stream out blocks in dods_encode
#10
base: master
Are you sure you want to change the base?
Conversation
opendap_protocol/protocol.py
Outdated
for block in serialize_data.blocks: | ||
flat = data.ravel() | ||
for start in range(0, data.size, chunk_size): | ||
block = flat[slice(start, chunk_size)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be
block = flat[slice(start, chunk_size)] | |
block = flat[slice(start, start+chunk_size)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clearly this needs tests!
opendap_protocol/protocol.py
Outdated
# we load one `DASK_ENCODE_CHUNK_SIZE`-sized block of linearized data | ||
# in to memory at one go. This may overlap with multiple dask chunks | ||
# so lets cache those chunks since we might come back to them. | ||
cache = Cache(Config.DASK_CACHE_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be configured at the server level? That is what we do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I think we should apply the cache more locally in that loop in dods_encode
. We want to cache aggressively when we have multiple batches to stream out for a single request from a single array. This is because the order in which we yield
bytes can be orthogonal to chunking, and we can visit the same chunk multiple times.
I think the more global server cache is appropriate for a less aggressive cache across multiple requests.
Perhaps we can pair at some point and just iterate through some options with a benchmark problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also seems like a good place to stick in a bit of async: compute the next iteration while streaming out the current iteration.
if isinstance(data, da.Array): | ||
block = data[slice(start, end)].compute() | ||
elif has_xarray and isinstance(data, Variable): | ||
npidxr = np.unravel_index(np.arange(start, min(end, data.size)), shape=data.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dods_encode
cc @jhamman @mpiannucci