-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improves IO performance for the report readers #104
Conversation
> IO: * Loop through rows (timesteps) instead of columns (node_ids). - Changes from tsteps * node_ids reads, to tsteps reads (i.e., constant time in terms of IO, regardless of node_ids reqs.). * Read all node_ids in a buffer every timestep, between min and max offsets of node_ids requested. > Memory * Allocate buffer before passing to HighFive to avoid extra and unnecessary memory allocations before reading. * Change data structure of node_ids/offsets from vector to map for faster search (from n^2 to nlogn). * In soma reports, assign values directly to return buffer instead of std::copy/memcpy. > New features and others: * Add support for strided reads, allowing to reduce the amount of timesteps (e.g., 1 by default, 2 for every 2 timesteps, etc.). * Eliminate duplicated code, avoid calling HDF5 metadata in every iteration, updated unit tests, and other minor changes.
Really nice way to improve things |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Some nitpicks about the docs… if that's intentional, please ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice addition.
@jorblancoa and I went through all the comments and suggestions, and integrated everything (specially Jorge, thanks for the effort). Thanks for all the great feedback! If no one has any additional comments, we are ready to close the PR and merge the changes! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work!
Changes
IO:
constant time in terms of IO, regardless of node_ids reqs.).
offsets of node_ids requested.
Memory:
unnecessary memory allocations before reading.
for faster search (from n^2 to nlogn).
of std::copy/memcpy.
New features and others:
timesteps (e.g., 1 by default, 2 for every 2 timesteps, etc.).
iteration, updated unit tests, and other minor changes.
Important note
Performance evaluations and other related discussions can be obtained here:
https://bbpteam.epfl.ch/project/issues/browse/REP-60
https://bbpteam.epfl.ch/project/issues/browse/BLPY-217