Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor HDF5 module and object information #1

Merged
merged 7 commits into from
Nov 15, 2023
Merged

Refactor HDF5 module and object information #1

merged 7 commits into from
Nov 15, 2023

Conversation

g5t
Copy link
Collaborator

@g5t g5t commented Nov 15, 2023

Storing any mccode_antlr object into an HDF5 file requires keeping track of the non-default properties of the object plus minimal metadata about the object and the version of mccode_antlr.

At the moment the version is required to be equal at writing and reading time, to protect against internal object changes during this pre-v1.0 era.
Eventually there could be a softening of this requirement, potentially with some built-in equivalency graph and/or automatic conversion.

The object metadata is only its name at present, e.g., mccode_antlr.instr.Instr must store 'Instr' as an attribute of the HDF5 group that it is written into.

Initial tried implementation

The first approach to record this metadata was to store both the module and object metadata as separate attributes for every group. This seemed wasteful, so an attempt was made to de-duplicate the metadata by storing the version once, and reduce the object name to an integer attribute with a file-global list of 'known' object names.

Why the initial implementation was abandoned

Surprisingly, de-duplicating this information did not significantly reduce the test HDF5 size and at the same time significantly increase the test runtime.
This seems to be due to the global information having poor memory access time performance due to lack of data locality.

Attempted improvements

The HDF5IO object was improved to cache the global metadata, with the idea that a faster lookup of the object name from its index in the 'known' list would be achieved. This did not produce the desired improvement in speed, possibly due to a poor cache implementation.

Current implementation

Using an attribute for every group seems to be the best solution for speed of writing and reading a mccode_antlr object.
The metadata associated with every string-valued group attribute is surprisingly large, so the two pieces of information are now packed together into a single string attribute, e.g., '{version}/{name}' which produces demonstrably smaller files.

@g5t g5t merged commit 504e9d6 into main Nov 15, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant