Refactor HDF5 module and object information #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Storing any
mccode_antlr
object into an HDF5 file requires keeping track of the non-default properties of the object plus minimal metadata about the object and the version ofmccode_antlr
.At the moment the version is required to be equal at writing and reading time, to protect against internal object changes during this pre-
v1.0
era.Eventually there could be a softening of this requirement, potentially with some built-in equivalency graph and/or automatic conversion.
The object metadata is only its name at present, e.g.,
mccode_antlr.instr.Instr
must store'Instr'
as an attribute of the HDF5 group that it is written into.Initial tried implementation
The first approach to record this metadata was to store both the module and object metadata as separate attributes for every group. This seemed wasteful, so an attempt was made to de-duplicate the metadata by storing the version once, and reduce the object name to an integer attribute with a file-global list of 'known' object names.
Why the initial implementation was abandoned
Surprisingly, de-duplicating this information did not significantly reduce the test HDF5 size and at the same time significantly increase the test runtime.
This seems to be due to the global information having poor memory access time performance due to lack of data locality.
Attempted improvements
The
HDF5IO
object was improved to cache the global metadata, with the idea that a faster lookup of the object name from its index in the 'known' list would be achieved. This did not produce the desired improvement in speed, possibly due to a poor cache implementation.Current implementation
Using an attribute for every group seems to be the best solution for speed of writing and reading a
mccode_antlr
object.The metadata associated with every string-valued group attribute is surprisingly large, so the two pieces of information are now packed together into a single string attribute, e.g.,
'{version}/{name}'
which produces demonstrably smaller files.