-
Notifications
You must be signed in to change notification settings - Fork 36
Implementation in HDF5
In this section we want to describe how the previously described model for data and metadata can be stored in a HDF5 file. HDF5 provides three elements that define the structure of the format. A group can contain zero or more group, dataset or attribute objects. An attribute in HDF5 is represented by a name and one or more values of the same datatype. The probably most important structuring element in HDF5 is the dataset: it can contain arbitrary data in a multidimensional array and stores information about its dimensionality, data type and a name. A dataset may further contain zero or more attributes. Since all elements defined in HDF5 have a name, they can be referenced by a path.
To visualize the hierarchical structure of the HDF5 tree, the HDF elements are shown as a nested list with multiple indention levels. Each element is represented with its element type followed by ':' and the element name. Attributes can have a value assigned. Literals with enclosed by '<>' are place holders for the actual values:
<type>:<name> = [<type>|<value>]
A group named 'foo' with an attribute named 'greeting' and the value 'hello world' is shown like this:
group:foo attribute:greeting = 'hello world'
An optional subgroup or attribute is enclosed in square brackets:
group:bar [attribute:definition] = 'group definition' // optional attribute
A group which is a reference (hard links) to another group in the same file has an note in comments:
group:foo DataArray // a reference to a DataArray
For datasets the following annotations are used:
dataset:<name> = <type>[] // dataset with one dimension dataset:<name> = <type>[][] // dataset with two dimensions dataset:<name> = <type>[] ... n // dataset with n dimensions dataset:foo = double[1.0, 2.0] // dataset named 'foo' with one dimension of double // values and the content 1.0 and 2.0
- All HDF5 elements representing an entity of the model for data or metadata are named with their name.
- All optional attributes that are empty or set to NULL have to be omitted.
- 1 - n connections in the data model are represented in a nested, tree-like, structure in HDF5.
- n - m connections are represented using references (hard link) to related groups.
The root of a HDF5 file implementing this standard contains a set of attributes describing the version, format as well as time stamps with the creation date an the last update. Further the root contains two groups named metadata and data.
root: attribute:format = string attribute:version = string attribute:created_at = date attribute:updated_at = date group:metadata group:data
A third group named 'terminology' is reserved for future use.
All entities of the data model are stored under the path '/data' in the HDF5 file.
Entities of the type Block can only be stored at the first hierarchy level at the '/data' path. Blocks always have the following structure.
group:<block_name> attribute:type = string attribute:name = string attribute:created_at = date attribute:updated_at = date attribute:entity_id = string [attribute:definition] = string [group:metadata] = Section // a reference to a metadata Section [group:sources] = Source[] // all sources in this block [group:data_arrays] = DataArray[] // all data arrays in this block [group:multi_tags] = DataArray[] // all multi-tags in this block [group:tags] = DataArray[] // all tags in this block
Entities of the type DataArray can only be defined in the group called 'data_arrays' of its respective Block. The path to a DataArray can therefore always be defined like this:
/data/<block_name>/data_arrays/<data_array_name>
In HDF5 DataArray objects always have the following structure:
group:<data_array_name> attribute:type = string attribute:name = string attribute:created_at = date attribute:updated_at = date attribute:entity_id = string [attribute:definition] = string [attribute:label] = string [attribute:unit] = string [expansion_origin] = double [group:metadata] = Section // a reference to a metadata Section [group:dimensions] = Dimension[] // definition of all dimensions dataset:data = <data_type>[]...n [dataset:polynom_oefficients] = double[] [dataset:sources] = Source[]
The field 'data_type' that is defined in the model becomes part of the dataspace definition of the hdf5 'data' dataset and thus is not represented in a distinct field. Native data type names will be mapped to the data types of HDF5 as defined in the following table:
Type name | HDF5 data type |
---|---|
byte | H5T_STD_I8LE |
uint16 | H5T_STD_U16LE |
uint,uint32 | H5T_STD_U32LE |
int16 | H5T_STD_I16LE |
int, int32 | H5T_STD_I32LE |
long, int64 | H5T_STD_I64LE |
float | H5T_IEEE_F32LE |
double | H5T_IEEE_F64LE |
string | H5T_C_S1 |
Dimensions Set, Range, and Sample can only be defined inside the group called 'dimensions' of their parent DataArray. The 'order' attribute of each dimension type defines the sequential arrangement of the dimension. The dimension with the lowest order value applies to the first dimension etc. The structure of all dimension entities is defined as described below:
// Set group:<dimension_order> attribute:dimension_type = 'set' [dataset:labels] = string[] // Range group:<dimension_order> attribute:dimension_type = 'range' [attribute:label] = string [attribute:unit] = string dataset:tics = double[] // Sample group:<dimension_order> attribute:dimension_type = 'sample' attribute:sampling_interval = double [attribute:label] = string [attribute:unit] = string [attribute:offset] = double
We define two types of tags which can be used to tag data in two different forms, Tag and Multi Tag.
Entities of the type Tag can only be defined inside first hierarchy level of the group 'tags' of the parent block.
TODO the purpose of this entity The structure of a Tag in a HDF5 file is defined as depicted below:
group:<tag_name> attribute:type = string attribute:name = string attribute:created_at = date attribute:updated_at = date attribute:entity_id = string [attribute:definition] = string [group:sources] = Source[] [group:references] = DataArray[] [group:features] = Feature[] [group:metadata] = Section // a reference to a metadata Section dataset:position = double[] [dataset:extent] = double[] [dataset:units] = string[]
Entities of the type Multi-Tag can only be defined inside first hierarchy level of the group 'multi_tags' of the parent block.
TODO the purpose of this entity
The structure of a Multi-Tag in a HDF5 file is defined as depicted below:
group:<multi_tag_name> attribute:type = string attribute:name = string attribute:created_at = date attribute:updated_at = date attribute:entity_id = string [attribute:definition] = string group:positions = DataArray // reference to a DataArray with positions [group:metadata] = Section // reference to a metadata Section [group:sources] = Source[] [group:extents] = DataArray // reference to a DataArray with extents [group:references] = DataArray[] // referenced DataArrays [group:features] = Feature[]
The definition of Feature entities is strictly restricted to the group 'features' of parent Tag entities. The schema below describes the structure:
group:<feature_name> attribute:created_at = date attribute:updated_at = date attribute:entity_id = string attribute:link_type = enum{tagged, untagged, indexed} group:data = DataArray // reference to a DataArray
Entities of the type Source can only be defined inside the group 'sources' of the parent Block entity.
group:<source_name> attribute:type = string attribute:name = string attribute:created_at = date attribute:updated_at = date attribute:entity_id = string [attribute:definition] = string [group:metadata] = Section // reference to a metadata Section [group:sources] = Source[]
Group entities are used for creating subgroups below the Block level. It can contain DataArrays, Tags, and MultiTags. The entities can be member of several groups.
group:<group_name> attribute:type = string attribute:name = string attribute:created_at = date attribute:updated_at = date attribute:entity_id = string [attribute:definition] = string [group:metadata] = Section // a reference to a metadata Section [group:sources] = Source[] // all sources in this group [group:data_arrays] = DataArray[] // all data arrays in this group [group:multi_tags] = DataArray[] // all multi-tags in this group [group:tags] = DataArray[] // all tags in this group
Feature of the metadata model (odML) in HDF5. Metadata objects can only be located in the Metadata group that is a direct child of the root node. Sections are stored in a flat way, there is no hierarchy in the sections. Properties and Values, on the other hand, are stored as children of their parent sections, respectively their parent Properties.
group:<section_name> attribute:type = string attribute:name = string attribute:created_at = date attribute:updated_at = date attribute:entity_id = string [attribute:definition] = string [attribute:link] = string [attribute:repository] = string [attribute:mapping] = string [group:properties] = Property[] [group:sections] = Sections[]
The property dataset is a compound dataset that actually stores the values of a property.
dataset:<property_name> // compound type dataset attribute:name = string attribute:created_at = date attribute:updated_at = date attribute:entity_id = string [attribute:definition] = string [attribute:unit] = string [attribute:mapping] = string member:value = <data_type> member:uncertainty = double member:reference = string member:filename = string member:encoder = string member:checksum = string