Skip to content

Commit

Permalink
Merge pull request #172 from LDRyan0/master
Browse files Browse the repository at this point in the history
spelling + formatting
  • Loading branch information
telegraphic authored Jul 20, 2023
2 parents 1072e56 + 0763308 commit 2b91423
Showing 1 changed file with 43 additions and 41 deletions.
84 changes: 43 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ assert np.all((array_hkl, array_obj))

A major benefit of `hickle` over `pickle` is that it allows fancy HDF5 features to
be applied, by passing on keyword arguments on to `h5py`. So, you can do things like:
```python
```Python
hkl.dump(array_obj, 'test_lzf.hkl', mode='w', compression='lzf', scaleoffset=0,
chunks=(100, 100), shuffle=True, fletcher32=True)
```
Expand All @@ -102,7 +102,7 @@ Starting with version 4.x hickle offers the possibility to define dedicated load
classes and starting with hickle 5.x these can be collected in module, package and application specific
loader modules.

```
```Python
class MyClass():
def __init__(self):
self.name = 'MyClass'
Expand All @@ -112,29 +112,29 @@ class MyClass():
To create a loader for `MyClass` the `create_MyClass_dataset` and either the `load_MyClass` or the
`MyClassContainer` class have to be defined.

```
```Python
import hdf5
form hickle.helpters import no_compression
from hickle.helpers import no_compression

def create_MyClass_dataset(py_obj, h_group, name, **kwargs):
"""
py_obj ..... the instance of MyClass to be dumped
h_group .... the h5py.Group py_obj should be dumped into
name ....... the name of the h5py.Dataset or h5py.Group representing py_obj
**kwargs ... the compression keyword arguments passed to hickle.dump
"""

# if content of MyClass can be represented as single matrix, vector or scalar
# values than created a dataset of appropriate size. and either set its shape and
# dtype parameters # to the appropriate size and tyoe . or directly pass the data
# using the data parameter
ds = h_group.create_dataset(name,data = py_obj.value,**kwargs)

## NOTE: if your class represents a scalar using empty tuple for shape
## than kwargs have to be filtered by no_compression
## NOTE: if your class represents a scalar using empty tuple for shape
## then kwargs have to be filtered by no_compression
# ds = h_group.create_dataset(name,data = py_obj.value,shape=(),**no_compression(kwargs))

# set additional attributes providing additional specialisation of content
# set additional attributes providing additional specialisation of content
ds.attrs['name'] = py_obj.name

# when done return the new dataset object and an empty tuple or list
Expand All @@ -151,6 +151,7 @@ def load_Myclass(h_node, base_type, py_obj_type):
new_instance = py_obj_type()
new_instance.name = h_node.attrs['name']
new_instance.value = h_node[()]

return new_instance
```

Expand All @@ -159,7 +160,7 @@ stored as individual h5py.Dataset or h5py.Group objects than define `create_MyCl
using `create_group` method instead of `create_dataset` and define the corresponding
`MyClassContainer` class.

```
```Python
import h5py
from hickle.helpers import PyContainer

Expand All @@ -169,76 +170,77 @@ def create_MyClass_dataset(py_obj, h_group, name, **kwargs):
h_group .... the h5py.Group py_obj should be dumped into
name ....... the name of the h5py.Dataset or h5py.Group representing py_obj
**kwargs ... the compression keyword arguments passed to hickle.dump
"""

ds = h_group.create_group(name)

# set additional attributes providing additional specialisation of content
# set additional attributes providing additional specialisation of content
ds.attrs['name'] = py_obj.name

# when done return the new dataset object and a tuple, list or generator function
# providing for all subitems a tuple or list describing containgin
# providing for all subitems a tuple or list describing containgin
# name ..... the name to be used storing the subitem within the h5py.Group object
# item ..... the subitem object to be stored
# attrs .... dictionary included in attrs of created h5py.Group or h5py.Dataset
# kwargs ... the kwargs as passed to create_MyClass_dataset function
return ds,(('name',py_obj.name,{},kwargs),('value',py_obj.value,{'the answer':True},kwargs))


class MyClassContainer(PyContainer):
"""
Valid container classes must be derived from hickle.helpers.PyContainer class
"""

def __init__(self,h5_attrs,base_type,object_type):
"""
h5_attrs ...... the attrs dictionary attached to the group representing MyClass
base_type ..... byte string naming the loader to be used for restoring MyClass object
py_obj_type ... MyClass class or MyClass subclass object
"""
"""
h5_attrs ...... the attrs dictionary attached to the group representing MyClass
base_type ..... byte string naming the loader to be used for restoring MyClass object
py_obj_type ... MyClass class or MyClass subclass object
"""

# the optional protected _content parameter of the PyContainer __init__
# the optional protected _content parameter of the PyContainer __init__
# method can be used to change the data structure used to store
# the subitems passed to the append method of the PyContainer class
# the subitems passed to the append method of the PyContainer class
# per default it is set to []
super().__init__(h5_attrs,base_type,object_type,_content = dict())

def filter(self,h_parent): # optional overload
def filter(self,h_parent): # optional overload
"""
generator member functoin which can be overloaded to reorganize subitems
generator member functoin which can be overloaded to reorganize subitems
of h_parent h5py.Group before being restored by hickle. Its default
implementation simply yields from h_parent.items().
"""
yield from super().filter(h_parent)
"""
yield from super().filter(h_parent)

def append(self,name,item,h5_attrs): # optional overload
def append(self,name,item,h5_attrs): # optional overload
"""
in case _content parameter was explicitly set or subitems should be sored
in case _content parameter was explicitly set or subitems should be sored
in specific order or have to be preprocessed before the next item is appended
than this can be done before storing in self._content.
name ....... the name identifying subitem item within the parent hdf5.Group
item ....... the object representing the subitem
h5_attrs ... attrs dictionary attached to h5py.Dataset, h5py.Group representing item
"""
self._content[name] = item
"""
self._content[name] = item

def convert(self):
def convert(self):
"""
called by hickle when all sub items have been appended to MyClass PyContainer
this method must be implemented by MyClass PyContainer.
"""
called by hickle when all sub items have been appended to MyClass PyContainer
this method must be implemented by MyClass PyContainer.
"""
# py_obj_type should point to MyClass or any of its subclasses
new_instance = py_obj_type()
new_instance.__dict__.update(self._content)
return new_instance

# py_obj_type should point to MyClass or any of its subclasses
new_instance = py_obj_type()
new_instance.__dict__.update(self._content)

return new_instance
```

In a last step the loader for MyClass has to be registered with hickle. This is done by calling
`hickle.lookup.LoaderManager.register_class` method

```
```Python
from hickle.lookup import LoaderManager

# to register loader for object mapped to h5py.Dataset use
Expand Down Expand Up @@ -279,7 +281,7 @@ For packages and application packages the `load_MyPackage.py` loader module has
`hickle_loaders` directory of the package directory (the first which contains a __init__.py file) and
should be structured as follows.

```
```Python
from hickle.helpers import PyContainer

## define below all create_MyClass_dataset load_MyClass functions and MyClassContainer classes
Expand Down Expand Up @@ -331,7 +333,7 @@ to a file. Therefore mindlessly storing plenty of tiny objects and scalar values
them into a single datataset will cause the HDF5 and thus the file created by hickle explode. File
sizes of several 10 GB are likely and possible when a pickle file would just need some 100 MB.
This can be prevented by `create_MyClass_dataset` method combining sub-items into bigger numpy arrays
or other data structures which can be mapped to `h5py.Datasets` and `load_MyClass` function and /or
or other data structures which can be mapped to `h5py.Datasets` and `load_MyClass` function and/or
`MyClassContainer.convert` method restoring actual structure of the sub-items on load.

Recent changes
Expand Down

0 comments on commit 2b91423

Please sign in to comment.