Skip to content

Latest commit

 

History

History
232 lines (194 loc) · 12 KB

README.md

File metadata and controls

232 lines (194 loc) · 12 KB

Easy to use HDF5 C++ templates for Serial and Paralell HDF5

News: optional custom dataype is added to h5::create to allow custom dataypes passed along the other arguments.

hsize_t dataset_size = 1'000'000;

using custom_t = char[42];
h5::dt_t<custom_t> hdf5_data_type{H5Tcreate(H5T_STRING, sizeof(custom_t))};
H5Tset_cset(hdf5_data_type, H5T_CSET_UTF8); // you can always call HDF5 CAPI functions on any H5CPP resource 

// `custom_t` is passed as template parameter, 
h5::ds_t ds = h5::create<custom_t>(fd, "my/dataset", h5::chunk{4096}, h5::current_dims{dataset_size},
    hdf5_data_type)); // <- h5::dt_t<custom_t> argument is passed along 

//example interop with HDF5 CAPI, where you need to pass `dt_t<custom_t>` type descriptor
h5::sp_t mem_space{H5Screate_simple(1, &dataset_size, nullptr )};
H5Sselect_all(mem_space);
// file space
h5::sp_t file_space{H5Dget_space(ds)};
H5Sselect_all(file_space);

H5Dwrite( ds[idx], hdf5_data_type, mem_space, file_space, H5P_DEFAULT, data.data());

h5cpp compiler can easily be compiled with stock LLVM6.0 and clang6.0 binary release h5cpp-dev_1.10.4-5.xxx contains bug fixes, and half float support, deb,rpm,tarball

Hierarchical Data Format prevalent in high performance scientific computing, sits directly on top of sequential or parallel file systems, providing block and stream operations on standardized or custom binary/text objects. Scientific computing platforms such as Python, R, Matlab, Fortran, Julia [and many more...] come with the necessary libraries to read write HDF5 dataset. This edition simplifies interactions with popular linear algebra libraries, provides compiler assisted seamless object persistence, Standard Template Library support and equipped with novel error handling architecture.

H5CPP is a novel approach to persistence in the field of machine learning, it provides high performance sequential and block access to HDF5 containers through modern C++ Download packages from here. If you are interested in h5cpp LLVM/clang based source code transformation tool you find it in this separate project.

You can read this blog post published on HDFGroup Blog site to find out where the project is originated. Click here Doxygen based Documentation pages. Browse highlighted examples, follow this link to our spring presentation or take a peek at the upcoming ISC'19 BOF, where I am presenting H5CPP.

H5CPP for MPI

Proud to announce to the HPC community that H5CPP is now MPI capable. The prerequisites are: c++17 capable MPI compiler, and linking against the Parallel HDF5 library. The template system provides the same easy to use functionality as in the serial version, and may be enabled by including parallel hdf5.h then passing h5::mpiio({mpi_com, mpi_info}) to h5::create | h5::open | h5::write | h5::read , as well as h5::independent and h5::collective data transfer properties. There are examples for independent, collective IO, as well as a short program to demonstrate throughput. The MPI extension supports all parallel HDF5 features, while the documentation is in progress please look at the tail end of H5Pall.hpp for details.

Note: h5::append operator and attributes are not yet tested, and probably are non-functional.

Tested against:

  • gcc 7.4.0, gcc 8.3.0, gcc 9.0.1
  • clang 6.0

Note: the preferred compiler is gcc, however there is work put in to broaden the support for major modern C++ compilers. Please contact me if there is any shortcomings.

Templates:

create dataset within an opened hdf5 file

file ::= const h5::fd_t& fd | const std::string& file_path;
dataspace ::= const h5::sp_t& dataspace | const h5::current_dims& current_dim [, const h5::max_dims& max_dims ] |  
    [,const h5::current_dims& current_dim] , const h5::max_dims& max_dims;

template <typename T> h5::ds_t create( file, const std::string& dataset_path, dataspace, 
    [, const h5::lcpl_t& lcpl] [, const h5::dcpl_t& dcpl] [, const h5::dapl_t& dapl] [const h5::dt_t<T>]);

read a dataset and return a reference of the created object

dataset ::= (const h5::fd_t& fd | const std::string& file_path, const std::string& dataset_path ) | const h5::ds_t& ds;

template <typename T> T read( dataset
    [, const h5::offset_t& offset]  [, const h5::stride_t& stride] [, const h5::count_t& count]
    [, const h5::dxpl_t& dxpl ] ) const;
template <typename T> h5::err_t read( dataset, T& ref 
    [, const h5::offset_t& offset]  [, const h5::stride_t& stride] [, const h5::count_t& count]
    [, const h5::dxpl_t& dxpl ] ) [noexcept] const;						 

write dataset into a specified location

dataset ::= (const h5::fd_t& fd | const std::string& file_path, const std::string& dataset_path ) | const h5::ds_t& ds;

template <typename T> h5::err_t write( dataset, const T* ptr
    [,const hsize_t* offset] [,const hsize_t* stride] ,const hsize_t* count [, const h5::dxpl_t dxpl ]  ) noexcept;
template <typename T> h5::err_t write( dataset,  const T& ref
    [,const h5::offset_t& offset] [,const h5::stride_t& stride]  [,const& h5::dxcpl_t& dxpl] ) [noexept];

append to extendable C++/C struct dataset

#include <h5cpp/core>
	#include "your_data_definition.h"
#include <h5cpp/io>
template <typename T> void h5::append(h5::pt_t& ds, const T& ref) [noexcept];

All file and dataset io descriptors implement raii idiom and close underlying resource when going out of scope, and may be seamlessly passed to HDF5 CAPI calls when implicit conversion enabled. Similarly templates can take CAPI hid_t identifiers as arguments where applicable provided conversion policy allows. See conversion policy for details.

install:

On the projects download page you find debian, rpm and general tar.gz packages with detailed instructions. Or get the download link to the header only h5cpp-dev_1.10.4.deb and the binary compiler h5cpp_1.10.4.deb directly from this page.

supported classes:

T := ([unsigned] ( int8_t | int16_t | int32_t | int64_t )) | ( float | double  )
S := T | c/c++ struct | std::string
ref := std::vector<S> 
	| arma::Row<T> | arma::Col<T> | arma::Mat<T> | arma::Cube<T> 
	| Eigen::Matrix<T,Dynamic,Dynamic> | Eigen::Matrix<T,Dynamic,1> | Eigen::Matrix<T,1,Dynamic>
	| Eigen::Array<T,Dynamic,Dynamic>  | Eigen::Array<T,Dynamic,1>  | Eigen::Array<T,1,Dynamic>
	| blaze::DynamicVector<T,rowVector> |  blaze::DynamicVector<T,colVector>
	| blaze::DynamicVector<T,blaze::rowVector> |  blaze::DynamicVector<T,blaze::colVector>
	| blaze::DynamicMatrix<T,blaze::rowMajor>  |  blaze::DynamicMatrix<T,blaze::colMajor>
	| itpp::Mat<T> | itpp::Vec<T>
	| blitz::Array<T,1> | blitz::Array<T,2> | blitz::Array<T,3>
	| dlib::Matrix<T>   | dlib::Vector<T,1> 
	| ublas::matrix<T>  | ublas::vector<T>
ptr 	:= T* 
accept 	:= ref | ptr 

In addition to the standard data types offered by BLAS/LAPACK systems and POD struct -s, std::vector also supports std::string data-types mapping N dimensional variable-length C like string HDF5 data-sets to std::vector<std::string> objects.

short examples:

to read/map a 10x5 matrix from a 3D array from location {3,4,1}

#include <armadillo>
#include <h5cpp/all>
...
auto fd = h5::open("some_file.h5", H5F_ACC_RDWR);
/* the RVO arma::Mat<double> object will have the size 10x5 filled*/
try {
	/* will drop extents of unit dimension returns a 2D object */
	auto M = h5::read<arma::mat>(fd,"path/to/object", 
        h5::offset{3,4,1}, h5::count{10,1,5}, h5::stride{3,1,1} ,h5::block{2,1,1} );
} catch (const std::runtime_error& ex ){
	...
}
// fd closes underlying resource (raii idiom)

to write the entire matrix back to a different file

#include <Eigen/Dense>
#include <h5cpp/all>

h5::fd_t fd = h5::create("some_file.h5",H5F_ACC_TRUNC);
h5::write(fd,"/result",M);

to create an dataset recording a stream of struct into an extendable chunked dataset with GZIP level 9 compression:

#include <h5cpp/core>
	#include "your_data_definition.h"
#include <h5cpp/io>
...
auto ds = h5::create<some_type>(fd,"bids", h5::max_dims{H5S_UNLIMITED}, h5::chunk{1000} | h5::gzip{9});

to append records to an HDF5 datastream

#include <h5cpp/core>
	#include "your_data_definition.h"
#include <h5cpp/io>
auto fd = h5::create("NYSE high freq dataset.h5");
h5::pt_t pt = h5::create<ns::nyse_stock_quote>( fd, 
		"price_quotes/2018-01-05.qte",h5::max_dims{H5S_UNLIMITED}, h5::chunk{1024} | h5::gzip{9} );
quote_update_t qu;

bool having_a_good_day{true};
while( having_a_good_day ){
	try{
		recieve_data_from_udp_stream( qu )
		h5::append(pt, qu);
	} catch ( ... ){
        if( cant_fix_connection() )
	  		having_a_good_day = false; 
	}
}