Skip to content

DataFrame API

Ben Murray edited this page Apr 19, 2021 · 8 revisions

DataFrames

The ExeTera DataFrame object is intended to be familiar to users of Pandas, albeit not identical.

Differences

ExeTera works with Datasets, which are backed up by physical key-value HDF5 datastores on drives, and, as such, there are necessarily some differences between the Pandas DataFrame:

  • Pandas DataFrames enforce that all Series (Fields in ExeTera terms) are the same length. ExeTera doesn't require this, but there are then operations that do not make sense unless all fields are of the same length. ExeTera allows DataFrames to have fields of different lengths because the operation to apply filters and so for to a DataFrame would run out of memory on large DataFrames

DataFrame usage examples

Create a new Field

df = #get a DataFrame from somewhere
i_f = df.create_indexed_string('i_foo')
f_f = df.create_fixed_string('f_foo', 8)
n_f = df.create_numeric('n_foo', 'int32')
c_f = df.create_categorical('c_foo', 'int8', {b'a': 0, b'b': 1})
t_f = df.create_timestamp('t_foo')
Clone this wiki locally