Skip to content

DataFrame API

Ben Murray edited this page Apr 19, 2021 · 8 revisions

DataFrames

The ExeTera DataFrame object is intended to be familiar to users of Pandas, albeit not identical.

Differences

ExeTera works with Datasets, which are backed up by physical key-value HDF5 datastores on drives, and, as such, there are necessarily some differences between the Pandas DataFrame:

  • Pandas DataFrames enforce that all Series (Fields in ExeTera terms) are the same length. ExeTera doesn't require this, but there are then operations that do not make sense unless all fields are of the same length. ExeTera allows DataFrames to have fields of different lengths because the operation to apply filters and so for to a DataFrame would run out of memory on large DataFrames

DataFrame usage examples

Create a new field

df = # get a DataFrame from somewhere
i_f = df.create_indexed_string('i_foo')
f_f = df.create_fixed_string('f_foo', 8)
n_f = df.create_numeric('n_foo', 'int32')
c_f = df.create_categorical('c_foo', 'int8', {b'a': 0, b'b': 1})
t_f = df.create_timestamp('t_foo')

Copy a field from another dataframe

df1 = # get a DataFrame from somewhere
df2 = # get another DataFrame from somewhere
df2['foo'] = df1['foo']
df2['foobar'] = df2['bar']

Apply a filter to all fields in a dataframe

df1 = # get a DataFrame from somewhere
filt = # get a filter from somewhere
df2 = df1.apply_filter(filt) # creates a new dataframe from the filtered dataframe
df1.apply_filter(filt, in_place=True) # destructively filters the dataframe
Clone this wiki locally