-
Notifications
You must be signed in to change notification settings - Fork 4
DataFrame API
Ben Murray edited this page Apr 19, 2021
·
8 revisions
The ExeTera DataFrame
object is intended to be familiar to users of Pandas, albeit not identical.
ExeTera works with Datasets
, which are backed up by physical key-value HDF5 datastores on drives, and, as such, there are necessarily some differences between the Pandas DataFrame
:
- Pandas DataFrames enforce that all Series (
Fields
in ExeTera terms) are the same length. ExeTera doesn't require this, but there are then operations that do not make sense unless all fields are of the same length. ExeTera allows DataFrames to have fields of different lengths because the operation to apply filters and so for to a DataFrame would run out of memory on large DataFrames
df = # get a DataFrame from somewhere
i_f = df.create_indexed_string('i_foo')
f_f = df.create_fixed_string('f_foo', 8)
n_f = df.create_numeric('n_foo', 'int32')
c_f = df.create_categorical('c_foo', 'int8', {b'a': 0, b'b': 1})
t_f = df.create_timestamp('t_foo')
df1 = # get a DataFrame from somewhere
df2 = # get another DataFrame from somewhere
df2['foo'] = df1['foo']
df2['foobar'] = df2['bar']
df1 = # get a DataFrame from somewhere
filt = # get a filter from somewhere
df2 = df1.apply_filter(filt) # creates a new dataframe from the filtered dataframe
df1.apply_filter(filt, in_place=True) # destructively filters the dataframe