Skip to content
This repository has been archived by the owner on Jun 18, 2023. It is now read-only.

Result file IO abstractions #69

Open
ceholden opened this issue Dec 3, 2015 · 0 comments
Open

Result file IO abstractions #69

ceholden opened this issue Dec 3, 2015 · 0 comments

Comments

@ceholden
Copy link
Owner

ceholden commented Dec 3, 2015

Motivation

Right now we're using NumPy saved files that store structural arrays for the results but this might change in the future (see #14), especially to accommodate some visualization utilities that would benefit from having all the results for an image in one container indexed intelligently.

Another annoyance is that each CLI utility uses duplicate code to open/inspect/read/write from/to the result files. Ideally this should be refactored into some common set of functions.

Proposal

Implement a "drivers" for each format (so just NumPy for now) that contains the logic for inspecting/reading/writing to/etc. each format. Eventually this will necessitate updating the configuration files to specify what result storage driver should be used.

The currently implemented iter_records, for example, would still iterate over result records, but would do so in a way that makes sense for the format. For the current NumPy saved files, we'd yield one row worth of records at a time. If we used something that stores results in blocks, maybe it would be chunks of data irregardless of the row:

driver = drivers.register(result_format)

for rec in driver.iter_records(config):
    # do stuff

We usually want to perform a query on the records based on the segment dates, so there could be some higher level API access that would perform a query optimized for the format (NumPy files would just use simple np.where against them but we could use in kernel searches if using pytables):

driver = drivers.register(result_format)

for matching_rec in driver.query_records(config, start='2000-01-01', end='2001-01-01'):
    # do more stuff

Justification

If we refactor out all of the result IO from the CLI scripts, we'll make testing much easier and probably reduce the overall amount of code. Refactoring out just the NumPy format probably won't take too much time and would set us up to easily transition to a better file format.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant