Managing file I/O and Overwrites #4

iguinn · 2021-03-02T16:15:22Z

This is a continuation of a conversation from pull request 153 legend-exp/pygama#153
Summary:
@sweigart made the overwrite option act as expected for raw_to_dsp. However, ultimately we want to make a few more changes:

Be able to overwrite only specific fields in an HDF5 table. According to @jasondet "proper overwrite (at the file as well as dataset level) is now implemented but untested in LH5Store in the refactor branch on my fork", so once this is tested/pulled into the main branch we can use it in our processing
raw_to_dsp will ultimately not handle file I/O, but instead act as a table_in -> table_out function. According to @mmatteo instead the I/O will be a part of the dataflow manager

If I missed anything important in this summary please add on to it!

iguinn · 2021-03-02T16:27:49Z

About handling IO in the dataflow manager: right now we do not read/write the entire files all at once, but in chunks of ~3000 waveforms at a time. As a result, it's not clear to me how a raw_to_dsp as a table_in to table_out function will work. The current pseudocode for raw_to_dsp is:

 Make input table based on contents of input file (but don't read it yet!)
 Make processing chain and output table from JSON config file
 for chunk in file
     read chunk from input file into input table
     execute the processing chain
     write chunk from output table to output file

If we want the dataflow manager to handle the IO steps, it will have to handle that full loop. That also means it will have to interact with the processing chain and not just the input and output tables, meaning raw_to_dsp under this proposal would have to also return the processing chain.

jasondet · 2021-04-08T17:48:26Z

These should both be handled with the refactor. Let's keep this open then come back to it then.

jasondet · 2023-12-19T18:35:56Z

in lh5.store.write() for append or overwrite mode we need to check if an object being written is going to be the new element of a struct (or column of a table, etc) and update the corresponding attribute.

gipert transferred this issue from legend-exp/pygama May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Managing file I/O and Overwrites #4

Managing file I/O and Overwrites #4

iguinn commented Mar 2, 2021

iguinn commented Mar 2, 2021 •

edited

Loading

jasondet commented Apr 8, 2021

jasondet commented Dec 19, 2023

Managing file I/O and Overwrites #4

Managing file I/O and Overwrites #4

Comments

iguinn commented Mar 2, 2021

iguinn commented Mar 2, 2021 • edited Loading

jasondet commented Apr 8, 2021

jasondet commented Dec 19, 2023

iguinn commented Mar 2, 2021 •

edited

Loading