Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support setting DSP output column level HDF5 settings #21

Open
gipert opened this issue Nov 10, 2023 · 8 comments
Open

Support setting DSP output column level HDF5 settings #21

gipert opened this issue Nov 10, 2023 · 8 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@gipert
Copy link
Member

gipert commented Nov 10, 2023

Would be needed if we want to use custom compression settings or any other HDF5 dataset option. We could actually have a new JSON field listing attributes that should be attached to LGDOs, e.g.

"wf_bl": {
    "function": "bl_subtract",
    "module": "dspeed.processors",
    "args": ["waveform", "baseline", "wf_bl"],
    "unit": "ADC",
    "lgdo_attrs": {
        "0vbb": "I want to believe!",
        "hdf5_settings": {
            "compression": "gzip",
            "shuffle": true
        }
    }
}
@iguinn
Copy link
Collaborator

iguinn commented Jan 10, 2024

I added this feature, but it doesn't work for this:

        "hdf5_settings": {
            "compression": "gzip",
            "shuffle": true
        }

giving TypeError: Object dtype dtype('O') has no native HDF5 equivalent. Is this how you had intended it to be used for setting the compression options, or am I missing something?

However, it does work for basic strings. I tried adding a "description" attribute, and one really nice thing is that it can be displayed using the -a option for lh5ls!

├── ch1027200 · HDF5 group 
│   └── dsp · table{wf_presum,waveform_windowed} 
│       ├── waveform_windowed · table{t0,dt,values} ── {'description': 'Waveform windowed from 42 us to 64.4 us'}
│       │   ├── dt · array<1>{real} ── {'units': 'ns'}
│       │   ├── t0 · array<1>{real} ── {'units': 'ns'}
│       │   └── values · array_of_equalsized_arrays<1,1>{real} 
│       └── wf_presum · table{t0,dt,values} ── {'description': 'Waveform presummed by a factor of 6'}
│           ├── dt · array<1>{real} ── {'units': 'ns'}
│           ├── t0 · array<1>{real} ── {'units': 'ns'}
│           └── values · array_of_equalsized_arrays<1,1>{real} 

I think this would be a great feature to take advantage of to make our analysis self-documenting.

@gipert
Copy link
Member Author

gipert commented Jan 11, 2024

giving TypeError: Object dtype dtype('O') has no native HDF5 equivalent. Is this how you had intended it to be used for setting the compression options, or am I missing something?

Uhm it's supposed to work, see:

https://legend-pydataobj.readthedocs.io/en/latest/api/lgdo.lh5.html#lgdo.lh5.store.LH5Store.write

let me have a look...

@gipert
Copy link
Member Author

gipert commented Jan 11, 2024

I implemented exactly the same thing here:

legend-exp/pygama#543

and it works... did you check your legend-pydataobj version?

@iguinn
Copy link
Collaborator

iguinn commented Jan 11, 2024

Oh I think the problem is that it doesn't work on WaveformTables, but it does on other objects. I'm guessing it doesn't work on composite LGDO objects?

@gipert
Copy link
Member Author

gipert commented Jan 11, 2024

Indeed!

@iguinn
Copy link
Collaborator

iguinn commented Jan 11, 2024

How do we want to handle waveform compression then? Do we want to have the LGDO WaveformArray object handle propagating the options to the table values, or do we want to have the processing_chain handle it?

@gipert
Copy link
Member Author

gipert commented Jan 12, 2024

Good point... I guess there is no clean way to do this. I think we need something similar to what I have implemented for the daq2lh5 configuration:

https://legend-daq2lh5.readthedocs.io/en/latest/api/daq2lh5.html#daq2lh5.data_decoder.DataDecoder

i.e.

"wf_bl": {
    "function": "bl_subtract",
    "module": "dspeed.processors",
    "args": ["waveform", "baseline", "wf_bl"],
    "unit": "ADC",
    "waveform_compression": {
        "values": "RadwareSigcompress(codec_shift=-32768)",
        "t0": "gzip"
    },
    "lgdo_attrs": {
        "0vbb": "I want to believe!",
        "hdf5_settings": {
            "compression": "gzip",
            "shuffle": true
        }
    }
}

or similar. This extra waveform_compression field would only work for WaveformTable outputs, obviously. Not sure how to specify extra HDF5 settings for the WaveformTable fields though... happy to know if you have better ideas.

For dev, this is useful to convert waveform decoder strings to Python objects: https://legend-pydataobj.readthedocs.io/en/stable/api/lgdo.compression.html#lgdo.compression.utils.str2wfcodec

@iguinn
Copy link
Collaborator

iguinn commented Jan 17, 2024

#44 has been accepted; however, this is only a partial solution to this issue so I am leaving it open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants