Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using custom windows (FLEXWIN/PyFlex) #264

Open
lsawade opened this issue May 7, 2024 · 4 comments
Open

Using custom windows (FLEXWIN/PyFlex) #264

lsawade opened this issue May 7, 2024 · 4 comments

Comments

@lsawade
Copy link

lsawade commented May 7, 2024

Hi Devs,

I want to quantify the uncertainty of my global CMTs using MTUQ, but to do that, I somehow need to work with the custom windows that I selected in PyFlex for the global long-period case.

Is it possible to implement it?


TL;DR

My global long-period use case is somewhat niche compared to local/regional cases. Windows are fairly long at times but irregular, I use 3 different wave categories each with different windows. I chatted with @thurinj at SSA, and he mentioned that it would be best to just open a ticket to start discussing this.

So, what have I done so far?

I already have implemented a rough Green function client for MTUQ that takes in local subsets of the Green functions computed with my non-merged version of specfem3d_globe.

What do I need?

To truly test MTUQ with my inverted moment tensors, I need to make approximately the same measurements as I have done in my inversion hence the requirement of custom windows.

What am I willing to do?

If this is at all possible within the MTUQ framework, I'm more than happy to implement it, as this would be a great application for the moment tensors. So, if possible, I'm willing to do the leg work -- under guidance of course!

@thurinj
Copy link
Member

thurinj commented May 7, 2024

Hi @lsawade, thanks for following up and opening an issue!

It would make total sense to have an additional window category (`custom`?) with completely user-defined time windows (or populated by another software in the processing chain in your case).

I don't foresee specific challenges in implementing this; pretty much everything happens in mtuq/process_data.py, and you can probably start by looking up what's happening for each of the exiting window_type parameters. I don't think it would clash with anything currently implemented.

I also want to stress that it would be a cool opportunity for the code at large and not just for this specific application: with a custom window type profile we can use multiple body-waves windows (which might be needed in your case), but we won't have any plug-and-play waveform visualization. There's already a cool example of application on the S1222a Marsquake using a modified version of MTUQ to handle P and S body waves here https://doi.org/10.1029/2023JE007793, but is unfortunately not currently supported.

We have weight files that are used as input for a lot of tuning parameters (origin time, static time shift, data weight...), but because they are supposed to work with the legacy code CAPUAF, I recommend not using them. How do you store your time-windows information? (waveform header, external file?) -- this might guide the way you approach your implementation, but it's better if you can make it general/flexible.

@rmodrak
Copy link
Member

rmodrak commented May 8, 2024 via email

@lsawade
Copy link
Author

lsawade commented May 8, 2024

Hi @thurinj, hi @rmodrak

thank of the swift replies! It does indeed look simpler than expected. And I'd be happy to implement a custom window and tests for it.

@thurinj to answer your question on my setup: my current workflow is pretty ugly. I process observed and synthetics in up to three different categories (body, surface, mantle), and then I run PyFlex for each category and simply attach a list of PyFlex window objects to each of the corresponding observed Trace.stats. Finally, I just store the entire stream as a pickle object. It is not ideal in terms of portability, but reduces the number of files that have to be read and matched.

That being said, I think a workflow where it's universal to MTUQ is probably the best way to move forward, I'm happy to preprocess things.

I guess really what it boils down to is, how do I enable selecting multiple traces. The SAC reader hints at a station_id being able to have the same trace twice, but I don't know whether this would make things weird when comparing observed and synthetics; I have to assume it does. Otherwise, I could just add multiple traces to the same station but with separate custom windows (picks of start and endtime).

The ugly way, that immediately jumped to mind was to check all traces for each category (body, surface, mantle) for the max number of windows (N) and have one dataset for window 1, to window N, for each corresponding category. Where DS_1 would have almost all stations, and DS_N would have barely any stations. But it does not really make sense.

Another way that I thought would be possible is to take the entire trace, but taper the traces to zero outside the windows. The downside is the potential weirdness of the cross_correlation (if the waveforms somewhat match that should be fine). This approach would also keep the waveforms in each category at the same length but lengthy of course...

Either way, I'll wait for your thoughts, @rmodrak!

@thurinj
Copy link
Member

thurinj commented May 23, 2024

Hi @lsawade,

Coming back on the discussion, the way we process data in MTUQ is to load all three component data as a Dataset object, which is sort of a fancy list of obspy stream with a few utility methods. Then, we map the same processing object to the whole dataset and each trace is treated the same way, that is where you would want to have something different.

I think the main item as @rmodrak mentioned, is to get a input of all your start-end times for each of the traces in your dataset, and iterate through it with a different windowing setting for each trace. It could be a json/yaml as a dual input for window_type and an input parameter file, something like:

ProcessData(
filter_type='Bandpass',
freq_min= fmin,
freq_max= fmax,
pick_type='taup',
taup_model=model,
window_type='custom',
window_file='windows_param.json',
capuaf_file=path_weights,
)

with a check for window_file if window_type == custom (we have a couple of these type of conditional checks in the process_data.py script anyway).

I imagine you could either have separate files for each waveform group, or a structured file with different sections for each group. The first option would reduce the implementation complexity since you wouldn't have any check in-code for the waveform type. The input file approach could be more robust and straightforward than having to deal with the sac header?

And as you loop through the whole dataset during the processing, you can directly get the starttime and endtime from the associated input file.

Let me know if I've missed something!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants