Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpeasyVariable does not check VALIDMIN and VALIDMAX #161

Closed
Beforerr opened this issue Nov 13, 2024 · 4 comments · Fixed by #168
Closed

SpeasyVariable does not check VALIDMIN and VALIDMAX #161

Beforerr opened this issue Nov 13, 2024 · 4 comments · Fixed by #168
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@Beforerr
Copy link
Contributor

Description

CDF actually contains this meta to confine data valid range (not nan). It would be nice this is validated in the library level not by user.

What I Did

I have a simple function to validate this

def get_data_and_time(v: SpeasyVariable):
    v = v.replace_fillval_by_nan()
    data = v.values
    time = v.time
    v_valid_mins = v.meta.get("VALIDMIN", [])
    v_valid_maxs = v.meta.get("VALIDMAX", [])

    all_cond_axis = tuple(range(1, data.ndim))
    for v_valid_min in v_valid_mins:
        cond = (data >= v_valid_min).all(axis=all_cond_axis)
        data, time = data[cond], time[cond]
    for v_valid_max in v_valid_maxs:
        cond = (data <= v_valid_max).all(axis=all_cond_axis)
        data, time = data[cond], time[cond]
    return data, time
    ```
@jeandet jeandet added the enhancement New feature or request label Nov 13, 2024
@jeandet
Copy link
Member

jeandet commented Nov 13, 2024

@Beforerr, definitely something we can integrate into Speasy. To be sure, you want to filter out any non valid data points from a Speasy variable?
If so, I we can easily add a method only_valid_values with the same interface than replace_fillval_by_nan.
Speasy variables Numpy support should make this simpler too, I have to check if Speasy variables behave well with comparisons and Numpy indexing.

@jeandet jeandet self-assigned this Nov 13, 2024
@jeandet jeandet added this to the 1.5 milestone Nov 13, 2024
@Beforerr
Copy link
Contributor Author

Yes, it would be a great idea to add a method like this. Also, If I remember right, this actually applies to the local CDF files. When geting the data from the CDAS (not sure about other service), it actually already does something like that.

@jeandet
Copy link
Member

jeandet commented Dec 15, 2024

@Beforerr I've prepared a PR (#168 ) with two new methods, clamp_with_nan that replaces values outside of Valid range with NaNs and sanitized that drops problematic values based on booleans.
I also extended Numpy support to allow comparison and indexing, so you can simply write something like [this](https://github.com/jeandet/speasy/blob/valid_min_and_valid_max/speasy/products/variable.py#L694]

@Beforerr
Copy link
Contributor Author

Hi @jeandet Thanks for the update! The method sanitized and extended Numpy support looks really useful! I appreciate the effort you put into the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants