-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Item creation failures #17
Comments
Thank you for the report. I'm very sorry about the issues, which did not appear in the limited set of examples that I had access to. I've now included all the examples files in the test suite, fixed the issues as described below and will commit to GitHub shortly. Please see below what the (temporary) solutions look like. We should contact NOAA for clarification. Would you do that or shall I? KeyError: 'flash_frame_time_offset_of_first_event'Sometimes it seems the frame times are missing. I'll leave them out of the geoparquet now if they are missing. Still, we should ask NOAA whether that is intentional. IndexError: index exceeds dimension boundsThis means for some variables that reported count (e.g. in event_count) is not reflecting the number of rows in the variables (e.g. event_id). For now, I'm using the count value by default but if it doesn't match I'm falling back to the number of rows that are actually available and raise a warning. Still, we should ask NOAA whether this is intentional and what we should do in such a case. TypeError: unsupported type for timedelta seconds component: NoneTypeIt seems for some time offsets a None/NaN/null value is reported. For now, I'm simply writing "null" to the geoparquet file, too. I'm also adding a short description to the column so that users are aware. Still, we should ask NOAA whether that is intentional and/or has any implications. KeyError: 'GOES_Test'Some files report GEOS_Test instead of GOES_West/East. The example file is from before the satellite has "finished drifting"
Your example is from October 2018. I'm not sure how to handle them as the geometry will likely not be the geometry that we have pre-defined for the final position of both satellites. I've added some fallback code, but that only applies for after the drifting period, so not sure how to handle the files beforehand. All code right now relies on the final positions. Something to ask NOAA about? TypeError: only integer scalar arrays can be converted to a scalar indexThis happens when no events/flashed occured in the timeframe. Will be fixed. |
Thanks for the quick updates! Yeah, we definitely could have supplied more examples. But this is pretty standard. Odds and ends turn up once the full dataset starts to be processed. I'm good with the changes. I agree that the items from when the satellite was drifting (GOES_Test) are an issue. Perhaps a warning about potentially bad geometry in the Item description? Something to discuss with @gadomski later today, along with potential contact with NOAA. |
Shall I re-release or do we wait for the NOAA feedback? |
Let's wait until we get fixes for all the above issues, then release. If @pjhartzell needs to use this package with only some of the fixes, he can pin to a commit. |
@gadomski found some docs relating to the first error: KeyError: 'flash_frame_time_offset_of_first_event'. Evidently there was a version change (the version is not indicated anywhere) that added the three variables containing
The table above this snippet in the doc lists the names of the variable that were added in version DO.07.00.00.
|
Thanks a lot, yeah I will make a small change and make the tests a bit more precise! |
I've just made the changes. Any news from NOAA regarding the other points? @gadomski |
Also in the doc that @gadomski found are warnings about missing We might need to check each variable that is supposed to have an |
I haven't found anything for the IndexError: index exceeds dimension bounds error. I think your method of overriding the provided |
Ah... interesting. I checked the values with Panoply and assumed that what I get out there is actually correct and based my fix on this, which it seems it is not because the file is effectively defect due to the missing unsigned indicator. I'll look into this. I'm wondering whether it should be our responsibility to fix defect files? Users of the netCDF files that read it with "normal software" like Panoply will also get nan values and as such may get different results based on whether they use the "fixed" geoparquet or the "defect" netCDF files. That's somehow weird. (I assume normal users would not read these guide books / PDFs from NOAA.) |
Yeah, I agree that this is a hidden bug and most users will struggle to find it and fix it. @gadomski, your thoughts on repairing the netCDF files? I hate to modify source files, but it definitely creates a disconnect between the corrected data in the parquet files and the defective source file. |
In principle, we could simply provide an option for this. See #18 for details. |
So going through the issues again, I think we were able to solve them somewhat and there are two that could be sent to NOAA for clarification. Thanks for the help Pete and Preston.
|
Agreed that we can still follow up with NOAA about the index exceeding dimension bounds. Some thoughts on the drift period, the "GOES-Test
|
Thanks for verifying with the data holding that the fallback code is not needed. I removed/changed it to use the platform for identifying the actual slot in case it Goes-Test. So this issue is basically finished except that we could still ask NOAA about the remaining questions, right? |
I believe the only remaining question is the "IndexError: index exceeds dimension bounds" question. I'll tag @gadomski on whether that's worth following up with NOAA. |
I opened a PR with some edits on how I think the Test orbital slot should be handled. Let me know your thoughts. I'll review the 0.2.0 release PR once we finish this up. |
Opened a new issue for the issue that reported counts are not reflecting the number of rows in the variables: #22 |
Seeing some item creation failures. I randomly sampled the Planetary Computer's holdings for 1000 NetCDFs and cataloged the error tracebacks in the attached file: log_summary.txt
It looks like there are some inconsistencies in the NetCDF content in up to 10% of the sampled files. I've bundled the example NetCDF files listed in the log_summary in the attached zip file: example_files.zip. The NetCDF files are also publicly available.
The text was updated successfully, but these errors were encountered: