Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current carbon workflows in NEONiso are broken on current DP4.00200.001 files #97

Open
rfiorella opened this issue Apr 7, 2024 · 8 comments

Comments

@rfiorella
Copy link
Collaborator

Describe the bug
A clear and concise description of what the bug is.

To Reproduce

  1. Download DP4.00200.001 files processed in 2024.
  2. Run calibrate_carbon on all files from a site (e.g., BART).
  3. Fails with error: "There are no data meeting the criteria level dp01, averaging interval 9, and variables isoCo2." inherited from neonUtilities::stackEddy. I think this is either because: a) there is not consistency in these variables throughout the time series in the currently released files, and b) upon manual inspection, many of the isoCo2 variables are null vectors in the data files - so the data are no longer being written to the h5 files by the NEON flow.stor.towr workflow.

@cflorian1 @NDurden @ddurden any ideas what might have happened here? Anything I can do to help out?

@NDurden
Copy link
Contributor

NDurden commented Apr 8, 2024

@rfiorella, in the latest released

  • Averaging Interval Change: The isoCo2 data is now provided with averaging intervals of 6 and 30 minutes. When using neonUtilities::stackEddy to download this data, you should update the averaging interval setting to 6 minutes instead of the previous 9 minutes.
  • We also applied your isotope calibration in this released. This includes:
    Expanded file:
    - contains the calibrated delta13CCo2 and rtioMoleDryCo2 from both methods (Bowling and linear regression)
    - the mean values of delta13CCo2 and rtioMoleDryCo2 are the mean values of calibrated delta13CCo2 and rtioMoleDryCo2 using Bowling method
    - the mean raw values of delta13CCo2 and rtioMoleDryCo2 are now called as "meanRaw"
    - the calibration parameters (i.e. slope & offset) from both methods, located in dp01/data/isoCo2/calData folder
    Basic file:
    - Contains only calibrated data from the Bowling method, including mean values based on this method and calibration parameters specifically from Bowling method.

Hope these help and please let me know If you have any further questions or encounter issues.

@rfiorella
Copy link
Collaborator Author

Thanks @NDurden, that is very helpful information! (And apologies that I forgot some of this, I am now remembering that we did discuss some of this a bit last year).

The first issue has been covered (#82) in the NEONiso 0.7.0 release - so I think that issue has been resolved!

I am a bit surprised the changes for the expanded and basic file you described are causing issues here. I suspect it's related to some HDF5 groups that have null vectors/data frames in the new files (I have only checked the basic files and not the expanded files yet). I'll attach some screen shots shortly that show what I mean.

@cklunch would probably know for sure whether this would throw an error, but I think when trying to stack these files at the 'isoCo2' level, it will fail when a group in that set of data files and a variable name selected as part of 'var' in stackEddy is null. Maybe there's another workaround that I'm not aware of, but I'm wondering if these files would need to have a single record in each month where the dates are some period in the month and the rest of the columns are NA to allow stacking of files.

@rfiorella rfiorella changed the title Current carbon workflows in NEONiso are horribly broken on current DP4.00200.001 files Current carbon workflows in NEONiso are broken on current DP4.00200.001 files Apr 8, 2024
@cklunch
Copy link

cklunch commented Apr 10, 2024

@rfiorella One other piece of relevant intel - we're still working on getting all the provisional files re-processed, so there are still some months of data with the 9-min label. @cflorian1 knows the schedule for the re-processing better than I do.

I'll take a look in the next couple of days, but I suspect the problem with the variable names is the same as (part of) the problem with the new averaging intervals. Currently stackEddy() is erroring out if it encounters a single file that doesn't match the var or avg requested. It was never a problem before because the files were so consistent, but now it needs an update. I'm planning to update in the next version so it skips the mismatched file and moves on to the next.

@cflorian1
Copy link

@rfiorella after reading through this I agree with @cklunch. This is very likely due to the few months of provisional data (July through September 2023) that didn't get reprocessed with the latest code and still have the 09m file structure naming convention. I'm not the best person to explain this, but the reason these haven't been reprocessed yet is that we migrated fully to GCP and we don't have a processing environment with the L0p files already generated to feed into the ECSE L1-4 code. In order to avoid rerunning everything (ECSE and ECTE, resource intensive), we would need to update the DAG to split the processes. Or, the current preferred solution, is to use the L0p files that are in the neon-sae-files bucket as inputs. This requires changing the input file location for the L1-4 DAG and changing some of the Airflow triggers. We just settled on this solution yesterday and as far as I know it is currently in the works. You aren't the only one who is having trouble due to the inconsistent naming convention, so I'm trying to get this resolved as soon as possible.

@rfiorella
Copy link
Collaborator Author

@cflorian1 @cklunch great, thank you both for the explanation! Let me know when you think you have the reprocessing finished and I'll try again and hopefully be able to close this issue.

@cklunch
Copy link

cklunch commented May 1, 2024

@rfiorella neonUtilities 2.4.2 was just released on CRAN, with the stackEddy() fix for handling mismatched files. So that half of the problem is fixed, and @cflorian1 can update you when the reprocessing is done.

@cflorian1
Copy link

@rfiorella, reprocessing is complete. All DP4.00200.001 files should have the same 06m naming convention for isoCO2 and we have successfully tested stacking of the most recent files.

@rfiorella
Copy link
Collaborator Author

rfiorella commented May 23, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants