Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cdscan for e3sm_diags #598

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Remove cdscan for e3sm_diags #598

wants to merge 2 commits into from

Conversation

forsyth2
Copy link
Collaborator

Resolves #346. Resolves #80. Follow-up to #519.

@forsyth2 forsyth2 self-assigned this May 28, 2024
@@ -101,19 +101,9 @@ create_links_ts()
YYYY=`printf "%04d" ${year}`
for file in ${ts_dir_source}/${v}_${YYYY}*.nc
do
# Add this time series file to the list of files for cdscan to use
echo ${file} >> ${v}_files.txt
cp ${file} ${ts_dir_destination}/${v}_${YYYY}*.nc
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose ${ts_dir_destination}/ isn't strictly necessary, since we have cd ${ts_dir_destination} above.

@forsyth2
Copy link
Collaborator Author

@tomvothecoder Re: #519 (comment) -- this PR includes the changes that couldn't be included in #519 since the e3sm_diags refactor isn't finished yet. My initial testing suggests this PR's implementation will be sufficient to fully remove CDAT dependencies from zppy. But we'll have to wait to merge it until after the e3sm_diags refactor is done.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 5, 2024

@tomvothecoder I created a new "min case" cfg for testing sets that have already been migrated. They appear to generate alright using E3SM Diags built off the latest CDAT branch: https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_www/issue-346-diags-20240802/v3.LR.historical_0051/e3sm_diags/

@forsyth2
Copy link
Collaborator Author

I ran zppy -c tests/integration/generated/test_min_case_e3sm_diags_cdat_migrated_sets_chrysalis.cfg.

The cfg
[default]
case = "v3.LR.historical_0051"
constraint = ""
dry_run = "False"
environment_commands = ""
input = /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051
input_subdir = archive/atm/hist
mapping_file = "map_ne30pg2_to_cmip6_180x360_aave.20200201.nc"
output = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051"
partition = "debug"
qos = "regular"
www = "/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_www/test-diags-no-cdat-20240917"
years = "1985:1989:2",

[climo]
active = True
walltime = "00:30:00"

  [[ atm_monthly_180x360_aave ]]
  frequency = "monthly"
  input_files = "eam.h0"
  input_subdir = "archive/atm/hist"
  vars = ""

  [[ atm_monthly_diurnal_8xdaily_180x360_aave ]]
  frequency = "diurnal_8xdaily"
  input_files = "eam.h3"
  input_subdir = "archive/atm/hist"
  vars = "PRECT"

  [[ land_monthly_climo ]]
  frequency = "monthly"
  input_files = "elm.h0"
  input_subdir = "archive/lnd/hist"
  mapping_file = "map_r05_to_cmip6_180x360_aave.20231110.nc"
  vars = ""

[ts]
active = True
e3sm_to_cmip_environment_commands = ""
walltime = "00:30:00"

  [[ atm_monthly_180x360_aave ]]
  frequency = "monthly"
  input_files = "eam.h0"
  input_subdir = "archive/atm/hist"
  ts_fmt = "cmip"

  [[ rof_monthly ]]
  extra_vars = 'areatotal2'
  frequency = "monthly"
  input_files = "mosart.h0"
  input_subdir = "archive/rof/hist"
  mapping_file = ""
  vars = "RIVER_DISCHARGE_OVER_LAND_LIQ"

  [[ land_monthly ]]
  extra_vars = "landfrac"
  frequency = "monthly"
  input_files = "elm.h0"
  input_subdir = "archive/lnd/hist"
  mapping_file = "map_r05_to_cmip6_180x360_aave.20231110.nc"
  ts_fmt = "cmip"
  vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILICE,SOILLIQ,SOILWATER_10CM,TSA,TSOI,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"

[tc_analysis]
active = True
scratch = "/lcrc/globalscratch/ac.forsyth2/zppy_weekly_comprehensive_v3_scratch/test-diags-no-cdat-20240917/v3.LR.historical_0051"
walltime = "00:30:00"

[e3sm_diags]
active = True
environment_commands = "source /home/ac.forsyth2/miniconda3/etc/profile.d/conda.sh; conda activate e3sm_diags_cdat_branch_20240916"
grid = '180x360_aave'
multiprocessing = True
num_workers = 8
partition = "compute"
qos = "regular"
ref_final_yr = 1986
ref_start_yr = 1985
ref_years = "1985-1986",
# Also lat_lon_land for mvm
#sets="lat_lon","lat_lon_river","cosp_histogram","zonal_mean_xy","zonal_mean_2d","zonal_mean_2d_stratosphere","aerosol_aeronet","polar","meridional_mean_2d","area_mean_time_series","aerosol_budget","annual_cycle_zonal_mean","diurnal_cycle","tc_analysis","enso_diags", "streamflow"
sets="lat_lon","tc_analysis","enso_diags","streamflow",
short_name = "v3.LR.historical_0051"
ts_num_years = 2
walltime = "5:00:00"
years = "1987:1989:2"

  [[ atm_monthly_180x360_aave ]]
  climo_diurnal_frequency = "diurnal_8xdaily"
  climo_diurnal_subsection = "atm_monthly_diurnal_8xdaily_180x360_aave"
  climo_subsection = "atm_monthly_180x360_aave"
  dc_obs_climo = '/lcrc/group/e3sm/public_html/e3sm_diags_test_data/unit_test_complete_run/obs/climatology'

  [[ atm_monthly_180x360_aave_mvm ]]
  # Test model-vs-model using the same files as the reference
  climo_diurnal_frequency = "diurnal_8xdaily"
  climo_diurnal_subsection = "atm_monthly_diurnal_8xdaily_180x360_aave"
  climo_subsection = "atm_monthly_180x360_aave"
  diff_title = "Difference"
  partition = "compute"
  qos = "regular"
  ref_name = "v3.LR.historical_0051"
  reference_data_path = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/clim"
  run_type = "model_vs_model"
  short_ref_name = "same simulation"
  swap_test_ref = False
  tag = "model_vs_model"
  ts_num_years_ref = 2
  ts_subsection = "atm_monthly_180x360_aave"

#   [[ lnd_monthly_mvm_lnd ]]
#  # Test model-vs-model using the same files as the reference
#   climo_subsection = "land_monthly_climo"
#   diff_title = "Difference"
#   partition = "compute"
#   qos = "regular"
#   ref_name = "v3.LR.historical_0051"
#   reference_data_path = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/lnd/180x360_aave/clim"
#   run_type = "model_vs_model"
#   sets = "lat_lon_land",
#   short_ref_name = "same simulation"
#   swap_test_ref = False
#   tag = "model_vs_model"
#   ts_num_years_ref = 2

I specifically set:

sets="lat_lon","tc_analysis","enso_diags","streamflow",

to test the latest sets added on https://github.com/E3SM-Project/e3sm_diags/commits/cdat-migration-fy24 (apparently missing qbo as a recent set). [Note I still included lat_lon as a set since I recall E3SM Diags sometimes running into problems if that set isn't run first.]

These three sets unfortunately ran into some errors. https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_www/test-diags-no-cdat-20240917/v3.LR.historical_0051/e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1987-1988/viewer/ only shows lat_lon. Now, by #625, apparently tc_analysis wasn't even showing up correctly on main. But that still means enso_diags and streamflow ran into some issues on the migrated branch.

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/scripts

$ grep -v "OK" *status
# No outright errors

$ grep -in tc_analysis e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o585928
$ grep -in enso_diags e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o585928
$ grep -in streamflow e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o585928

tc_analysis:

IndexError: list index out of range

This error matches the one on main, as noted in #625

enso_diags:

OSError: No files found for target variable TAUX or derived variables ([('tauu',), ('surf_mom_flux_\
U',)]) in ts.
OSError: No files found for target variable LHFLX or derived variables ([('hfls',), ('QFLX',), ('su\
rface_upward_latent_heat_flux',)]) in ts.
OSError: No files found for target variable SHFLX or derived variables ([('hfss',), ('surf_sens_flu\
x',)]) in ts
OSError: No files found for target variable LHFLX or derived variables ([('hfls',), ('QFLX',), ('su\
rface_upward_latent_heat_flux',)]) in ts.
OSError: No files found for target variable SHFLX or derived variables ([('hfss',), ('surf_sens_flu\
x',)]) in ts.
OSError: No files found for target variable NET_FLUX_SRF or derived variables ([('FSNS', 'FLNS', 'Q\
FLX', 'PRECC', 'PRECL', 'PRECSC', 'PRECSL', 'SHFLX'), ('FSNS', 'FLNS', 'LHFLX', 'SHFLX'), ('FSNS', \
'FLNS', 'QFLX', 'SHFLX'), ('rsds', 'rsus', 'rlds', 'rlus', 'hfls', 'hfss')]) in ts.
OSError: No files found for target variable PRECT or derived variables ([('PRECT',), ('pr',), ('PRE\
CC', 'PRECL'), ('sat_gauge_precip',), ('PrecipLiqSurfMassFlux', 'PrecipIceSurfMassFlux')]) in ts.
OSError: No files found for target variable TAUX or derived variables ([('tauu',), ('surf_mom_flux_\
U',)]) in ts.
OSError: No files found for target variable TAUY or derived variables ([('tauv',), ('surf_mom_flux_\
V',)]) in ts.
OSError: No files found for target variable FSNS or derived variables ([('sfc_net_sw_all_mon',), ('\
rsds', 'rsus')]) in ts.
OSError: No files found for target variable NET_FLUX_SRF or derived variables ([('FSNS', 'FLNS', 'Q\
FLX', 'PRECC', 'PRECL', 'PRECSC', 'PRECSL', 'SHFLX'), ('FSNS', 'FLNS', 'LHFLX', 'SHFLX'), ('FSNS', \
'FLNS', 'QFLX', 'SHFLX'), ('rsds', 'rsus', 'rlds', 'rlus', 'hfls', 'hfss')]) in ts.

streamflow:

OSError: No time series `.nc` file was found for 'RIVER_DISCHARGE_OVER_LAND_LIQ' in 'rof'

The enso_diags and streamflow errors seem to suggest missing data, but this confuses me because the comprehensive_v3 test uses the same simulation output as its data source...

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 1, 2024

@chengzhuzhang @tomvothecoder

https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_www/test-diags-no-cdat-20240917/v3.LR.historical_0051/e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1987-1988/viewer/ shows only lat_lon despite tests/integration/template_min_case_e3sm_diags_cdat_migrated_sets.cfg having sets="lat_lon","tc_analysis","enso_diags","streamflow",.

The relevant output directory can be seen here, and is also mentioned in a code block above.

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/scripts

$ grep -v "OK" *status
# No obvious failures

# Possibly useful commands:
$ grep -in error e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o585928
$ grep -in tc_analysis e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o585928
$ grep -in enso_diags e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o585928
$ grep -in streamflow e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o585928

Note https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_www/test-diags-no-cdat-20240917/v3.LR.historical_0051/e3sm_diags/atm_monthly_180x360_aave_mvm/model_vs_model_1987-1988/viewer/ also shows only lat_lon in the final viewer.

@chengzhuzhang
Copy link
Collaborator

chengzhuzhang commented Oct 1, 2024

@forsyth2 I'm looking at the log files (as below), it appears that time series files are not correctly feed into e3sm_diags, which means the code change that removes cdscan step is not working with e3sm_diags s new code base correctly.

>cp: cannot create regular file 'ts/FSNTOA_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/FLUT_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/FSNT_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/FLNT_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/FSNS_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/FLNS_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/SHFLX_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/QFLX_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/TAUX_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/TAUY_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/PRECC_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/PRECL_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/PRECSC_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/PRECSL_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/TS_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/TREFHT_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/CLDTOT_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/CLDHGH_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/CLDMED_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/CLDLOW_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/U_1987*.nc': No such file or directory
cp: target 'RIVER_DISCHARGE_OVER_LAND_LIQ_*.nc' is not a directory

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 1, 2024

Thanks @chengzhuzhang. I haven't had problems with previous added sets though. I'll try to debug this further to see if it's something with how this PR handles these sets in particular...

@chengzhuzhang
Copy link
Collaborator

chengzhuzhang commented Oct 1, 2024

@tomvothecoder we will need to coordinate on this issue with e3sm_diags refactoring.

The original cdscan codes basically subset and extract the metadata from input file for the specified time period and generate an xml file with file name as T_startTime_endTime.xml which cdms2 is able to read and process.

In the new e3sm_diags code, for time-series, we use xarray.open_dataset, and the input needs to be one single .nc file. I think we may need to update to xarray.open_mfdataset (https://github.com/E3SM-Project/e3sm_diags/blob/7b53932d72ef7782818a5bd1e4e9d527420fec49/e3sm_diags/driver/utils/dataset_xr.py#L1035)which allows multiple files, and also to support subset time based on start and end years (i think the later is already supported by https://github.com/E3SM-Project/e3sm_diags/blob/7b53932d72ef7782818a5bd1e4e9d527420fec49/e3sm_diags/driver/utils/dataset_xr.py#L1185 ?).

@chengzhuzhang
Copy link
Collaborator

Thanks @chengzhuzhang. I haven't had problems with previous added sets though. I'll try to debug this further to see if it's something with how this PR handles these sets in particular...

Could you clarify "the previous added sets", i think the issue only relevant for time-series based diagnostics sets. For sets depends on climo datasets, they should be fine.

@chengzhuzhang
Copy link
Collaborator

Thanks @chengzhuzhang. I haven't had problems with previous added sets though. I'll try to debug this further to see if it's something with how this PR handles these sets in particular...

Could you clarify "the previous added sets", i think the issue only relevant for time-series based diagnostics sets. For sets depends on climo datasets, they should be fine.

@forsyth2 I saw the test results from Aug 6, yes by then those sets only rely on climo files.

@chengzhuzhang chengzhuzhang reopened this Oct 1, 2024
@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 1, 2024

@chengzhuzhang Does this mean the bug is for certain in the Diags code or should I dive deeper into the zppy changes in this PR?

@chengzhuzhang
Copy link
Collaborator

@chengzhuzhang Does this mean the bug is for certain in the Diags code or should I dive deeper into the zppy changes in this PR?

It is not a bug in diags. We can also add the time selecting and slicing code in zppy to be able to remove cdscan. We should have a discussion with @tomvothecoder on how best to approach this.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Oct 1, 2024

@chengzhuzhang Does this mean the bug is for certain in the Diags code or should I dive deeper into the zppy changes in this PR?

It is not a bug in diags.

To summarize this issue, based on what I'm reading:

  • lat_lon works fine because the refactored e3sm_diags uses xc.open_mfdataset() for climatology datasets (link), which supports directories or globs for .nc files (like how zppy specifies here)
  • enso_diags and streamflow fail because they use time series datasets (via a directory). The refactored e3sm_diags uses xc.open_dataset() for time series files (link).
  • tc_analysis seems to have been failing on main and dev branch for another reason

I will open a PR to update e3sm_diags to use xc.open_mfdataset() for time series files (here).


also to support subset time based on start and end years (i think the later is already supported by E3SM-Project/e3sm_diags@7b53932/e3sm_diags/driver/utils/dataset_xr.py#L1185 ?).
We can also add the time selecting and slicing code in zppy to be able to remove cdscan. We should have a discussion with @tomvothecoder on how best to approach this.

The time slicing and subsetting code you listed parses the filepath to get the start and end years. I'm not sure if this currently works with a directory specified as an input path.

@tomvothecoder
Copy link
Collaborator

@forsyth2 Can you provide the input paths for the test data? I am trying to reproduce your errors by running prov/e3sm.py but the test_ts is a local reference.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Oct 1, 2024

Note, opening multiple time series files is actually supported already if deriving from other variables.

If we are trying to open up multiple datasets for the same variable (e.g., split up by months or something) without deriving, then we need to add it in #861.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 1, 2024

@tomvothecoder

Can you provide the input paths for the test data?

Everything is specified in the cfg I have copied in #598 (comment). Specifically:

input = /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051
output = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051"

For the test_ts directory specifically, I believe that would be copied from:

/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 16, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace CDAT Explore replacement of cdscan
3 participants