-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ESM1.5 driver file naming. #93
Conversation
There's a few things here:
A few changes can be made to simplify the code & touch less of the existing interface. The um2nc-standalone/umpost/conversion_driver_esm1p5.py Lines 343 to 346 in 2c5fcea
I'd recommend the following changes:
Do you want to see if those changes help simplify the naming feature & tests? |
Ah, there's a video about software architecture called "Boundaries", also known as "Functional Core, Imperative Shell" (FCIS). This is the sort of approach guiding |
Sweet, I've had a go at putting in these changes.
Oops sorry about that! This should be fixed in aff66bd and hopefully it's a bit easier to read the changes. I've moved the I have a couple of extra questions/comments, which I've added over the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've listed a few first round fixes (somehow they got bundled into a review instead of individual comments).
The biggest current problem is the failing CI, once that passes we can go over other steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a review in the event comments are blocked without one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be good to go!
@marc-white, I'm just pinging you for a second pair of eyes to look over the changes. Any suggestions or feedback you have are welcome! |
stem = fields_file_name[0:FF_UNIT_INDEX + 1] | ||
unit = fields_file_name[FF_UNIT_INDEX] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suffices for this use case, but you may want to consider adding a TODO
to switch over to regular expressions. They're more flexible (especially if this conversion ever needs to be done for other models/filename patterns), and I think they're also a bit clearer as to what you're doing (i.e., it would be more explicit what you're pulling out of the filename pattern and from where).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also forgot to mention that REs would give immediate feedback on whether or not fields_file_name
conforms to the expected pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I've added a # TODO
comment in b014029 and will make a new issue for this.
|
||
assert nc_write_path == Path("/test/path/NetCDF/fields_123.file.nc") | ||
assert nc_write_path == expected_nc_write_path | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to test against what happens if a totally invalid filename gets fired into get_nc_write_path
? Is that behaviour even defined/consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Currently the get_nc_write_path
is only being provided file names that match some sort of expected format provided by the find_matching_fields_files
function, however it probably makes sense to more explicitly deal with invalid file names being supplied. I think if we swap to using regex following your suggestion, this should be easier!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to go as-is; comments added are things to consider/possible TODO
items to add.
This pr attempts to close #91.
It updates the naming of the output netCDF files to include a
YYYYMM
date string, and a suffix stating the data frequency in a file. E.g. the fields fileaiihca.paa1jan
gets converted toaiihca.pa-010101_mon.nc
following the convention in thep73
archive.Getting this information involves two steps:
mule.FixedLengthHeader.from_file
mon
,dai
,6hr
, or3hr
.Originally setting the output filepath was handled by
get_nc_write_path()
, which was called from insideconvert_fields_file_list()
and just glued.nc
onto the name:um2nc-standalone/umpost/conversion_driver_esm1p5.py
Lines 141 to 143 in 70a890c
I thought adding changes 1 and 2 to the naming here would involve burying more I/O which might make
convert_fields_file_list
harder to test, and also would make it the function a bit inflexible (e.g. the same function couldn't be used with a different naming convention).I tried moving the file naming out from the conversion function up to the higher level
convert_esm1p5_output_dir
function. It uses an updatedget_nc_write_path()
to create a list of(input_path, output_path)
pairs, which gets fed intoconvert_fields_file_list
um2nc-standalone/umpost/conversion_driver_esm1p5.py
Lines 343 to 347 in 2c5fcea
get_nc_write_path
then performs steps 1 and 2, and I tried to make it so that it could be a target for more "mid level" testing.I went back and forth a bit on the best way to structure the changes and am still not completely sure, so any suggestions would be welcome!