Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor steps in blending code #443

Open
wants to merge 31 commits into
base: master
Choose a base branch
from

Conversation

sidekock
Copy link
Contributor

@sidekock sidekock commented Dec 6, 2024

This is the refactored version of the blending code of steps, see issue #440

@sidekock sidekock added the enhancement New feature or request label Dec 6, 2024
@sidekock sidekock self-assigned this Dec 6, 2024
@sidekock sidekock linked an issue Dec 6, 2024 that may be closed by this pull request
@sidekock
Copy link
Contributor Author

sidekock commented Dec 6, 2024

Hi reviewers,
The blending code is finally refactored. The following things still need to happen but I would like to do these during the review process:

  • Documentation needs to be updated to the new structure (please let me know where more documentation is needed)
  • @RubenImhoff could you please take a look at the # TODO's, I added some and some remain from the original version. I would like to remove most of these if possible. Some are still there for me to check I don't forget last checks.
    Look forward to all the feedback!

Copy link

codecov bot commented Dec 6, 2024

Codecov Report

Attention: Patch coverage is 72.22222% with 10 lines in your changes missing coverage. Please review.

Project coverage is 84.22%. Comparing base (8b5333c) to head (1b82512).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
pysteps/nowcasts/steps.py 72.22% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #443      +/-   ##
==========================================
+ Coverage   84.03%   84.22%   +0.19%     
==========================================
  Files         160      160              
  Lines       13031    13243     +212     
==========================================
+ Hits        10950    11154     +204     
- Misses       2081     2089       +8     
Flag Coverage Δ
unit_tests 84.22% <72.22%> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sidekock
Copy link
Contributor Author

sidekock commented Dec 6, 2024

Also, @ladc and I found a possible bug in the code that needs to be fixed. I added a test to the test files but commented it out for the moment as this would break the tests (also in the old version of the code)

@RubenImhoff
Copy link
Contributor

@sidekock, fantastic work! I'll try to give it a good review before the Christmas break.

@dnerini
Copy link
Member

dnerini commented Dec 15, 2024

Also, @ladc and I found a possible bug in the code that needs to be fixed. I added a test to the test files but commented it out for the moment as this would break the tests (also in the old version of the code)

could you say a bit more about this bug @sidekock ? is it something that we should address in a separate bug fix?

@sidekock
Copy link
Contributor Author

sidekock commented Dec 15, 2024

Also, @ladc and I found a possible bug in the code that needs to be fixed. I added a test to the test files but commented it out for the moment as this would break the tests (also in the old version of the code)

could you say a bit more about this bug @sidekock ? is it something that we should address in a separate bug fix?

I should check it to be sure but don't have my computer with me now but is has to do with iterations over model numbers vs over duplicated model numbers. We just missed this issue in the tests because it is something that does not frequently happen. A possible reason this could fail is when you have a 2 NWP members but want to generate a blended nowcast with 4 members. This is a small bugfix I can do but I wanted to wait until this issue is tackled fully. It is one of the #TODO comments I added to the code so if you want you can take a look through the code.

The specific TODO is this one:
"TODO: check if j is the best accessor for this variable"
The type of test it failed (which is not yet present in the master branch) is this one:

n_models=2, timesteps=3, n_ens_members=4, n_cascade_levels=8, mask_method="incremental", probmatching_method="cdf", blend_nwp_members=True&False, weights_method="spn", decomposed_nwp=True, expected_n_ens_members=2, zero_radar=False, zero_nwp=False, smooth_radar_mask_range=0, resample_distribution=False),

.gitignore Outdated Show resolved Hide resolved
Co-authored-by: mats-knmi <[email protected]>
@mats-knmi
Copy link
Contributor

I have already noted some initial things I noticed, but other than that it looks really good, much more readable than before.
I will try to find some more time to look into this more in depth (it is a really big file to review haha).
Regarding docstrings: I think what we did for the steps code still makes sense here right? Adding a docstring to StepsBlendingConfig and one to compute_forecast to explain what all the variables do?

@sidekock
Copy link
Contributor Author

I have already noted some initial things I noticed, but other than that it looks really good, much more readable than before. I will try to find some more time to look into this more in depth (it is a really big file to review haha). Regarding docstrings: I think what we did for the steps code still makes sense here right? Adding a docstring to StepsBlendingConfig and one to compute_forecast to explain what all the variables do?

Indeed a very big file :) Ill do the docstrings one of the next days and try to tackle the duplicate comments at that point too

# 8.5 Blend the cascades
final_blended_forecast_single_member = []
for t_sub in self.__state.subtimesteps:
# TODO: does it make sense to use sub time steps - check if it works?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the sub timesteps: I would love it if we could make it so that pysteps can just handle a variable timestep.

When you would go for example from a 5 to 15 minute timestep you could aggregate the previous 3 precipitation fields and multiply the last motion field by 3 and you would then probably just be able to continue in a 15 minute timestep from there right?

This would be much preferable to the sub timestep, which still has to do all the computations at the smaller timesteps. Also it should make the code easier to follow maybe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is part of the refactoring, but maybe something for after

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but I think the refactoring makes the process a lot easier to to this.

@mats-knmi
Copy link
Contributor

I just scrolled though the entire code. I did not read everything, but I mostly tried to follow along. I have added some additonal comments to the things I found, other than that it is good to go from my part (after @RubenImhoff also has had a look).

….velocity_perturbations = [] in __initialize_random_generators
Copy link
Contributor

@RubenImhoff RubenImhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sidekock, super nice work! As @mats-knmi already mentioned, it is a lot of code, so I'm sure I've missed things. My main recommendation would be do to some comparison tests to ensure all functionality still operates the same.

In addition, some overall points I had:

  • Docstrings: already mentioned, but good to make this as clear as possible to in the end have a blending module that is more understandable than the old version was. :)
  • Also, don't forget to sometimes add some short docstrings line-by-line in the individual functions.
  • Question: What internal functions in this piece of code do have overlap (or more or less overlap) with nowcasts/steps.py. Meaning, could we put some functions separate in a utils script or so?

can be given as float32. They will then be converted to float64 before computations
to minimize loss in precision.
# TODO: compare old and new version of the code, run a benchmark to compare the two
# TODO: look at the documentation and try to improve it, lots of things are now combined together
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you still planning to do that as part of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed that is the goal. Based on the feedback on the code I would first change the code and the, when it is in a final state. I will first compare master to this branch and then focus on documentation

pysteps/blending/steps.py Show resolved Hide resolved
"""
###
# 8. Start the forecasting loop
###
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments above can be removed.

self.__perturb_blend_and_advect_extrapolation_and_noise_to_current_timestep(
t, j, worker_state
)
# 8.5 Blend the cascades
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 8.5 Blend the cascades
# Blend the cascades

pysteps/blending/steps.py Show resolved Hide resolved
# latest extrapolated radar rainfall field blended with the
# nwp model(s) rainfall forecast fields as 'benchmark'.

# 8.7.1 first blend the extrapolated rainfall field (the field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 8.7.1 first blend the extrapolated rainfall field (the field
# First blend the extrapolated rainfall field (the field

Perhaps give this the variable name, instead of extrapolated rainfall field, as it occurs in the class.

precip_forecast_probability_matching_blended
)

# 8.7.2. Apply the masking and prob. matching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 8.7.2. Apply the masking and prob. matching
# Apply the masking and probability matching

return out


def calculate_weights_bps(correlations):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are cleaning up the code, would it make sense to move calculate_weights_bps and calculate_weights_spn to a separate model, called weights.py or so? However, at the same time, these weights are quite specific to the STEPS approach, so I'm not sure what is best. It would make the code here a little shorter and cleaner at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure where to move it best. Two other functions are used a few times as well and I don't know where to move them to... Some, I would say, could maybe be added to utils

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think about it. Maybe I'll put them in a separate weights.py module, but otherwise they're good where they are now.

pysteps/blending/steps.py Outdated Show resolved Hide resolved
pysteps/blending/steps.py Outdated Show resolved Hide resolved
@sidekock
Copy link
Contributor Author

@sidekock, super nice work! As @mats-knmi already mentioned, it is a lot of code, so I'm sure I've missed things. My main recommendation would be do to some comparison tests to ensure all functionality still operates the same.

In addition, some overall points I had:

* Docstrings: already mentioned, but good to make this as clear as possible to in the end have a blending module that is more understandable than the old version was. :)

* Also, don't forget to sometimes add some short docstrings line-by-line in the individual functions.

* Question: What internal functions in this piece of code do have overlap (or more or less overlap) with `nowcasts/steps.py`. Meaning, could we put some functions separate in a utils script or so?

Hi ruben,
Docstrings are the thing to look at if the code has a finalized form but that does not seem to far away now :). Regarding your last question: conceptually there is quite a bit of similarity between nowcasting and blending steps. However there are very few pieces of code that are duplicate, this is because we have additional NWP fields to take into account. Stylisticly, I agree that we should keep the duplicated code to a minimum but readability wise and for understanding, I also see the benefit of keeping all the steps of the workflow in one file. If this range would be really needed, we could also make a new issue out of this, because i think it is probably something we do best on its own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Refactor steps in blending code
4 participants