Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share FCI L1c metadata between segments #2828

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

pnuu
Copy link
Member

@pnuu pnuu commented Jun 14, 2024

Some of the metadata are identical in every FCI L1c segment, so reading those only once is possible. This will save a lot of time in Scene creation when the data are in S3 storage:

  • main - 37.0 s
  • PR - 23.0 s

There will be conflicts with #2686, but maybe adding the pickle would benefit also this feature.

  • Closes #xxxx
  • Tests added
  • Fully documented
  • Add your name to AUTHORS.md if not there already

@pnuu pnuu added enhancement code enhancements, features, improvements component:readers refactor PCW Pytroll Contributors' Week labels Jun 14, 2024
@pnuu pnuu self-assigned this Jun 14, 2024
Copy link

codecov bot commented Jun 14, 2024

Codecov Report

Attention: Patch coverage is 96.87500% with 2 lines in your changes missing coverage. Please review.

Project coverage is 95.94%. Comparing base (ee2273a) to head (f46df94).
Report is 10 commits behind head on main.

Files Patch % Lines
satpy/readers/fci_l1c_nc.py 91.30% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2828   +/-   ##
=======================================
  Coverage   95.94%   95.94%           
=======================================
  Files         366      366           
  Lines       53515    53580   +65     
=======================================
+ Hits        51343    51409   +66     
+ Misses       2172     2171    -1     
Flag Coverage Δ
behaviourtests 4.04% <0.00%> (-0.01%) ⬇️
unittests 96.04% <96.87%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@coveralls
Copy link

coveralls commented Jun 14, 2024

Pull Request Test Coverage Report for Build 9515844065

Details

  • 9 of 28 (32.14%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.03%) to 96.006%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/readers/fci_l1c_nc.py 4 23 17.39%
Totals Coverage Status
Change from base Build 9513521871: -0.03%
Covered Lines: 51566
Relevant Lines: 53711

💛 - Coveralls

@pnuu
Copy link
Member Author

pnuu commented Jun 14, 2024

I'll see what I can do for testing. It's not easy, because the file handler is never used in the tests.

@pnuu
Copy link
Member Author

pnuu commented Jun 19, 2024

I've now added two tests that hopefully show that the storing and reusing of common items between the FCI L1c segments work.

@coveralls
Copy link

coveralls commented Jun 19, 2024

Pull Request Test Coverage Report for Build 9581505593

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 48 of 51 (94.12%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.002%) to 96.038%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/readers/fci_l1c_nc.py 20 23 86.96%
Totals Coverage Status
Change from base Build 9513521871: -0.002%
Covered Lines: 51605
Relevant Lines: 53734

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jun 20, 2024

Pull Request Test Coverage Report for Build 9592333785

Details

  • 48 of 51 (94.12%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.002%) to 96.039%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/readers/fci_l1c_nc.py 20 23 86.96%
Totals Coverage Status
Change from base Build 9588363290: -0.002%
Covered Lines: 51621
Relevant Lines: 53750

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jun 20, 2024

Pull Request Test Coverage Report for Build 9592773461

Details

  • 49 of 52 (94.23%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.002%) to 96.039%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/readers/fci_l1c_nc.py 20 23 86.96%
Totals Coverage Status
Change from base Build 9588363290: -0.002%
Covered Lines: 51622
Relevant Lines: 53751

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jun 20, 2024

Pull Request Test Coverage Report for Build 9593191408

Details

  • 62 of 64 (96.88%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.006%) to 96.047%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/readers/fci_l1c_nc.py 21 23 91.3%
Totals Coverage Status
Change from base Build 9588363290: 0.006%
Covered Lines: 51638
Relevant Lines: 53763

💛 - Coveralls

@simonrp84
Copy link
Member

Interesting!
In the code I can see some metadata listed as non-shareable. Do you have a list of the shareable metadata? i.e: Which ones don't change between segments and are hence affected by this PR?

@pnuu
Copy link
Member Author

pnuu commented Jun 20, 2024

Anything in the list shown here that don't end in the strings listed in NONSHAREABLE_VARIABLE_ENDINGS can be shared between the segments. Note that the {chan_name} is replaced with the 16 channels FCI has, so the list is quite long.

@gerritholl
Copy link
Collaborator

Interesting! In the code I can see some metadata listed as non-shareable. Do you have a list of the shareable metadata? i.e: Which ones don't change between segments and are hence affected by this PR?

In
https://github.com/pytroll/satpy/pull/2686/files#diff-2170f8edf16088150763d5f3a6cbd69d62600d238c0ba80a41afcb4832fb7b5d
I explicitly list what metadata can be shared between segments.

Copy link
Collaborator

@gerritholl gerritholl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the work! I left some comments/questions where I think things might be implemented differently. In addition, could more implementation be in netcdf_utils so it can be used by other readers where relevant?

Comment on lines +161 to +169
NONSHAREABLE_VARIABLE_ENDINGS = [
"index",
"time",
"measured/effective_radiance",
"measured/y",
"position_row",
"index_map",
"pixel_quality"]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #2686 I have added this information to the YAML file. Essentially, I have changed required_variable_names to be a dict rather than a list. The keys of the dict remain the required variable names, and the values indicate how they can be shared between segments or between repeat cycles:

    required_netcdf_variables: &required-variables
      # key/value; keys are names, value is a list of string on how this may be
      # cached between segments or between repeat cycles or neither
      attr/platform:
        - segment
        - rc
      data/{channel_name}/measured/start_position_row:
        - rc
      data/{channel_name}/measured/end_position_row:
        - rc
      data/{channel_name}/measured/radiance_to_bt_conversion_coefficient_wavenumber:
        - segment
        - rc
      data/{channel_name}/measured/radiance_to_bt_conversion_coefficient_a:
        - segment
        - rc
      data/{channel_name}/measured/radiance_to_bt_conversion_coefficient_b:
        - segment
        - rc
      data/{channel_name}/measured/radiance_to_bt_conversion_constant_c1:
        - segment
        - rc
      data/{channel_name}/measured/radiance_to_bt_conversion_constant_c2:
        - segment
        - rc
      data/{channel_name}/measured/radiance_unit_conversion_coefficient:
        - segment
        - rc
      data/{channel_name}/measured/channel_effective_solar_irradiance:
        - segment
        - rc
      data/{channel_name}/measured/effective_radiance: []
      data/{channel_name}/measured/x:
        - segment
        - rc
      data/{channel_name}/measured/y:
        - rc
      data/{channel_name}/measured/pixel_quality: []
      data/{channel_name}/measured/index_map: []

See https://github.com/pytroll/satpy/pull/2686/files#diff-2170f8edf16088150763d5f3a6cbd69d62600d238c0ba80a41afcb4832fb7b5dR23-R84

We should agree on an approach to avoid a duplication of information. One difference is that I need information not only on what can be shared between segments, but also between repeat cycles. By adding information to the YAML file, I don't need to repeat variable names (or parts of variable names) in different places (YAML file + source code).

I think swath_number is not shareable between segments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I kinda made this on the basis "lets see if this makes things faster" so the approach was the simplest I saw. I think yours is more general and I'll update this PR to match yours when yours is ready.

Could be that swath_number can't be shared, but apparently sharing it didn't break anything when creating imagery 😅

satpy/readers/fci_l1c_nc.py Outdated Show resolved Hide resolved
Comment on lines +732 to +733
if any(key.endswith(k) for k in NONSHAREABLE_VARIABLE_ENDINGS):
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default you are sharing (in case of unknown variable names). Would it be safer to default to not-sharing, i.e. use a whitelist rather than a blacklist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your dict of sharing stuff would make this obsolete in any case as it explicitly says what can be shared and how.

if any(key.endswith(k) for k in NONSHAREABLE_VARIABLE_ENDINGS):
continue
shared_info[key] = self.file_content[key]
filetype_info["shared_info"] = shared_info
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is filetype_info really an appropriate place to put this cache? I fear this might be confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this with @ameraner at PCW and this was already used in somewhere. LI L2, perhaps? It was also the path of least resistance as it was already in place and the same dict is passed between different filehandlers by the YAML reader.

If there are other backwards compatible ways to pass the info between the file handlers then let me know.

@pnuu
Copy link
Member Author

pnuu commented Jun 20, 2024

In addition, could more implementation be in netcdf_utils so it can be used by other readers where relevant?

I think so. As I said in another comment above this snowballed from a quick test at PCW so I touched as little other parts of the code as possible. The FCI L1c data format is the most demanding in this regard, and most important to me, so started here.

I'll see about generalizing things later and converting this as a draft for the summer period.

@pnuu pnuu marked this pull request as draft June 20, 2024 10:36
@coveralls
Copy link

coveralls commented Jun 20, 2024

Pull Request Test Coverage Report for Build 9596154108

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 62 of 64 (96.88%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.006%) to 96.047%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/readers/fci_l1c_nc.py 21 23 91.3%
Totals Coverage Status
Change from base Build 9588363290: 0.006%
Covered Lines: 51638
Relevant Lines: 53763

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:readers enhancement code enhancements, features, improvements PCW Pytroll Contributors' Week refactor
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

5 participants