Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share FCI L1c metadata between segments #2828

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
34 changes: 34 additions & 0 deletions satpy/readers/fci_l1c_nc.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,15 @@
"grid_width": 5568},
}

NONSHAREABLE_VARIABLE_ENDINGS = [
"index",
"time",
"measured/effective_radiance",
"measured/y",
"position_row",
"index_map",
"pixel_quality"]

Comment on lines +161 to +169
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #2686 I have added this information to the YAML file. Essentially, I have changed required_variable_names to be a dict rather than a list. The keys of the dict remain the required variable names, and the values indicate how they can be shared between segments or between repeat cycles:

    required_netcdf_variables: &required-variables
      # key/value; keys are names, value is a list of string on how this may be
      # cached between segments or between repeat cycles or neither
      attr/platform:
        - segment
        - rc
      data/{channel_name}/measured/start_position_row:
        - rc
      data/{channel_name}/measured/end_position_row:
        - rc
      data/{channel_name}/measured/radiance_to_bt_conversion_coefficient_wavenumber:
        - segment
        - rc
      data/{channel_name}/measured/radiance_to_bt_conversion_coefficient_a:
        - segment
        - rc
      data/{channel_name}/measured/radiance_to_bt_conversion_coefficient_b:
        - segment
        - rc
      data/{channel_name}/measured/radiance_to_bt_conversion_constant_c1:
        - segment
        - rc
      data/{channel_name}/measured/radiance_to_bt_conversion_constant_c2:
        - segment
        - rc
      data/{channel_name}/measured/radiance_unit_conversion_coefficient:
        - segment
        - rc
      data/{channel_name}/measured/channel_effective_solar_irradiance:
        - segment
        - rc
      data/{channel_name}/measured/effective_radiance: []
      data/{channel_name}/measured/x:
        - segment
        - rc
      data/{channel_name}/measured/y:
        - rc
      data/{channel_name}/measured/pixel_quality: []
      data/{channel_name}/measured/index_map: []

See https://github.com/pytroll/satpy/pull/2686/files#diff-2170f8edf16088150763d5f3a6cbd69d62600d238c0ba80a41afcb4832fb7b5dR23-R84

We should agree on an approach to avoid a duplication of information. One difference is that I need information not only on what can be shared between segments, but also between repeat cycles. By adding information to the YAML file, I don't need to repeat variable names (or parts of variable names) in different places (YAML file + source code).

I think swath_number is not shareable between segments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I kinda made this on the basis "lets see if this makes things faster" so the approach was the simplest I saw. I think yours is more general and I'll update this PR to match yours when yours is ready.

Could be that swath_number can't be shared, but apparently sharing it didn't break anything when creating imagery 😅


def _get_aux_data_name_from_dsname(dsname):
aux_data_name = [key for key in AUX_DATA.keys() if key in dsname]
Expand Down Expand Up @@ -700,6 +709,31 @@
res = 100 * radiance * np.float32(np.pi) * np.float32(sun_earth_distance) ** np.float32(2) / cesi
return res

def _collect_listed_variables(self, file_handle, listed_variables, filetype_info):
listed_variables = self._recycle_shared_info(listed_variables, filetype_info)
super()._collect_listed_variables(file_handle, listed_variables, filetype_info)
self._store_shared_info(filetype_info)

def _recycle_shared_info(self, listed_variables, filetype_info):
pnuu marked this conversation as resolved.
Show resolved Hide resolved
if "shared_info" in filetype_info:
shared_info = filetype_info["shared_info"]
for key in shared_info:
self.file_content[key] = shared_info[key]
try:
listed_variables.remove(key)
except ValueError:
pass

Check warning on line 725 in satpy/readers/fci_l1c_nc.py

View check run for this annotation

Codecov / codecov/patch

satpy/readers/fci_l1c_nc.py#L724-L725

Added lines #L724 - L725 were not covered by tests
return listed_variables

def _store_shared_info(self, filetype_info):
if "shared_info" not in filetype_info:
shared_info = {}
for key in self.file_content:
if any(key.endswith(k) for k in NONSHAREABLE_VARIABLE_ENDINGS):
continue
Comment on lines +732 to +733
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default you are sharing (in case of unknown variable names). Would it be safer to default to not-sharing, i.e. use a whitelist rather than a blacklist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your dict of sharing stuff would make this obsolete in any case as it explicitly says what can be shared and how.

shared_info[key] = self.file_content[key]
filetype_info["shared_info"] = shared_info
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is filetype_info really an appropriate place to put this cache? I fear this might be confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this with @ameraner at PCW and this was already used in somewhere. LI L2, perhaps? It was also the path of least resistance as it was already in place and the same dict is passed between different filehandlers by the YAML reader.

If there are other backwards compatible ways to pass the info between the file handlers then let me know.



def _ensure_dataarray(arr):
if not isinstance(arr, xr.DataArray):
Expand Down
9 changes: 5 additions & 4 deletions satpy/readers/netcdf_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,9 @@ def __init__(self, filename, filename_info, filetype_info,

listed_variables = filetype_info.get("required_netcdf_variables")
if listed_variables:
self._collect_listed_variables(file_handle, listed_variables)
variable_name_replacements = self.filetype_info.get("variable_name_replacements")
listed_variables = self._get_required_variable_names(listed_variables, variable_name_replacements)
self._collect_listed_variables(file_handle, listed_variables, filetype_info)
else:
self.collect_metadata("", file_handle)
self.collect_dimensions("", file_handle)
Expand Down Expand Up @@ -168,9 +170,8 @@ def _collect_variable_info(self, var_name, var_obj):
self.file_content[var_name + "/dimensions"] = var_obj.dimensions
self._collect_attrs(var_name, var_obj)

def _collect_listed_variables(self, file_handle, listed_variables):
variable_name_replacements = self.filetype_info.get("variable_name_replacements")
for itm in self._get_required_variable_names(listed_variables, variable_name_replacements):
def _collect_listed_variables(self, file_handle, listed_variables, filetype_info):
for itm in listed_variables:
parts = itm.split("/")
grp = file_handle
for p in parts[:-1]:
Expand Down
78 changes: 78 additions & 0 deletions satpy/tests/reader_tests/test_fci_l1c_nc.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python

Check warning on line 1 in satpy/tests/reader_tests/test_fci_l1c_nc.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

❌ New issue: Low Cohesion

This module has at least 4 different responsibilities amongst its 60 functions, threshold = 4. Cohesion is calculated using the LCOM4 metric. Low cohesion means that the module/class has multiple unrelated responsibilities, doing too many things and breaking the Single Responsibility Principle.
# -*- coding: utf-8 -*-
# Copyright (c) 2019 Satpy developers
#
Expand Down Expand Up @@ -921,3 +921,81 @@
np.array([-5568000.227139, -5368000.221262,
5568000.100073, -5568000.227139]),
decimal=2)


def _mock_np2str(val):
return val


def parse_fdhsi_filename_info(filename):
"""Parse filename info from the filename."""
from trollsift import parse

fmt = ("{pflag}_{location_indicator},{data_designator},MTI{spacecraft_id:1d}+{data_source}-1C-RRAD-FDHSI-"
"{coverage}-{subsetting}-{component1}-BODY-{component3}-{purpose}-{format}_{oflag}_{originator}_"
"{processing_time:%Y%m%d%H%M%S}_{facility_or_tool}_{environment}_{start_time:%Y%m%d%H%M%S}_{end_time:%Y%m%d%H%M%S}_"
"{processing_mode}_{special_compression}_{disposition_mode}_{repeat_cycle_in_day:>04d}_{count_in_repeat_cycle:>04d}.nc")
filename_info = parse(fmt, filename)
return filename_info


def test_reusing_shared_segment_info():
"""Test that certain data are reused after the first segment has been handled.

This is the case where something has been already handled and stored to filetype_info["shared_info"] dict.
"""
filename = TEST_FILENAMES["fdhsi"][0]
filename_info = parse_fdhsi_filename_info(filename)
filetype_info = {
"shared_info": {
"attr/platform": "mock-satellite"
},
"required_netcdf_variables": ["attr/platform"]
}

with mock.patch("satpy.readers.netcdf_utils.NetCDF4FsspecFileHandler._get_file_handle"):
with mock.patch("satpy.readers.netcdf_utils.np2str", _mock_np2str):
fh_ = FCIL1cNCFileHandler(filename, filename_info, filetype_info)

assert fh_.file_content["attr/platform"] == filetype_info["shared_info"]["attr/platform"]


def test_store_shared_segment_info():
"""Test that certain data are collected to filetype_info["shared_info"] from the first segment."""
filename = TEST_FILENAMES["fdhsi"][0]
filename_info = parse_fdhsi_filename_info(filename)
filetype_info = {"required_netcdf_variables": ["attr/platform"]}

with mock.patch("satpy.readers.netcdf_utils.NetCDF4FsspecFileHandler._get_file_handle") as get_file_handle:
get_file_handle.return_value.platform = "mock-satellite"
with mock.patch("satpy.readers.netcdf_utils.np2str", _mock_np2str):
_ = FCIL1cNCFileHandler(filename, filename_info, filetype_info)

assert filetype_info["shared_info"]["attr/platform"] == "mock-satellite"


def test_store_shared_segment_info_ignore_nonshareable():
"""Test that nonshareable segment info are not collected for sharing."""
variables = [
"data/ir105/measured/effective_radiance",
"data/ir105/measured/y",
"data/ir105/measured/end_position_row",
"data/ir105/measured/start_position_row",
"data/ir105/measured/index_map",
"data/ir105/measured/pixel_quality",
"index",
"time",
]

filename = TEST_FILENAMES["fdhsi"][0]
filename_info = parse_fdhsi_filename_info(filename)
filetype_info = {"required_netcdf_variables": variables}

with mock.patch("satpy.readers.netcdf_utils.NetCDF4FsspecFileHandler._get_file_handle") as get_file_handle:
for itm in variables:
setattr(get_file_handle.return_value, itm, "mock-value")
with mock.patch("satpy.readers.netcdf_utils.np2str", _mock_np2str):
_ = FCIL1cNCFileHandler(filename, filename_info, filetype_info)

for itm in variables:
assert itm not in filetype_info["shared_info"]
Loading