Resample by year using mean #4518

BorjaEst · 2022-01-21T10:34:45Z

BorjaEst
Jan 21, 2022

Is there a similar function to cf_xarray resample using mean?

dataset.cf.resample(T="Y").mean()

Answered by pp-mo

Jan 21, 2022

I don't have much detailed knowledge of xarray (I assume this is 'really' an xarray operation?)
So I'm not entirely sure what this exact operation involves. Certainly the cf-xarray docs don't help much !
But ...
the exisiting "Iris way" of performing aggregated statistics (equivalent to a 'groupby') is to first create an auxiliary 'categorised coordinate' and then use Cube.aggregated_by
So, there is already a User Guide section on it, here : https://scitools-iris.readthedocs.io/en/latest/userguide/cube_statistics.html#partially-reducing-data-dimensions
Can you confirm whether that is what you were looking for ?

As a general development point, I think we should probably get around to provi…

View full answer

pp-mo · 2022-01-21T11:49:09Z

pp-mo
Jan 21, 2022
Maintainer

I don't have much detailed knowledge of xarray (I assume this is 'really' an xarray operation?)
So I'm not entirely sure what this exact operation involves. Certainly the cf-xarray docs don't help much !
But ...
the exisiting "Iris way" of performing aggregated statistics (equivalent to a 'groupby') is to first create an auxiliary 'categorised coordinate' and then use Cube.aggregated_by
So, there is already a User Guide section on it, here : https://scitools-iris.readthedocs.io/en/latest/userguide/cube_statistics.html#partially-reducing-data-dimensions
Can you confirm whether that is what you were looking for ?

As a general development point, I think we should probably get around to providing a 'groupby' ?
-- as I suspect that is a much more common terminology for this, and a term which people are likely to search on.

BTW it this is "just a question", then it might be more friendly to put this type of query on StackOverflow (though I must admit there isn't much specifically about Iris there).
But I guess a Discussion raises it as a general Development issue -- an idea to discuss, about a thing we are missing or need to improve.

0 replies

BorjaEst · 2022-01-21T14:14:50Z

BorjaEst
Jan 21, 2022
Author

You are probably right, I created the question: Group_by in iris using mean calculation.

To reply your post, (thanks for the help) that indeed pointed me to the direction.
I found that it is possible to call iris.coord_categorisation.add_year:

iris.coord_categorisation.add_year(cube, 'time', name='year')
result = cube.aggregated_by(['year'], iris.analysis.MEAN)

That produced the following output:

print(result)
atmosphere_mole_content_of_ozone / (mol m-2) (time: 1; latitude: 18; longitude: 36)
     Dimension coordinates:
          time                                    x            -              -
          latitude                                -            x              -
          longitude                               -            -              x
     Auxiliary coordinates:
          year                                    x            -              -
     Attributes:
          ...
     Cell methods:
          mean: area
          maximum: time (interval: 1 day)
          mean: year

Although, if interesting, I was expecting something more like:

atmosphere_mole_content_of_ozone / (mol m-2) (time: 1; latitude: 18; longitude: 36)
     Dimension coordinates:
          time                                    x            -              -
          latitude                                -            x              -
          longitude                               -            -              x
     Attributes:
          ...
     Cell methods:
          mean: area
          maximum: time (interval: 1 day)
          mean: time (interval: 1 year)

Although probably both are CF valid.

1 reply

BorjaEst Jan 21, 2022
Author

Maybe produced output is not valid?
I run cfcheker and it returns:

ERROR: (7.3): Invalid 'name' in cell_methods attribute: year

Probably because "year" is an Auxiliary coordinate but not a Dimension coordinate.

pp-mo · 2022-01-26T18:29:29Z

pp-mo
Jan 26, 2022
Maintainer

Maybe produced output is not valid?

Ok that is worrying.
Thanks for raising this @BorjaEst
I will take a look ...

0 replies

pp-mo · 2022-01-26T18:46:52Z

pp-mo
Jan 26, 2022
Maintainer

@BorjaEst suggestion: "time (interval: 1 year)"

That I think would not in fact be correct : the "interval" should record the original spacing of the observations (which is lost when forming a statistical measure).
https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#recording-spacing-original-data

So e.g. "mean: time (interval: 1 day) " would mean that it is an average of daily values.

As to Iris, I suspect that probably the result of the statistical aggregation never adds a stated interval, because the aggregation operation itself is a generalised thing -- so the original data sampling interval would not always be regular, so we probably don't specially check for that. Certainly if you had monthly or yearly values, for instance, that wouldn't be straightforward because it is only 'regular' with respect to a calendar (not in the actual numberic intervals).

Actually, looking at the code, Iris has only a very generic and minimal approach to adding cell-methods on aggregation.

0 replies

BorjaEst · 2022-03-09T08:49:32Z

BorjaEst
Mar 9, 2022
Author

@pp-mo thanks for the comments. I am sorry to write again this late, I am new to this CF conventions and I find it quite complicated, however I earn experience each day.

I have read your point and you are correct, the interval is for the original spacing.
However, after reading again the conventions, I think there is a convention for this issue which is call Climatological Statistics.

Here is the example metioned:

This example shows the metadata for the average seasonal-minimum temperature for the four standard climatological seasons MAM JJA SON DJF, made from data for March 1960 to February 1991.

dimensions:
  time=4;
  nv=2;
variables:
  float temperature(time,lat,lon);
    temperature:long_name="surface air temperature";
    temperature:cell_methods="time: minimum within years time: mean over years";
    temperature:units="K";
  double time(time);
    time:climatology="climatology_bounds";
    time:units="days since 1960-1-1";
  double climatology_bounds(time,nv);
data:  // time coordinates translated to date/time format
  time="1960-4-16", "1960-7-16", "1960-10-16", "1961-1-16" ;
  climatology_bounds="1960-3-1",  "1990-6-1",
                     "1960-6-1",  "1990-9-1",
                     "1960-9-1",  "1990-12-1",
                     "1960-12-1", "1991-3-1" ;

So it applies a statistical calculation over the time coordinate. In terms of Iris, the expected result should be the following:

atmosphere_mole_content_of_ozone / (mol m-2) (time: 1; latitude: 18; longitude: 36)
     Dimension coordinates:
          time                                    x            -              -
          latitude                                -            x              -
          longitude                               -            -              x
     Attributes:
          ...
     Cell methods:
          mean: area
          maximum within days: time
          mean over years: time

0 replies

pp-mo · 2022-09-28T13:28:16Z

pp-mo
Sep 28, 2022
Maintainer

Archiving "answered" Q+As

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resample by year using mean #4518

{{title}}

Replies: 6 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Resample by year using mean #4518

BorjaEst Jan 21, 2022

Replies: 6 comments · 1 reply

pp-mo Jan 21, 2022 Maintainer

BorjaEst Jan 21, 2022 Author

BorjaEst Jan 21, 2022 Author

pp-mo Jan 26, 2022 Maintainer

pp-mo Jan 26, 2022 Maintainer

BorjaEst Mar 9, 2022 Author

pp-mo Sep 28, 2022 Maintainer

BorjaEst
Jan 21, 2022

Replies: 6 comments 1 reply

pp-mo
Jan 21, 2022
Maintainer

BorjaEst
Jan 21, 2022
Author

BorjaEst Jan 21, 2022
Author

pp-mo
Jan 26, 2022
Maintainer

pp-mo
Jan 26, 2022
Maintainer

BorjaEst
Mar 9, 2022
Author

pp-mo
Sep 28, 2022
Maintainer