Skip to content

Latest commit

 

History

History
422 lines (336 loc) · 29.2 KB

ch03.adoc

File metadata and controls

422 lines (336 loc) · 29.2 KB

Description of the Data

The attributes described in this section are used to provide a description of the content and the units of measurement for each variable. We continue to support the use of the units and long_name attributes as defined in COARDS. We extend COARDS by adding the optional standard_name attribute which is used to provide unique identifiers for variables. This is important for data exchange since one cannot necessarily identify a particular variable based on the name assigned to it by the institution that provided the data.

The standard_name attribute can be used to identify variables that contain coordinate data. But since it is an optional attribute, applications that implement these standards must continue to be able to identify coordinate types based on the COARDS conventions.

Units

The units attribute is required for all variables that represent dimensional quantities (except for boundary variables defined in [cell-boundaries] and climatology variables defined in [climatological-statistics]). The units attribute is permitted but not required for dimensionless quantities (see Section 3.1.1, "Dimensionless units").

The value of the units attribute is a string that can be recognized by the UDUNITS package [UDUNITS], with the exceptions that are given in Section 3.1.1, "Dimensionless units" and Section 3.1.3, "Scale factors and offsets". Note that case is significant in the units strings. Note also that CF depends on UDUNITS only for the definition of legal units strings. CF does not assume or require that the UDUNITS software will be used for units conversion. In most units conversions, the sole operation on the data is multiplication by a scale factor. Special treatment is required in converting the units of variables that involve temperature (Section 3.1.2, "Temperature units") and the units of time coordinate variables ([time-coordinate]).

The COARDS convention prohibits the unit degrees altogether, but this unit is not forbidden by the CF convention because it may in fact be appropriate for a variable containing, say, solar zenith angle. The unit degrees is also allowed on coordinate variables such as the latitude and longitude coordinates of a transformed grid. In this case the coordinate values are not true latitudes and longitudes, which must always be identified using the more specific forms of degrees as described in [latitude-coordinate] and [longitude-coordinate].

Dimensionless units

A variable with no units attribute is assumed to be dimensionless. However, a units attribute specifying a dimensionless unit may optionally be included. The canonical unit (see also Section 3.3, "Standard Name") for dimensionless quantities that represent fractions, or parts of a whole, is 1. The UDUNITS package defines a few dimensionless units, such as percent, ppm (parts per million, 1e-6), and ppb (parts per billion, 1e-9). As an alternative to the canonical units of 1 or some other unitless number, the units for a dimensionless quantity may be given as a ratio of dimensional units, for instance mg kg-1 for a mass ratio of 1e-6, or microlitre litre-1 for a volume ratio of 1e-6. Data-producers are invited to consider whether this alternative would be more helpful to the users of their data.

The CF convention supports dimensionless units that are UDUNITS compatible, with one exception, concerning the dimensionless units defined by UDUNITS for volume ratios, such as ppmv and ppbv. These units are allowed in the units attribute by CF only if the data variable has no standard_name. These units are prohibited by CF if there is a standard_name, because the standard_name defines whether the quantity is a volume ratio, so the units are needed only to indicate a dimensionless number.

Information describing a dimensionless physical quantity itself (e.g. "area fraction" or "probability") does not belong in the units attribute, but should be given in the long_name or standard_name attributes (see Section 3.2, "Long Name" and Section 3.3, "Standard Name"), in the same way as for physical quantities with dimensional units. As an exception, to maintain backwards compatibility with COARDS, the text strings level, layer, and sigma_level are allowed in the units attribute, in order to indicate dimensionless vertical coordinates. This use of units is not compatible with UDUNITS, and is deprecated by this standard because conventions for more precisely identifying dimensionless vertical coordinates are available (see [dimensionless-vertical-coordinate]).

The UDUNITS syntax that allows scale factors and offsets to be applied to a unit is not supported by this standard, except for case of specifying reference time, see section [time-coordinate]. The application of any scale factors or offsets to data should be indicated by the scale_factor and add_offset attributes. Use of these attributes for data packing, which is their most important application, is discussed in detail in [packed-data].

Temperature units

The units of temperature imply an origin (i.e. zero point) for the associated measurement scale. When the temperature value is the degree of warmth with respect to the origin of the measurement scale, we call it an on-scale temperature. When units of on-scale temperature are converted, the data may require the addition of an offset as well as multiplication by a scale factor, because the physical meaning of a numerical value of zero for an on-scale temperature depends on the unit of measurement. On-scale temperature is unique among quantities in the respect that the origin and the unit of measurement are both defined by the units and therefore cannot be chosen independently. For all other quantities, the origin and the unit of measurement are independent. Converting the unit of measurement alone, without changing the origin, does not change the meaning of zero. For example (using bold to indicate a numerical data value), 0 kilogram is the same mass as 0 pound, and 0 seconds since 1970-1-1 means the same as 0 days since 1970-1-1, but 0 degC is not the same temperature as 0 degF (= -17.8 degC), because these two temperature units implicitly refer to measurement scales which have different origins.

On the other hand, when the temperature value is a temperature difference, which compares two on-scale temperatures with the same origin, the value of that origin is irrelevant as it cancels out when taking the difference. Therefore to convert the units of a temperature difference requires only multiplication by a scale factor, without the addition of an offset.

The units attribute does not distinguish between on-scale temperatures and temperature differences. This ambiguity also affects units of temperature raised to some power e.g. K^2 or multiplied by other units e.g. W m-2 K-1, degF/foot or degC m s-1. A standard_name (Section 3.3, "Standard Name") or standard_name modifier ([standard-name-modifiers]) may clarify the intention, but they are optional. Some statistical operations described by the cell_methods attribute ([cell-methods]; [appendix-cell-methods]) imply that temperature must be interpreted as temperature difference, but this attribute is optional too.

In order to convert the units correctly, it is essential to know whether a temperature is on-scale or a difference. Therefore this standard strongly recommends that any variable whose units involve a temperature unit should also have a units_metadata attribute to make the distinction. This attribute must have one of the following three values: temperature: on_scale, temperature: difference, temperature: unknown. The units_metadata attribute, standard_name modifier ([standard-name-modifiers]) and cell_methods attribute ([appendix-cell-methods]) must be consistent if present. A variable must not have a units_metadata attribute if it has no units attribute or if its units do not involve a temperature unit.

Example 3.1. Use of units_metadata to distinguish temperature quantities
variables:
  float Tonscale;
    Tonscale:long_name="global-mean surface temperature";
    Tonscale:standard_name="surface_temperature";
    Tonscale:units="degC";
    Tonscale:units_metadata="temperature: on_scale";
    Tonscale:cell_methods="area: mean";
  float Tdifference;
    Tdifference:long_name="change in global-mean surface temperature relative to pre-industrial";
    Tdifference:standard_name="surface_temperature";
    Tdifference:units="degC";
    Tdifference:units_metadata="temperature: difference";
    Tdifference:cell_methods="area: mean";

With temperature: unknown, correct conversion of the units cannot be guaranteed. This value of units_metadata indicates that the data-writer does not know whether the temperature is on-scale or a difference. If the units_metadata attribute is not present, the data-reader should assume temperature: unknown. The units_metadata attribute was introduced in CF 1.11. In data written according to versions before 1.11, temperature: unknown should be assumed for all units involving temperature, if it cannot be deduced from other metadata. We note (for guidance only regarding temperature: unknown, not as a CF convention) that the UDUNITS software assumes temperature: on_scale for units strings containing only a unit of temperature, and temperature: difference for units strings in which a unit of temperature is raised to any power other than unity, or multiplied or divided by any other unit.

With temperature: on_scale, correct conversion can be guaranteed only for pure temperature units. If the quantity is an on-scale temperature multiplied by some other quantity, it is not possible to convert the data from the units given to any other units that involve a temperature with a different origin, given only the units. For instance, when temperature is on-scale, a value in kg degree_C m-2 can be converted to a value in kg K m-2 only if we know separately the values in degree_C and kg m-2 of which it is the product.

Scale factors and offsets

UDUNITS recognises the SI prefixes shown in Prefixes for decimal multiples and submultiples of units for decimal multiples and submultiples of units, and allows them to be applied to non-SI units as well. UDUNITS offers a syntax for indicating arbitrary scale factors and offsets to be applied to a unit. (Note that this is different from the scale factors and offsets used for converting between units, as discussed for temperature in Section 3.1.2, "Temperature units".) This UDUNITS syntax for arbitrary transformation of units is not supported by the CF standard, except for the case of specifying reference time ([time-coordinate]). The application of any scale factors or offsets to data should be indicated by the scale_factor and add_offset attributes. Use of these attributes for data packing, which is their most important application, is discussed in detail in [packed-data].

Table 3.1. Prefixes for decimal multiples and submultiples of units
Factor Prefix Abbreviation Factor Prefix Abbreviation

1e1

deca,deka

da

1e-1

deci

d

1e2

hecto

h

1e-2

centi

c

1e3

kilo

k

1e-3

milli

m

1e6

mega

M

1e-6

micro

u

1e9

giga

G

1e-9

nano

n

1e12

tera

T

1e-12

pico

p

1e15

peta

P

1e-15

femto

f

1e18

exa

E

1e-18

atto

a

1e21

zetta

Z

1e-21

zepto

z

1e24

yotta

Y

1e-24

yocto

y

Long Name

The long_name attribute is defined by the NUG to contain a long descriptive name which may, for example, be used for labeling plots. For backwards compatibility with COARDS this attribute is optional. But it is highly recommended that either this or the standard_name attribute defined in the next section be provided for all data variables and variables containing coordinate data, in order to make the file self-describing. If a variable has no long_name attribute then an application may use, as a default, the standard_name if it exists, or the variable name itself.

Standard Name

A fundamental requirement for exchange of scientific data is the ability to describe precisely the physical quantities being represented. To some extent this is the role of the long_name attribute as defined in the NUG. However, usage of long_name is completely ad-hoc. For many applications it is desirable to have a more definitive description of the quantity, which allows users of data from different sources (some of which might be models and others observational) to determine whether quantities are in fact comparable. For this reason each variable may optionally be given a "standard name", whose meaning is defined by this convention. There may be several variables in a dataset with any given standard name, and these may be distinguished by other metadata, such as coordinates ([coordinate-types]) and cell_methods ([cell-methods]).

A standard name is associated with a variable via the attribute standard_name which takes a string value comprised of a standard name optionally followed by one or more blanks and a standard name modifier (a string value from [standard-name-modifiers]).

The set of permissible standard names is contained in the standard name table. The table entry for each standard name contains the following:

standard name

The name used to identify the physical quantity. A standard name contains no whitespace and is case sensitive.

canonical units

Representative units of the physical quantity. Unless it is dimensionless, a variable with a standard_name attribute must have units which are physically equivalent (not necessarily identical) to the canonical units, possibly modified by an operation specified by the standard name modifier (see below and [standard-name-modifiers]) or by the cell_methods attribute (see [cell-methods] and [appendix-cell-methods]) or both.

description

The description is meant to clarify the qualifiers of the fundamental quantities such as which surface a quantity is defined on or what the flux sign conventions are. We don’t attempt to provide precise definitions of fundumental physical quantities (e.g., temperature) which may be found in the literature. The description may define rules on the variable type, attributes and coordinates which must be complied with by any variable carrying that standard name (such as in Example 3.5).

When appropriate, the table entry also contains the corresponding GRIB parameter code(s) (from ECMWF and NCEP) and AMIP identifiers.

The standard name table is located at https://cfconventions.org/Data/cf-standard-names/current/src/cf-standard-name-table.xml, written in compliance with the XML format, as described in [standard-name-table-format]. Knowledge of the XML format is only necessary for application writers who plan to directly access the table. A formatted text version of the table is provided at https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html, and this table may be consulted in order to find the standard name that should be assigned to a variable. Some standard names (e.g. region and area_type) are used to indicate quantities which are permitted to take only certain standard values. This is indicated in the definition of the quantity in the standard name table, accompanied by a list or a link to a list of the permitted values.

Standard names by themselves are not always sufficient to describe a quantity. For example, a variable may contain data to which spatial or temporal operations have been applied. Or the data may represent an uncertainty in the measurement of a quantity. These quantity attributes are expressed as modifiers of the standard name. Modifications due to common statistical operations are expressed via the cell_methods attribute (see [cell-methods] and [appendix-cell-methods]). Other types of quantity modifiers are expressed using the optional modifier part of the standard_name attribute. The permissible values of these modifiers are given in [standard-name-modifiers].

Example 3.2. Use of standard_name
float psl(lat,lon) ;
  psl:long_name = "mean sea level pressure" ;
  psl:units = "hPa" ;
  psl:standard_name = "air_pressure_at_sea_level" ;

The description in the standard name table entry for air_pressure_at_sea_level clarifies that "sea level" refers to the mean sea level, which is close to the geoid in sea areas.

Ancillary Data

When one data variable provides metadata about the individual values of another data variable it may be desirable to express this association by providing a link between the variables. For example, instrument data may have associated measures of uncertainty. The attribute ancillary_variables is used to express these types of relationships. It is a string attribute whose value is a blank separated list of variable names. The nature of the relationship between variables associated via ancillary_variables must be determined by other attributes. The variables listed by the ancillary_variables attribute will often have the standard name of the variable which points to them including a modifier ([standard-name-modifiers]) to indicate the relationship. The dimensions of an ancillary variable must be the same as or a subset of the dimensions of the variable to which it is related, but their order is not restricted, and with one exception: If an ancillary variable of a data variable that has been compressed by gathering ([compression-by-gathering]) does not span the compressed dimension, then its dimensions may be any subset of the data variable’s uncompressed dimensions, i.e. any of the dimensions of the data variable except the compressed dimension, and any of the dimensions listed by the compress attribute of the compressed coordinate variable.

Example 3.3. Ancillary instrument data
  float q(time) ;
    q:standard_name = "specific_humidity" ;
    q:units = "g/g" ;
    q:ancillary_variables = "q_error_limit q_detection_limit" ;
  float q_error_limit(time)
    q_error_limit:standard_name = "specific_humidity standard_error" ;
    q_error_limit:units = "g/g" ;
  float q_detection_limit(time)
    q_detection_limit:standard_name = "specific_humidity detection_minimum" ;
    q_detection_limit:units = "g/g" ;

Alternatively, ancillary_variables may be used as status flags indicating the operational status of an instrument producing the data or as quality flags indicating the results of a quality control test, or some other quantitative quality assessment, performed against the measurements contained in the source variable. In these cases, the flag variable will include a standard name that differs from that of the source variable and indicates the specific type of flag the variable represents.

The standard names table includes many names intended to be used in this situation, both general names meant to be used to flexibly represent any type of status or quality assessment, as well as names for specific quality control tests commonly applied to geophysical phenomena timeseries data. Several examples are listed below:

Sample flag variable standard names:
  • status_flag and quality_flag: general flag categories for instrument status or quality assessment

  • climatology_test_quality_flag, flat_line_test_quality_flag, gap_test_quality_flag, spike_test_quality_flag: a subset of standard name flags used to indicate the results of commonly-used geophysical timeseries data quality control tests (consult the standard names table for a full list of published flags)

  • aggregate_quality_flag: flag indicating an aggregate summary of all quality tests performed on the data variable, both automated and manual (i.e. a master quality flag for a particular variable)

The following example illustrates the use of three of these flags to represent two independent quality control tests and an aggregate flag that combines the results of the two tests.

Example 3.4. Ancillary quality flag data
float salinity(time, z);
        salinity:units = "1";
        salinity:long_name = "Salinity";
        salinity:standard_name = "sea_water_practical_salinity";
        salinity:ancillary_variables = "salinity_qc_generic salinity_qc_flat_line_test salinity_qc_agg";

    int salinity_qc_generic(time, z);
        salinity_qc_generic:long_name = "Salinity Generic QC Process Flag";
        salinity_qc_generic:standard_name = "quality_flag";

    int salinity_qc_flat_line_test(time, z);
        salinity_qc_flat_line_test:long_name = "Salinity Flat Line Test Flag";
        salinity_qc_flat_line_test:standard_name = "flat_line_test_quality_flag";

    int salinity_qc_agg(time, z);
        salinity_qc_agg:long_name = "Salinity Aggregate Flag";
        salinity_qc_agg:standard_name = "aggregate_quality_flag";

Note that the ancillary variables in this example are simplified to exclude flag_values, flag_masks and flag_meanings attributes described in Section 3.5, "Flags" that they would ordinarily require

Flags

The attributes flag_values, flag_masks and flag_meanings are intended to make variables that contain flag values self describing. Status codes and Boolean (binary) condition flags may be expressed with different combinations of flag_values and flag_masks attribute definitions.

The flag_values and flag_meanings attributes describe a status flag consisting of mutually exclusive coded values. The flag_values attribute is the same type as the variable to which it is attached, and contains a list of the possible flag values. The flag_meanings attribute is a string whose value is a blank separated list of descriptive words or phrases, one for each flag value. Each word or phrase should consist of characters from the alphanumeric set and the following five: '_', '-', '.', '+', '@'. If multi-word phrases are used to describe the flag values, then the words within a phrase should be connected with underscores. The following example illustrates the use of flag values to express a speed quality with an enumerated status code.

Example 3.5. A flag variable, using flag_values
  byte current_speed_qc(time, depth, lat, lon) ;
    current_speed_qc:long_name = "Current Speed Quality" ;
    current_speed_qc:standard_name = "status_flag" ;
    current_speed_qc:_FillValue = -128b ;
    current_speed_qc:valid_range = 0b, 2b ;
    current_speed_qc:flag_values = 0b, 1b, 2b ;
    current_speed_qc:flag_meanings = "quality_good sensor_nonfunctional
                                      outside_valid_range" ;

Note that the data variable containing current speed has an ancillary_variables attribute with a value containing current_speed_qc.

The flag_masks and flag_meanings attributes describe a number of independent Boolean conditions using bit field notation by setting unique bits in each flag_masks value. The flag_masks attribute is the same type as the variable to which it is attached, and contains a list of values matching unique bit fields. The flag_meanings attribute is defined as above, one for each flag_masks value. A flagged condition is identified by performing a bitwise AND of the variable value and each flag_masks value; a non-zero result indicates a true condition. Thus, any or all of the flagged conditions may be true, depending on the variable bit settings. The following example illustrates the use of flag_masks to express six sensor status conditions.

Example 3.6. A flag variable, using flag_masks
  byte sensor_status_qc(time, depth, lat, lon) ;
    sensor_status_qc:long_name = "Sensor Status" ;
    sensor_status_qc:standard_name = "status_flag" ;
    sensor_status_qc:_FillValue = 0b ;
    sensor_status_qc:valid_range = 1b, 63b ;
    sensor_status_qc:flag_masks = 1b, 2b, 4b, 8b, 16b, 32b ;
    sensor_status_qc:flag_meanings = "low_battery processor_fault
                                      memory_fault disk_fault
                                      software_fault
                                      maintenance_required" ;

A variable with standard name of region, area_type or any other standard name which requires string-valued values from a defined list may use flags together with flag_values and flag_meanings attributes to record the translation to the string values. The following example illustrates this using integer flag values for a variable with standard name region and flag_values selected from the standardized region names (see section 6.1.1).

Example 3.7. A region variable, using flag_values
int basin(lat, lon);
       standard_name: region;
       flag_values: 1, 2, 3;
       flag_meanings:"atlantic_arctic_ocean indo_pacific_ocean global_ocean";
data:
   basin: 1, 1, 1, 1, 2, ..... ;

The flag_masks, flag_values and flag_meanings attributes, used together, describe a blend of independent Boolean conditions and enumerated status codes. The flag_masks and flag_values attributes are both the same type as the variable to which they are attached. A flagged condition is identified by a bitwise AND of the variable value and each flag_masks value; a result that matches the flag_values value indicates a true condition. Repeated flag_masks define a bit field mask that identifies a number of status conditions with different flag_values. The flag_meanings attribute is defined as above, one for each flag_masks bit field and flag_values definition. Each flag_values and flag_masks value must coincide with a flag_meanings value. The following example illustrates the use of flag_masks and flag_values to express two sensor status conditions and one enumerated status code.

Example 3.8. A flag variable, using flag_masks and flag_values
  byte sensor_status_qc(time, depth, lat, lon) ;
    sensor_status_qc:long_name = "Sensor Status" ;
    sensor_status_qc:standard_name = "status_flag" ;
    sensor_status_qc:_FillValue = 0b ;
    sensor_status_qc:valid_range = 1b, 15b ;
    sensor_status_qc:flag_masks = 1b, 2b, 12b, 12b, 12b ;
    sensor_status_qc:flag_values = 1b, 2b, 4b, 8b, 12b ;
    sensor_status_qc:flag_meanings =
         "low_battery
          hardware_fault
          offline_mode calibration_mode maintenance_mode" ;

In this case, mutually exclusive values are blended with Boolean values to maximize use of the available bits in a flag value. The table below represents the four binary digits (bits) expressed by the sensor_status_qc variable in the previous example.

Bit 0 and Bit 1 are Boolean values indicating a low battery condition and a hardware fault, respectively. The next two bits (Bit 2 and Bit 3) express an enumeration indicating abnormal sensor operating modes. Thus, if Bit 0 is set, the battery is low and if Bit 1 is set, there is a hardware fault - independent of the current sensor operating mode.

Table 3.2. Flag Variable Bits (from Example)
Bit 3 (MSB) Bit 2 Bit 1 Bit 0 (LSB)

H/W Fault

Low Batt

The remaining bits (Bit 2 and Bit 3) are decoded as follows:

Table 3.3. Flag Variable Bit 2 and Bit 3 (from Example)
Bit 3 Bit 2 Mode

0

1

offline_mode

1

0

calibration_mode

1

1

maintenance_mode

The "12b" flag mask is repeated in the sensor_status_qc flag_masks definition to explicitly declare the recommended bit field masks to repeatedly AND with the variable value while searching for matching enumerated values. An application determines if any of the conditions declared in the flag_meanings list are true by simply iterating through each of the flag_masks and AND’ing them with the variable. When a result is equal to the corresponding flag_values element, that condition is true. The repeated flag_masks enable a simple mechanism for clients to detect all possible conditions.