Skip to content

3. Data pre processing

kajsamp edited this page Mar 18, 2015 · 4 revisions

Most of the datasets can be processed through the esd package such as extracting a sub sample or sub-setting (subset()), aggregating (using aggregate() and aggregate.area()) in time and space, computing anomalies and climatologies (using anomaly()), performing an empirical-orthogonal function (using EOF()) or performing a principal component analysis (using PCA()) depending on the type of the input object. The main functionalities for preprocessing the data are listed below.

Sub-setting

To extract a subset of a data in time and space, the function subset() has been extended here.

> subset(x,it,is)

where it and is are time and space indexes, respectively.

Aggregating

To computes aggregates of the dataset in time and space use the following command lines:

In time, use

> aggregate(x,it,is,...)

where it and is are time and space indexes, respectively. Note, that other derivative or conversion tools based on aggregate have also been implemented such as as.monthly(), as.4seasons(), as.annual() to compute monthly, seasonal, and annual aggregates, respectively.

In space, use

> aggregate.area(x,it,is,...)

where it and is are time and space indexes, respectively.

Anomaly

To compute the anomaly of an object x, use

> anomaly(x,ref,...)

where it and is are time and space indexes, respectively, and ... are additional arguments to be passed into the function according to the type of x object. ref is a reference period to be specified, if not, the whole time period covered by the dataset is taken as default.

Regridding

For any gridding, regridding or grid transformation, i.e. from higher to lower resolution or vice-versa use

> regrid(x,it,is,...)

where it and is are time and space indexes, respectively, and ... are additional arguments to be passed into the function according to the type of x object. Note that is can handle several formats.

# Example
eraint <- t2m.ERAINT(lon=c(-30,50),lat=c(40,80))
merra <- t2m.MERRA(lon=c(-30,50),lat=c(40,80))
merra.new <- regrid(merra,is=eraint,verbose=TRUE)
map(merra)
title("Original field")
x11() ; map(merra.new)
title("Regridded field")

Empirical Orthogonal Functions for fields' objects (EOF())

To compute the empirical-orthogonal functions based on a field object use

> EOF(x,it,is,...)

where it and is are time and space indexes, respectively, and ... are additional arguments to be passed into the function according to the type of x object.

Principal component Analysis for stations' objects (PCA())

> PCA(x,it,is,...)

where it and is are time and space indexes, respectively, and ... are additional arguments to be passed into the function according to the type of x object.

e.g.

Comparing ERA40 with NCEP-NCAR 2m January temperature dataset

Read ERA40 and NCEP-NCAR dataset using retrieve() such as

> ncep <- retrieve('~/data/ncep/air.mon.mean.nc',lon=c(-40,40),lat=c(40,75))
> era40 <- retrieve('~/data/ERA40/era40_t2m.nc',lon=c(-40,40),lat=c(40,75))

Extract a the common sub period 1958-2002

> ncep <- subset(ncep,it=c(1958,2002))
> era40 <- subset(era40,it=c(1958,2002))

Aggregate to seasons and extract winter : djf. Note that, in this example, NCEP contains monthly data whereas ERA40 contains daily data. This can be checked using

> class(era40)
> class(ncep)  

Convert from monthly to 4 seasons and extract 'jan' (set it=1 or it='jan') as

> NCEP <- as.4seasons(ncep)
> DJF <- subset(NCEP,it=1)

Convert from daily to monthly to 4 seasons as

> Era40 <- as.monthly(era40)
> ERA40 <- as.4seasons(Era40)
> djf <- subset(ERA40,it=1)

Compute anomalies, spatial averaging, convert to station objects, and combine the latter as

> djfs <- as.station(spatial.avg.field(anomaly(djf)))
> attr(djfs,"location") <- "ERA40"
> DJFs <- as.station(spatial.avg.field(anomaly(DJF)))
> attr(DJFs,"location") <- "NCEP-NCAR"
> com <- combine.stations(djfs,DJFs)

Finally, plot the result

> plot(com)

![] (https://github.com/metno/esd/blob/master/img/era40vsncep.png)