Skip to content

Titanlib documentation

Thomas Nipen edited this page May 11, 2022 · 14 revisions

Titanlib is a library of quality control functions. The library is written in C++, but automatic wrappers for python and R are created using SWIG. The examples shown on this wiki use the python and R interfaces.

Titanlib consists of a collection of quality control functions that can be applied to observations. The listing of these functions can be found in the API reference. This wiki describes the features of the latest released version of titanlib, however the API reference contains lists of functions for all versions of the API.

General concepts

Titanlib functions have different signatures, but in general have the following components:

  • Points object containing station position information
  • observation values
  • input parameters for the test
  • output parameters

As an example, consider the isolation check:

points = Points(lats, lons, elevs)
flags = isolation_check(points, observations, radius)

where lats, lons, elevs are vectors that represent the stations coordinates, observations a vector of observations, radius is a parameter required by the test, and flags is the output flags (vector) of the test.

Station metadata

Titanlib uses a Points object to encapsulate the information of the geographical positions of the observations. The object is created by passing in equally sized 1D-vectors with latitude, longitude, and elevation. Latitude and longitude are in decimal degrees, and use negative values for South and West values. Elevations are in meters. Here is an example:

lats = [57.2, 50.5, 11.2]
lons =  [10.3, 13.2, 29.5]
elevs = [50, 912, 31]
points = Points(lats, lons, elevs)

For gridded observation sets, the 2D-grid needs to be serialised into 1D-vectors of latitude, longitude, and elevations, for which a Points object can be created.

Observation values

Observations are represented by a vector of the same size as the lat/lon/elevation inputs in the points object.

Parameters

Parameters can either be scalars or vectors. If they are scalar, the single value is valid for all observations. If the parameter is a vector, then it can either be of length (in which case the value is broadcast to all observations) or of the same length as the observations. In the latter case the parameters can vary from station to station. This is common if the set of observations come from different data sources with different characteristics.

Return values

Each function returns a vector of flags, where a value of 0 means the observation passed the test and a value of 1 meaning it was flagged. The size of the output is the same as the number of stations in the input.

Dataset interface

In a typical QC pipeline, a sequence of tests are run on the input data. After each test, only the remaining non-flagged observations should be use in the susequent tests. There is a lot of complicated logic involved with this process. Titanlib therefore includes a dataset class that simplifies the running of multiple tests on the same data.

First a dataset object is created. All titanlib functions can be run on this dataset. The functions are similar to the regular API except that the station metadata and observations need not be specified, since they are stored in the dataset object.

The dataset object also keeps track of which observations have been flagged by earlier tests and only uses non-flagged observations in subsequent tests.

The final flags can be retrieved from the dataset object at the end.

Here is an example:

dataset = titanlib.Dataset(lats, lons, elevs, values)
dataset.range_check(min, max)
dataset.isolation_check(radius)
print(dataset.flags)