Data model

DP³ data model

Basic elements of the DP³ data model are entities (or objects), each entity record (object instance) has a set of attributes. Each attribute has some value (associated to a particular entity), optionally associated with a timestamp (history of previous values can be stored) and confidence value.

There can also be relations between entities. A relation can also have some attributes associated to it.

TODO scheme

TODO make clear difference between entity type (object class) and entity (object instance), etc.

TODO example

Attributes

There are three main types of attributes supported by DP³, each handled quite differently:

Plain attributes
- Common attributes with only one value of some data type.
- No history is stored.
- Confidence can be stored optionally.
Observations
- A history of attribute values is stored as tuples containing the value and observation time (or time interval), optionally with confidence estimation.
- A mechanism to derive the most probable value (and its confidence) of the attribute at any given time is provided.
- This attributes may be single or multi value.
  - TODO: describe multi-value
Timeseries
- Regular or irregular timeseries, i.e. a row of timestamped numerical data.
- Multiple values per time instant are supported (multivariate time-series)
- Types of timeseries:
  - regular - regularly-sampled timeseries, i.e. time is divided into intervals of a fixed length and exactly one value (or one set of values) is assigned to each interval. For example, a temperature measured every 5 minutes. If no data are received for an interval, it's filled with N/A (nan). (TODO make it configurable, zero or nan?)
  - irregular - irregularly-sampled timeseries, i.e. a timestamp is explicitly attached to each value (or a set of values) and these timestamps doesn't generally have the same gaps between them.
  - irregular_intervals - same as irregular, but an interval (two timestamps) is attached to each value instead of a single timestamp. The intervals may overlap.

Configuration

Each attribute is specified by the following set of parameters:

param	for types	data-type	default value	description
`id`	all	string (identifier)	(mandatory)	Short string identifying the attribute, it's machine name (must match this regex `[a-zA-Z_][a-zA-Z0-9_-]*`, most importantly it can't contain a dot). Lower-case only is recommended. TODO: maybe allow some special symbols as prefixes?
`type`	all	string	(mandatory)	Type of attribute. Can be either `plain`, `observations` or `timeseries`.
`name`	all	string	same as `id`	Attribute name for humans
`description`	all	string	""	Longer description of the attribute, if needed
`color`	all	`#xxxxxx`	None	Color to use in GUI (useful mostly for tag values), not used currently
`data_type`	plain/observations	string, one of the types below	(mandatory)	Data type of attribute value, see below for the list of supported data types
`categories`	plain/observations	array of strings	None	List of categories if `data_type=category` and the set of possible values is known in advance and should be enforced. If not specified, any string can be stored as attr value, but only a small number of unique values are expected (which is important for display/search in GUI, for example)
`confidence`	plain/observations	bool	false	Whether a confidence value should be stored along with data value or not.
`multi_value`	observations	bool	false	Whether multiple values can be set at the same time (can be enabled for all data types expect "tag" and "binary")
`history_params`	observations	object, see below	(mandatory)	History and time aggregation parameters. A subobject with fields described in the table below.
`history_force_graph`	observations	bool	false	By default, if data type of attribute is array, we show it's history on web interface as table. This option can force tag-like graph with comma-joined values of that array as tags.
`editable`	plain/observations	bool	false	Whether value of this attribute is editable via web interface.
`timeseries_type`	timeseries	string	(mandatory)	One of: `regular`, `irregular` or `irregular_intervals`
`timeseries_params`	timeseries	object, see below	None	History parameters for timeseries. A subobject with fields described in the table below.
`series`	timeseries	object of objects, see below	(mandatory)	Configuration of series of data represented by this timeseries.

History params

param	type/format	default value	description
`max_age`	`<int><s/m/h/d>` (e.g. `30s`, `12h`, `7d`)	None	How many seconds/minutes/hours/days of history to keep (older data-points/intervals are removed).
`max_items`	int (>0)	None	How many data-points/intervals to store (oldest ones are removed when limit is exceeded).
`expire_time`	`<int><s/m/h/d>` or `inf`	inf	How long after the end time (`t2`) is the last value considered valid (i.e. is used as "current value"). Zero (`0`) means to strictly follow `t1,t2`. Zero can be specified without a unit (`s/m/h/d`).

Note: At least one of max_age and max_items SHOULD be defined, otherwise the amount of stored data can grow unbounded.

Timeseries params

param	type/format	default value	description
`max_age`	`<int><s/m/h/d>` (e.g. `30s`, `12h`, `7d`)	None	How many seconds/minutes/hours/days of history to keep (older data-points/intervals are removed).

Note: max_age SHOULD be defined, otherwise the amount of stored data can grow unbounded.

Series

Key for series object is id - short string identifying the series (e.g. bytes, temperature, parcels).

param	type/format	default value	description
`type`	string	(mandatory)	Data type of series. Only `int` and `float` are allowed (also `time`, but that's used internally, see below).

Time series (axis) is added implicitly by DP³ and this behaviour is specific to selected timeseries_type:

regular: "time": { "data_type": "time" }
irregular: "time": { "data_type": "time" }
irregular_timestamps: "time_first": { "data_type": "time" }, "time_last": { "data_type": "time" }

Data ingestion (datapoint API)

Data-points

All data are written to DP³ in the form of data-points. A data-point sets a value of a given attribute of given entity. It is a JSON-encoded object with the set of keys defined in the table below. Presence of some keys depends on the primary type of the attribute (plain/observations/timseries).

`key`	description	data-type	required?	plain	observations	timeseries
`type`	Entity type	string	mandatory	✔	✔	✔
`id`	Entity identification	string	mandatory	✔	✔	✔
`attr`	Attribute name	string	mandatory	✔	✔	✔
`v`	The value to set, depends on attr. type and data-type, see below	--	mandatory	✔	✔	✔
`t1`	Start time of the observation interval	string (rfc 3339 format)	mandatory	--	✔	✔
`t2`	End time of the observation interval	string (rfc 3339 format)	optional, default=`t1`	--	✔	✔
`c`	Confidence	float (0.0-1.0)	optional, default=1.0	✔	✔	✔
`src`	Identification of the information source	string	optional, default=""	✔	✔	✔

More details depends on the particular type of the attribute ...

Plain

TODO

Example:

{
  "type": "ip",
  "id": "192.168.0.1",
  "attr": "note",
  "v": "My home router",
  "src": "web_gui"
}

Observations

TODO (stávající data-pointy)

Example:

{
  "type": "ip",
  "id": "192.168.0.1",
  "attr": "open_ports",
  "v": [22, 80, 443],
  "t1": "2022-08-01T12:00:00",
  "t2": "2022-08-01T12:10:00",
  "src": "open_ports_module"
}

Timeseries

Timeseries are sent to DP³ in "chunks", short timeseries that can later be joined together. Each chunk bears value(s) for one or more time instants.

The time-series datapoint looks like the other ones, but its value (v) is an object (dictionary) whose values are arrays containing values of sub-series.

All arrays must have the same length.

t1 and t2 of the data-point should specify the observation period covered by this chunk. All times within v must lie between t1 and t2.

In case of irregular (or irregular_intervals) timeseries, there are implicit time (irregular) or time_first and time_last (irregular_intervals) sub-series to store time information.

In regular time-series, time is not passed explicitly. The first value each of the sub-series is the value of the interval starting at t1, the second is of the next interval (t1 + time_step), etc. If t2 is given, it must be t1 + n*time_step, where n is the number of items in the sub-series (t2 can be omitted, in which case it's computed automatically).

For regular timeseries, the intervals of individual chunks must not overlap. Any gaps between intervals will be filled by "N/A" values (or zeros, depending on configuration - TODO).

Example of regularly sampled timeseries:

{
  ...
  "t1": "2022-08-01T12:00:00",
  "t2": "2022-08-01T12:20:00", // assuming time_step = 5 min
  "v": {
    "a": [1, 3, 0, 2]
  }
}

In irregular time-series, timestamps must always be present.

Example of irregular timeseries:

{
  ...
  "t1": "2022-08-01T12:00:00",
  "t2": "2022-08-01T12:05:00",
  "v": {
    "time": ["2022-08-01T12:00:00", "2022-08-01T12:01:10", "2022-08-01T12:01:15", "2022-08-01T12:03:30"],
    "x": [0.5, 0.8, 1.2, 0.7],
    "y": [-1, 3, 0, 0]
  }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly