-
Notifications
You must be signed in to change notification settings - Fork 1
Data model
Basic elements of the DP³ data model are entities (or objects), each entity record (object instance) has a set of attributes. Each attribute has some value (associated to a particular entity), optionally associated with a timestamp (history of previous values can be stored) and confidence value.
There can also be relations between entities. A relation can also have some attributes associated to it.
TODO scheme
TODO make clear difference between entity type (object class) and entity (object instance), etc.
TODO example
There are three main types of attributes supported by DP³, each handled quite differently:
-
Plain attributes
- Common attributes with only one value of some data type.
- No history is stored.
- Confidence can be stored optionally.
-
Observations
- A history of attribute values is stored as tuples containing the value and observation time (or time interval), optionally with confidence estimation.
- A mechanism to derive the most probable value (and its confidence) of the attribute at any given time is provided.
- This attributes may be single or multi value.
- TODO: describe multi-value
-
Timeseries
- Regular or irregular timeseries, i.e. a row of timestamped numerical data.
- Multiple values per time instant are supported (multivariate time-series)
- Types of timeseries:
-
regular
- regularly-sampled timeseries, i.e. time is divided into intervals of a fixed length and exactly one value (or one set of values) is assigned to each interval. For example, a temperature measured every 5 minutes. If no data are received for an interval, it's filled with N/A (nan
). (TODO make it configurable, zero or nan?) -
irregular
- irregularly-sampled timeseries, i.e. a timestamp is explicitly attached to each value (or a set of values) and these timestamps doesn't generally have the same gaps between them. -
irregular_intervals
- same asirregular
, but an interval (two timestamps) is attached to each value instead of a single timestamp. The intervals may overlap.
-
Each attribute is specified by the following set of parameters:
param | for types | data-type | default value | description |
---|---|---|---|---|
id |
all | string (identifier) | (mandatory) | Short string identifying the attribute, it's machine name (must match this regex [a-zA-Z_][a-zA-Z0-9_-]* , most importantly it can't contain a dot). Lower-case only is recommended. TODO: maybe allow some special symbols as prefixes? |
type |
all | string | (mandatory) | Type of attribute. Can be either plain , observations or timeseries . |
name |
all | string | same as id
|
Attribute name for humans |
description |
all | string | "" | Longer description of the attribute, if needed |
color |
all | #xxxxxx |
None | Color to use in GUI (useful mostly for tag values), not used currently |
data_type |
plain/observations | string, one of the types below | (mandatory) | Data type of attribute value, see below for the list of supported data types |
categories |
plain/observations | array of strings | None | List of categories if data_type=category and the set of possible values is known in advance and should be enforced. If not specified, any string can be stored as attr value, but only a small number of unique values are expected (which is important for display/search in GUI, for example) |
confidence |
plain/observations | bool | false | Whether a confidence value should be stored along with data value or not. |
multi_value |
observations | bool | false | Whether multiple values can be set at the same time (can be enabled for all data types expect "tag" and "binary") |
history_params |
observations | object, see below | (mandatory) | History and time aggregation parameters. A subobject with fields described in the table below. |
history_force_graph |
observations | bool | false | By default, if data type of attribute is array, we show it's history on web interface as table. This option can force tag-like graph with comma-joined values of that array as tags. |
editable |
plain/observations | bool | false | Whether value of this attribute is editable via web interface. |
timeseries_type |
timeseries | string | (mandatory) | One of: regular , irregular or irregular_intervals
|
timeseries_params |
timeseries | object, see below | None | History parameters for timeseries. A subobject with fields described in the table below. |
series |
timeseries | object of objects, see below | (mandatory) | Configuration of series of data represented by this timeseries. |
param | type/format | default value | description |
---|---|---|---|
max_age |
<int><s/m/h/d> (e.g. 30s , 12h , 7d ) |
None | How many seconds/minutes/hours/days of history to keep (older data-points/intervals are removed). |
max_items |
int (>0) | None | How many data-points/intervals to store (oldest ones are removed when limit is exceeded). |
expire_time |
<int><s/m/h/d> or inf
|
inf | How long after the end time (t2 ) is the last value considered valid (i.e. is used as "current value"). Zero (0 ) means to strictly follow t1,t2 . Zero can be specified without a unit (s/m/h/d ). |
Note: At least one of max_age
and max_items
SHOULD be defined, otherwise the amount of stored data can grow unbounded.
param | type/format | default value | description |
---|---|---|---|
max_age |
<int><s/m/h/d> (e.g. 30s , 12h , 7d ) |
None | How many seconds/minutes/hours/days of history to keep (older data-points/intervals are removed). |
Note: max_age
SHOULD be defined, otherwise the amount of stored data can grow unbounded.
Key for series
object is id
- short string identifying the series (e.g. bytes
, temperature
, parcels
).
param | type/format | default value | description |
---|---|---|---|
type |
string | (mandatory) | Data type of series. Only int and float are allowed (also time , but that's used internally, see below). |
Time series
(axis) is added implicitly by DP³ and this behaviour is specific to selected timeseries_type
:
- regular:
"time": { "data_type": "time" }
- irregular:
"time": { "data_type": "time" }
- irregular_timestamps:
"time_first": { "data_type": "time" }, "time_last": { "data_type": "time" }
All data are written to DP³ in the form of data-points. A data-point sets a value of a given attribute of given entity. It is a JSON-encoded object with the set of keys defined in the table below. Presence of some keys depends on the primary type of the attribute (plain/observations/timseries).
key |
description | data-type | required? | plain | observations | timeseries |
---|---|---|---|---|---|---|
type |
Entity type | string | mandatory | ✔ | ✔ | ✔ |
id |
Entity identification | string | mandatory | ✔ | ✔ | ✔ |
attr |
Attribute name | string | mandatory | ✔ | ✔ | ✔ |
v |
The value to set, depends on attr. type and data-type, see below | -- | mandatory | ✔ | ✔ | ✔ |
t1 |
Start time of the observation interval | string (rfc 3339 format) | mandatory | -- | ✔ | ✔ |
t2 |
End time of the observation interval | string (rfc 3339 format) | optional, default=t1
|
-- | ✔ | ✔ |
c |
Confidence | float (0.0-1.0) | optional, default=1.0 | ✔ | ✔ | ✔ |
src |
Identification of the information source | string | optional, default="" | ✔ | ✔ | ✔ |
More details depends on the particular type of the attribute ...
TODO
Example:
{
"type": "ip",
"id": "192.168.0.1",
"attr": "note",
"v": "My home router",
"src": "web_gui"
}
TODO (stávající data-pointy)
Example:
{
"type": "ip",
"id": "192.168.0.1",
"attr": "open_ports",
"v": [22, 80, 443],
"t1": "2022-08-01T12:00:00",
"t2": "2022-08-01T12:10:00",
"src": "open_ports_module"
}
Timeseries are sent to DP³ in "chunks", short timeseries that can later be joined together. Each chunk bears value(s) for one or more time instants.
The time-series datapoint looks like the other ones, but its value (v
) is an object (dictionary) whose values are arrays containing
values of sub-series.
All arrays must have the same length.
t1
and t2
of the data-point should specify the observation period covered by this chunk.
All times within v
must lie between t1
and t2
.
In case of irregular
(or irregular_intervals
) timeseries, there are implicit time
(irregular
) or time_first
and time_last
(irregular_intervals
) sub-series to store time information.
In regular time-series, time is not passed explicitly. The first value each of the sub-series is the value of the
interval starting at t1
, the second is of the next interval (t1 + time_step
), etc.
If t2
is given, it must be t1 + n*time_step
, where n
is the number of items in the sub-series (t2
can be
omitted, in which case it's computed automatically).
For regular timeseries, the intervals of individual chunks must not overlap. Any gaps between intervals will be filled by "N/A" values (or zeros, depending on configuration - TODO).
Example of regularly sampled timeseries:
{
...
"t1": "2022-08-01T12:00:00",
"t2": "2022-08-01T12:20:00", // assuming time_step = 5 min
"v": {
"a": [1, 3, 0, 2]
}
}
In irregular time-series, timestamps must always be present.
Example of irregular timeseries:
{
...
"t1": "2022-08-01T12:00:00",
"t2": "2022-08-01T12:05:00",
"v": {
"time": ["2022-08-01T12:00:00", "2022-08-01T12:01:10", "2022-08-01T12:01:15", "2022-08-01T12:03:30"],
"x": [0.5, 0.8, 1.2, 0.7],
"y": [-1, 3, 0, 0]
}
}