The Universal Device Management Interface (UDMI) provides a high-level specification for the management and operation of physical IoT systems. This data is typically exchanged with a cloud entity that can maintain a "digital twin" or "shadow device" in the cloud. Nominally meant for use with Googe's Cloud IoT Core, as a schema it can be applied to any set of data or hosting setup. Additionally, the schema has provisions for basic telemetry ingestion, such as datapoint streaming from an IoT device.
By deisgn, this schema is intended to be:
- Universal: Apply to all subsystems in a building, not a singular vertical solution.
- Device: Operations on an IoT device, a managed entity in physical space.
- Management: Focus on device management, rather than command & control.
- Interface: Define an interface specification, rather than a client-library or RPC mechanism.
See the associated UDMI Tech Stack for details about transport mechanism outside of the core schema definition. For questions and discussion pertaining to this topic, please join/monitor the [email protected] email list
The essence behind UDMI is an automated mechanism for IoT system management. Many current systems require direct-to-device access, such as through a web browser or telnet/ssh session. These techniques do not scale to robust managed ecosystems since they rely too heavily on manual operation (aren't automated), and increase the security exposure of the system (since they need to expose these management ports).
UDMI is intended to support a few primary use-cases:
- Telemetry Ingestion: Ingest device data points in a standardized format.
- On-Prem Actuation: Ability to effect on-prem device behavior.
- Device Testability: e.g. Trigger a fake alarm to test reporting mechanims.
- Commissioning Tools: Streamline complete system setup and install.
- Operational Diagnostics: Make it easy for system operators to diagnoe basic faults.
- Status and Logging: Report system operational metrics to hosting infrastructure.
- Key Rotation: Manage encryption keys and certificates in accordance with best practice.
- Credential Exchange: Bootstrap higher-layer authentication to restricted resources.
- Firmware Updates: Initiate, monitor, and track firmware updates across an entire fleet of devices.
- On-Prem Discovery: Enumerate and on-prem devices to aid setup or anomaly detection.
- Gateway Proxy: Proxy data/connection for non-UDMI devices, allowing adaptation to legacy systems.
All these situations are conceptually about management of devices, which is conceptually different than the control or operation. These concepts are similar to the management, control, and data planes of Software Defined Networks. Once operational, the system should be able to operate completely autonomoulsy from the management capabilities, which are only required to diagnose or tweak system behavior.
In order to provide for management automation, UDMI strives for the following principles:
- Secure and Authenticated: Requires a propertly secure and authenticated channel from the device to managing infrastructure.
- Declarative Specification: The schema describes the desired state of the system, relying on the underlying mechanisms to match actual state with desired state. This is conceptually similar to Kubernetes-style configuraiton files.
- Minimal Elegant Design: Initially underspecified, with an eye towards making it easy to add new capabilities in the future. It is easier to add something than it is to remove it.
- Reduced Choices: In the long run, choice leads to more work to implement, and more ambiguity. Strive towards having only one way of doing each thing.
- Structure and Clarity: This is not a "compressed" format and not designed for very large structures or high-bandwidth streams.
- Property Names:Uses snake_case convention for property names.
- Resource Names: Overall structure (when flattened to paths), follows the API Resource Names guidline.
Schemas are broken down into several top-level sub-schemas that are invoked for different aspects of device management:
- Device state (example), sent from device to cloud, defined by state.json. There is one current state per device, which is considered sticky until a new state message is sent. is comprised of several subsections (e.g. system or pointset) that describe the relevant sub-state components.
- Device config (example), passed from cloud to device, defined by config.json. There is one active config per device, which is considered current until a new config is recevied.
- Message envelope (example) for server-side attributes of received messages, defined by envelope.json. This is automatically generated by the transport layer and is then available for server-side processing.
- Device metadata (example) stored in the cloud about a device, but not directly available to or on the device, defined by metadata.json. This is essentially a specification about how the device should be configured or expectations about what the device should be doing.
- Device properties (example) which is used for configuring a system (e.g., when registering a device with the cloud), defined by properties.json.
- Streaming device telemetry, which can take on several different forms, depending on the intended
use.
- Streaming pointset (example) from device to cloud, defined by pointset.json. pointset is used for delivering a set of data point telemetry.
- Monitoring logentry messages (example) from devices, defined by logentry.json.
- Local discovery messages (example) that show the results of local scans or probes to determine which devices are on the local network, defined by discovery.json.
A device client implementation will typically only be aware of the state, config, and one or more telemetry messages (e.g. pointset), while all others are meant for the supporting infrastructure. Additionall, the state and config parts are comprised of several distinct subsections (e.g. system, pointset, or gateway) that relate to various bits of functionality.
To verify correct operation of a real system, follow the instructions outlined in the validator subsystem docs, which provides for a suitable communication channel. Additional sample messages are easy to include in the regression suite if there are new cases to test.
- See notes below about 'State status' fields.
- There is an implicit minimum update interval of one second applied to state updates, and it is considered an error to update device state more often than that.
last_config
should be the timestamp from thetimestamp
field of the last successfully parsedconfig
message.
sample_rate_sec
: Sampling rate for the system, which should proactively send an update (e.g. pointset, logentry, discovery message) at this interval.sample_limit_sec
: Minimum time between sample updates. Updates that happen faster than this time (e.g. due to cov events) should be coalessed so that only the most recent update is sent.force_value
: Override value for a point to be used during diagnostics and diagnosis. Should override any operational values, but not override alarm conditions.min_loglevel
: Indicates the minimum loglevel for reporting log messages below which log entries should not be sent. See note below for a description of the level value.
- See notes below about 'logentry entries' fields.
The State and Logentry messages both have status
and entries
sub-fields, respectivly, that
follow the same structure.
- State
status
entries represent 'sticky' conditions that persist until the situation is cleared, e.g. "device disconnected". - A
statuses
entry is a map of 'sticky' conditions that are keyed on a value that can be used to manage updates by a particular (device dependent) subsystem. - Logentry
entries
fields are transitory event that happen, e.g. "connection failed". - The log
entries
field is an array that can be used to collaesce multiple log updates into one message. - Config parse errors should be represented as a system-level device state
status
entry. - The
message
field sould be a one-line representation of the triggering condition. - The
detail
field can be multi-line and include more detail, e.g. a complete program stack-trace. - The
category
field is a device-specific representation of which sub-system the message comes from. In a Java environment, for example, it would be the fully qualified path name of the Class triggering the message. - A
status
orstatuses
timestamp
field should be the timestamp the condition was triggered, or most recently updated. It might be different than the top-level messagetimestamp
if the condition is not checked often or is sticky until it's cleared. - A logentry
entries
timestamp
field is the time that the event occured, which is potentially different than the top-leveltimestamp
field (which is when the log was sent). - The status
level
should conform to the numerical Stackdriver LogEntry levels. TheDEFAULT
value of 0 is not allowed (lowest value is 100, maximum 800).