From 7c8beea8c2e1fab1a69a61e289eb1ebc918ae35e Mon Sep 17 00:00:00 2001 From: Ze Gan Date: Tue, 27 Aug 2024 06:22:24 +0000 Subject: [PATCH 01/13] init doc Signed-off-by: Ze Gan --- .../netlink_dma_channel.drawio.svg | 4 + doc/stream-telemetry/stream-telemetry-hld.md | 930 ++++++++++++++++++ 2 files changed, 934 insertions(+) create mode 100644 doc/stream-telemetry/netlink_dma_channel.drawio.svg create mode 100644 doc/stream-telemetry/stream-telemetry-hld.md diff --git a/doc/stream-telemetry/netlink_dma_channel.drawio.svg b/doc/stream-telemetry/netlink_dma_channel.drawio.svg new file mode 100644 index 0000000000..4bc2bdf783 --- /dev/null +++ b/doc/stream-telemetry/netlink_dma_channel.drawio.svg @@ -0,0 +1,4 @@ + + + +
genetlink
family: sonic_stel
group: stats
genetlink...
Netlink Module
Netlink Module
Counter Syncd
Counter Syncd
DMA Engine
DMA Engine
ASIC
ASIC
IPFIX header
observation time milliseconds
port 2 stats 1
port 8 stats 2
queue 1 stats 2
queue 5 stats 2
observation time milliseconds
port 2 stats 1
port 8 stats 2
queue 1 stats 2
queue 5 stats 2
IPFIX headerobservation time...
observation time milliseconds
observation time milliseconds
port 2 stats 1
port 2 stats 1
port 8 stats 1
port 8 stats 1
queue 1 stats 2
queue 1 stats 2
queue 5 stats 2
queue 5 stats 2
poll interval
poll interval
chunk size
chunk size
Registered IPFIX template
ID 256
ID 256
IPFIX Template 
IPFIX Template 
ID 257
ID 257
IPFIX Template 
IPFIX Template 
ID 257
ID 257
IPFIX Template 
IPFIX Template 
IPFIX Message
IPFIX Message
IPFIX parser
IPFIX parser

Ring buffer

size = cache size

Ring buffer...
Drop
(if no template can be decided)
Drop...
GNMi message
GNMi message
Convert to GNMi message
Convert to GNMi message
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/stream-telemetry/stream-telemetry-hld.md b/doc/stream-telemetry/stream-telemetry-hld.md new file mode 100644 index 0000000000..eb184cc7da --- /dev/null +++ b/doc/stream-telemetry/stream-telemetry-hld.md @@ -0,0 +1,930 @@ +# Stream telemetry high level design # +- [Stream telemetry high level design](#stream-telemetry-high-level-design) + - [Table of Content](#table-of-content) + - [Revision](#revision) + - [Scope](#scope) + - [Definitions/Abbreviations](#definitionsabbreviations) + - [Overview](#overview) + - [Requirements](#requirements) + - [Architecture Design](#architecture-design) + - [High-Level Design](#high-level-design) + - [Modules](#modules) + - [Netlink Module](#netlink-module) + - [Data format](#data-format) + - [IPFIX header](#ipfix-header) + - [IPFIX template](#ipfix-template) + - [IPFIX data](#ipfix-data) + - [Bandwidth Estimation](#bandwidth-estimation) + - [Config DB](#config-db) + - [STREAM\_TELEMETRY\_PROFILE](#stream_telemetry_profile) + - [STREAM\_TELEMETRY\_GROUP](#stream_telemetry_group) + - [StateDb](#statedb) + - [STREAM\_TELEMETRY\_SESSION](#stream_telemetry_session) + - [Work Flow](#work-flow) + - [SAI API](#sai-api) + - [Create HOSTIF object](#create-hostif-object) + - [Creating TAM transport object](#creating-tam-transport-object) + - [Creating TAM collector object](#creating-tam-collector-object) + - [Creating TAM report object](#creating-tam-report-object) + - [Creating TAM telemetry type object](#creating-tam-telemetry-type-object) + - [Creating TAM telemetry object](#creating-tam-telemetry-object) + - [Create TAM counter subscription objects](#create-tam-counter-subscription-objects) + - [Create TAM object](#create-tam-object) + - [Query IPFIX template](#query-ipfix-template) + - [Enable/Disable telemetry stream](#enabledisable-telemetry-stream) + - [Configuration and management](#configuration-and-management) + - [Manifest (if the feature is an Application Extension)](#manifest-if-the-feature-is-an-application-extension) + - [CLI/YANG model Enhancements](#cliyang-model-enhancements) + - [Config DB Enhancements](#config-db-enhancements) + - [Warmboot and Fastboot Design Impact](#warmboot-and-fastboot-design-impact) + - [Memory Consumption](#memory-consumption) + - [Restrictions/Limitations](#restrictionslimitations) + - [Testing Requirements/Design](#testing-requirementsdesign) + - [Unit Test cases](#unit-test-cases) + - [System Test cases](#system-test-cases) + - [Open/Action items - if any](#openaction-items---if-any) + + +## Table of Content + +### Revision + +### Scope + +This section describes the scope of this high-level design document in SONiC. + +### Definitions/Abbreviations + +This section covers the abbreviation if any, used in this high-level design document and its definitions. + +### Overview + +The purpose of this section is to give an overview of high-level design document and its architecture implementation in SONiC. + +### Requirements + +This section list out all the requirements for the HLD coverage and exemptions (not supported) if any for this design. + +### Architecture Design + +This section covers the changes that are required in the SONiC architecture. In general, it is expected that the current architecture is not changed. +This section should explain how the new feature/enhancement (module/sub-module) fits in the existing architecture. + +If this feature is a SONiC Application Extension mention which changes (if any) needed in the Application Extension infrastructure to support new feature. + +``` mermaid + +--- +title: Stream telemetry architecture +--- +flowchart BT + subgraph Redis + config_db[(CONFIG_DB)] + state_db[(STATE_DB)] + end + + subgraph SONiC service + subgraph GNMI container + gnmi(GNMI server) + counter_syncd(Counter Syncd) + end + subgraph SWSS container + subgraph Orchagent + st_orch(Stream Telemetry Orch) + end + end + subgraph SYNCD container + syncd(Syncd) + end + end + + subgraph Linux Kernel + dma_engine(DMA Engine) + netlink_module(Netlink Module) + end + + asic[\ASIC\] + + config_db --STREAM_TELEMETRY_PROFILE + STREAM_TELEMETRY_GROUP--> st_orch + st_orch --SAI_OBJECT_TYPE_TAM_REPORT + SAI_OBJECT_TYPE_TAM_TEL_TYPE + SAI_OBJECT_TYPE_TAM_TRANSPORT + SAI_OBJECT_TYPE_TAM_COLLECTOR + SAI_OBJECT_TYPE_TAM_EVENT + SAI_OBJECT_TYPE_TAM--> syncd + syncd --IPFIX template--> st_orch + syncd --TAM configuration--> dma_engine + syncd --TAM configuration--> netlink_module + st_orch --STREAM_TELEMETRY_SESSION--> state_db + state_db --STREAM_TELEMETRY_SESSION--> counter_syncd + asic --counters--> dma_engine + dma_engine --IPFIX record--> netlink_module + netlink_module --IPFIX record--> counter_syncd + counter_syncd --telemetry message--> gnmi +``` + +### High-Level Design ### + +#### Modules #### + +##### Netlink Module ##### + +generic_netlink + +netlink configuration constants in /etc/sonic/constants.yml + +``` yaml +constants: + stream_telemetry: + genl_family: "sonic_stel" + genl_multicast_group: "stats" +} + +``` + +Ring buffer model + +Pin CPU? + +![netlink_dma_channel](netlink_dma_channel.drawio.svg) + +#### Data format #### + +We would like to use IPFIX as the report format, and the bytes order of all numbers in the IPFIX message is network-order(Big-endian order). + +The reference of IPFIX: + +- [Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information](https://datatracker.ietf.org/doc/html/rfc7011) +- [IP Flow Information Export (IPFIX) Entities](https://www.iana.org/assignments/ipfix/ipfix.xhtml) + +##### IPFIX header ##### + +``` mermaid + +--- +title: stream message IPFIX header +--- +packet-beta +0-15: "Version = 0x000a" +16-31: "Message Length = (16 + payload) bytes" +32-63: "Export Timestamp: Second" +64-95: "Sequence Number = 0, start from 0 and incremental sequence counter modulo 2^32" +96-127: "Observation Domain ID = 0, always 0" + +``` + +##### IPFIX template ##### + +``` mermaid + +--- +title: stream message of IPFIX template +--- +packet-beta +0-15: "Set ID = 2" +16-31: "Set Length = (12 + Number of Element * 8) bytes" +32-47: "Template ID = > 256 configured" +48-63: "Number of Fields = 1 + Number of stats" +64-79: "Element ID=observationTimeMilliseconds (324)" +80-95: "Field length = 8 bytes" +96-96: "1" +97-111: "Element ID = SAI STATS ID 1" +112-127: "Field Length = 0 or 8 bytes" +128-159: "Enterprise Number = SAI TYPE ID 1" +160-191: "..." +192-192: "1" +193-207: "Element ID = SAI STATS ID N" +208-223: "Field Length = 0 or 8 bytes" +224-255: "Enterprise Number = SAI TYPE ID N" + +``` + +- To some high frequency counters, the unit of timestamp of native IPFIX is second which cannot meet our requirement. So, we introduce an extra element, `observationTimeMilliseconds`, for each record. +- Normally, to use the [SAI_OBJECT_TYPE](https://github.com/opencomputeproject/SAI/blob/master/inc/saitypes.h) as the enterprise number of IPFIX directly and derive the element ID of IPFIX from SAI stats ID via AND `0x8000`. +For example, to the `SAI_QUEUE_STAT_WRED_ECN_MARKED_PACKETS=0x00000022` of `SAI_OBJECT_TYPE_QUEUE = 21`, the enterprise number would be: `0x000000015`, and the element ID would be `0x8022`. +- We don't support the stats ID exceeds to `65535` currently, because the IPFIX specification itself limits the element ID to two bytes. If a larger stats IDs are needed in the future, we will need to extend IPFIX to a private encoding with 8 bytes element IDs. +- In order to the a flexibility and an efficiency, this system will support partial telemetry. It means this system will only report stats from selected ports/queue or so on. For example, if we only configure to report stats on Ethernet2 and Ethernet8, the report data will only include stats from these two ports even though there are 256 ports on this switch. To achieve this, the selection information needs to be encoded into the IPFIX template via field length. **The template should includes ALL objects for the stats. Meanwhile, if an object is selected, the length of corresponding field is 8, vice versa it's 0.** +For example, if the switch has 8 ports, but we only want to get the `SAI_PORT_STAT_IF_IN_OCTETS = 0` on Ethernet2 and Ethernet5. The template will look like: + +``` + +0...31 +|Set ID = 2|Set Length = 76<12+8*8>| +|Template ID = 256|Number of Fields = 9| +|Type = 324|Field Length = 16| +|1|Element ID = 0|Field Length = 0| +|Enterprise Number = 0| +|1|Element ID = 0|Field Length = 0| +|Enterprise Number = 0| +|1|Element ID = 0|Field Length = 8| +|Enterprise Number = 0| +|1|Element ID = 0|Field Length = 0| +|Enterprise Number = 0| +|1|Element ID = 0|Field Length = 0| +|Enterprise Number = 0| +|1|Element ID = 0|Field Length = 8| +|Enterprise Number = 0| +|1|Element ID = 0|Field Length = 0| +|Enterprise Number = 0 +|1|Element ID = 0|Field Length = 0| +|Enterprise Number = 0| + +``` + +##### IPFIX data ##### + +A message of IPFIX data contains two level hierarchies, namely chunks and snapshots. A chunk contains a numbers of snapshots. And a snapshot is a binary block that can be interpreted by the IPFIX template mentioned above. + +The binary block of snapshot is as follows: + +``` mermaid + +--- +title: stream message of IPFIX data +--- +packet-beta +0-15: "Set ID = Same as template ID" +16-31: "Set Length = (8 + Number of valid stats * 8) bytes" +32-63: "Rcord 1: observationTimeMilliseconds" +64-95: "Record 2: Stats 1" +96-127: "..." +128-159: "Record N + 1: Stats N" + +``` + +- The chunk size can be configured via SAI. +- The shot structure is derived from the IPFIX template, which is also derived from the stats we want to record. + +The following is an IPFIX message example of the same stats record as the IPFIX template example, and the chunk size is 3 + +``` + +0...31 +|Version = 0x000a|Message Length = 64| +|Export Timestamp = 2024-08-29 20:30:60| +|Sequence Number = 1| +|Observation Domain ID = 0| +|Set ID = 256|Set Length = 24| +|Record 1: observationTimeMilliseconds = 100| +|Record 2: SAI_PORT_STAT_IF_IN_OCTETS = 10 | +|Record 3: SAI_PORT_STAT_IF_IN_OCTETS = 0 | +|Set ID = 256|Set Length = 24| +|Record 1: observationTimeMilliseconds = 200| +|Record 2: SAI_PORT_STAT_IF_IN_OCTETS = 10 | +|Record 3: SAI_PORT_STAT_IF_IN_OCTETS = 5 | +|Set ID = 256|Set Length = 24| +|Record 1: observationTimeMilliseconds = 300| +|Record 2: SAI_PORT_STAT_IF_IN_OCTETS = 30 | +|Record 3: SAI_PORT_STAT_IF_IN_OCTETS = 20 | + +``` + +##### Bandwidth Estimation ##### + +We estimate the bandwidth based only on the effective data size, not the actual data size. Because the extra information of a message is the IPFIX header(16 bytes), data prefix(4 bytes) and observation time milliseconds(4 bytes) which is negligible. For example, if we want to collect 30 stats on 256 ports, and the chunk size is 100. The percentage of effective data = `(4 * 30 * 256 * 100) / (16
+ 4 * 100 + 4 * 100 + 4 * 30 * 256 * 100) = 99.9%`. + +The following table is telemetry bandwidth of one cluster + +| # of stats per port | # of ports per switch | # of switch | frequency (ms) | Total BW per switch(Mbps) | Total BW(Mbps) | +| ------------------- | --------------------- | ----------- | -------------- | ------------------------- | -------------- | +| 30 | 512 | 10000 | 1 | 122.88 | 1,228,800 | + +- *Total BW per switch = <# of stats per port> * <# of ports per switch> * * 8 / 1,000,000* +- *Total BM = * <# of switch>* + +#### Config DB #### + +Any configuration changes in the config DB will interrupt existing session and restart new one. + +##### STREAM_TELEMETRY_PROFILE ##### + +``` +STREAM_TELEMETRY_PROFILE:{{profile_name}} + "stream_status": {{enable/disable}} + "poll_interval": {{uint32}} + "profile_id": {{uint16}} + "chunk_size": {{uint32}} (OPTIONAL) + "cache_size": {{uint32}} (OPTIONAL) +``` + +``` +key = STREAM_TELEMETRY_PROFILE:profile_name a string as the identifier of stream telemetry +; field = value +stream_status = enable/disable ; Enable/Disable stream. +poll_interval = uint32 ; The interval to poll counter, unit milliseconds. +profile_id = uint16 ; A numeric identifier of stream telemetry. The range is 256-65535. +chunk_size = uint32 ; number of stats groups in a telemetry message. +cache_size = uint32 ; number of chunks that can be cached. +``` + +##### STREAM_TELEMETRY_GROUP ##### + +``` +STREAM_TELEMETRY_GROUP:{{group_name}}:{{profile_name}} + "object_names": {{list of object name}} + "object_counters": {{list of stats of object}} +``` + +``` +key = STREAM_TELEMETRY_GROUP:group_name:profile_name + ; group_name is the object type, like PORT, QUEUE or INGRESS_PRIORITY_GROUP. + ; Multiple groups can be bound to a same stream telemetry profile. +; field = value +object_names = list of object name + ; The object name in the group, like Ethernet0,Ethernet8. comma separated list. +object_counters = list of stats of object + ; The stats name in the group. like SAI_PORT_STAT_IF_IN_OCTETS,SAI_PORT_STAT_IF_IN_UCAST_PKTS. + ; comma separated list. +``` + +#### StateDb #### + +##### STREAM_TELEMETRY_SESSION ##### + +``` +STREAM_TELEMETRY_SESSION:{{profile_name}} + "session_status": {{enable/disable}} + "session_template": {{binary array}} +``` + +``` +key = STREAM_TELEMETRY_SESSION:profile_name ; a string as the identifier of stream telemetry +; field = value +session_status = enable/disable ; Enable/Disable stream. +session_template = binary array; The IPFIX template to interpret the message from netlink +``` + +#### Work Flow + +``` mermaid + +sequenceDiagram + autonumber + box Redis + participant config_db as CONFIG_DB + participant state_db as STATE_DB + end + box GNMI container + participant gnmi as gnmi server + participant counter as counter syncd + end + box SWSS container + participant st_orch as Stream Telemetry Orch + end + box SYNCD container + participant syncd + end + box Linux Kernel + participant dma_engine as DMA Engine + participant netlink_module as Netlink module + end + participant asic as ASIC + + counter --> counter: Initialize genetlink + config_db ->> st_orch: STREAM_TELEMETRY_PROFILE + opt Is the first telemetry profile? + st_orch ->> syncd: Initialize
HOSTIF
TAM_TRANSPORT
TAM_collector
. + syncd --) st_orch: + note right of st_orch: The collector object will be reused + end + config_db ->> st_orch: STREAM_TELEMETRY_GROUP + st_orch ->> syncd: Config TAM objects + alt Is stream status enabled? + st_orch ->> syncd: Enable telemetry stream + syncd ->> dma_engine: Config stats + syncd ->> st_orch: Query IPFIX template + st_orch ->> state_db: Update STREAM_TELEMETRY_SESSION enabled + state_db ->> counter: Register IPFIX template + loop Push stats until stream disabled + loop collect a chunk of stats + dma_engine ->> asic: query stats from asic + asic --) dma_engine: + dma_engine ->> netlink_module: Push stats + end + alt counter syncd is ready to receive? + netlink_module ->> counter: Push a chunk of stats with IPFIX message + else + netlink_module ->> netlink_module: Cache data + end + end + else + st_orch ->> syncd: Disable telemetry stream + syncd ->> dma_engine: Stop stream + st_orch ->> state_db: Update STREAM_TELEMETRY_SESSION to disabled + state_db ->> counter: Unrigster IPFIX template + end + loop Receive IPFIX message of stats from genetlink + alt Have this template of IPFIX been registered? + counter ->> gnmi: Push message to GNMI server + else + counter ->> counter: Discard this message + end + end + +``` + +This section covers the high level design of the feature/enhancement. This section covers the following points in detail. + + - Is it a built-in SONiC feature or a SONiC Application Extension? + - What are the modules and sub-modules that are modified for this design? + - What are the repositories that would be changed? + - Module/sub-module interfaces and dependencies. + - SWSS and Syncd changes in detail + - DB and Schema changes (APP_DB, ASIC_DB, COUNTERS_DB, LOGLEVEL_DB, CONFIG_DB, STATE_DB) + - Sequence diagram if required. + - Linux dependencies and interface + - Warm reboot requirements/dependencies + - Fastboot requirements/dependencies + - Scalability and performance requirements/impact + - Memory requirements + - Docker dependency + - Build dependency if any + - Management interfaces - SNMP, CLI, RestAPI, etc., + - Serviceability and Debug (logging, counters, trace etc) related design + - Is this change specific to any platform? Are there dependencies for platforms to implement anything to make this feature work? If yes, explain in detail and inform community in advance. + - SAI API requirements, CLI requirements, ConfigDB requirements. Design is covered in following sections. + +### SAI API ### + +This section covers the changes made or new API added in SAI API for implementing this feature. If there is no change in SAI API for HLD feature, it should be explicitly mentioned in this section. +This section should list the SAI APIs/objects used by the design so that silicon vendors can implement the required support in their SAI. Note that the SAI requirements should be discussed with SAI community during the design phase and ensure the required SAI support is implemented along with the feature/enhancement. + +``` mermaid + +--- +title: Stream Telemetry SAI Objects +--- +erDiagram + hostif[HOSTIF] { + SAI_ID SAI_VALUE "Comments" + SAI_HOSTIF_ATTR_TYPE SAI_HOSTIF_TYPE_GENETLINK + SAI_HOSTIF_ATTR_OPER_STATUS true + SAI_HOSTIF_ATTR_NAME sonic_stel "constant variables" + SAI_HOSTIF_ATTR_GENETLINK_MCGRP_NAME stats "constant variables" + } + transport[TAM_transport] { + SAI_ID SAI_VALUE "Comments" + SAI_TAM_TRANSPORT_ATTR_TRANSPORT_TYPE SAI_TAM_TRANSPORT_TYPE_NONE + } + collector[TAM_collector] { + SAI_ID SAI_VALUE "Comments" + SAI_TAM_COLLECTOR_ATTR_TRANSPORT sai_tam_transport_obj + SAI_TAM_COLLECTOR_ATTR_LOCALHOST true + SAI_TAM_COLLECTOR_ATTR_HOSTIF sai_hostif_obj + SAI_TAM_COLLECTOR_ATTR_DSCP_VALUE _0 + } + report[TAM_report] { + SAI_ID SAI_VALUE "Comments" + SAI_TAM_REPORT_ATTR_TYPE SAI_TAM_REPORT_TYPE_IPFIX + SAI_TAM_REPORT_ATTR_REPORT_MODE SAI_TAM_REPORT_MODE_BULK + SAI_TAM_REPORT_ATTR_REPORT_INTERVAL poll_interval "STREAM_TELEMETRY_PROFILE:profile_name[poll_interval] on Config DB" + SAI_TAM_REPORT_ATTR_TEMPLATE_REPORT_INTERVAL _0 "Don't push the template, Because we hope the template can be proactively queried by orchagent" + SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID profile_id "STREAM_TELEMETRY_PROFILE:profile_name[profile_id] on Config DB" + SAI_TAM_REPORT_ATTR_REPORT_INTERVAL_UNIT SAI_TAM_REPORT_INTERVAL_UNIT_MSEC + } + telemetry_type[TAM_telemetry_type] { + SAI_ID SAI_VALUE "Comments" + SAI_TAM_TEL_TYPE_ATTR_SWITCH_ENABLE_XXXX_STATS true "Based on the STREAM_TELEMETRY_GROUP on Config DB, to enable corresponding capabilities." + SAI_TAM_TEL_TYPE_ATTR_REPORT_ID sai_tam_report_obj + } + telemetry[TAM_telemetry] { + SAI_ID SAI_VALUE "Comments" + SAI_TAM_TELEMETRY_ATTR_TAM_TYPE_LIST sai_tam_tel_type_obj + SAI_TAM_TELEMETRY_ATTR_COLLECTOR_LIST sai_tam_collector_obj + SAI_TAM_TELEMETRY_ATTR_REPORTING_TYPE SAI_TAM_REPORTING_TYPE_COUNT_BASED + SAI_TAM_TELEMETRY_ATTR_REPORTING_CHUNK_SIZE chunk_size "STREAM_TELEMETRY_PROFILE:profile_name[chunk_size] on Config DB" + SAI_TAM_TELEMETRY_ATTR_CACHE_SIZE cache_size "STREAM_TELEMETRY_PROFILE:profile_name[cache_size] on Config DB" + } + counter_subscription[TAM_counter_subscription] { + SAI_ID SAI_VALUE "Comments" + SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_TEL_TYPE sai_tam_tel_type_obj + SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_ID port_obj + SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_ID SAI_PORT_STAT_IF_IN_OCTETS "A stats in sai_port_stat_t" + SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL index "The index of IPFIX template" + } + TAM[TAM] { + SAI_ID SAI_VALUE "Comments" + SAI_TAM_ATTR_TELEMETRY_OBJECTS_LIST sai_tam_telemetry_obj + SAI_TAM_ATTR_TAM_BIND_POINT_TYPE_LIST SAI_TAM_BIND_POINT_TYPE_PORT + } + + collector ||--|| hostif: binds + collector ||--|| transport: binds + telemetry_type ||--|| report: binds + telemetry ||..o| telemetry_type: binds + telemetry }o..|{ collector: binds + counter_subscription }|--|| telemetry_type: binds + TAM ||--|| telemetry: binds + +``` + +| Object Type | Scope | +| ------------------------ | ---------------------------- | +| HOSTIF | Global | +| TAM_transport | Global | +| TAM_collector | Global | +| TAM | per STREAM_TELEMETRY profile | +| TAM_telemetry | per STREAM_TELEMETRY profile | +| TAM_telemetry_type | per STREAM_TELEMETRY profile | +| TAM_report | per STREAM_TELEMETRY profile | +| TAM_counter_subscription | per stats of object | + + +#### Create HOSTIF object #### + +``` c++ + +sai_attr_list[0].id = SAI_HOSTIF_ATTR_TYPE; +sai_attr_list[0].value.s32 = SAI_HOSTIF_TYPE_GENETLINK; + +sai_attr_list[1].id = SAI_HOSTIF_ATTR_OPER_STATUS; +sai_attr_list[1].value.boolean = true; + +// Set genetlink family +sai_attr_list[2].id = SAI_HOSTIF_ATTR_NAME; +strncpy(sai_attr_list[2].value.chardata, "sonic_stel", strlen("sonic_stel") + 1); + +// Set genetlink group +sai_attr_list[3].id = SAI_HOSTIF_ATTR_GENETLINK_MCGRP_NAME; +strncpy(sai_attr_list[3].value.chardata, "stats", strlen("stats") + 1); + +attr_count = 4; +create_hostif(sai_hostif_obj, switch_id, attr_count, sai_attr_list); + +``` + +#### Creating TAM transport object #### + +``` c++ + +sai_attr_list[0].id = SAI_TAM_TRANSPORT_ATTR_TRANSPORT_TYPE; +sai_attr_list[0].value.s32 = SAI_TAM_TRANSPORT_TYPE_NONE; + +attr_count = 1; +sai_create_tam_transport_fn(&sai_tam_transport_obj, switch_id, attr_count, sai_attr_list); + +``` + +#### Creating TAM collector object #### + +``` c++ +typedef enum _sai_tam_collector_attr_t +{ + // ... + + /** + * @brief Hostif object used to reach local host via GENETLINK + * + * @type sai_object_id_t + * @flags CREATE_AND_SET + * @objects SAI_OBJECT_TYPE_HOSTIF + * @allownull true + * @default SAI_NULL_OBJECT_ID + * @validonly SAI_TAM_COLLECTOR_ATTR_LOCALHOST == true + */ + SAI_TAM_COLLECTOR_ATTR_HOSTIF, + + // ... +} sai_tam_collector_attr_t; +``` + +``` c++ + +sai_attr_list[0].id = SAI_TAM_COLLECTOR_ATTR_TRANSPORT; +sai_attr_list[0].value.oid = sai_tam_transport_obj; + +sai_attr_list[1].id = SAI_TAM_COLLECTOR_ATTR_LOCALHOST; +sai_attr_list[1].value.booldata = true; + +sai_attr_list[2].id = SAI_TAM_COLLECTOR_ATTR_HOSTIF; +sai_attr_list[2].value.oid = sai_hostif_obj; + +sai_attr_list[3].id = SAI_TAM_COLLECTOR_ATTR_DSCP_VALUE; +sai_attr_list[3].value.u8 = 0; + +attr_count = 4; +sai_create_tam_collector_fn(&sai_tam_collector_obj, switch_id, attr_count, sai_attr_list); + +``` + +#### Creating TAM report object #### + +``` c++ +/** + * @brief Attributes for TAM report + */ +typedef enum _sai_tam_report_attr_t +{ + + // ... + + /** + * @brief Set ID for IPFIX template + * + * According to the IPFIX spec, the available range should be 256-65535. + * The value 0 means the ID will be decided by the vendor's SAI. + * + * @type sai_uint16_t + * @flags CREATE_AND_SET + * @default 0 + * @validonly SAI_TAM_REPORT_ATTR_TYPE == SAI_TAM_REPORT_TYPE_IPFIX + */ + SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID, + + /** + * @brief query IPFIX template + * + * Return the IPFIX template binary buffer + * + * @type sai_uint8_list_t + * @flags READ_ONLY + * @validonly SAI_TAM_REPORT_ATTR_TYPE == SAI_TAM_REPORT_TYPE_IPFIX + */ + SAI_TAM_REPORT_ATTR_IPFIX_TEMPLATE, + + // ... + +} sai_tam_report_attr_t; + +``` + +``` c++ + +sai_attr_list[0].id = SAI_TAM_REPORT_ATTR_TYPE; +sai_attr_list[0].value.s32 = SAI_TAM_REPORT_TYPE_IPFIX; + +sai_attr_list[1].id = SAI_TAM_REPORT_ATTR_REPORT_MODE; +sai_attr_list[1].value.s32 = SAI_TAM_REPORT_MODE_BULK; + +sai_attr_list[2].id = SAI_TAM_REPORT_ATTR_REPORT_INTERVAL; +sai_attr_list[2].value.u32 = poll_interval; // STREAM_TELEMETRY_PROFILE:profile_name[poll_interval] on Config DB + +// sai_attr_list[].id = SAI_TAM_REPORT_ATTR_ENTERPRISE_NUMBER; Ignore this value + +sai_attr_list[3].id = SAI_TAM_REPORT_ATTR_TEMPLATE_REPORT_INTERVAL; +sai_attr_list[3].value.s32 = 0; // Don't push the template, Because we hope the template can be proactively queried by orchagent + +sai_attr_list[4].id = SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID; +sai_attr_list[4].value.u16 = profile_id;// STREAM_TELEMETRY_PROFILE:profile_name[profile_id] on Config DB; + +sai_attr_list[5].id = SAI_TAM_REPORT_ATTR_REPORT_INTERVAL_UNIT; +sai_attr_list[5].value.s32 = SAI_TAM_REPORT_INTERVAL_UNIT_MSEC; + +attr_count = 6; +sai_create_tam_report_fn(&sai_tam_report_obj, switch_id, attr_count, sai_attr_list); + +``` + +#### Creating TAM telemetry type object #### + +``` c++ + +sai_attr_list[0].id = SAI_TAM_TEL_TYPE_ATTR_TAM_TELEMETRY_TYPE; +sai_attr_list[0].value.s32 = SAI_TAM_TELEMETRY_TYPE_COUNTER_SUBSCRIPTION; + +// Based on the STREAM_TELEMETRY_GROUP on Config DB, to enable corresponding capabilities. +sai_attr_list[1].id = SAI_TAM_TEL_TYPE_ATTR_SWITCH_ENABLE_PORT_STATS ; +sai_attr_list[1].value.boolean = true; + +sai_attr_list[2].id = SAI_TAM_TEL_TYPE_ATTR_SWITCH_ENABLE_MMU_STATS ; +sai_attr_list[2].value.boolean = true; + +// ... + +sai_attr_list[3].id = SAI_TAM_TEL_TYPE_ATTR_REPORT_ID; +sai_attr_list[3].value.oid = sai_tam_report_obj; + +attr_count = 4; +sai_create_tam_tel_type_fn(&sai_tam_tel_type_obj, switch_id, attr_count, sai_attr_list); + +``` + +#### Creating TAM telemetry object #### + +Extern TAM telemetry attributes in SAI + +``` c++ + +typedef enum _sai_tam_reporting_type_t +{ + /** + * @brief Report type is time based + */ + SAI_TAM_REPORTING_TYPE_TIME_BASED, + + /** + * @brief Report type is count based + */ + SAI_TAM_REPORTING_TYPE_COUNT_BASED, + +} sai_tam_reporting_type_t; + +typedef enum _sai_tam_telemetry_attr_t +{ + // ... + + /** + * @brief Tam telemetry reporting unit + * + * @type sai_tam_reporting_unit_t + * @flags CREATE_AND_SET + * @default SAI_TAM_REPORTING_UNIT_SEC + * @condition SAI_TAM_TELEMETRY_ATTR_REPORTING_TYPE == SAI_TAM_REPORTING_TYPE_TIME_BASED + */ + SAI_TAM_TELEMETRY_ATTR_TAM_REPORTING_UNIT, + + /** + * @brief Tam event reporting interval + * + * defines the gap between two reports + * + * @type sai_uint32_t + * @flags CREATE_AND_SET + * @default 1 + * @condition SAI_TAM_TELEMETRY_ATTR_REPORTING_TYPE == SAI_TAM_REPORTING_TYPE_TIME_BASED + */ + SAI_TAM_TELEMETRY_ATTR_REPORTING_INTERVAL, + + /** + * @brief Tam telemetry reporting type + * + * @type sai_tam_reporting_type_t + * @flags CREATE_AND_SET + * @default SAI_TAM_REPORTING_TYPE_TIME_BASED + */ + SAI_TAM_TELEMETRY_ATTR_REPORTING_TYPE, + + /** + * @brief Tam telemetry reporting chunk size + * + * defines the size of reporting chunk, which means TAM will report to the collector every time + * if the report count reaches the chunk size. + * + * @type sai_uint32_t + * @flags CREATE_AND_SET + * @default 1 + * @condition SAI_TAM_TELEMETRY_ATTR_REPORTING_TYPE == SAI_TAM_REPORTING_TYPE_COUNT_BASED + */ + SAI_TAM_TELEMETRY_ATTR_REPORTING_CHUNK_SIZE, + + /** + * @brief Tam telemetry cache size + * + * If the collector isn't ready to receive the report, this value indicates how many + * reports that can be cached. 0 means no cache which is the default behavior. + * + * @type sai_uint32_t + * @flags CREATE_AND_SET + * @default 0 + */ + SAI_TAM_TELEMETRY_ATTR_CACHE_SIZE, + +} sai_tam_telemetry_attr_t; + +``` + +``` c++ + +sai_attr_list[0].id = SAI_TAM_TELEMETRY_ATTR_TAM_TYPE_LIST; +sai_attr_list[0].value.objlist.count = 1; +sai_attr_list[0].value.objlist.list[0] = sai_tam_tel_type_obj; + +sai_attr_list[1].id = SAI_TAM_TELEMETRY_ATTR_COLLECTOR_LIST; +sai_attr_list[1].value.objlist.count = 1; +sai_attr_list[1].value.objlist.list[0] = sai_tam_collector_obj; + +sai_attr_list[2].id = SAI_TAM_TELEMETRY_ATTR_REPORTING_TYPE; +sai_attr_list[2].value.s32 = SAI_TAM_REPORTING_TYPE_COUNT_BASED + +sai_attr_list[3].id = SAI_TAM_TELEMETRY_ATTR_REPORTING_CHUNK_SIZE; +sai_attr_list[3].value.u32 = chunk_size; // STREAM_TELEMETRY_PROFILE:profile_name[chunk_size] on Config DB + +sai_attr_list[4].id = SAI_TAM_TELEMETRY_ATTR_CACHE_SIZE; +sai_attr_list[4].value.u32 = cache_size; // STREAM_TELEMETRY_PROFILE:profile_name[cache_size] on Config DB + +attr_count = 5; + +sai_create_tam_telemetry_fn(&sai_tam_telemetry_obj, switch_id, attr_count, sai_attr_list); + +``` + +#### Create TAM counter subscription objects #### + +Based on the STREAM_TELEMETRY_GROUP on Config DB, to create corresponding counter subscription objects. + +``` c++ + +// Create counter subscription list + +sai_attr_list[0].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_TEL_TYPE; +sai_attr_list[0].value.oid = sai_tam_tel_type_obj; + +sai_attr_list[1].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_ID; +sai_attr_list[1].value.oid = port_obj; + +sai_attr_list[2].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_ID; +sai_attr_list[2].value.oid = SAI_PORT_STAT_IF_IN_OCTETS; + +sai_attr_list[3].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL; +sai_attr_list[3].value.oid = index; // The index of IPFIX template + +attr_count = 4; + +create_tam_counter_subscription(&sai_tam_counter_subscription_obj, switch_id, attr_count, sai_attr_lis); +// If this stats of object cannot support this poll frequency, this API should return SAI_STATUS_NOT_SUPPORTED. +``` + +#### Create TAM object #### + +``` c++ + +sai_attr_list[0].id = SAI_TAM_ATTR_TELEMETRY_OBJECTS_LIST; +sai_attr_list[0].value.objlist.count = 1; +sai_attr_list[0].value.objlist.list[0] = sai_tam_telemetry_obj; + +sai_attr_list[1].id = SAI_TAM_ATTR_TAM_BIND_POINT_TYPE_LIST; +sai_attr_list[1].value.objlist.count = 2; +sai_attr_list[1].value.objlist.list[0] = SAI_TAM_BIND_POINT_TYPE_PORT; +sai_attr_list[1].value.objlist.list[0] = SAI_TAM_BIND_POINT_TYPE_QUEUE; + +attr_count = 2; +sai_create_tam_fn(&sai_tam_obj, switch_id, attr_count, sai_attr_list); + +``` + +#### Query IPFIX template #### + +``` c++ + +sai_attribute_t attr; +get_tam_report_attribute(&sai_tam_report_obj, 1, &attr); + +std::vector ipfix_template(attr.value.u8list.list, attr.value.u8list.list + attr.value.u8list.count); +// Save ipfix_template to STATE DB + +// Free memory +free(attr.value.u8list.list); + +``` + +#### Enable/Disable telemetry stream #### + +``` c++ + +sai_attribute_t attr; + +// Enable telemetry stream + +attr.id = SAI_TAM_TELEMETRY_ATTR_TAM_TYPE_LIST; +attr.value.objlist.count = 1; +attr.value.objlist.list[0] = sai_tam_tel_type_obj; + +// Disable telemetry stream + +attr.id = SAI_TAM_TELEMETRY_ATTR_TAM_TYPE_LIST; +attr.value.objlist.count = 1; +attr.value.objlist.list[0] = sai_tam_tel_type_obj; + +get_tam_telemetry_attribute(&sai_tam_telemetry_obj, 1, &attr); + +``` + +### Configuration and management +This section should have sub-sections for all types of configuration and management related design. Example sub-sections for "CLI" and "Config DB" are given below. Sub-sections related to data models (YANG, REST, gNMI, etc.,) should be added as required. +If there is breaking change which may impact existing platforms, please call out in the design and get platform vendors reviewed. + +#### Manifest (if the feature is an Application Extension) + +Paste a preliminary manifest in a JSON format. + +#### CLI/YANG model Enhancements + +This sub-section covers the addition/deletion/modification of CLI changes and YANG model changes needed for the feature in detail. If there is no change in CLI for HLD feature, it should be explicitly mentioned in this section. Note that the CLI changes should ensure downward compatibility with the previous/existing CLI. i.e. Users should be able to save and restore the CLI from previous release even after the new CLI is implemented. +This should also explain the CLICK and/or KLISH related configuration/show in detail. +https://github.com/sonic-net/sonic-utilities/blob/master/doc/Command-Reference.md needs be updated with the corresponding CLI change. + +#### Config DB Enhancements + +This sub-section covers the addition/deletion/modification of config DB changes needed for the feature. If there is no change in configuration for HLD feature, it should be explicitly mentioned in this section. This section should also ensure the downward compatibility for the change. + +### Warmboot and Fastboot Design Impact +Mention whether this feature/enhancement has got any requirements/dependencies/impact w.r.t. warmboot and fastboot. Ensure that existing warmboot/fastboot feature is not affected due to this design and explain the same. + +### Memory Consumption +This sub-section covers the memory consumption analysis for the new feature: no memory consumption is expected when the feature is disabled via compilation and no growing memory consumption while feature is disabled by configuration. +### Restrictions/Limitations + +### Testing Requirements/Design +Explain what kind of unit testing, system testing, regression testing, warmboot/fastboot testing, etc., +Ensure that the existing warmboot/fastboot requirements are met. For example, if the current warmboot feature expects maximum of 1 second or zero second data disruption, the same should be met even after the new feature/enhancement is implemented. Explain the same here. +Example sub-sections for unit test cases and system test cases are given below. + +#### Unit Test cases + +#### System Test cases + +### Open/Action items - if any + + +NOTE: All the sections and sub-sections given above are mandatory in the design document. Users can add additional sections/sub-sections if required. From 3303c56c66c2d688a1c6d65c7d184e5997240d9e Mon Sep 17 00:00:00 2001 From: Ze Gan Date: Wed, 4 Sep 2024 19:39:31 +0800 Subject: [PATCH 02/13] Fix length Signed-off-by: Ze Gan --- doc/stream-telemetry/stream-telemetry-hld.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/stream-telemetry/stream-telemetry-hld.md b/doc/stream-telemetry/stream-telemetry-hld.md index eb184cc7da..815c47aebe 100644 --- a/doc/stream-telemetry/stream-telemetry-hld.md +++ b/doc/stream-telemetry/stream-telemetry-hld.md @@ -212,7 +212,7 @@ For example, if the switch has 8 ports, but we only want to get the `SAI_PORT_ST 0...31 |Set ID = 2|Set Length = 76<12+8*8>| |Template ID = 256|Number of Fields = 9| -|Type = 324|Field Length = 16| +|Type = 324|Field Length = 8| |1|Element ID = 0|Field Length = 0| |Enterprise Number = 0| |1|Element ID = 0|Field Length = 0| From 5ac14811020d0db07506bf6cf105616f8d361d06 Mon Sep 17 00:00:00 2001 From: Ze Gan Date: Fri, 6 Sep 2024 16:47:20 +0800 Subject: [PATCH 03/13] Fix comments Signed-off-by: Ze Gan --- .../sonic-stream-telemetry.yang | 81 +++ doc/stream-telemetry/stream-telemetry-hld.md | 659 +++++++++++------- 2 files changed, 486 insertions(+), 254 deletions(-) create mode 100644 doc/stream-telemetry/sonic-stream-telemetry.yang diff --git a/doc/stream-telemetry/sonic-stream-telemetry.yang b/doc/stream-telemetry/sonic-stream-telemetry.yang new file mode 100644 index 0000000000..39f4216e6f --- /dev/null +++ b/doc/stream-telemetry/sonic-stream-telemetry.yang @@ -0,0 +1,81 @@ +module sonic-stream-telemetry { + yang-version 1.1; + + namespace "http://github.com/sonic-net/sonic-stream-telemetry"; + + prefix sonic-stream-telemetry; + + container sonic-stream-telemetry { + container STREAM_TELEMETRY_PROFILE { + description "STREAM_TELEMETRY_PROFILE part of config_db.json"; + list STREAM_TELEMETRY_PROFILE_LIST { + + key "name"; + + leaf name { + type string { + length 1..128; + } + } + + leaf stream_state { + type string { + pattern "enabled|disabled"; + } + } + + leaf poll_interval { + description "The interval to poll counter, unit milliseconds."; + type uint32; + } + + leaf chunk_size { + type uint32; + default 0; + } + + leaf cache_size { + type uint32; + default 0; + } + } + } + + container STREAM_TELEMETRY_GROUP { + description "STREAM_TELEMETRY_GROUP part of config_db.json"; + list STREAM_TELEMETRY_GROUP_LIST { + key "profile_name group_name"; + + leaf profile_name { + type leafref { + path "/sonic-stream-telemetry:sonic-stream-telemetry/STREAM_TELEMETRY_PROFILE/STREAM_TELEMETRY_PROFILE_LIST/name"; + } + } + + // The table name of config db + leaf group_name { + type string { + pattern "PORT|QUEUE|BUFFER_PG|BUFFER_POOL|BUFFER_QUEUE"; + } + } + + leaf-list object_names { + type string { + pattern "\w+(\|\d+(-\d+)?)?"; + error-message "Invalid object names"; + } + description "The object names to be monitored"; + } + + leaf-list object_counters { + string { + pattern "SAI_[A-Z]+_STAT_([A-Z]+_)*[A-Z]+"; + error-message "Invalid STATS ID for SAI object"; + } + description "The SAI STATS ID"; + } + + } + } + } +} diff --git a/doc/stream-telemetry/stream-telemetry-hld.md b/doc/stream-telemetry/stream-telemetry-hld.md index 815c47aebe..59940ce719 100644 --- a/doc/stream-telemetry/stream-telemetry-hld.md +++ b/doc/stream-telemetry/stream-telemetry-hld.md @@ -1,76 +1,90 @@ -# Stream telemetry high level design # -- [Stream telemetry high level design](#stream-telemetry-high-level-design) - - [Table of Content](#table-of-content) - - [Revision](#revision) - - [Scope](#scope) - - [Definitions/Abbreviations](#definitionsabbreviations) - - [Overview](#overview) - - [Requirements](#requirements) - - [Architecture Design](#architecture-design) - - [High-Level Design](#high-level-design) - - [Modules](#modules) - - [Netlink Module](#netlink-module) - - [Data format](#data-format) - - [IPFIX header](#ipfix-header) - - [IPFIX template](#ipfix-template) - - [IPFIX data](#ipfix-data) - - [Bandwidth Estimation](#bandwidth-estimation) - - [Config DB](#config-db) - - [STREAM\_TELEMETRY\_PROFILE](#stream_telemetry_profile) - - [STREAM\_TELEMETRY\_GROUP](#stream_telemetry_group) - - [StateDb](#statedb) - - [STREAM\_TELEMETRY\_SESSION](#stream_telemetry_session) - - [Work Flow](#work-flow) - - [SAI API](#sai-api) - - [Create HOSTIF object](#create-hostif-object) - - [Creating TAM transport object](#creating-tam-transport-object) - - [Creating TAM collector object](#creating-tam-collector-object) - - [Creating TAM report object](#creating-tam-report-object) - - [Creating TAM telemetry type object](#creating-tam-telemetry-type-object) - - [Creating TAM telemetry object](#creating-tam-telemetry-object) - - [Create TAM counter subscription objects](#create-tam-counter-subscription-objects) - - [Create TAM object](#create-tam-object) - - [Query IPFIX template](#query-ipfix-template) - - [Enable/Disable telemetry stream](#enabledisable-telemetry-stream) - - [Configuration and management](#configuration-and-management) - - [Manifest (if the feature is an Application Extension)](#manifest-if-the-feature-is-an-application-extension) - - [CLI/YANG model Enhancements](#cliyang-model-enhancements) - - [Config DB Enhancements](#config-db-enhancements) - - [Warmboot and Fastboot Design Impact](#warmboot-and-fastboot-design-impact) - - [Memory Consumption](#memory-consumption) - - [Restrictions/Limitations](#restrictionslimitations) - - [Testing Requirements/Design](#testing-requirementsdesign) - - [Unit Test cases](#unit-test-cases) - - [System Test cases](#system-test-cases) - - [Open/Action items - if any](#openaction-items---if-any) - - -## Table of Content - -### Revision - -### Scope - -This section describes the scope of this high-level design document in SONiC. - -### Definitions/Abbreviations - -This section covers the abbreviation if any, used in this high-level design document and its definitions. - -### Overview - -The purpose of this section is to give an overview of high-level design document and its architecture implementation in SONiC. - -### Requirements - -This section list out all the requirements for the HLD coverage and exemptions (not supported) if any for this design. - -### Architecture Design - -This section covers the changes that are required in the SONiC architecture. In general, it is expected that the current architecture is not changed. -This section should explain how the new feature/enhancement (module/sub-module) fits in the existing architecture. - -If this feature is a SONiC Application Extension mention which changes (if any) needed in the Application Extension infrastructure to support new feature. +# Stream telemetry high level design + +## Table of Content ## + +- [Revision](#revision) +- [Scope](#scope) +- [Definitions/Abbreviations](#definitionsabbreviations) +- [Overview](#overview) +- [Requirements](#requirements) +- [Architecture Design](#architecture-design) +- [High-Level Design](#high-level-design) + - [Modules](#modules) + - [Counter Syncd](#counter-syncd) + - [Stream Telemetry Orch](#stream-telemetry-orch) + - [Netlink Module and DMA Engine](#netlink-module-and-dma-engine) + - [Data format](#data-format) + - [IPFIX header](#ipfix-header) + - [IPFIX template](#ipfix-template) + - [IPFIX data](#ipfix-data) + - [Bandwidth Estimation](#bandwidth-estimation) + - [Config DB](#config-db) + - [STREAM\_TELEMETRY\_PROFILE](#stream_telemetry_profile) + - [STREAM\_TELEMETRY\_GROUP](#stream_telemetry_group) + - [StateDb](#statedb) + - [STREAM\_TELEMETRY\_SESSION](#stream_telemetry_session) + - [Work Flow](#work-flow) + - [SAI API](#sai-api) + - [Create HOSTIF object](#create-hostif-object) + - [Creating TAM transport object](#creating-tam-transport-object) + - [Creating TAM collector object](#creating-tam-collector-object) + - [Creating TAM report object](#creating-tam-report-object) + - [Creating TAM telemetry type object](#creating-tam-telemetry-type-object) + - [Creating TAM telemetry object](#creating-tam-telemetry-object) + - [Create TAM counter subscription objects](#create-tam-counter-subscription-objects) + - [Create TAM object](#create-tam-object) + - [Query IPFIX template](#query-ipfix-template) + - [Enable/Disable telemetry stream](#enabledisable-telemetry-stream) +- [Configuration and management](#configuration-and-management) + - [Manifest (if the feature is an Application Extension)](#manifest-if-the-feature-is-an-application-extension) + - [CLI/YANG model Enhancements](#cliyang-model-enhancements) + - [Config CLI](#config-cli) + - [Inspect stream CLI](#inspect-stream-cli) + - [YANG](#yang) + - [Config DB Enhancements](#config-db-enhancements) + - [Warmboot and Fastboot Design Impact](#warmboot-and-fastboot-design-impact) + - [Memory Consumption](#memory-consumption) + - [Restrictions/Limitations](#restrictionslimitations) + - [Testing Requirements/Design](#testing-requirementsdesign) + - [Unit Test cases](#unit-test-cases) + - [System Test cases](#system-test-cases) + - [Open/Action items - if any](#openaction-items---if-any) + +## Revision + +| Rev | Date | Author | Change Description | +| --- | ---------- | ------ | ------------------ | +| 0.1 | 09/06/2024 | Ze Gan | Initial version | + +## Scope + +This document outlines the high-level design of stream telemetry, focusing primarily on the internal aspects of SONiC rather than external telemetry systems. + +## Definitions/Abbreviations + +| Abbreviation | Description | +| ------------ | ----------------------------------------- | +| SAI | The Switch Abstraction Interface | +| IPFIX | Internet Protocol Flow Information Export | +| TAM | Telemetry and Monitoring | +| BW | Bandwidth | + +## Overview + +The existing telemetry solution of SONiC relies on the syncd process to proactively query stats and counters via the SAI API. This approach causes the syncd process to spend excessive time on SAI communication. The stream telemetry described in this document aims to provide a more efficient method for collecting object stats. The main idea is that selected stats will be proactively pushed from the vendor's driver to the collector via netlink. + +## Requirements + +- The number of SAI object types should not exceed 32,768 ($2^{15}$). This means the value of SAI_OBJECT_TYPE_MAX should be less than 32,768. +- The number of SAI object extension types should not exceed 32,768. +- The number of stats types for a single SAI object type should not exceed 32,768. +- The number of extension stats types for a single SAI object type should not exceed 32,768. +- The number of SAI objects of the same type should not exceed 32,768. +- The vendor SDK should support publishing stats in IPFIX format and its IPFIX template. +- If a polling frequency for stats cannot be supported, the vendor's SDK should report this error. +- When reconfiguring any stream settings, whether it is the polling interval or the stats list, the existing stream will be interrupted and regenerated. + +## Architecture Design ``` mermaid @@ -124,15 +138,15 @@ flowchart BT counter_syncd --telemetry message--> gnmi ``` -### High-Level Design ### +STATE_DB channel model? Produce Table/Consume Table -#### Modules #### +## High-Level Design -##### Netlink Module ##### +### Modules -generic_netlink +#### Counter Syncd -netlink configuration constants in /etc/sonic/constants.yml +The `counter syncd` is a new module that runs within the GNMI container. Its primary responsibility is to receive counter messages via netlink and convert them into GNMI messages for an external collector. It subscribes to a socket of a specific family and multicast group of generic netlink. The configuration for generic netlink is defined as constants in `/etc/sonic/constants.yml` as follows. ``` yaml constants: @@ -143,22 +157,34 @@ constants: ``` -Ring buffer model +#### Stream Telemetry Orch -Pin CPU? +The `Stream Telemetry Orch` is a new object within the Orchagent. It has following primary duties: + +1. Maintain the TAM SAI objects according to the stream telemetry configuration in the config DB. +2. Generate a unique template ID for each stream telemetry profile to ensure distinct identification and management. +3. Register and activate streams on counter syncd. + +`Stream Telemetry Orch` leverages `tam_counter_subscription` objects to bind monitoring objects, such as ports, buffers, or queues, to streams. Therefore, this orch must ensure that the lifecycle of `tam_counter_subscription` objects is within the lifecycle of their respective monitoring objects. + +#### Netlink Module and DMA Engine + +These two modules need to be provided by vendors. This document proposes a ring buffer communication model to support all expected TAM configurations as follows. ![netlink_dma_channel](netlink_dma_channel.drawio.svg) -#### Data format #### +### Data format -We would like to use IPFIX as the report format, and the bytes order of all numbers in the IPFIX message is network-order(Big-endian order). +We will use IPFIX as the report format, with all numbers in the IPFIX message in network-order (Big-endian). -The reference of IPFIX: +For more information on IPFIX, refer to the following resources: - [Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information](https://datatracker.ietf.org/doc/html/rfc7011) - [IP Flow Information Export (IPFIX) Entities](https://www.iana.org/assignments/ipfix/ipfix.xhtml) -##### IPFIX header ##### +#### IPFIX header + +The `Version` and `Observation Domain ID` fields of the IPFIX header are identical for each IPFIX message. ``` mermaid @@ -174,7 +200,7 @@ packet-beta ``` -##### IPFIX template ##### +#### IPFIX template ``` mermaid @@ -183,70 +209,79 @@ title: stream message of IPFIX template --- packet-beta 0-15: "Set ID = 2" -16-31: "Set Length = (12 + Number of Element * 8) bytes" +16-31: "Set Length = (12 + Number of Stats * 8) bytes" 32-47: "Template ID = > 256 configured" 48-63: "Number of Fields = 1 + Number of stats" -64-79: "Element ID=observationTimeMilliseconds (324)" -80-95: "Field length = 8 bytes" +64-79: "Element ID=observationTimeNanoseconds (325)" +80-95: "Field length = 4 bytes" 96-96: "1" -97-111: "Element ID = SAI STATS ID 1" -112-127: "Field Length = 0 or 8 bytes" -128-159: "Enterprise Number = SAI TYPE ID 1" +97-111: "Element ID = Object index for the stats 1" +112-127: "Field Length = 8 bytes" +128-159: "Enterprise Number = SAI TYPE ID + SAI STATS ID for the stats 1" 160-191: "..." 192-192: "1" -193-207: "Element ID = SAI STATS ID N" -208-223: "Field Length = 0 or 8 bytes" -224-255: "Enterprise Number = SAI TYPE ID N" +193-207: "Element ID = Object index for the stats N" +208-223: "Field Length = 8 bytes" +224-255: "Enterprise Number = SAI TYPE ID + SAI STATS ID for the stats N" ``` -- To some high frequency counters, the unit of timestamp of native IPFIX is second which cannot meet our requirement. So, we introduce an extra element, `observationTimeMilliseconds`, for each record. -- Normally, to use the [SAI_OBJECT_TYPE](https://github.com/opencomputeproject/SAI/blob/master/inc/saitypes.h) as the enterprise number of IPFIX directly and derive the element ID of IPFIX from SAI stats ID via AND `0x8000`. -For example, to the `SAI_QUEUE_STAT_WRED_ECN_MARKED_PACKETS=0x00000022` of `SAI_OBJECT_TYPE_QUEUE = 21`, the enterprise number would be: `0x000000015`, and the element ID would be `0x8022`. -- We don't support the stats ID exceeds to `65535` currently, because the IPFIX specification itself limits the element ID to two bytes. If a larger stats IDs are needed in the future, we will need to extend IPFIX to a private encoding with 8 bytes element IDs. -- In order to the a flexibility and an efficiency, this system will support partial telemetry. It means this system will only report stats from selected ports/queue or so on. For example, if we only configure to report stats on Ethernet2 and Ethernet8, the report data will only include stats from these two ports even though there are 256 ports on this switch. To achieve this, the selection information needs to be encoded into the IPFIX template via field length. **The template should includes ALL objects for the stats. Meanwhile, if an object is selected, the length of corresponding field is 8, vice versa it's 0.** -For example, if the switch has 8 ports, but we only want to get the `SAI_PORT_STAT_IF_IN_OCTETS = 0` on Ethernet2 and Ethernet5. The template will look like: +- For high-frequency counters, the native IPFIX timestamp unit of seconds is insufficient. Therefore, we introduce an additional element, `observationTimeNanoseconds`, for each record to meet our requirements. +- The enterprise bit is always set to 1 for stats records. +- The element ID of IPFIX is derived from the object index. For example, for `Ethernet5`, the element ID will be `0x5 | 0x8000 = 0x8005`. +- The enterprise number is derived from the combination of the [SAI_OBJECT_TYPE](https://github.com/opencomputeproject/SAI/blob/master/inc/saitypes.h) and its corresponding stats ID. The high bits are used to indicate the SAI extension flag. For example, for `SAI_QUEUE_STAT_WRED_ECN_MARKED_PACKETS=0x00000022` of `SAI_OBJECT_TYPE_QUEUE=0x00000015`, the enterprise number will be `0x00000022 << 16 | 0x00000015 = 0x00220015`. + +``` mermaid +--- +title: Enterprise number encoding +--- +packet-beta +0: "EF" +1-15: "SAI TYPE ID" +16: "EF" +17-31: "SAI STATS ID" ``` -0...31 -|Set ID = 2|Set Length = 76<12+8*8>| -|Template ID = 256|Number of Fields = 9| -|Type = 324|Field Length = 8| -|1|Element ID = 0|Field Length = 0| -|Enterprise Number = 0| -|1|Element ID = 0|Field Length = 0| -|Enterprise Number = 0| -|1|Element ID = 0|Field Length = 8| -|Enterprise Number = 0| -|1|Element ID = 0|Field Length = 0| -|Enterprise Number = 0| -|1|Element ID = 0|Field Length = 0| -|Enterprise Number = 0| -|1|Element ID = 0|Field Length = 8| -|Enterprise Number = 0| -|1|Element ID = 0|Field Length = 0| -|Enterprise Number = 0 -|1|Element ID = 0|Field Length = 0| -|Enterprise Number = 0| +**EF is the extension flag: If this type or stat is an SAI extension, it should be set to 1.** + +For example, if the switch has 8 ports, but we only want to get the `SAI_PORT_STAT_IF_IN_ERRORS = 0x00000004` of `SAI_OBJECT_TYPE_PORT = 0x00000001` on Ethernet2 and Ethernet5, the template will look like this: + +``` mermaid + +packet-beta +0-15: "Set ID = 2" +16-31: "Set Length = 28 bytes" +32-47: "Template ID = 256" +48-63: "Number of Fields = 3" +64-79: "Element ID=325" +80-95: "Field length = 4 bytes" +96-96: "1" +97-111: "Element ID = 2 (port index)" +112-127: "Field Length = 8 bytes" +128-159: "Enterprise Number = 0x00010004" +160-160: "1" +161-175: "Element ID = 5 (port index)" +176-191: "Field Length = 8 bytes" +192-223: "Enterprise Number = 0x00010004" ``` -##### IPFIX data ##### +#### IPFIX data -A message of IPFIX data contains two level hierarchies, namely chunks and snapshots. A chunk contains a numbers of snapshots. And a snapshot is a binary block that can be interpreted by the IPFIX template mentioned above. +An IPFIX data message consists of two hierarchical levels: chunk and snapshots. A chunk contains multiple snapshots, and a snapshot is a binary block that can be interpreted using the IPFIX template mentioned above. -The binary block of snapshot is as follows: +The binary structure of a snapshot is as follows: ``` mermaid --- -title: stream message of IPFIX data +title: A snapshot of IPFIX data --- packet-beta 0-15: "Set ID = Same as template ID" -16-31: "Set Length = (8 + Number of valid stats * 8) bytes" -32-63: "Rcord 1: observationTimeMilliseconds" +16-31: "Set Length = (8 + Number of stats * 8) bytes" +32-63: "Rcord 1: observationTimeNanoseconds" 64-95: "Record 2: Stats 1" 96-127: "..." 128-159: "Record N + 1: Stats N" @@ -254,56 +289,64 @@ packet-beta ``` - The chunk size can be configured via SAI. -- The shot structure is derived from the IPFIX template, which is also derived from the stats we want to record. +- The snapshot structure is derived from the IPFIX template, which is based on the stats we want to record. -The following is an IPFIX message example of the same stats record as the IPFIX template example, and the chunk size is 3 +Below is an example of an IPFIX message for the same stats record as the IPFIX template example, with a chunk size of 3: -``` - -0...31 -|Version = 0x000a|Message Length = 64| -|Export Timestamp = 2024-08-29 20:30:60| -|Sequence Number = 1| -|Observation Domain ID = 0| -|Set ID = 256|Set Length = 24| -|Record 1: observationTimeMilliseconds = 100| -|Record 2: SAI_PORT_STAT_IF_IN_OCTETS = 10 | -|Record 3: SAI_PORT_STAT_IF_IN_OCTETS = 0 | -|Set ID = 256|Set Length = 24| -|Record 1: observationTimeMilliseconds = 200| -|Record 2: SAI_PORT_STAT_IF_IN_OCTETS = 10 | -|Record 3: SAI_PORT_STAT_IF_IN_OCTETS = 5 | -|Set ID = 256|Set Length = 24| -|Record 1: observationTimeMilliseconds = 300| -|Record 2: SAI_PORT_STAT_IF_IN_OCTETS = 30 | -|Record 3: SAI_PORT_STAT_IF_IN_OCTETS = 20 | +``` mermaid +--- +title: stream message IPFIX +--- +packet-beta +0-15: "Version = 0x000a" +16-31: "Message Length = 112 bytes" +32-63: "Export Timestamp = 2024-08-29 20:30:60" +64-95: "Sequence Number = 1" +96-127: "Observation Domain ID = 0" +128-143: "Set ID = 256" +144-159: "Set Length = 32 bytes" +160-191: "observationTimeNanoseconds = 10000" +192-255: "Port 1: SAI_PORT_STAT_IF_IN_ERRORS = 10" +256-319: "Port 2: SAI_PORT_STAT_IF_IN_ERRORS = 0" +320-383: "Port 3: SAI_PORT_STAT_IF_IN_ERRORS = 5" +384-399: "Set ID = 256" +400-415: "Set Length = 32 bytes" +416-447: "observationTimeNanoseconds = 20000" +448-511: "Port 1: SAI_PORT_STAT_IF_IN_ERRORS = 15" +512-575: "Port 2: SAI_PORT_STAT_IF_IN_ERRORS = 0" +576-639: "Port 3: SAI_PORT_STAT_IF_IN_ERRORS = 6" +640-655: "Set ID = 256" +656-671: "Set Length = 32 bytes" +672-703: "observationTimeNanoseconds = 30000" +704-767: "Port 1: SAI_PORT_STAT_IF_IN_ERRORS = 20" +768-831: "Port 2: SAI_PORT_STAT_IF_IN_ERRORS = 0" +832-895: "Port 3: SAI_PORT_STAT_IF_IN_ERRORS = 8" ``` -##### Bandwidth Estimation ##### +### Bandwidth Estimation -We estimate the bandwidth based only on the effective data size, not the actual data size. Because the extra information of a message is the IPFIX header(16 bytes), data prefix(4 bytes) and observation time milliseconds(4 bytes) which is negligible. For example, if we want to collect 30 stats on 256 ports, and the chunk size is 100. The percentage of effective data = `(4 * 30 * 256 * 100) / (16
+ 4 * 100 + 4 * 100 + 4 * 30 * 256 * 100) = 99.9%`. +We estimate the bandwidth based only on the effective data size, not the actual data size. The extra information in a message, such as the IPFIX header (16 bytes), data prefix (4 bytes), and observation time milliseconds (4 bytes), is negligible. For example, if we want to collect 30 stats on 64 ports, and the chunk size is 100: $The Percentage Of Effective Data = \frac{8 \times 30 \times 64 \times 100_{Effective Data}}{16_{Header} + 4 \times 100_{Data Prefix} + 4 \times 100_{Observation Time Milliseconds} + 8 \times 30 \times 64 \times 100_{Effective Data}} \approx 99.9\%$ . The following table is telemetry bandwidth of one cluster -| # of stats per port | # of ports per switch | # of switch | frequency (ms) | Total BW per switch(Mbps) | Total BW(Mbps) | +| # of stats per port | # of ports per switch | # of switch | frequency (us) | Total BW per switch(Mbps) | Total BW(Mbps) | | ------------------- | --------------------- | ----------- | -------------- | ------------------------- | -------------- | -| 30 | 512 | 10000 | 1 | 122.88 | 1,228,800 | +| 30 | 64 | 10,000 | 10 | 12,288 | 122,880,000 | -- *Total BW per switch = <# of stats per port> * <# of ports per switch> * * 8 / 1,000,000* -- *Total BM = * <# of switch>* +- ${Total BW Per Switch} = \frac{{\verb|#| Of Stats Per Port} \times 8_{bytes} \times {\verb|#| Of Ports Per Switch} \times {Frequency} \times 1,000 \times 8}{1,000,000}$ +- ${Total BM} = {Total BW Per Switch} \times {\verb|#| Of Switch}$ -#### Config DB #### +### Config DB -Any configuration changes in the config DB will interrupt existing session and restart new one. +Any configuration changes in the config DB will interrupt the existing session and initiate a new one. -##### STREAM_TELEMETRY_PROFILE ##### +#### STREAM_TELEMETRY_PROFILE ``` -STREAM_TELEMETRY_PROFILE:{{profile_name}} - "stream_status": {{enable/disable}} +STREAM_TELEMETRY_PROFILE|{{profile_name}} + "stream_state": {{enabled/disabled}} "poll_interval": {{uint32}} - "profile_id": {{uint16}} "chunk_size": {{uint32}} (OPTIONAL) "cache_size": {{uint32}} (OPTIONAL) ``` @@ -311,40 +354,43 @@ STREAM_TELEMETRY_PROFILE:{{profile_name}} ``` key = STREAM_TELEMETRY_PROFILE:profile_name a string as the identifier of stream telemetry ; field = value -stream_status = enable/disable ; Enable/Disable stream. +stream_state = enabled/disabled ; Enabled/Disabled stream. poll_interval = uint32 ; The interval to poll counter, unit milliseconds. -profile_id = uint16 ; A numeric identifier of stream telemetry. The range is 256-65535. chunk_size = uint32 ; number of stats groups in a telemetry message. cache_size = uint32 ; number of chunks that can be cached. ``` -##### STREAM_TELEMETRY_GROUP ##### +#### STREAM_TELEMETRY_GROUP ``` -STREAM_TELEMETRY_GROUP:{{group_name}}:{{profile_name}} +STREAM_TELEMETRY_GROUP|{{profile_name}}|{{group_name}} "object_names": {{list of object name}} "object_counters": {{list of stats of object}} ``` ``` key = STREAM_TELEMETRY_GROUP:group_name:profile_name - ; group_name is the object type, like PORT, QUEUE or INGRESS_PRIORITY_GROUP. + ; group_name is the object type, like PORT, BUFFER_PG or BUFFER_POOL. ; Multiple groups can be bound to a same stream telemetry profile. ; field = value -object_names = list of object name - ; The object name in the group, like Ethernet0,Ethernet8. comma separated list. +object_names = A comma separated list of object name. + ; The syntax of object name is top_object_name|index_range. + ; The object_name is the object of the top level, like port, Ethernet0,Ethernet4. + ; The index range is the object in second level, like priority group. + ; An example is Ethernet0|0,Ethernet4|3-4. object_counters = list of stats of object ; The stats name in the group. like SAI_PORT_STAT_IF_IN_OCTETS,SAI_PORT_STAT_IF_IN_UCAST_PKTS. ; comma separated list. ``` -#### StateDb #### +### StateDb -##### STREAM_TELEMETRY_SESSION ##### +#### STREAM_TELEMETRY_SESSION ``` -STREAM_TELEMETRY_SESSION:{{profile_name}} - "session_status": {{enable/disable}} +STREAM_TELEMETRY_SESSION|{{profile_name}} + "session_status": {{enabled/disabled}} + "session_type": {{ipfix}} "session_template": {{binary array}} ``` @@ -352,10 +398,11 @@ STREAM_TELEMETRY_SESSION:{{profile_name}} key = STREAM_TELEMETRY_SESSION:profile_name ; a string as the identifier of stream telemetry ; field = value session_status = enable/disable ; Enable/Disable stream. -session_template = binary array; The IPFIX template to interpret the message from netlink +session_type = ipfix ; Specified the session type. +session_template = binary array; The IPFIX template to interpret the message of this session. ``` -#### Work Flow +### Work Flow ``` mermaid @@ -424,31 +471,7 @@ sequenceDiagram ``` -This section covers the high level design of the feature/enhancement. This section covers the following points in detail. - - - Is it a built-in SONiC feature or a SONiC Application Extension? - - What are the modules and sub-modules that are modified for this design? - - What are the repositories that would be changed? - - Module/sub-module interfaces and dependencies. - - SWSS and Syncd changes in detail - - DB and Schema changes (APP_DB, ASIC_DB, COUNTERS_DB, LOGLEVEL_DB, CONFIG_DB, STATE_DB) - - Sequence diagram if required. - - Linux dependencies and interface - - Warm reboot requirements/dependencies - - Fastboot requirements/dependencies - - Scalability and performance requirements/impact - - Memory requirements - - Docker dependency - - Build dependency if any - - Management interfaces - SNMP, CLI, RestAPI, etc., - - Serviceability and Debug (logging, counters, trace etc) related design - - Is this change specific to any platform? Are there dependencies for platforms to implement anything to make this feature work? If yes, explain in detail and inform community in advance. - - SAI API requirements, CLI requirements, ConfigDB requirements. Design is covered in following sections. - -### SAI API ### - -This section covers the changes made or new API added in SAI API for implementing this feature. If there is no change in SAI API for HLD feature, it should be explicitly mentioned in this section. -This section should list the SAI APIs/objects used by the design so that silicon vendors can implement the required support in their SAI. Note that the SAI requirements should be discussed with SAI community during the design phase and ensure the required SAI support is implemented along with the feature/enhancement. +### SAI API ``` mermaid @@ -480,7 +503,7 @@ erDiagram SAI_TAM_REPORT_ATTR_REPORT_MODE SAI_TAM_REPORT_MODE_BULK SAI_TAM_REPORT_ATTR_REPORT_INTERVAL poll_interval "STREAM_TELEMETRY_PROFILE:profile_name[poll_interval] on Config DB" SAI_TAM_REPORT_ATTR_TEMPLATE_REPORT_INTERVAL _0 "Don't push the template, Because we hope the template can be proactively queried by orchagent" - SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID profile_id "STREAM_TELEMETRY_PROFILE:profile_name[profile_id] on Config DB" + SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID template_id "A unique id generated by stream telemetry orch" SAI_TAM_REPORT_ATTR_REPORT_INTERVAL_UNIT SAI_TAM_REPORT_INTERVAL_UNIT_MSEC } telemetry_type[TAM_telemetry_type] { @@ -508,15 +531,19 @@ erDiagram SAI_TAM_ATTR_TELEMETRY_OBJECTS_LIST sai_tam_telemetry_obj SAI_TAM_ATTR_TAM_BIND_POINT_TYPE_LIST SAI_TAM_BIND_POINT_TYPE_PORT } + switch[Switch] { + SAI_ID SAI_VALUE "Comments" + SAI_SWITCH_ATTR_TAM_OBJECT_ID sai_tam_obj + } - collector ||--|| hostif: binds - collector ||--|| transport: binds - telemetry_type ||--|| report: binds - telemetry ||..o| telemetry_type: binds - telemetry }o..|{ collector: binds - counter_subscription }|--|| telemetry_type: binds - TAM ||--|| telemetry: binds - + collector |o--|| hostif: binds + collector |o--|| transport: binds + telemetry_type |o--|| report: binds + telemetry |o--o| telemetry_type: binds + telemetry }o--o{ collector: binds + counter_subscription }o--|| telemetry_type: binds + TAM |o--o| telemetry: binds + switch |o..o{ TAM: binds ``` | Object Type | Scope | @@ -530,8 +557,7 @@ erDiagram | TAM_report | per STREAM_TELEMETRY profile | | TAM_counter_subscription | per stats of object | - -#### Create HOSTIF object #### +#### Create HOSTIF object ``` c++ @@ -554,7 +580,7 @@ create_hostif(sai_hostif_obj, switch_id, attr_count, sai_attr_list); ``` -#### Creating TAM transport object #### +#### Creating TAM transport object ``` c++ @@ -566,7 +592,7 @@ sai_create_tam_transport_fn(&sai_tam_transport_obj, switch_id, attr_count, sai_a ``` -#### Creating TAM collector object #### +#### Creating TAM collector object ``` c++ typedef enum _sai_tam_collector_attr_t @@ -608,7 +634,7 @@ sai_create_tam_collector_fn(&sai_tam_collector_obj, switch_id, attr_count, sai_a ``` -#### Creating TAM report object #### +#### Creating TAM report object ``` c++ /** @@ -666,7 +692,7 @@ sai_attr_list[3].id = SAI_TAM_REPORT_ATTR_TEMPLATE_REPORT_INTERVAL; sai_attr_list[3].value.s32 = 0; // Don't push the template, Because we hope the template can be proactively queried by orchagent sai_attr_list[4].id = SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID; -sai_attr_list[4].value.u16 = profile_id;// STREAM_TELEMETRY_PROFILE:profile_name[profile_id] on Config DB; +sai_attr_list[4].value.u16 = template_id;// A unique id generated by stream telemetry orch sai_attr_list[5].id = SAI_TAM_REPORT_ATTR_REPORT_INTERVAL_UNIT; sai_attr_list[5].value.s32 = SAI_TAM_REPORT_INTERVAL_UNIT_MSEC; @@ -676,7 +702,7 @@ sai_create_tam_report_fn(&sai_tam_report_obj, switch_id, attr_count, sai_attr_li ``` -#### Creating TAM telemetry type object #### +#### Creating TAM telemetry type object ``` c++ @@ -700,7 +726,7 @@ sai_create_tam_tel_type_fn(&sai_tam_tel_type_obj, switch_id, attr_count, sai_att ``` -#### Creating TAM telemetry object #### +#### Creating TAM telemetry object Extern TAM telemetry attributes in SAI @@ -809,10 +835,105 @@ sai_create_tam_telemetry_fn(&sai_tam_telemetry_obj, switch_id, attr_count, sai_a ``` -#### Create TAM counter subscription objects #### +#### Create TAM counter subscription objects Based on the STREAM_TELEMETRY_GROUP on Config DB, to create corresponding counter subscription objects. +Proposal a new subscription mode: OBJECT TYPE and Index + +``` c++ + +typedef enum _sai_tam_counter_subscription_type_t +{ + /** @brief Object based subscription */ + SAI_TAM_COUNTER_SUBSCRIPTION_OBJECT_ID_BASE, + + /** @brief Index based subscription */ + SAI_TAM_COUNTER_SUBSCRIPTION_OBJECT_INDEX_BASE, + +} sai_tam_counter_subscription_type_t; + +typedef enum _sai_tam_counter_subscription_attr_t +{ + + /** + * @brief Subscribed object + * + * @type sai_object_id_t + * @flags MANDATORY_ON_CREATE | CREATE_ONLY + * @objects SAI_OBJECT_TYPE_BUFFER_POOL, SAI_OBJECT_TYPE_INGRESS_PRIORITY_GROUP, SAI_OBJECT_TYPE_PORT, SAI_OBJECT_TYPE_QUEUE + * @validonly SAI_TAM_COUNTER_SUBSCRIPTION_TYPE == SAI_TAM_COUNTER_SUBSCRIPTION_OBJECT_ID_BASE + */ + SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_ID, + + /** + * @brief Subscribed stat enum + * + * @type sai_uint32_t + * @flags MANDATORY_ON_CREATE | CREATE_ONLY + */ + SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_ID, + + // ... + + /** + * @brief Tam telemetry reporting type + * + * @type sai_tam_reporting_type_t + * @flags MANDATORY_ON_CREATE | CREATE_ONLY + * @default SAI_TAM_COUNTER_SUBSCRIPTION_OBJECT_ID_BASE + */ + SAI_TAM_COUNTER_SUBSCRIPTION_TYPE, + + /** + * @brief Subscribed object + * + * @type sai_object_type_t + * @flags MANDATORY_ON_CREATE | CREATE_ONLY + * @validonly SAI_TAM_COUNTER_SUBSCRIPTION_TYPE == SAI_TAM_COUNTER_SUBSCRIPTION_OBJECT_INDEX_BASE + */ + SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_TYPE, + + /** + * @brief Subscribed object + * + * @type sai_uint32_t + * @flags MANDATORY_ON_CREATE | CREATE_ONLY + * @validonly SAI_TAM_COUNTER_SUBSCRIPTION_TYPE == SAI_TAM_COUNTER_SUBSCRIPTION_OBJECT_INDEX_BASE + */ + SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_INDEX, + +} sai_tam_counter_subscription_attr_t; + +``` + +- Index ID based + +``` c++ + +// Create counter subscription list + +sai_attr_list[0].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_TEL_TYPE; +sai_attr_list[0].value.oid = sai_tam_tel_type_obj; + +sai_attr_list[1].id = SAI_TAM_COUNTER_SUBSCRIPTION_TYPE; +sai_attr_list[1].value.s32 = SAI_TAM_COUNTER_SUBSCRIPTION_OBJECT_INDEX_BASE; + +sai_attr_list[2].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_TYPE; +sai_attr_list[2].value.s32 = SAI_OBJECT_TYPE_PORT; + +sai_attr_list[3].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_INDEX; +sai_attr_list[3].value.u32 = 2; // Calculate this index according to + + +attr_count = 4; + +create_tam_counter_subscription(&sai_tam_counter_subscription_obj, switch_id, attr_count, sai_attr_lis); +// If this stats of object cannot support this poll frequency, this API should return SAI_STATUS_NOT_SUPPORTED. +``` + +- Object ID based + ``` c++ // Create counter subscription list @@ -826,16 +947,13 @@ sai_attr_list[1].value.oid = port_obj; sai_attr_list[2].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_ID; sai_attr_list[2].value.oid = SAI_PORT_STAT_IF_IN_OCTETS; -sai_attr_list[3].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL; -sai_attr_list[3].value.oid = index; // The index of IPFIX template - -attr_count = 4; +attr_count = 3; create_tam_counter_subscription(&sai_tam_counter_subscription_obj, switch_id, attr_count, sai_attr_lis); // If this stats of object cannot support this poll frequency, this API should return SAI_STATUS_NOT_SUPPORTED. ``` -#### Create TAM object #### +#### Create TAM object ``` c++ @@ -853,7 +971,7 @@ sai_create_tam_fn(&sai_tam_obj, switch_id, attr_count, sai_attr_list); ``` -#### Query IPFIX template #### +#### Query IPFIX template ``` c++ @@ -868,63 +986,96 @@ free(attr.value.u8list.list); ``` -#### Enable/Disable telemetry stream #### +#### Enable/Disable telemetry stream ``` c++ -sai_attribute_t attr; +sai_object_id_t obj_list[100] = { 0 }; +sai_attr.value.count = 0; + +sai_attribute_t sai_attr; +sai_attr.id = SAI_SWITCH_ATTR_TAM_OBJECT_ID; +sai_attr.value.oidlist = obj_list; +sai_attr.value.count = 0; + +get_switch_attribute(switch_id, 1, &sai_attr); // Enable telemetry stream -attr.id = SAI_TAM_TELEMETRY_ATTR_TAM_TYPE_LIST; -attr.value.objlist.count = 1; -attr.value.objlist.list[0] = sai_tam_tel_type_obj; +sai_attr.value.oidlist[sai_attr.value.count] = sai_tam_obj; +sai_attr.value.count++; // Disable telemetry stream -attr.id = SAI_TAM_TELEMETRY_ATTR_TAM_TYPE_LIST; -attr.value.objlist.count = 1; -attr.value.objlist.list[0] = sai_tam_tel_type_obj; +std::remove(sai_attr.value.oidlist, sai_attr.value.oidlist + sai_attr.value.count, sai_tam_obj); +sai_attr.value.count--; -get_tam_telemetry_attribute(&sai_tam_telemetry_obj, 1, &attr); +set_switch_attribute(switch_id, sai_attr) ``` -### Configuration and management -This section should have sub-sections for all types of configuration and management related design. Example sub-sections for "CLI" and "Config DB" are given below. Sub-sections related to data models (YANG, REST, gNMI, etc.,) should be added as required. -If there is breaking change which may impact existing platforms, please call out in the design and get platform vendors reviewed. +## Configuration and management + +### Manifest (if the feature is an Application Extension) + +N/A + +### CLI/YANG model Enhancements -#### Manifest (if the feature is an Application Extension) +#### Config CLI -Paste a preliminary manifest in a JSON format. +``` shell -#### CLI/YANG model Enhancements +# Add a new profile +sudo config stream_telemetry profile add $profile_name --stream_state=$stream_state --poll_interval=$poll_interval --chunk_size=$chunk_size --cache_size=$cache_size -This sub-section covers the addition/deletion/modification of CLI changes and YANG model changes needed for the feature in detail. If there is no change in CLI for HLD feature, it should be explicitly mentioned in this section. Note that the CLI changes should ensure downward compatibility with the previous/existing CLI. i.e. Users should be able to save and restore the CLI from previous release even after the new CLI is implemented. -This should also explain the CLICK and/or KLISH related configuration/show in detail. -https://github.com/sonic-net/sonic-utilities/blob/master/doc/Command-Reference.md needs be updated with the corresponding CLI change. +# Change stream state +sudo config stream_telemetry profile set $profile_name --stream_state=$stream_state + +# Remove a existing profile +sudo config stream_telemetry group "$profile|$group_name" --object_names="$object1,$object2" --object_counters="$object_counters1,$object_counters2" + +``` -#### Config DB Enhancements +#### Inspect stream CLI -This sub-section covers the addition/deletion/modification of config DB changes needed for the feature. If there is no change in configuration for HLD feature, it should be explicitly mentioned in this section. This section should also ensure the downward compatibility for the change. - -### Warmboot and Fastboot Design Impact -Mention whether this feature/enhancement has got any requirements/dependencies/impact w.r.t. warmboot and fastboot. Ensure that existing warmboot/fastboot feature is not affected due to this design and explain the same. +Fetch all counters on the stream-telemetry + +``` shell +sudo stream-telemetry $profile_name --json/--table --duration=$duration +``` + +#### YANG + +[sonic-stream-telemetry.yang](sonic-stream-telemetry.yang) + +### Config DB Enhancements + +[Config DB](#config-db) + +### Warmboot and Fastboot Design Impact + +Warmboot/fastboot support is not required. ### Memory Consumption -This sub-section covers the memory consumption analysis for the new feature: no memory consumption is expected when the feature is disabled via compilation and no growing memory consumption while feature is disabled by configuration. -### Restrictions/Limitations -### Testing Requirements/Design -Explain what kind of unit testing, system testing, regression testing, warmboot/fastboot testing, etc., -Ensure that the existing warmboot/fastboot requirements are met. For example, if the current warmboot feature expects maximum of 1 second or zero second data disruption, the same should be met even after the new feature/enhancement is implemented. Explain the same here. -Example sub-sections for unit test cases and system test cases are given below. +In addition to constant memory consumption, dynamic memory consumption can be adjusted by configuring the chunk size and cache size of the stream-telemetry profile table in the config DB. + +$Dynamic Memory Consumption_{bytes} = \sum_{Profile} ({Cache Size} \times {Chunk Size} \times 8_{bytes} \times \sum_{Group} ({Object Count} \times {Stat Count}))$ + +### Restrictions/Limitations + +[Requirements](#requirements) + +### Testing Requirements/Design + +#### Unit Test cases -#### Unit Test cases +- Test that the `STREAM_TELEMETRY_GROUP` can be correctly converted to the SAI objects and their corresponding SAI STAT IDs by the Orchagent. #### System Test cases -### Open/Action items - if any +- Test that the counter can be correctly monitored by the counter syncd. +- Test that the counter can be correctly fetched using the telemetry stream CLI. - -NOTE: All the sections and sub-sections given above are mandatory in the design document. Users can add additional sections/sub-sections if required. +### Open/Action items - if any From b60811d08df77ddc6d40e2a378e15ef1a0ac2c68 Mon Sep 17 00:00:00 2001 From: Ze Gan Date: Tue, 10 Sep 2024 11:43:52 +0800 Subject: [PATCH 04/13] Remove subscription index Signed-off-by: Ze Gan --- doc/stream-telemetry/stream-telemetry-hld.md | 95 -------------------- 1 file changed, 95 deletions(-) diff --git a/doc/stream-telemetry/stream-telemetry-hld.md b/doc/stream-telemetry/stream-telemetry-hld.md index 59940ce719..b580e1a882 100644 --- a/doc/stream-telemetry/stream-telemetry-hld.md +++ b/doc/stream-telemetry/stream-telemetry-hld.md @@ -839,101 +839,6 @@ sai_create_tam_telemetry_fn(&sai_tam_telemetry_obj, switch_id, attr_count, sai_a Based on the STREAM_TELEMETRY_GROUP on Config DB, to create corresponding counter subscription objects. -Proposal a new subscription mode: OBJECT TYPE and Index - -``` c++ - -typedef enum _sai_tam_counter_subscription_type_t -{ - /** @brief Object based subscription */ - SAI_TAM_COUNTER_SUBSCRIPTION_OBJECT_ID_BASE, - - /** @brief Index based subscription */ - SAI_TAM_COUNTER_SUBSCRIPTION_OBJECT_INDEX_BASE, - -} sai_tam_counter_subscription_type_t; - -typedef enum _sai_tam_counter_subscription_attr_t -{ - - /** - * @brief Subscribed object - * - * @type sai_object_id_t - * @flags MANDATORY_ON_CREATE | CREATE_ONLY - * @objects SAI_OBJECT_TYPE_BUFFER_POOL, SAI_OBJECT_TYPE_INGRESS_PRIORITY_GROUP, SAI_OBJECT_TYPE_PORT, SAI_OBJECT_TYPE_QUEUE - * @validonly SAI_TAM_COUNTER_SUBSCRIPTION_TYPE == SAI_TAM_COUNTER_SUBSCRIPTION_OBJECT_ID_BASE - */ - SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_ID, - - /** - * @brief Subscribed stat enum - * - * @type sai_uint32_t - * @flags MANDATORY_ON_CREATE | CREATE_ONLY - */ - SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_ID, - - // ... - - /** - * @brief Tam telemetry reporting type - * - * @type sai_tam_reporting_type_t - * @flags MANDATORY_ON_CREATE | CREATE_ONLY - * @default SAI_TAM_COUNTER_SUBSCRIPTION_OBJECT_ID_BASE - */ - SAI_TAM_COUNTER_SUBSCRIPTION_TYPE, - - /** - * @brief Subscribed object - * - * @type sai_object_type_t - * @flags MANDATORY_ON_CREATE | CREATE_ONLY - * @validonly SAI_TAM_COUNTER_SUBSCRIPTION_TYPE == SAI_TAM_COUNTER_SUBSCRIPTION_OBJECT_INDEX_BASE - */ - SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_TYPE, - - /** - * @brief Subscribed object - * - * @type sai_uint32_t - * @flags MANDATORY_ON_CREATE | CREATE_ONLY - * @validonly SAI_TAM_COUNTER_SUBSCRIPTION_TYPE == SAI_TAM_COUNTER_SUBSCRIPTION_OBJECT_INDEX_BASE - */ - SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_INDEX, - -} sai_tam_counter_subscription_attr_t; - -``` - -- Index ID based - -``` c++ - -// Create counter subscription list - -sai_attr_list[0].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_TEL_TYPE; -sai_attr_list[0].value.oid = sai_tam_tel_type_obj; - -sai_attr_list[1].id = SAI_TAM_COUNTER_SUBSCRIPTION_TYPE; -sai_attr_list[1].value.s32 = SAI_TAM_COUNTER_SUBSCRIPTION_OBJECT_INDEX_BASE; - -sai_attr_list[2].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_TYPE; -sai_attr_list[2].value.s32 = SAI_OBJECT_TYPE_PORT; - -sai_attr_list[3].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_INDEX; -sai_attr_list[3].value.u32 = 2; // Calculate this index according to - - -attr_count = 4; - -create_tam_counter_subscription(&sai_tam_counter_subscription_obj, switch_id, attr_count, sai_attr_lis); -// If this stats of object cannot support this poll frequency, this API should return SAI_STATUS_NOT_SUPPORTED. -``` - -- Object ID based - ``` c++ // Create counter subscription list From f092315e9cb4e9b06a85b7fb2e6f6279d2d61e47 Mon Sep 17 00:00:00 2001 From: Ze Gan Date: Fri, 13 Sep 2024 20:27:40 +0800 Subject: [PATCH 05/13] Refactor STREAM_TELEMETRY_GROUP Signed-off-by: Ze Gan --- .../sonic-stream-telemetry.yang | 53 ++++++--- doc/stream-telemetry/stream-telemetry-hld.md | 16 +-- doc/stream-telemetry/stream_telemetry.json | 105 ++++++++++++++++++ 3 files changed, 152 insertions(+), 22 deletions(-) create mode 100644 doc/stream-telemetry/stream_telemetry.json diff --git a/doc/stream-telemetry/sonic-stream-telemetry.yang b/doc/stream-telemetry/sonic-stream-telemetry.yang index 39f4216e6f..52134bb090 100644 --- a/doc/stream-telemetry/sonic-stream-telemetry.yang +++ b/doc/stream-telemetry/sonic-stream-telemetry.yang @@ -5,6 +5,22 @@ module sonic-stream-telemetry { prefix sonic-stream-telemetry; + import sonic-port { + prefix port; + } + + import sonic-buffer-pool { + prefix bpl; + } + + import sonic-buffer-pg { + prefix bpg; + } + + import sonic-buffer-queue { + prefix bql; + } + container sonic-stream-telemetry { container STREAM_TELEMETRY_PROFILE { description "STREAM_TELEMETRY_PROFILE part of config_db.json"; @@ -19,19 +35,23 @@ module sonic-stream-telemetry { } leaf stream_state { + mandatory true; type string { pattern "enabled|disabled"; } } leaf poll_interval { + mandatory true; description "The interval to poll counter, unit milliseconds."; type uint32; } leaf chunk_size { - type uint32; - default 0; + type uint32 { + range "1..4294967295"; + } + default 1; } leaf cache_size { @@ -54,25 +74,30 @@ module sonic-stream-telemetry { // The table name of config db leaf group_name { - type string { - pattern "PORT|QUEUE|BUFFER_PG|BUFFER_POOL|BUFFER_QUEUE"; + type enumeration { + enum PORT; + enum BUFFER_POOL; + enum BUFFER_PG; + enum BUFFER_QUEUE; } } leaf-list object_names { - type string { - pattern "\w+(\|\d+(-\d+)?)?"; - error-message "Invalid object names"; - } - description "The object names to be monitored"; + type string; + must "( ../group_name = 'PORT' and current() = /port:sonic-port/port:PORT/port:PORT_LIST/port:name )" + + " or ( ../group_name = 'BUFFER_POOL' and current() = /bpl:sonic-buffer-pool/bpl:BUFFER_POOL/bpl:BUFFER_POOL_LIST/bpl:name )" + + " or ( ../group_name = 'BUFFER_PG' and substring-before(current(), '|') = /bpg:sonic-buffer-pg/bpg:BUFFER_PG/bpg:BUFFER_PG_LIST/bpg:port and re-match(substring-after(current(), '|'), '[0-9]+') )" + + " or ( ../group_name = 'BUFFER_QUEUE' and substring-before(current(), '|') = /bql:sonic-buffer-queue/bql:BUFFER_QUEUE/bql:BUFFER_QUEUE_LIST/bql:port and re-match(substring-after(current(), '|'), '[0-9]+') )"; } + must "count(object_names) > 0"; + leaf-list object_counters { - string { - pattern "SAI_[A-Z]+_STAT_([A-Z]+_)*[A-Z]+"; - error-message "Invalid STATS ID for SAI object"; - } - description "The SAI STATS ID"; + type string; + must "( ../group_name = 'PORT' and re-match(current(), 'IF_IN_OCTETS|IF_IN_UCAST_PKTS|IF_IN_DISCARDS|IF_IN_ERRORS|IN_CURR_OCCUPANCY_BYTES|IF_OUT_OCTETS|IF_OUT_DISCARDS|IF_OUT_ERRORS|IF_OUT_UCAST_PKTS|OUT_CURR_OCCUPANCY_BYTES') )" + + " or ( ../group_name = 'BUFFER_POOL' and re-match(current(), 'PACKETS|BYTES|DROPPED_PACKETS|CURR_OCCUPANCY_BYTES|WATERMARK_BYTES|WRED_ECN_MARKED_PACKETS') )" + + " or ( ../group_name = 'BUFFER_PG' and re-match(current(), 'PACKETS|BYTES|CURR_OCCUPANCY_BYTES|WATERMARK_BYTES|XOFF_ROOM_CURR_OCCUPANCY_BYTES|XOFF_ROOM_WATERMARK_BYTES|DROPPED_PACKETS') )" + + " or ( ../group_name = 'BUFFER_QUEUE' and re-match(current(), 'DROPPED_PACKETS|CURR_OCCUPANCY_BYTES|WATERMARK_BYTES|CURR_OCCUPANCY_BYTES|XOFF_ROOM_WATERMARK_BYTES') )"; } } diff --git a/doc/stream-telemetry/stream-telemetry-hld.md b/doc/stream-telemetry/stream-telemetry-hld.md index b580e1a882..3c088dee1e 100644 --- a/doc/stream-telemetry/stream-telemetry-hld.md +++ b/doc/stream-telemetry/stream-telemetry-hld.md @@ -1,6 +1,6 @@ # Stream telemetry high level design -## Table of Content ## +## Table of Content - [Revision](#revision) - [Scope](#scope) @@ -373,16 +373,16 @@ key = STREAM_TELEMETRY_GROUP:group_name:profile_name ; group_name is the object type, like PORT, BUFFER_PG or BUFFER_POOL. ; Multiple groups can be bound to a same stream telemetry profile. ; field = value -object_names = A comma separated list of object name. - ; The syntax of object name is top_object_name|index_range. +object_names = A list of object name. + ; The syntax of object name is top_object_name|index. ; The object_name is the object of the top level, like port, Ethernet0,Ethernet4. - ; The index range is the object in second level, like priority group. - ; An example is Ethernet0|0,Ethernet4|3-4. -object_counters = list of stats of object - ; The stats name in the group. like SAI_PORT_STAT_IF_IN_OCTETS,SAI_PORT_STAT_IF_IN_UCAST_PKTS. - ; comma separated list. + ; The index indicates the object in second level, like priority group. + ; An example is Ethernet0|0,Ethernet4|3. +object_counters = A list of stats of object; ``` +For the schema of `STREAM_TELEMETRY_GROUP`, please refer to its [YANG model](sonic-stream-telemetry.yang). + ### StateDb #### STREAM_TELEMETRY_SESSION diff --git a/doc/stream-telemetry/stream_telemetry.json b/doc/stream-telemetry/stream_telemetry.json new file mode 100644 index 0000000000..9c704d5011 --- /dev/null +++ b/doc/stream-telemetry/stream_telemetry.json @@ -0,0 +1,105 @@ +{ + "STREAM_TELEMETRY_VALID_CASE": { + "sonic-port:sonic-port": { + "sonic-port:PORT": { + "PORT_LIST": [ + { + "name": "Ethernet0", + "lanes": "0", + "speed": 25000 + }, + { + "name": "Ethernet4", + "lanes": "4", + "speed": 25000 + } + ] + } + }, + "sonic-buffer-pool:sonic-buffer-pool": { + "sonic-buffer-pool:BUFFER_POOL": { + "BUFFER_POOL_LIST": [ + { + "name": "egress_lossless_pool", + "mode": "static", + "size": "300", + "type": "ingress" + } + ] + } + }, + "sonic-buffer-profile:sonic-buffer-profile": { + "sonic-buffer-profile:BUFFER_PROFILE": { + "BUFFER_PROFILE_LIST": [ + { + "name": "lossless_buffer_profile", + "size": "1518", + "dynamic_th": "2", + "pool": "egress_lossless_pool" + } + ] + } + }, + "sonic-buffer-pg:sonic-buffer-pg": { + "sonic-buffer-pg:BUFFER_PG": { + "BUFFER_PG_LIST": [ + { + "port": "Ethernet4", + "pg_num": "3", + "profile": "lossless_buffer_profile" + } + ] + } + }, + "sonic-buffer-queue:sonic-buffer-queue": { + "sonic-buffer-queue:BUFFER_QUEUE": { + "BUFFER_QUEUE_LIST": [ + { + "port": "Ethernet0", + "qindex": "15", + "profile": "lossless_buffer_profile" + } + ] + } + }, + "sonic-stream-telemetry:sonic-stream-telemetry": { + "sonic-stream-telemetry:STREAM_TELEMETRY_PROFILE": { + "STREAM_TELEMETRY_PROFILE_LIST": [ + { + "name": "high_frequency_counters", + "stream_state": "enabled", + "poll_interval": 100 + } + ] + }, + "sonic-stream-telemetry:STREAM_TELEMETRY_GROUP": { + "STREAM_TELEMETRY_GROUP_LIST": [ + { + "profile_name": "high_frequency_counters", + "group_name": "PORT", + "object_names": ["Ethernet0", "Ethernet4"], + "object_counters": ["IF_IN_OCTETS", "OUT_CURR_OCCUPANCY_BYTES"] + }, + { + "profile_name": "high_frequency_counters", + "group_name": "BUFFER_POOL", + "object_names": ["egress_lossless_pool"], + "object_counters": ["PACKETS", "WRED_ECN_MARKED_PACKETS"] + }, + { + "profile_name": "high_frequency_counters", + "group_name": "BUFFER_PG", + "object_names": ["Ethernet4|0", "Ethernet4|3"], + "object_counters": ["CURR_OCCUPANCY_BYTES", "XOFF_ROOM_WATERMARK_BYTES"] + }, + { + "profile_name": "high_frequency_counters", + "group_name": "BUFFER_QUEUE", + "object_names": ["Ethernet0|15", "Ethernet0|3"], + "object_counters": ["WATERMARK_BYTES", "XOFF_ROOM_WATERMARK_BYTES"] + } + ] + } + } + } + } From 6c2888eb527849e0d98a0693b9d979330ebb43ff Mon Sep 17 00:00:00 2001 From: Ze Gan Date: Thu, 19 Sep 2024 20:52:27 +0800 Subject: [PATCH 06/13] Refactor SAI for hostif trap Signed-off-by: Ze Gan --- doc/stream-telemetry/stream-telemetry-hld.md | 129 +++++++++++++------ 1 file changed, 88 insertions(+), 41 deletions(-) diff --git a/doc/stream-telemetry/stream-telemetry-hld.md b/doc/stream-telemetry/stream-telemetry-hld.md index 3c088dee1e..80cf8d10bc 100644 --- a/doc/stream-telemetry/stream-telemetry-hld.md +++ b/doc/stream-telemetry/stream-telemetry-hld.md @@ -25,14 +25,17 @@ - [STREAM\_TELEMETRY\_SESSION](#stream_telemetry_session) - [Work Flow](#work-flow) - [SAI API](#sai-api) - - [Create HOSTIF object](#create-hostif-object) + - [Creating HOSTIF object](#creating-hostif-object) + - [Creating HOSTIF trap group](#creating-hostif-trap-group) + - [Creating HOSTIF user defined trap](#creating-hostif-user-defined-trap) + - [Creating Hostif table entry](#creating-hostif-table-entry) - [Creating TAM transport object](#creating-tam-transport-object) - [Creating TAM collector object](#creating-tam-collector-object) - [Creating TAM report object](#creating-tam-report-object) - [Creating TAM telemetry type object](#creating-tam-telemetry-type-object) - [Creating TAM telemetry object](#creating-tam-telemetry-object) - - [Create TAM counter subscription objects](#create-tam-counter-subscription-objects) - - [Create TAM object](#create-tam-object) + - [Creating TAM counter subscription objects](#creating-tam-counter-subscription-objects) + - [Creating TAM object](#creating-tam-object) - [Query IPFIX template](#query-ipfix-template) - [Enable/Disable telemetry stream](#enabledisable-telemetry-stream) - [Configuration and management](#configuration-and-management) @@ -152,7 +155,7 @@ The `counter syncd` is a new module that runs within the GNMI container. Its pri constants: stream_telemetry: genl_family: "sonic_stel" - genl_multicast_group: "stats" + genl_multicast_group: "ipfix" } ``` @@ -427,7 +430,7 @@ sequenceDiagram participant netlink_module as Netlink module end participant asic as ASIC - + counter --> counter: Initialize genetlink config_db ->> st_orch: STREAM_TELEMETRY_PROFILE opt Is the first telemetry profile? @@ -479,13 +482,29 @@ sequenceDiagram title: Stream Telemetry SAI Objects --- erDiagram + hostif_trap_group [HOSTIF_trap_group] { + SAI_ID SAI_VALUE "Comments" + } hostif[HOSTIF] { SAI_ID SAI_VALUE "Comments" SAI_HOSTIF_ATTR_TYPE SAI_HOSTIF_TYPE_GENETLINK SAI_HOSTIF_ATTR_OPER_STATUS true SAI_HOSTIF_ATTR_NAME sonic_stel "constant variables" - SAI_HOSTIF_ATTR_GENETLINK_MCGRP_NAME stats "constant variables" + SAI_HOSTIF_ATTR_GENETLINK_MCGRP_NAME ipfix "constant variables" + } + host_table_entry [HOSTIF_table_entry] { + SAI_ID SAI_VALUE "Comments" + SAI_HOSTIF_TABLE_ENTRY_ATTR_TYPE SAI_HOSTIF_TABLE_ENTRY_TYPE_TRAP_ID + SAI_HOSTIF_TABLE_ENTRY_ATTR_TRAP_ID sai_hostif_udt_obj + SAI_HOSTIF_TABLE_ENTRY_ATTR_CHANNEL_TYPE SAI_HOSTIF_TABLE_ENTRY_CHANNEL_TYPE_GENETLINK + SAI_HOSTIF_TABLE_ENTRY_ATTR_HOST_IF sai_hostif_obj + } + hostif_trap [HostIF_user_defined_trap] { + SAI_ID SAI_VALUE "Comments" + SAI_HOSTIF_USER_DEFINED_TRAP_ATTR_TYPE SAI_HOSTIF_USER_DEFINED_TRAP_TYPE_TAM + SAI_HOSTIF_USER_DEFINED_TRAP_ATTR_TRAP_GROUP sai_trap_group_obj } + transport[TAM_transport] { SAI_ID SAI_VALUE "Comments" SAI_TAM_TRANSPORT_ATTR_TRANSPORT_TYPE SAI_TAM_TRANSPORT_TYPE_NONE @@ -494,7 +513,7 @@ erDiagram SAI_ID SAI_VALUE "Comments" SAI_TAM_COLLECTOR_ATTR_TRANSPORT sai_tam_transport_obj SAI_TAM_COLLECTOR_ATTR_LOCALHOST true - SAI_TAM_COLLECTOR_ATTR_HOSTIF sai_hostif_obj + SAI_TAM_COLLECTOR_ATTR_HOSTIF sai_hostif_udt_obj SAI_TAM_COLLECTOR_ATTR_DSCP_VALUE _0 } report[TAM_report] { @@ -536,7 +555,10 @@ erDiagram SAI_SWITCH_ATTR_TAM_OBJECT_ID sai_tam_obj } - collector |o--|| hostif: binds + host_table_entry |o--|| hostif: binds + host_table_entry |o--|| hostif_trap: binds + hostif_trap |o--|| hostif_trap_group: binds + collector |o--|| hostif_trap: binds collector |o--|| transport: binds telemetry_type |o--|| report: binds telemetry |o--o| telemetry_type: binds @@ -549,6 +571,9 @@ erDiagram | Object Type | Scope | | ------------------------ | ---------------------------- | | HOSTIF | Global | +| HOSTIF_trap_group | Global | +| HostIF_user_defined_trap | Global | +| HOSTIF_table_entry | Global | | TAM_transport | Global | | TAM_collector | Global | | TAM | per STREAM_TELEMETRY profile | @@ -557,7 +582,7 @@ erDiagram | TAM_report | per STREAM_TELEMETRY profile | | TAM_counter_subscription | per stats of object | -#### Create HOSTIF object +#### Creating HOSTIF object ``` c++ @@ -573,48 +598,70 @@ strncpy(sai_attr_list[2].value.chardata, "sonic_stel", strlen("sonic_stel") + 1) // Set genetlink group sai_attr_list[3].id = SAI_HOSTIF_ATTR_GENETLINK_MCGRP_NAME; -strncpy(sai_attr_list[3].value.chardata, "stats", strlen("stats") + 1); +strncpy(sai_attr_list[3].value.chardata, "ipfix", strlen("ipfix") + 1); attr_count = 4; create_hostif(sai_hostif_obj, switch_id, attr_count, sai_attr_list); ``` -#### Creating TAM transport object +#### Creating HOSTIF trap group ``` c++ -sai_attr_list[0].id = SAI_TAM_TRANSPORT_ATTR_TRANSPORT_TYPE; -sai_attr_list[0].value.s32 = SAI_TAM_TRANSPORT_TYPE_NONE; +create_hostif_trap_group(sai_trap_group_obj, switch_id, 0, NULL); -attr_count = 1; -sai_create_tam_transport_fn(&sai_tam_transport_obj, switch_id, attr_count, sai_attr_list); +``` + +#### Creating HOSTIF user defined trap + +``` c++ + +sai_attr_list[0].id = SAI_HOSTIF_USER_DEFINED_TRAP_ATTR_TYPE; +sai_attr_list[0].value.s32 = SAI_HOSTIF_USER_DEFINED_TRAP_TYPE_TAM; + +sai_attr_list[1].id = SAI_HOSTIF_USER_DEFINED_TRAP_ATTR_TRAP_GROUP; +sai_attr_list[1].value.oid = sai_trap_group_obj; + +attr_count = 2; +sai_create_hostif_user_defined_trap_fn(&sai_hostif_udt_obj, switch_id, attr_count, sai_attr_list); ``` -#### Creating TAM collector object +#### Creating Hostif table entry ``` c++ -typedef enum _sai_tam_collector_attr_t -{ - // ... - /** - * @brief Hostif object used to reach local host via GENETLINK - * - * @type sai_object_id_t - * @flags CREATE_AND_SET - * @objects SAI_OBJECT_TYPE_HOSTIF - * @allownull true - * @default SAI_NULL_OBJECT_ID - * @validonly SAI_TAM_COLLECTOR_ATTR_LOCALHOST == true - */ - SAI_TAM_COLLECTOR_ATTR_HOSTIF, +sai_attr_list[0].id = SAI_HOSTIF_TABLE_ENTRY_ATTR_TYPE; +sai_attr_list[0].value.s32 = SAI_HOSTIF_TABLE_ENTRY_TYPE_TRAP_ID; + +sai_attr_list[1].id = SAI_HOSTIF_TABLE_ENTRY_ATTR_TRAP_ID; +sai_attr_list[1].value.oid = sai_hostif_udt_obj; + +sai_attr_list[2].id = SAI_HOSTIF_TABLE_ENTRY_ATTR_CHANNEL_TYPE; +sai_attr_list[2].value.s32 = SAI_HOSTIF_TABLE_ENTRY_CHANNEL_TYPE_GENETLINK; + +sai_attr_list[3].id = SAI_HOSTIF_TABLE_ENTRY_ATTR_HOST_IF; +sai_attr_list[3].value.oid = sai_hostif_obj; + +attr_count = 4; +sai_create_hostif_table_entry_fn(&sai_hostif_table_entry_obj, switch_id, attr_count, sai_attr_list); - // ... -} sai_tam_collector_attr_t; ``` +#### Creating TAM transport object + +``` c++ + +sai_attr_list[0].id = SAI_TAM_TRANSPORT_ATTR_TRANSPORT_TYPE; +sai_attr_list[0].value.s32 = SAI_TAM_TRANSPORT_TYPE_NONE; +attr_count = 1; +sai_create_tam_transport_fn(&sai_tam_transport_obj, switch_id, attr_count, sai_attr_list); + +``` + +#### Creating TAM collector object + ``` c++ sai_attr_list[0].id = SAI_TAM_COLLECTOR_ATTR_TRANSPORT; @@ -623,8 +670,8 @@ sai_attr_list[0].value.oid = sai_tam_transport_obj; sai_attr_list[1].id = SAI_TAM_COLLECTOR_ATTR_LOCALHOST; sai_attr_list[1].value.booldata = true; -sai_attr_list[2].id = SAI_TAM_COLLECTOR_ATTR_HOSTIF; -sai_attr_list[2].value.oid = sai_hostif_obj; +sai_attr_list[2].id = SAI_TAM_COLLECTOR_ATTR_HOSTIF_TRAP; +sai_attr_list[2].value.oid = sai_hostif_udt_obj; sai_attr_list[3].id = SAI_TAM_COLLECTOR_ATTR_DSCP_VALUE; sai_attr_list[3].value.u8 = 0; @@ -659,7 +706,7 @@ typedef enum _sai_tam_report_attr_t SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID, /** - * @brief query IPFIX template + * @brief query IPFIX template * * Return the IPFIX template binary buffer * @@ -835,7 +882,7 @@ sai_create_tam_telemetry_fn(&sai_tam_telemetry_obj, switch_id, attr_count, sai_a ``` -#### Create TAM counter subscription objects +#### Creating TAM counter subscription objects Based on the STREAM_TELEMETRY_GROUP on Config DB, to create corresponding counter subscription objects. @@ -858,19 +905,19 @@ create_tam_counter_subscription(&sai_tam_counter_subscription_obj, switch_id, at // If this stats of object cannot support this poll frequency, this API should return SAI_STATUS_NOT_SUPPORTED. ``` -#### Create TAM object +#### Creating TAM object ``` c++ sai_attr_list[0].id = SAI_TAM_ATTR_TELEMETRY_OBJECTS_LIST; sai_attr_list[0].value.objlist.count = 1; sai_attr_list[0].value.objlist.list[0] = sai_tam_telemetry_obj; - + sai_attr_list[1].id = SAI_TAM_ATTR_TAM_BIND_POINT_TYPE_LIST; sai_attr_list[1].value.objlist.count = 2; sai_attr_list[1].value.objlist.list[0] = SAI_TAM_BIND_POINT_TYPE_PORT; -sai_attr_list[1].value.objlist.list[0] = SAI_TAM_BIND_POINT_TYPE_QUEUE; - +sai_attr_list[1].value.objlist.list[0] = SAI_TAM_BIND_POINT_TYPE_QUEUE; + attr_count = 2; sai_create_tam_fn(&sai_tam_obj, switch_id, attr_count, sai_attr_list); @@ -925,7 +972,7 @@ set_switch_attribute(switch_id, sai_attr) N/A -### CLI/YANG model Enhancements +### CLI/YANG model Enhancements #### Config CLI From 748d68b308564c402e31f41926afa129099a5e39 Mon Sep 17 00:00:00 2001 From: Ze Gan Date: Fri, 11 Oct 2024 14:38:34 +0800 Subject: [PATCH 07/13] remove useless content Signed-off-by: Ze Gan --- doc/stream-telemetry/stream-telemetry-hld.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/doc/stream-telemetry/stream-telemetry-hld.md b/doc/stream-telemetry/stream-telemetry-hld.md index 80cf8d10bc..3271ea2d30 100644 --- a/doc/stream-telemetry/stream-telemetry-hld.md +++ b/doc/stream-telemetry/stream-telemetry-hld.md @@ -141,8 +141,6 @@ flowchart BT counter_syncd --telemetry message--> gnmi ``` -STATE_DB channel model? Produce Table/Consume Table - ## High-Level Design ### Modules From 9e38bd9e51b2a6c0ab6af6d00b5d9a11f24ba2b2 Mon Sep 17 00:00:00 2001 From: Ze Gan Date: Fri, 11 Oct 2024 19:01:37 +0800 Subject: [PATCH 08/13] Add object label in SAI Signed-off-by: Ze Gan --- doc/stream-telemetry/stream-telemetry-hld.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/doc/stream-telemetry/stream-telemetry-hld.md b/doc/stream-telemetry/stream-telemetry-hld.md index 3271ea2d30..9d2ab8f5de 100644 --- a/doc/stream-telemetry/stream-telemetry-hld.md +++ b/doc/stream-telemetry/stream-telemetry-hld.md @@ -541,7 +541,7 @@ erDiagram SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_TEL_TYPE sai_tam_tel_type_obj SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_ID port_obj SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_ID SAI_PORT_STAT_IF_IN_OCTETS "A stats in sai_port_stat_t" - SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL index "The index of IPFIX template" + SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL index "Element ID of the object in the IPFIX template" } TAM[TAM] { SAI_ID SAI_VALUE "Comments" @@ -897,7 +897,10 @@ sai_attr_list[1].value.oid = port_obj; sai_attr_list[2].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_ID; sai_attr_list[2].value.oid = SAI_PORT_STAT_IF_IN_OCTETS; -attr_count = 3; +sai_attr_list[3].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL; +sai_attr_list[3].value.oid = index; // Element ID of the object in the IPFIX template + +attr_count = 4; create_tam_counter_subscription(&sai_tam_counter_subscription_obj, switch_id, attr_count, sai_attr_lis); // If this stats of object cannot support this poll frequency, this API should return SAI_STATUS_NOT_SUPPORTED. From 4658b7cc84c0856936d6d15389acf2f26a165389 Mon Sep 17 00:00:00 2001 From: Ze Gan Date: Fri, 11 Oct 2024 19:30:25 +0800 Subject: [PATCH 09/13] Move gnmi to opentelemetry Signed-off-by: Ze Gan --- doc/stream-telemetry/stream-telemetry-hld.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/doc/stream-telemetry/stream-telemetry-hld.md b/doc/stream-telemetry/stream-telemetry-hld.md index 9d2ab8f5de..822ff98618 100644 --- a/doc/stream-telemetry/stream-telemetry-hld.md +++ b/doc/stream-telemetry/stream-telemetry-hld.md @@ -101,8 +101,8 @@ flowchart BT end subgraph SONiC service - subgraph GNMI container - gnmi(GNMI server) + subgraph OpenTelemetry container + otel(OpenTelemetry Collector) counter_syncd(Counter Syncd) end subgraph SWSS container @@ -138,7 +138,7 @@ flowchart BT asic --counters--> dma_engine dma_engine --IPFIX record--> netlink_module netlink_module --IPFIX record--> counter_syncd - counter_syncd --telemetry message--> gnmi + counter_syncd -- open telemetry message --> otel ``` ## High-Level Design @@ -147,7 +147,7 @@ flowchart BT #### Counter Syncd -The `counter syncd` is a new module that runs within the GNMI container. Its primary responsibility is to receive counter messages via netlink and convert them into GNMI messages for an external collector. It subscribes to a socket of a specific family and multicast group of generic netlink. The configuration for generic netlink is defined as constants in `/etc/sonic/constants.yml` as follows. +The `counter syncd` is a new module that runs within the OpenTelemetry container. Its primary responsibility is to receive counter messages via netlink and convert them into open telemetry messages for a collector. It subscribes to a socket of a specific family and multicast group of generic netlink. The configuration for generic netlink is defined as constants in `/etc/sonic/constants.yml` as follows. ``` yaml constants: @@ -413,8 +413,8 @@ sequenceDiagram participant config_db as CONFIG_DB participant state_db as STATE_DB end - box GNMI container - participant gnmi as gnmi server + box OpenTelemetry container + participant otel as OpenTelemetry Collector participant counter as counter syncd end box SWSS container @@ -464,7 +464,7 @@ sequenceDiagram end loop Receive IPFIX message of stats from genetlink alt Have this template of IPFIX been registered? - counter ->> gnmi: Push message to GNMI server + counter ->> otel: Push message to OpenTelemetry Collector else counter ->> counter: Discard this message end From a2bdf0e94351def9ded31abc8eb74fd82e5f9a11 Mon Sep 17 00:00:00 2001 From: Ze Gan Date: Sat, 12 Oct 2024 11:03:17 +0800 Subject: [PATCH 10/13] Update SAI to follow convention Signed-off-by: Ze Gan --- doc/stream-telemetry/stream-telemetry-hld.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/doc/stream-telemetry/stream-telemetry-hld.md b/doc/stream-telemetry/stream-telemetry-hld.md index 822ff98618..53281c2eba 100644 --- a/doc/stream-telemetry/stream-telemetry-hld.md +++ b/doc/stream-telemetry/stream-telemetry-hld.md @@ -698,19 +698,19 @@ typedef enum _sai_tam_report_attr_t * * @type sai_uint16_t * @flags CREATE_AND_SET + * @isvlan false * @default 0 * @validonly SAI_TAM_REPORT_ATTR_TYPE == SAI_TAM_REPORT_TYPE_IPFIX */ SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID, /** - * @brief query IPFIX template + * @brief Query IPFIX template * * Return the IPFIX template binary buffer * - * @type sai_uint8_list_t + * @type sai_u8_list_t * @flags READ_ONLY - * @validonly SAI_TAM_REPORT_ATTR_TYPE == SAI_TAM_REPORT_TYPE_IPFIX */ SAI_TAM_REPORT_ATTR_IPFIX_TEMPLATE, @@ -777,6 +777,9 @@ Extern TAM telemetry attributes in SAI ``` c++ +/** + * @brief TAM reporting type + */ typedef enum _sai_tam_reporting_type_t { /** @@ -801,7 +804,7 @@ typedef enum _sai_tam_telemetry_attr_t * @type sai_tam_reporting_unit_t * @flags CREATE_AND_SET * @default SAI_TAM_REPORTING_UNIT_SEC - * @condition SAI_TAM_TELEMETRY_ATTR_REPORTING_TYPE == SAI_TAM_REPORTING_TYPE_TIME_BASED + * @validonly SAI_TAM_TELEMETRY_ATTR_TAM_REPORTING_TYPE == SAI_TAM_REPORTING_TYPE_TIME_BASED */ SAI_TAM_TELEMETRY_ATTR_TAM_REPORTING_UNIT, @@ -813,7 +816,7 @@ typedef enum _sai_tam_telemetry_attr_t * @type sai_uint32_t * @flags CREATE_AND_SET * @default 1 - * @condition SAI_TAM_TELEMETRY_ATTR_REPORTING_TYPE == SAI_TAM_REPORTING_TYPE_TIME_BASED + * @validonly SAI_TAM_TELEMETRY_ATTR_TAM_REPORTING_TYPE == SAI_TAM_REPORTING_TYPE_TIME_BASED */ SAI_TAM_TELEMETRY_ATTR_REPORTING_INTERVAL, @@ -824,7 +827,7 @@ typedef enum _sai_tam_telemetry_attr_t * @flags CREATE_AND_SET * @default SAI_TAM_REPORTING_TYPE_TIME_BASED */ - SAI_TAM_TELEMETRY_ATTR_REPORTING_TYPE, + SAI_TAM_TELEMETRY_ATTR_TAM_REPORTING_TYPE, /** * @brief Tam telemetry reporting chunk size @@ -835,7 +838,7 @@ typedef enum _sai_tam_telemetry_attr_t * @type sai_uint32_t * @flags CREATE_AND_SET * @default 1 - * @condition SAI_TAM_TELEMETRY_ATTR_REPORTING_TYPE == SAI_TAM_REPORTING_TYPE_COUNT_BASED + * @validonly SAI_TAM_TELEMETRY_ATTR_TAM_REPORTING_TYPE == SAI_TAM_REPORTING_TYPE_COUNT_BASED */ SAI_TAM_TELEMETRY_ATTR_REPORTING_CHUNK_SIZE, From 24fb36b566df7f532a414c907e41e1cdeb1d2335 Mon Sep 17 00:00:00 2001 From: Ze Gan Date: Mon, 21 Oct 2024 20:30:40 +0800 Subject: [PATCH 11/13] Add netlink message Signed-off-by: Ze Gan --- doc/stream-telemetry/stream-telemetry-hld.md | 30 ++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/doc/stream-telemetry/stream-telemetry-hld.md b/doc/stream-telemetry/stream-telemetry-hld.md index 53281c2eba..ec1178f19a 100644 --- a/doc/stream-telemetry/stream-telemetry-hld.md +++ b/doc/stream-telemetry/stream-telemetry-hld.md @@ -17,6 +17,7 @@ - [IPFIX header](#ipfix-header) - [IPFIX template](#ipfix-template) - [IPFIX data](#ipfix-data) + - [Netlink message](#netlink-message) - [Bandwidth Estimation](#bandwidth-estimation) - [Config DB](#config-db) - [STREAM\_TELEMETRY\_PROFILE](#stream_telemetry_profile) @@ -325,6 +326,35 @@ packet-beta 832-895: "Port 3: SAI_PORT_STAT_IF_IN_ERRORS = 8" ``` +#### Netlink message + +We expect that all control messages and out-of-band information will be transmitted by the SAI. Therefore, we do not need to read the attribute header of netlink and message header of Genetlink from the socket. The sample code for building the message from the kernel side should look as follows: + +``` c + +struct genl_multicast_group stel_mcgrps[] = { + { .name = "ipfix" }, +}; + +// Family definition +static struct genl_family nl_bench_family = { + .name = "sonic_stel", + .version = 1, + // ... + .mcgrps = stel_mcgrps, + .n_mcgrps = ARRAY_SIZE(stel_mcgrps), +}; + + +void send_msg_to_user(int ipfix_msg_len, const void *ipfix_msg) +{ + struct sk_buff *skb_out = nlmsg_new(ipfix_msg_len, GFP_KERNEL); + nla_put_nohdr(skb_out, ipfix_msg_len, ipfix_msg); + genlmsg_multicast(&genl_family, skb_out, 0, 0/* group_id to ipfix group */, GFP_KERNEL); +} + +``` + ### Bandwidth Estimation We estimate the bandwidth based only on the effective data size, not the actual data size. The extra information in a message, such as the IPFIX header (16 bytes), data prefix (4 bytes), and observation time milliseconds (4 bytes), is negligible. For example, if we want to collect 30 stats on 64 ports, and the chunk size is 100: $The Percentage Of Effective Data = \frac{8 \times 30 \times 64 \times 100_{Effective Data}}{16_{Header} + 4 \times 100_{Data Prefix} + 4 \times 100_{Observation Time Milliseconds} + 8 \times 30 \times 64 \times 100_{Effective Data}} \approx 99.9\%$ . From 02acf74c167fc9a7f7863d2f24aa4bd233b30b71 Mon Sep 17 00:00:00 2001 From: Ze Gan Date: Mon, 28 Oct 2024 15:26:15 +0800 Subject: [PATCH 12/13] Update from review Signed-off-by: Ze Gan --- .../netlink_dma_channel.drawio.svg | 2 +- doc/stream-telemetry/stream-telemetry-hld.md | 326 ++++++++++++------ 2 files changed, 212 insertions(+), 116 deletions(-) diff --git a/doc/stream-telemetry/netlink_dma_channel.drawio.svg b/doc/stream-telemetry/netlink_dma_channel.drawio.svg index 4bc2bdf783..73f5ef5013 100644 --- a/doc/stream-telemetry/netlink_dma_channel.drawio.svg +++ b/doc/stream-telemetry/netlink_dma_channel.drawio.svg @@ -1,4 +1,4 @@ -
genetlink
family: sonic_stel
group: stats
genetlink...
Netlink Module
Netlink Module
Counter Syncd
Counter Syncd
DMA Engine
DMA Engine
ASIC
ASIC
IPFIX header
observation time milliseconds
port 2 stats 1
port 8 stats 2
queue 1 stats 2
queue 5 stats 2
observation time milliseconds
port 2 stats 1
port 8 stats 2
queue 1 stats 2
queue 5 stats 2
IPFIX headerobservation time...
observation time milliseconds
observation time milliseconds
port 2 stats 1
port 2 stats 1
port 8 stats 1
port 8 stats 1
queue 1 stats 2
queue 1 stats 2
queue 5 stats 2
queue 5 stats 2
poll interval
poll interval
chunk size
chunk size
Registered IPFIX template
ID 256
ID 256
IPFIX Template 
IPFIX Template 
ID 257
ID 257
IPFIX Template 
IPFIX Template 
ID 257
ID 257
IPFIX Template 
IPFIX Template 
IPFIX Message
IPFIX Message
IPFIX parser
IPFIX parser

Ring buffer

size = cache size

Ring buffer...
Drop
(if no template can be decided)
Drop...
GNMi message
GNMi message
Convert to GNMi message
Convert to GNMi message
Text is not SVG - cannot display
\ No newline at end of file +
genetlink
family: sonic_stel
group: ipfix
genetlink...
Netlink Module
Netlink Module
Counter Syncd
Counter Syncd
DMA Engine
DMA Engine
ASIC
ASIC
IPFIX Header
IPFIX Header
observation time milliseconds

port 2 stats 1

port 8 stats 1

queue 1 stats 2

queue 5 stats 2
observation time milliseconds...
bulk count
bulk count
Registered IPFIX template
ID 256
ID 256
IPFIX Template 
IPFIX Template 
ID 257
ID 257
IPFIX Template 
IPFIX Template 
ID 258
ID 258
IPFIX Template 
IPFIX Template 
IPFIX parser
IPFIX parser

Ring buffer

capability = cache count

Ring buffer...
Drop
(if no template can be decided)
Drop...
OpenTelemetryMessage
OpenTelemetryMess...
Convert to OpenTelemetryMessage
Convert to OpenTelemetryMessage
Netlink healder
Netlink healder
IPFIX recording 1

IPFIX recording 2

IPFIX recording 3

IPFIX recording 4

IPFIX recording 5
IPFIX recording 1...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/stream-telemetry/stream-telemetry-hld.md b/doc/stream-telemetry/stream-telemetry-hld.md index ec1178f19a..c3fd883391 100644 --- a/doc/stream-telemetry/stream-telemetry-hld.md +++ b/doc/stream-telemetry/stream-telemetry-hld.md @@ -20,12 +20,14 @@ - [Netlink message](#netlink-message) - [Bandwidth Estimation](#bandwidth-estimation) - [Config DB](#config-db) + - [DEVICE\_METADATA](#device_metadata) - [STREAM\_TELEMETRY\_PROFILE](#stream_telemetry_profile) - [STREAM\_TELEMETRY\_GROUP](#stream_telemetry_group) - [StateDb](#statedb) - [STREAM\_TELEMETRY\_SESSION](#stream_telemetry_session) - [Work Flow](#work-flow) - [SAI API](#sai-api) + - [Initialize TAM cache count for Switch](#initialize-tam-cache-count-for-switch) - [Creating HOSTIF object](#creating-hostif-object) - [Creating HOSTIF trap group](#creating-hostif-trap-group) - [Creating HOSTIF user defined trap](#creating-hostif-user-defined-trap) @@ -39,6 +41,7 @@ - [Creating TAM object](#creating-tam-object) - [Query IPFIX template](#query-ipfix-template) - [Enable/Disable telemetry stream](#enabledisable-telemetry-stream) + - [Query stream telemetry capability](#query-stream-telemetry-capability) - [Configuration and management](#configuration-and-management) - [Manifest (if the feature is an Application Extension)](#manifest-if-the-feature-is-an-application-extension) - [CLI/YANG model Enhancements](#cliyang-model-enhancements) @@ -85,7 +88,8 @@ The existing telemetry solution of SONiC relies on the syncd process to proactiv - The number of extension stats types for a single SAI object type should not exceed 32,768. - The number of SAI objects of the same type should not exceed 32,768. - The vendor SDK should support publishing stats in IPFIX format and its IPFIX template. -- If a polling frequency for stats cannot be supported, the vendor's SDK should report this error. +- If a polling frequency for stats cannot be supported, the vendor's SDK should return this error. +- The vendor SDK should support querying the minimal polling interval for each counter. - When reconfiguring any stream settings, whether it is the polling interval or the stats list, the existing stream will be interrupted and regenerated. ## Architecture Design @@ -215,7 +219,7 @@ packet-beta 32-47: "Template ID = > 256 configured" 48-63: "Number of Fields = 1 + Number of stats" 64-79: "Element ID=observationTimeNanoseconds (325)" -80-95: "Field length = 4 bytes" +80-95: "Field length = 8 bytes" 96-96: "1" 97-111: "Element ID = Object index for the stats 1" 112-127: "Field Length = 8 bytes" @@ -271,7 +275,7 @@ packet-beta #### IPFIX data -An IPFIX data message consists of two hierarchical levels: chunk and snapshots. A chunk contains multiple snapshots, and a snapshot is a binary block that can be interpreted using the IPFIX template mentioned above. +An IPFIX data message consists of a snapshot that is a binary block that can be interpreted using the IPFIX template mentioned above. The binary structure of a snapshot is as follows: @@ -283,17 +287,15 @@ title: A snapshot of IPFIX data packet-beta 0-15: "Set ID = Same as template ID" 16-31: "Set Length = (8 + Number of stats * 8) bytes" -32-63: "Rcord 1: observationTimeNanoseconds" -64-95: "Record 2: Stats 1" -96-127: "..." -128-159: "Record N + 1: Stats N" - +32-95: "Data 1: observationTimeNanoseconds" +96-127: "Data 2: Stats 1" +128-159: "..." +160-191: "Data N + 1: Stats N" ``` -- The chunk size can be configured via SAI. - The snapshot structure is derived from the IPFIX template, which is based on the stats we want to record. -Below is an example of an IPFIX message for the same stats record as the IPFIX template example, with a chunk size of 3: +Below is an example of an IPFIX message for the same stats record as the IPFIX template example: ``` mermaid @@ -308,27 +310,15 @@ packet-beta 96-127: "Observation Domain ID = 0" 128-143: "Set ID = 256" 144-159: "Set Length = 32 bytes" -160-191: "observationTimeNanoseconds = 10000" -192-255: "Port 1: SAI_PORT_STAT_IF_IN_ERRORS = 10" -256-319: "Port 2: SAI_PORT_STAT_IF_IN_ERRORS = 0" -320-383: "Port 3: SAI_PORT_STAT_IF_IN_ERRORS = 5" -384-399: "Set ID = 256" -400-415: "Set Length = 32 bytes" -416-447: "observationTimeNanoseconds = 20000" -448-511: "Port 1: SAI_PORT_STAT_IF_IN_ERRORS = 15" -512-575: "Port 2: SAI_PORT_STAT_IF_IN_ERRORS = 0" -576-639: "Port 3: SAI_PORT_STAT_IF_IN_ERRORS = 6" -640-655: "Set ID = 256" -656-671: "Set Length = 32 bytes" -672-703: "observationTimeNanoseconds = 30000" -704-767: "Port 1: SAI_PORT_STAT_IF_IN_ERRORS = 20" -768-831: "Port 2: SAI_PORT_STAT_IF_IN_ERRORS = 0" -832-895: "Port 3: SAI_PORT_STAT_IF_IN_ERRORS = 8" +160-223: "observationTimeNanoseconds = 10000" +224-287: "Port 1: SAI_PORT_STAT_IF_IN_ERRORS = 10" +288-351: "Port 2: SAI_PORT_STAT_IF_IN_ERRORS = 0" +352-415: "Port 3: SAI_PORT_STAT_IF_IN_ERRORS = 5" ``` #### Netlink message -We expect that all control messages and out-of-band information will be transmitted by the SAI. Therefore, we do not need to read the attribute header of netlink and message header of Genetlink from the socket. The sample code for building the message from the kernel side should look as follows: +We expect all control messages and out-of-band information to be transmitted by the SAI. Therefore, it is unnecessary to read the attribute header of netlink and the message header of Genetlink from the socket. Instead, we can insert a bulk of IPFIX recordings as the payload of the netlink message. The sample code for building the message from the kernel side is as follows: ``` c @@ -346,10 +336,20 @@ static struct genl_family nl_bench_family = { }; -void send_msg_to_user(int ipfix_msg_len, const void *ipfix_msg) +void send_msgs_to_user(/* ... */) { struct sk_buff *skb_out = nlmsg_new(ipfix_msg_len, GFP_KERNEL); - nla_put_nohdr(skb_out, ipfix_msg_len, ipfix_msg); + + for (size_t i = 0; i < bulk_count; i++) + { + struct ipfix *msg = ring_buffer.pop(); + if (msg == NULL) + { + break; + } + nla_append(skb_out, msg->data, msg->len); + } + genlmsg_multicast(&genl_family, skb_out, 0, 0/* group_id to ipfix group */, GFP_KERNEL); } @@ -357,9 +357,9 @@ void send_msg_to_user(int ipfix_msg_len, const void *ipfix_msg) ### Bandwidth Estimation -We estimate the bandwidth based only on the effective data size, not the actual data size. The extra information in a message, such as the IPFIX header (16 bytes), data prefix (4 bytes), and observation time milliseconds (4 bytes), is negligible. For example, if we want to collect 30 stats on 64 ports, and the chunk size is 100: $The Percentage Of Effective Data = \frac{8 \times 30 \times 64 \times 100_{Effective Data}}{16_{Header} + 4 \times 100_{Data Prefix} + 4 \times 100_{Observation Time Milliseconds} + 8 \times 30 \times 64 \times 100_{Effective Data}} \approx 99.9\%$ . +We estimate the bandwidth based only on the effective data size, not the actual data size. The extra information in a message, such as the IPFIX header (16 bytes), data prefix (4 bytes), and observation time milliseconds (8 bytes), is negligible. For example, a IPFIX message could include $The Maximal Number Of Counters In One Message = \frac{0xFFFF_{Max Length Bytes} - 16_{Header Bytes} - 4_{DataPrefix Bytes} - 8_{Observation Time Milliseconds Bytes}}{8_{bytes}} \approx 8188$, So $The Percentage Of Effective Data = \frac{0xFFFF_{Max Length Bytes} - 16_{Header Bytes} - 4_{DataPrefix Bytes} - 8_{Observation Time Milliseconds Bytes}} {0xFFFF_{Max LengthBytes}} \approx 99.9\%$ . -The following table is telemetry bandwidth of one cluster +The following table is an example of telemetry bandwidth of one cluster | # of stats per port | # of ports per switch | # of switch | frequency (us) | Total BW per switch(Mbps) | Total BW(Mbps) | | ------------------- | --------------------- | ----------- | -------------- | ------------------------- | -------------- | @@ -372,23 +372,28 @@ The following table is telemetry bandwidth of one cluster Any configuration changes in the config DB will interrupt the existing session and initiate a new one. +#### DEVICE_METADATA + +``` +DEVICE_METADATA|localhost + "stream_telemetry_cache_size": number of message that can be cached. +``` + #### STREAM_TELEMETRY_PROFILE ``` STREAM_TELEMETRY_PROFILE|{{profile_name}} "stream_state": {{enabled/disabled}} "poll_interval": {{uint32}} - "chunk_size": {{uint32}} (OPTIONAL) - "cache_size": {{uint32}} (OPTIONAL) + "bulk_size": {{uint32}} (OPTIONAL) ``` ``` key = STREAM_TELEMETRY_PROFILE:profile_name a string as the identifier of stream telemetry ; field = value stream_state = enabled/disabled ; Enabled/Disabled stream. -poll_interval = uint32 ; The interval to poll counter, unit milliseconds. -chunk_size = uint32 ; number of stats groups in a telemetry message. -cache_size = uint32 ; number of chunks that can be cached. +poll_interval = uint32 ; The interval to poll counter, unit microseconds. +bulk_size = uint32 ; Defines the size of reporting bulk, which means TAM will report to the collector every time ``` #### STREAM_TELEMETRY_GROUP @@ -504,6 +509,8 @@ sequenceDiagram ### SAI API +The SAI logic for stream telemetry follows the existing SAI documentation: [Granular-Counter-Subscription.md](https://github.com/opencomputeproject/SAI/blob/master/doc/TAM/Granular-Counter-Subscription.md). + ``` mermaid --- @@ -550,7 +557,6 @@ erDiagram SAI_TAM_REPORT_ATTR_REPORT_MODE SAI_TAM_REPORT_MODE_BULK SAI_TAM_REPORT_ATTR_REPORT_INTERVAL poll_interval "STREAM_TELEMETRY_PROFILE:profile_name[poll_interval] on Config DB" SAI_TAM_REPORT_ATTR_TEMPLATE_REPORT_INTERVAL _0 "Don't push the template, Because we hope the template can be proactively queried by orchagent" - SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID template_id "A unique id generated by stream telemetry orch" SAI_TAM_REPORT_ATTR_REPORT_INTERVAL_UNIT SAI_TAM_REPORT_INTERVAL_UNIT_MSEC } telemetry_type[TAM_telemetry_type] { @@ -563,8 +569,7 @@ erDiagram SAI_TAM_TELEMETRY_ATTR_TAM_TYPE_LIST sai_tam_tel_type_obj SAI_TAM_TELEMETRY_ATTR_COLLECTOR_LIST sai_tam_collector_obj SAI_TAM_TELEMETRY_ATTR_REPORTING_TYPE SAI_TAM_REPORTING_TYPE_COUNT_BASED - SAI_TAM_TELEMETRY_ATTR_REPORTING_CHUNK_SIZE chunk_size "STREAM_TELEMETRY_PROFILE:profile_name[chunk_size] on Config DB" - SAI_TAM_TELEMETRY_ATTR_CACHE_SIZE cache_size "STREAM_TELEMETRY_PROFILE:profile_name[cache_size] on Config DB" + SAI_TAM_TELEMETRY_ATTR_REPORTING_BULK_SIZE bulk_count "STREAM_TELEMETRY_PROFILE:profile_name[bulk_count] on Config DB" } counter_subscription[TAM_counter_subscription] { SAI_ID SAI_VALUE "Comments" @@ -572,6 +577,7 @@ erDiagram SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_ID port_obj SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_ID SAI_PORT_STAT_IF_IN_OCTETS "A stats in sai_port_stat_t" SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL index "Element ID of the object in the IPFIX template" + SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STATS_MODE SAI_STATS_MODE_READ "Set read counter mode" } TAM[TAM] { SAI_ID SAI_VALUE "Comments" @@ -581,6 +587,7 @@ erDiagram switch[Switch] { SAI_ID SAI_VALUE "Comments" SAI_SWITCH_ATTR_TAM_OBJECT_ID sai_tam_obj + SAI_SWITCH_ATTR_TAM_CACHE_COUNT cache_count } host_table_entry |o--|| hostif: binds @@ -610,6 +617,41 @@ erDiagram | TAM_report | per STREAM_TELEMETRY profile | | TAM_counter_subscription | per stats of object | +#### Initialize TAM cache count for Switch + +``` c++ +/** + * @brief Attribute Id in sai_set_switch_attribute() and + * sai_get_switch_attribute() calls. + */ +typedef enum _sai_switch_attr_t +{ + // ... + + /** + * @brief Tam telemetry cache count + * + * If the collector isn't ready to receive the report, this value indicates how many + * reports that can be cached. 0 means no cache which is the default behavior. + * + * @type sai_uint32_t + * @flags CREATE_ONLY + * @default 0 + */ + SAI_SWITCH_ATTR_TAM_CACHE_COUNT, + + // ... +} sai_switch_attr_t; + +sai_attr_list[0].id = SAI_SWITCH_ATTR_TAM_CACHE_COUNT; +sai_attr_list[0].value.u32 = cache_count; + +// ... + +create_switch(&gSwitchId, (uint32_t)sai_attr_list.size(), sai_attr_list.data()); + +``` + #### Creating HOSTIF object ``` c++ @@ -711,45 +753,6 @@ sai_create_tam_collector_fn(&sai_tam_collector_obj, switch_id, attr_count, sai_a #### Creating TAM report object -``` c++ -/** - * @brief Attributes for TAM report - */ -typedef enum _sai_tam_report_attr_t -{ - - // ... - - /** - * @brief Set ID for IPFIX template - * - * According to the IPFIX spec, the available range should be 256-65535. - * The value 0 means the ID will be decided by the vendor's SAI. - * - * @type sai_uint16_t - * @flags CREATE_AND_SET - * @isvlan false - * @default 0 - * @validonly SAI_TAM_REPORT_ATTR_TYPE == SAI_TAM_REPORT_TYPE_IPFIX - */ - SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID, - - /** - * @brief Query IPFIX template - * - * Return the IPFIX template binary buffer - * - * @type sai_u8_list_t - * @flags READ_ONLY - */ - SAI_TAM_REPORT_ATTR_IPFIX_TEMPLATE, - - // ... - -} sai_tam_report_attr_t; - -``` - ``` c++ sai_attr_list[0].id = SAI_TAM_REPORT_ATTR_TYPE; @@ -766,13 +769,10 @@ sai_attr_list[2].value.u32 = poll_interval; // STREAM_TELEMETRY_PROFILE:profile_ sai_attr_list[3].id = SAI_TAM_REPORT_ATTR_TEMPLATE_REPORT_INTERVAL; sai_attr_list[3].value.s32 = 0; // Don't push the template, Because we hope the template can be proactively queried by orchagent -sai_attr_list[4].id = SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID; -sai_attr_list[4].value.u16 = template_id;// A unique id generated by stream telemetry orch +sai_attr_list[4].id = SAI_TAM_REPORT_ATTR_REPORT_INTERVAL_UNIT; +sai_attr_list[4].value.s32 = SAI_TAM_REPORT_INTERVAL_UNIT_MSEC; -sai_attr_list[5].id = SAI_TAM_REPORT_ATTR_REPORT_INTERVAL_UNIT; -sai_attr_list[5].value.s32 = SAI_TAM_REPORT_INTERVAL_UNIT_MSEC; - -attr_count = 6; +attr_count = 5; sai_create_tam_report_fn(&sai_tam_report_obj, switch_id, attr_count, sai_attr_list); ``` @@ -854,35 +854,23 @@ typedef enum _sai_tam_telemetry_attr_t * @brief Tam telemetry reporting type * * @type sai_tam_reporting_type_t - * @flags CREATE_AND_SET + * @flags CREATE_ONLY * @default SAI_TAM_REPORTING_TYPE_TIME_BASED */ SAI_TAM_TELEMETRY_ATTR_TAM_REPORTING_TYPE, /** - * @brief Tam telemetry reporting chunk size + * @brief Tam telemetry reporting bulk count * - * defines the size of reporting chunk, which means TAM will report to the collector every time - * if the report count reaches the chunk size. + * defines the size of reporting bulk, which means TAM will report to the collector every time + * if the report count reaches the bulk count. * * @type sai_uint32_t * @flags CREATE_AND_SET * @default 1 * @validonly SAI_TAM_TELEMETRY_ATTR_TAM_REPORTING_TYPE == SAI_TAM_REPORTING_TYPE_COUNT_BASED */ - SAI_TAM_TELEMETRY_ATTR_REPORTING_CHUNK_SIZE, - - /** - * @brief Tam telemetry cache size - * - * If the collector isn't ready to receive the report, this value indicates how many - * reports that can be cached. 0 means no cache which is the default behavior. - * - * @type sai_uint32_t - * @flags CREATE_AND_SET - * @default 0 - */ - SAI_TAM_TELEMETRY_ATTR_CACHE_SIZE, + SAI_TAM_TELEMETRY_ATTR_REPORTING_BULK_COUNT, } sai_tam_telemetry_attr_t; @@ -901,13 +889,10 @@ sai_attr_list[1].value.objlist.list[0] = sai_tam_collector_obj; sai_attr_list[2].id = SAI_TAM_TELEMETRY_ATTR_REPORTING_TYPE; sai_attr_list[2].value.s32 = SAI_TAM_REPORTING_TYPE_COUNT_BASED -sai_attr_list[3].id = SAI_TAM_TELEMETRY_ATTR_REPORTING_CHUNK_SIZE; -sai_attr_list[3].value.u32 = chunk_size; // STREAM_TELEMETRY_PROFILE:profile_name[chunk_size] on Config DB - -sai_attr_list[4].id = SAI_TAM_TELEMETRY_ATTR_CACHE_SIZE; -sai_attr_list[4].value.u32 = cache_size; // STREAM_TELEMETRY_PROFILE:profile_name[cache_size] on Config DB +sai_attr_list[3].id = SAI_TAM_TELEMETRY_ATTR_REPORTING_BULK_COUNT; +sai_attr_list[3].value.u32 = bulk_count; // STREAM_TELEMETRY_PROFILE:profile_name[bulk_count] on Config DB -attr_count = 5; +attr_count = 4; sai_create_tam_telemetry_fn(&sai_tam_telemetry_obj, switch_id, attr_count, sai_attr_list); @@ -919,6 +904,40 @@ Based on the STREAM_TELEMETRY_GROUP on Config DB, to create corresponding counte ``` c++ +/** + * @brief Counter Subscription attributes + */ +typedef enum _sai_tam_counter_subscription_attr_t +{ + +// ... + + /** + * @brief Telemetry label + * + * Label to identify this counter in telemetry reports. + * If the report type is IPFIX, this label will be used as the element ID in the IPFIX template. + * + * @type sai_uint64_t + * @flags CREATE_ONLY + * @default 0 + */ + SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL, + + /** + * @brief Setting of read-clear or read-only for statistics read. + * + * @type sai_stats_mode_t + * @flags CREATE_ONLY + * @default SAI_STATS_MODE_READ + */ + SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STATS_MODE, + +// ... + +} sai_tam_counter_subscription_attr_t; + + // Create counter subscription list sai_attr_list[0].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_TEL_TYPE; @@ -933,7 +952,11 @@ sai_attr_list[2].value.oid = SAI_PORT_STAT_IF_IN_OCTETS; sai_attr_list[3].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL; sai_attr_list[3].value.oid = index; // Element ID of the object in the IPFIX template -attr_count = 4; +sai_attr_list[4].id =SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STATS_MODE; +sai_attr_list[4].value.s32 = SAI_STATS_MODE_READ; +// sai_attr_list[4].value.s32 = SAI_STATS_MODE_READ_AND_CLEAR; // It could be read and clear for queue watermark + +attr_count = 5; create_tam_counter_subscription(&sai_tam_counter_subscription_obj, switch_id, attr_count, sai_attr_lis); // If this stats of object cannot support this poll frequency, this API should return SAI_STATUS_NOT_SUPPORTED. @@ -960,15 +983,41 @@ sai_create_tam_fn(&sai_tam_obj, switch_id, attr_count, sai_attr_list); #### Query IPFIX template ``` c++ +/** + * @brief Attributes for TAM report + */ +typedef enum _sai_tam_report_attr_t +{ -sai_attribute_t attr; -get_tam_report_attribute(&sai_tam_report_obj, 1, &attr); + // ... -std::vector ipfix_template(attr.value.u8list.list, attr.value.u8list.list + attr.value.u8list.count); -// Save ipfix_template to STATE DB + /** + * @brief Query IPFIX template + * + * Return the IPFIX template binary buffer + * + * @type sai_u8_list_t + * @flags READ_ONLY + */ + SAI_TAM_REPORT_ATTR_IPFIX_TEMPLATES, + + // ... -// Free memory -free(attr.value.u8list.list); +} sai_tam_report_attr_t; + +``` + +``` c++ + +std::vector template_buffer(64*1024*10, 0); + +sai_attribute_t sai_attr_list; + +sai_attr_list[0].id = SAI_TAM_REPORT_ATTR_IPFIX_TEMPLATES; +sai_attr_list[0].value.u8list.list = template_buffer.data(); +sai_attr_list[0].value.u8list.count = template_buffer.size(); + +get_tam_report_attribute(&sai_tam_report_obj, 1, &sai_attr_list); ``` @@ -1000,6 +1049,52 @@ set_switch_attribute(switch_id, sai_attr) ``` +#### Query stream telemetry capability + +``` c++ + +/** + * @brief Query statistics capability for statistics bound at object level under the stream telemetry mode + * + * @param[in] switch_id SAI Switch object id + * @param[in] object_type SAI object type + * @param[inout] stats_capability List of implemented enum values, the statistics modes (bit mask) supported and minimal polling interval per value + * + * @return #SAI_STATUS_SUCCESS on success, #SAI_STATUS_BUFFER_OVERFLOW if lists size insufficient, failure status code on error + */ +sai_status_t sai_query_stats_st_capability( + _In_ sai_object_id_t switch_id, + _In_ sai_object_type_t object_type, + _Inout_ sai_stat_st_capability_list_t *stats_capability); + +/** + * @brief Stat capability under the stream telemetry mode + */ +typedef struct _sai_stat_st_capability_t +{ + /** + * @brief Typical stat capability + */ + sai_stat_capability_t capability; + + /** + * @brief Minimal polling interval in nanoseconds + * + * If polling interval is less than this value, it will be unacceptable. + */ + uint32_t minimal_polling_interval; + +} sai_stat_st_capability_t; + +typedef struct _sai_stat_st_capability_list_t +{ + uint32_t count; + sai_stat_st_capability_t *list; + +} sai_stat_st_capability_list_t; + +``` + ## Configuration and management ### Manifest (if the feature is an Application Extension) @@ -1062,6 +1157,7 @@ $Dynamic Memory Consumption_{bytes} = \sum_{Profile} ({Cache Size} \times {Chunk #### System Test cases - Test that the counter can be correctly monitored by the counter syncd. -- Test that the counter can be correctly fetched using the telemetry stream CLI. +- Verify that the bulk size is accurate when reading messages from the netlink socket. +- Ensure that counters can be correctly retrieved using the telemetry stream CLI. ### Open/Action items - if any From feda94f04c17f25cfce72f5a20750f0bb8dacb80 Mon Sep 17 00:00:00 2001 From: Ze Gan Date: Mon, 28 Oct 2024 15:59:13 +0800 Subject: [PATCH 13/13] Update from review Signed-off-by: Ze Gan --- .../sonic-stream-telemetry.yang | 6 +---- doc/stream-telemetry/stream-telemetry-hld.md | 25 +++++++++++++++++-- 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/doc/stream-telemetry/sonic-stream-telemetry.yang b/doc/stream-telemetry/sonic-stream-telemetry.yang index 52134bb090..4b292caf0d 100644 --- a/doc/stream-telemetry/sonic-stream-telemetry.yang +++ b/doc/stream-telemetry/sonic-stream-telemetry.yang @@ -47,17 +47,13 @@ module sonic-stream-telemetry { type uint32; } - leaf chunk_size { + leaf bulk_size { type uint32 { range "1..4294967295"; } default 1; } - leaf cache_size { - type uint32; - default 0; - } } } diff --git a/doc/stream-telemetry/stream-telemetry-hld.md b/doc/stream-telemetry/stream-telemetry-hld.md index c3fd883391..b5ac5b3b5e 100644 --- a/doc/stream-telemetry/stream-telemetry-hld.md +++ b/doc/stream-telemetry/stream-telemetry-hld.md @@ -275,7 +275,7 @@ packet-beta #### IPFIX data -An IPFIX data message consists of a snapshot that is a binary block that can be interpreted using the IPFIX template mentioned above. +An IPFIX data message consists of snapshots that is a binary block that can be interpreted using the IPFIX template mentioned above. The binary structure of a snapshot is as follows: @@ -295,7 +295,7 @@ packet-beta - The snapshot structure is derived from the IPFIX template, which is based on the stats we want to record. -Below is an example of an IPFIX message for the same stats record as the IPFIX template example: +Below is an example of an IPFIX message with 3 snapshots for the same stats record as the IPFIX template example: ``` mermaid @@ -308,14 +308,35 @@ packet-beta 32-63: "Export Timestamp = 2024-08-29 20:30:60" 64-95: "Sequence Number = 1" 96-127: "Observation Domain ID = 0" + 128-143: "Set ID = 256" 144-159: "Set Length = 32 bytes" 160-223: "observationTimeNanoseconds = 10000" 224-287: "Port 1: SAI_PORT_STAT_IF_IN_ERRORS = 10" 288-351: "Port 2: SAI_PORT_STAT_IF_IN_ERRORS = 0" 352-415: "Port 3: SAI_PORT_STAT_IF_IN_ERRORS = 5" + +416-431: "Set ID = 256" +432-447: "Set Length = 32 bytes" +448-511: "observationTimeNanoseconds = 20000" +512-575: "Port 1: SAI_PORT_STAT_IF_IN_ERRORS = 15" +576-639: "Port 2: SAI_PORT_STAT_IF_IN_ERRORS = 0" +640-703: "Port 3: SAI_PORT_STAT_IF_IN_ERRORS = 6" + +704-719: "Set ID = 256" +720-735: "Set Length = 32 bytes" +736-799: "observationTimeNanoseconds = 30000" +800-863: "Port 1: SAI_PORT_STAT_IF_IN_ERRORS = 20" +864-927: "Port 2: SAI_PORT_STAT_IF_IN_ERRORS = 0" +928-991: "Port 3: SAI_PORT_STAT_IF_IN_ERRORS = 8" + ``` +- If the number of stats in a group is small, multiple snapshots may be encoded into a single IPFIX message. +- If the number of stats in a group exceeds 8K, the group must be split across multiple IPFIX messages. + +The IPFIX template should be provided by vendors. This document does not restrict how to split or concatenate snapshots, but each separated snapshot must include its own `observationTimeNanoseconds`. + #### Netlink message We expect all control messages and out-of-band information to be transmitted by the SAI. Therefore, it is unnecessary to read the attribute header of netlink and the message header of Genetlink from the socket. Instead, we can insert a bulk of IPFIX recordings as the payload of the netlink message. The sample code for building the message from the kernel side is as follows: