Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc]: Stream Telemetry HLD #1795

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

Pterosaur
Copy link
Contributor

@Pterosaur Pterosaur commented Sep 1, 2024

Stream Telemetry HLD initial version

doc/stream-telemetry/stream-telemetry-hld.md Outdated Show resolved Hide resolved
### Architecture Design

This section covers the changes that are required in the SONiC architecture. In general, it is expected that the current architecture is not changed.
This section should explain how the new feature/enhancement (module/sub-module) fits in the existing architecture.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. these "guidance" words are better to be removed. very confusing to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, Yes, there are many template "guidance" words I haven't removed them.


#### Modules ####

##### Netlink Module #####
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know the numbered prefix are not there in the template, but the doc will be much more readable with it. we should consider add them here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zhangyanzhao , Do you have any suggestions or concerns about adding a number prefix to each title in the template?

doc/stream-telemetry/stream-telemetry-hld.md Show resolved Hide resolved

Pin CPU?

![netlink_dma_channel](netlink_dma_channel.drawio.svg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we still uses the chunk size concept in our API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or leverage the existing solution, time-based reporting?

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Ze Gan <[email protected]>
@Pterosaur Pterosaur requested a review from r12f September 9, 2024 12:33
@Pterosaur Pterosaur marked this pull request as ready for review September 9, 2024 12:33
Signed-off-by: Ze Gan <[email protected]>

- For high-frequency counters, the native IPFIX timestamp unit of seconds is insufficient. Therefore, we introduce an additional element, `observationTimeNanoseconds`, for each record to meet our requirements.
- The enterprise bit is always set to 1 for stats records.
- The element ID of IPFIX is derived from the object index. For example, for `Ethernet5`, the element ID will be `0x5 | 0x8000 = 0x8005`.
Copy link
Contributor

@marian-pritsak marian-pritsak Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a variable element ID per SAI object will blow up the number of templates in the report (separate template ID per port, queue etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, I assume all stats(per port, queue etc) in one profile will be encoded into ONE template. I will generate a unique template ID for a profile.


### Data format

We will use IPFIX as the report format, with all numbers in the IPFIX message in network-order (Big-endian).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot make the assumption that data will be Big- or Little-endian. If it is collected by the ASIC and DMAd to a ring buffer, then kernel driver would need to correct the endianness of each data set. We need to have a bit in the element ID or Enterprise number that would indicate endianness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you worried that endian conversion will consume too many CPU clocks?

```
STREAM_TELEMETRY_GROUP|{{profile_name}}|{{group_name}}
"object_names": {{list of object name}}
"object_counters": {{list of stats of object}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you plan to handle the update of the counter list or object list from the report POV? is it going to be a new template ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I assume we will use a new template ID.
We treat any configuration update as a new profile. The previous session will be interrupted. The SAI report object will get a new SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants