-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IP Multicast HLD #1808
base: master
Are you sure you want to change the base?
IP Multicast HLD #1808
Conversation
Signed-off-by: philo <[email protected]>
Signed-off-by: philo <[email protected]>
|
||
- Reads IP multicast routing messages from the kernel and listens for IP multicast routing changes in the kernel | ||
- Based on netlink messages, the source IP address, destination IP address, inbound interface member, and outbound member of the IP multicast route are resolved and written into the APPL DB | ||
- During a warm reboot, compare the kernel multicast route and APPL DB data to update the warm reboot data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a mechanism to propagate the IPMC counters in the hardware (SAI_IPMC_ENTRY_ATTR_COUNTER_ID) to the kernel/FRR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a mechanism to propagate the IPMC counters in the hardware (SAI_IPMC_ENTRY_ATTR_COUNTER_ID) to the kernel/FRR.
In the SAI definition, SAI_IPMC_ENTRY_ATTR_COUNTER_ID means 'packet hits count', which I understand as the statistic for IPMC hit packets.
Why do we need to pass IPMC packet statistics to kernel/FRR? It seems that there is no use case for this at the moment.
IP multicast is a network communication technique that allows a single sender to send packets to multiple destinations without having to send packets separately for each destination. This method greatly saves bandwidth resources and improves the efficiency of the network. For different Multicast group members, Multicast service models can be divided into ASM(Any-Source multicast) and SSM(Source-Specific multicast) service models. To ensure efficient transmission of multicast data, IP multicast uses the RPF (Reverse Path Forwarding) mechanism. | ||
|
||
Routers dynamically establish forwarding tables through protocols (such as IGMP, PIM, MSDP, etc.) and maintain a multicast forwarding table, recording the membership of the multicast group and the corresponding outbound interface. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you request a slot to review this HLD in Routing WG? FYI. @eddieruan-alibaba
Few questions from the community meeting today.
- Why didnt we explore fpmsyncd path for programming the multicast routes? we came to know pimd is directly programming the kernel, what if we have the bgp ipv4 multicast configuration, wont there be any MRTM (part of Zebra) to consolidate multicast routes from different protocols(e.g pimd, bgp and static..etc)? we need to discuss with FRR folks and decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe your use case as well?
I am second on @venkatmahalingam 's comment. It is better to explore fpmsyncd approach to program mroutes directly from pimd instead of via Linux kernel for the following two reasons.
- Scale and performance
- feature velocity
Both of these two considerations are related to your use case and roadmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, next we will try to request Routing WG review of the HLD. As for choosing between fpmsyncd or kernel, both solutions have their own significant advantages, so it's hard for me to decide :) Let's leave this discussion for after communicating with Routing WG and see what their feedback is
; Store IP multicast routing data | ||
|
||
key = MROUTE_TABLE:vrf_name|source_ip|dest_ip | ||
; APPL DB usually uses ':' as the separator, but '|' is chosen here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have key prefixes e.g MROUTE_TABLE:vrf_name:src-<source_ip>:dst-<dest_ip> to get rid of this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have key prefixes e.g MROUTE_TABLE:vrf_name:src-<source_ip>:dst-<dest_ip> to get rid of this issue?
Great suggestion, but we still need to parse src- and dst-. Maybe using @src-ip would be a bit more concise?
|
||
#### mgmanager | ||
|
||
Follow the nexthop group design, and manage the Group and Members separately: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't generally follow manager naming convention to handle routes from ASIC, can we rename it to mroutegrouporch or combine this functionality part of mrouteorch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't generally follow manager naming convention to handle routes from ASIC, can we rename it to mroutegrouporch or combine this functionality part of mrouteorch?
- ‘combine this functionality part of mrouteorch‘ is not a good solution. Based on the example of the nexthop group, further development will become very difficult to maintain due to code redundancy
- Since the mroute group doesn't have an appl db entry, using 'orch' for naming would result in mroutegrouporch using 'orch' but not using doTask to generate the group, which feels a bit odd
- So, we used 'manager' to indicate the management of the mroute group
|
||
The frr-pimd/pim6d daemon process is introduced in the FRR Container, in order to learn IP multicast routes under the PIM protocol and install IP multicast routes to the kernel. | ||
|
||
For the implementation of the data plane, firstly, the fpm component does not support IP multicast routing right now; secondly, in order to support more IP multicast protocols as much as possible; thirdly, the implementation and support of FRR Container are beyond the design scope of this document, so the design will use Linux kernel as multicast route source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to fpmsyncd, for IPMC can we consider using ipmcfpmsyncd instead of mroutesyncd ?
This channel can also be used to get multicast enabled interface updates from kernel along with mroute entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to fpmsyncd, for IPMC can we consider using ipmcfpmsyncd instead of mroutesyncd ? This channel can also be used to get multicast enabled interface updates from kernel along with mroute entries.
The decision between using fpmsyncd or the kernel will be made after further discussion with the Routing WG. We will update the HLD accordingly.
In ECMP routing, there is already a very elegant nexthop group design and implementation, so for IP multicast routing reference this design uses mgmanager to manage the ipmc group and rpf group. | ||
|
||
The following diagram summarizes the key structure of IPMC functionality in SONiC: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Linux kernel notifies multicast interface creation or deletion using RTM_NEWNETCONF and RTM_DELNETCONF messages.
Given this, can the same path used for mroute updates also be applied for multicast-enabled interfaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Linux kernel notifies multicast interface creation or deletion using RTM_NEWNETCONF and RTM_DELNETCONF messages. Given this, can the same path used for mroute updates also be applied for multicast-enabled interfaces?
Since the current interface configurations are all passed through config DB, separating multicast-enabled to be read by the kernel doesn't seem to be necessary
|
||
- Listens to MROUTE_TABLE in the APPL DB and queries the corresponding rpf group and ipmc group | ||
- Associate the corresponding rpf group and ipmc group IDs, and invoke the SAI API to create IP multicast routes. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take care of defining CoPP rules and multicast control and data packets trapping behaviour when multicast feature is enabled.
```text | ||
RTM_NEWROUTE | ||
RTM_DELROUTE | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls update netlink messages used for multicast enabled interfaces.
|
||
- The MgManager exposes group query and management interfaces | ||
- Internally, RpfMember and RpfGroup manage reference counting and exception handling, invoke the SAI API to generate the corresponding group id | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add details on how IPMC groups and RPF groups shared b/w multicast routes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add details on how IPMC groups and RPF groups shared b/w multicast routes ?
Yes, it will be revised in the next commit.
Description: A table was added to store IP multicast routing data. One entry corresponds to one multicast route. | ||
|
||
Schema: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APP_DB schema is missing for INTF_TABLE to store multicast forwarding on interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APP_DB schema is missing for INTF_TABLE to store multicast forwarding on interface.
Yes, it will be revised in the next commit.
|
||
#### Scalability and performance requirements | ||
|
||
In terms of capacity, the SAI API does not implement a query interface for the capacity of IP multicast routes and multicast member groups like ECMP routes. Therefore, the relevant capacity is not restricted. However, the device must continue to run normally in scenarios where the capacity is exceeded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SAI switch attribute SAI_SWITCH_ATTR_AVAILABLE_IPMC_ENTRY is available to query capacity of IP multicast routes, please include.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SAI switch attribute SAI_SWITCH_ATTR_AVAILABLE_IPMC_ENTRY is available to query capacity of IP multicast routes, please include.
OK, it will be revised in the next commit. The focus here is on the group size and the size of member within each group
| ---------------------------------------------------- | -------------- | | ||
| sai_router_intfs_api->set_router_interface_attribute | SAI_ROUTER_INTERFACE_ATTR_V4_MCAST_ENABLE</br>SAI_ROUTER_INTERFACE_ATTR_V6_MCAST_ENABLE | | ||
| sai_rpf_group_api->create_rpf_group | NULL | | ||
| sai_ipmc_group_api->create_ipmc_group_member | SAI_RPF_GROUP_MEMBER_ATTR_RPF_GROUP_ID</br>SAI_RPF_GROUP_MEMBER_ATTR_RPF_INTERFACE_ID | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo ? it should be sai_rpf_group_api->create_rpf_group_member
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo ? it should be sai_rpf_group_api->create_rpf_group_member
Yes, it will be revised in the next commit.
The HLD introduce IPMC dataplane implement in SONiC, include database change, Linux Multicast route listening and handling, orchagent ipmc support.