-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Buffers Configuration Update design
Switch Buffers Configuration feature allows to distribute buffer memory among the ports in order to guarantee lossless traffic flow.
Current buffers configuration is predefined for the "worst case scenario" and does not take into account particular port settings and conditions.
Suggested update to the Buffers Configuration assumes dynamic profile selection based on current port parameters to ensure the most efficient buffer memory usage and maximize switch throughput.
Port speed supported: 10G, 25G, 40G, 50G, 100G
Cable length supported: 5m, 40m, 300m
Port speed is explicitly specified in the minigraph configuration file.
Cable length for each port can be specified in the minigraph or determined from the device port role and the port neighbor device role. This is also stored in the minigraph under PngDec/Devices/Device:
<Device i:type="ToRRouter">
<Hostname>str-msn2700-04</Hostname>
<HwSku>ACS-MSN2700</HwSku>
<ManagementAddress xmlns:a="Microsoft.Search.Autopilot.NetMux">
<a:IPPrefix>10.3.147.47</a:IPPrefix>
</ManagementAddress>
</Device>
Table below shows mapping of neighbor ports roles to cable length:
Device and Neighbor roles | Cable length |
---|---|
ToRRouter(T0) - Server | 5m |
ToRRouter(T0) - LeafRouter(T1) | 40m |
LeafRouter(T1) - SpineRouter(T2) | 300m |
Buffers Configuration profiles for all combinations of port speed + cable length should be defined. Profile for each port is selected basing on port speed and attached cable length. If no exactly matching mapping found, the closest option will be used with the greater parameters.
Buffers Configuration update will be done in two stages. The first part of the update includes changing of the data model and the buffers configuration json generation. Second part implements port buffer profile update in run time. Update is triggered by the port speed change.
Changes in Part1 implement port buffers configuration on switch initialization.
Files *buffers.json
(e.g. msn2700.32ports.buffers.json
) located at sonic-buildimage/src/sonic-swss/swssconfig/sample/
contain buffers related switch configuration (pools, profiles, binding of profiles to ports, etc).
The number of profiles as well as their parameters are hardware specific and should be calculated for every switch model separately. The table below contains the list of profiles for Mellanox MSN2700
Profile Name | Size | threshold | Pool Name | Xon | Xoff |
---|---|---|---|---|---|
pg_lossless_40G_5m_profile | 34K | 1 | ingress_lossless | 18K | 16K |
pg_lossless_50G_5m_profile | 34K | 1 | ingress_lossless | 18K | 16K |
pg_lossless_100G_5m_profile | 36K | 1 | ingress_lossless | 18K | 18K |
pg_lossless_40G_40m_profile | 41K | 1 | ingress_lossless | 18K | 23K |
pg_lossless_50G_40m_profile | 41K | 1 | ingress_lossless | 18K | 23K |
pg_lossless_100G_40m_profile | 53K | 1 | ingress_lossless | 18K | 35K |
pg_lossless_40G_300m_profile | 92K | 1 | ingress_lossless | 18K | 74K |
pg_lossless_50G_300m_profile | 92K | 1 | ingress_lossless | 18K | 74K |
pg_lossless_100G_300m_profile | 180K | 1 | ingress_lossless | 18K | 162K |
... | ... | ... | ... | ... | ... |
- PG Profiles for all combinations of supported speed and cable length should be declared.
- new field
status
will be added to the profile table to indicate profile status. Values for the field "active" or "inactive". Value will be set to "active" when profile is created in SAI and used by one or more ports. Value "inactive" means profile is not used and not created in SAI.
; field = value
status = "active/inactive" ; Session state.
Initially all profiles should be declared with status "inactive".
Existing PG configuration in json file:
"BUFFER_PG_TABLE:Ethernet0,Ethernet4,Ethernet8,...:3-4": {
"profile" : "[BUFFER_PROFILE_TABLE:pg_lossless_profile]"
},
"OP": "SET"
replaced with the jinja2 template described below.
Port parameters to profile look-up table:
{%- set portconfig2profile = {
'40000_5m' : 'pg_lossless_40G_5m_profile', //name + size?
'40000_40m' : 'pg_lossless_40G_40m_profile',
'40000_300m' : 'pg_lossless_40G_300m_profile',
'50000_5m' : 'pg_lossless_50G_5m_profile',
'50000_40m' : 'pg_lossless_50G_40m_profile',
'50000_300m' : 'pg_lossless_50G_300m_profile',
'100000_5m' : 'pg_lossless_100G_5m_profile',
'100000_40m' : 'pg_lossless_100G_40m_profile',
'100000_300m': 'pg_lossless_100G_300m_profile'
...
}
-%}
Port and neighbor role to cable length look-up table:
{% set ports2cable = {
'ToRRouter_Server' : '5m',
'LeafRouter_ToRRouter' : '40m',
'SpineRouter_LeafRouter' : '300m'
}
%}
Macro (function) to determine cable length
{% set switch_role = minigraph_devices[minigraph_hostname]['type'] %}
{% macro cable_length(interface_name) %}
// pseudocode
if found in minigraph
return ethernet_interfaces['cable']
else
{% set nei = '"'+minigraph_neighbors[interface['name']]['name']+'"' -%}
{% set nei_role = minigraph_devices[nei]['type'] -%}
if found switch_role + nei_role
return {{ ports2cable[switch_role_nei_role] or ports2cable[nei_role_switch_role]}}
else
return max_length
{%- endmacro %}
Loop to generate Ethernet port-to-profile mapping tables:
{% pg_range = '3-4' %}
{% set ingress_lossless_pg_pool_size = 0 %}
{% for interface in ethernet_interfaces %}
// pseudocode
{% set speed = interface['speed'] %}
{% cable = cable_length(interface['name']) %}
{% set port_config = speed + '_' + cable -%}
if !(portconfig2profile has_key port_config)
port_config = find_closest_greater_profile(speed, cable)
profile = portconfig2profile[port_config]
ingress_lossless_pg_pool_size += profile.size
{
"BUFFER_PG_TABLE:{{ interface['name'] }}:{{ pg_range }}": {
"profile" : "[BUFFER_PROFILE_TABLE:{{ profile }}]"
},
"OP": "SET"
}{% if not loop.last %},{% endif %}
{% endfor %}
Buffers pool ingress_lossless_pool previously used for all ingress lossless profiles is now split into two parts:
- static part with the same name ingress_lossless_pool and size decreased but enough to serve all lossless profiles except the PG
- PG part which is calculated as a sum of sizes needed for profiles used for all ports.
Example:
{%
{
"BUFFER_POOL_TABLE:ingress_lossless_pg_pool": {
"size": "{{ ingress_lossless_pg_pool_size }}",
"type": "ingress",
"mode": "dynamic"
},
"OP": "SET"
}
%}
ingress_lossless_pg_pool_size calculation is covered in chapter "BUFFER_PG_TABLE update"
The declaration of the following tables could be replaced with the template in the interfaces list part. This will make config file more generic
- BUFFER_PORT_INGRESS_PROFILE_LIST
- BUFFER_PORT_EGRESS_PROFILE_LIST
- BUFFER_QUEUE_TABLE
E.g.
{
"BUFFER_QUEUE_TABLE:{{ all_ethernet_interfaces }}:0-1": {
"profile" : "[BUFFER_PROFILE_TABLE:q_lossy_profile]"
},
"OP": "SET"
}
The value of all_ethernet_interfaces variable can be assigned in the loop used for PG profiles (see chapter "BUFFER_PG_TABLE update")
Changes in part #2 implement port buffers configuration in run time.
Generated from minigraph's portconfig2profile dictionary:
"BUFFER_PORT_CONFIG_TO_PG_PROFILE_TABLE": {
"40000_5m" : "[BUFFER_PROFILE_TABLE:pg_lossless_40000_5m_profile]"
"50000_5m" : "[BUFFER_PROFILE_TABLE:pg_lossless_40000_5m_profile]"
"100000_5m" : "[BUFFER_PROFILE_TABLE:pg_lossless_100G_5m_profile]"
...
},
"OP": "SET"
This table is used to pick buffers profile for port by port speed and cable length.
Port speed could be read from the PORT_TABLE table (or in update handlers).
TBD: where to get cable length (port role) in run time? Possible solution: (Until CONFIG_DB is implemented) On switch configuration stage generate table:
"BUFFER_PORT_CABLE_LENGTH_TABLE": {
"Ethernet0" : "5m"
"Ethernet4" : "5m"
"Ethernet8" : "40m"
...
},
"OP": "SET"
BufferOrch component updates port buffer profile on port parameters change. Buffer profile update is done by changing "profile" parameter in the appropriate BUFFER_PG_TABLE table.
Currently it is planned to update buffer configuration only on port speed change.
Existing BUFFER_PG_TABLE updates handler will take care of setting new parameter on the port.
To handle this event bufferorch uses existing Observer mechanism (PortsOrch will notify about actual speed change)
Alternative #1:
- subscribe to the PORT_TABLE updates and on port speed change update port's buffer profile
- pros: it is simpler(?) (it is not, actually) and no changes in other components code
- cons: changes in APP DB happens before parameters actually applied (pro?).
Handler of BUFFER_PORT_CONFIG_TO_PG_PROFILE_TABLE table in bufferorch handling includes only storing locally (std::map) a look-up table which helps to convert speed/cable to buffer profile name.
-
For Users
-
For Developers
-
Subgroups/Working Groups
-
Presentations
-
Join Us