Skip to content

Buffers Configuration Update design

Andriy Moroz edited this page Aug 17, 2017 · 4 revisions

Overview

Switch Buffers Configuration feature allows to distribute buffer memory among the ports in order to guarantee lossless traffic flow. Current buffers configuration is predefined for the "worst case scenario" and does not take into account particular port settings and conditions.
Suggested update to the Buffers Configuration assumes dynamic profile selection based on current port parameters to ensure the most efficient buffer memory usage and maximize switch throughput.

General feature design

Port speed supported: 10G, 25G, 40G, 50G, 100G
Cable length supported: 5m, 40m, 300m

Port speed is explicitly specified in the minigraph configuration file.
Cable length for each port can be specified in the minigraph or determined from the device port role and the port neighbor device role. This is also stored in the minigraph under PngDec/Devices/Device:

  <Device i:type="ToRRouter">
    <Hostname>str-msn2700-04</Hostname>
    <HwSku>ACS-MSN2700</HwSku>
    <ManagementAddress xmlns:a="Microsoft.Search.Autopilot.NetMux">
       <a:IPPrefix>10.3.147.47</a:IPPrefix>
    </ManagementAddress>
  </Device>

Table below shows mapping of neighbor ports roles to cable length:

Device and Neighbor roles Cable length
ToRRouter(T0) - Server 5m
ToRRouter(T0) - LeafRouter(T1) 40m
LeafRouter(T1) - SpineRouter(T2) 300m

Buffers Configuration profiles for all combinations of port speed + cable length should be defined. Profile for each port is selected basing on port speed and attached cable length. If no exactly matching mapping found, the closest option will be used with the greater parameters.

Buffers Configuration update will be done in two stages. The first part of the update includes changing of the data model and the buffers configuration json generation. Second part implements port buffer profile update in run time. Update is triggered by the port speed change.

Components to be updated (Part1)

Changes in Part1 implement port buffers configuration on switch initialization.

Data Model Update

Update buffers configuration in json file

Files *buffers.json (e.g. msn2700.32ports.buffers.json) located at sonic-buildimage/src/sonic-swss/swssconfig/sample/ contain buffers related switch configuration (pools, profiles, binding of profiles to ports, etc).

New profiles

The number of profiles as well as their parameters are hardware specific and should be calculated for every switch model separately. The table below contains the list of profiles for Mellanox MSN2700

Profile Name Size threshold Pool Name Xon Xoff
pg_lossless_40G_5m_profile 34K 1 ingress_lossless 18K 16K
pg_lossless_50G_5m_profile 34K 1 ingress_lossless 18K 16K
pg_lossless_100G_5m_profile 36K 1 ingress_lossless 18K 18K
pg_lossless_40G_40m_profile 41K 1 ingress_lossless 18K 23K
pg_lossless_50G_40m_profile 41K 1 ingress_lossless 18K 23K
pg_lossless_100G_40m_profile 53K 1 ingress_lossless 18K 35K
pg_lossless_40G_300m_profile 92K 1 ingress_lossless 18K 74K
pg_lossless_50G_300m_profile 92K 1 ingress_lossless 18K 74K
pg_lossless_100G_300m_profile 180K 1 ingress_lossless 18K 162K
... ... ... ... ... ...

Profiles declaration (table BUFFER_PROFILE_TABLE)

  • PG Profiles for all combinations of supported speed and cable length should be declared.
  • new field status will be added to the profile table to indicate profile status. Values for the field "active" or "inactive". Value will be set to "active" when profile is created in SAI and used by one or more ports. Value "inactive" means profile is not used and not created in SAI.
; field   = value
status    = "active/inactive"   ; Session state.

Initially all profiles should be declared with status "inactive".

BUFFER_PG_TABLE update

Existing PG configuration in json file:

"BUFFER_PG_TABLE:Ethernet0,Ethernet4,Ethernet8,...:3-4": {
 	"profile" : "[BUFFER_PROFILE_TABLE:pg_lossless_profile]"
},
"OP": "SET"

replaced with the jinja2 template described below.

Port parameters to profile look-up table:

{%- set portconfig2profile = {
	'40000_5m'   : 'pg_lossless_40G_5m_profile',  //name + size?
	'40000_40m'  : 'pg_lossless_40G_40m_profile',
	'40000_300m' : 'pg_lossless_40G_300m_profile',

	'50000_5m'   : 'pg_lossless_50G_5m_profile',
	'50000_40m'  : 'pg_lossless_50G_40m_profile',
	'50000_300m' : 'pg_lossless_50G_300m_profile',

	'100000_5m'  : 'pg_lossless_100G_5m_profile',
	'100000_40m' : 'pg_lossless_100G_40m_profile',
	'100000_300m': 'pg_lossless_100G_300m_profile'
	
	...
	}
-%}

Port and neighbor role to cable length look-up table:

{% set ports2cable = {
	'ToRRouter_Server'       : '5m',
	'LeafRouter_ToRRouter'   : '40m',
	'SpineRouter_LeafRouter' : '300m'
	}
%}

Macro (function) to determine cable length

{% set switch_role = minigraph_devices[minigraph_hostname]['type'] %}

{% macro cable_length(interface_name) %}
// pseudocode
if found in minigraph
	return ethernet_interfaces['cable']
else	
	{% set nei = '"'+minigraph_neighbors[interface['name']]['name']+'"' -%}
	{% set nei_role = minigraph_devices[nei]['type'] -%}
	if found switch_role + nei_role 
		return {{ ports2cable[switch_role_nei_role] or ports2cable[nei_role_switch_role]}}
	else
		return max_length
{%- endmacro %}

Loop to generate Ethernet port-to-profile mapping tables:

{% pg_range = '3-4' %}

{% set ingress_lossless_pg_pool_size = 0 %}
{% for interface in ethernet_interfaces %}
	// pseudocode
	{% set speed = interface['speed'] %}
	{% cable = cable_length(interface['name']) %}
	{% set port_config = speed + '_' + cable -%}
	if !(portconfig2profile has_key port_config)
		port_config = find_closest_greater_profile(speed, cable)
	profile = portconfig2profile[port_config]			
	ingress_lossless_pg_pool_size += profile.size
	{
		"BUFFER_PG_TABLE:{{ interface['name'] }}:{{ pg_range }}": {
		        "profile" : "[BUFFER_PROFILE_TABLE:{{ profile }}]"
		},
		"OP": "SET"
	}{% if not loop.last %},{% endif %}

{% endfor %}

BUFFER_POOL_TABLE update

Buffers pool ingress_lossless_pool previously used for all ingress lossless profiles is now split into two parts:

  • static part with the same name ingress_lossless_pool and size decreased but enough to serve all lossless profiles except the PG
  • PG part which is calculated as a sum of sizes needed for profiles used for all ports.

Example:

{%
	{
	    "BUFFER_POOL_TABLE:ingress_lossless_pg_pool": {
	        "size": "{{ ingress_lossless_pg_pool_size }}",
	        "type": "ingress",
	        "mode": "dynamic"
	    },
	    "OP": "SET"
	}
%}

ingress_lossless_pg_pool_size calculation is covered in chapter "BUFFER_PG_TABLE update"

Other changes

The declaration of the following tables could be replaced with the template in the interfaces list part. This will make config file more generic

  • BUFFER_PORT_INGRESS_PROFILE_LIST
  • BUFFER_PORT_EGRESS_PROFILE_LIST
  • BUFFER_QUEUE_TABLE

E.g.

{
	"BUFFER_QUEUE_TABLE:{{ all_ethernet_interfaces }}:0-1": {
		"profile" : "[BUFFER_PROFILE_TABLE:q_lossy_profile]"
	},
	"OP": "SET"
}

The value of all_ethernet_interfaces variable can be assigned in the loop used for PG profiles (see chapter "BUFFER_PG_TABLE update")

Components to be updated (Part #2)

Changes in part #2 implement port buffers configuration in run time.

Data Model Update

New table BUFFER_PORT_CONFIG_TO_PG_PROFILE_TABLE

Generated from minigraph's portconfig2profile dictionary:

"BUFFER_PORT_CONFIG_TO_PG_PROFILE_TABLE": {
	"40000_5m" : "[BUFFER_PROFILE_TABLE:pg_lossless_40000_5m_profile]"
	"50000_5m" : "[BUFFER_PROFILE_TABLE:pg_lossless_40000_5m_profile]"
	"100000_5m" : "[BUFFER_PROFILE_TABLE:pg_lossless_100G_5m_profile]"
	...
},
"OP": "SET"

This table is used to pick buffers profile for port by port speed and cable length.
Port speed could be read from the PORT_TABLE table (or in update handlers).

TBD: where to get cable length (port role) in run time? Possible solution: (Until CONFIG_DB is implemented) On switch configuration stage generate table:

"BUFFER_PORT_CABLE_LENGTH_TABLE": {
	"Ethernet0" : "5m"
	"Ethernet4" : "5m"
	"Ethernet8" : "40m"
	...
},
"OP": "SET"

Orchagent Update

BufferOrch component updates port buffer profile on port parameters change. Buffer profile update is done by changing "profile" parameter in the appropriate BUFFER_PG_TABLE table.
Currently it is planned to update buffer configuration only on port speed change.
Existing BUFFER_PG_TABLE updates handler will take care of setting new parameter on the port.
To handle this event bufferorch uses existing Observer mechanism (PortsOrch will notify about actual speed change)

Alternative #1:

  • subscribe to the PORT_TABLE updates and on port speed change update port's buffer profile
    • pros: it is simpler(?) (it is not, actually) and no changes in other components code
    • cons: changes in APP DB happens before parameters actually applied (pro?).

Handler of BUFFER_PORT_CONFIG_TO_PG_PROFILE_TABLE table in bufferorch handling includes only storing locally (std::map) a look-up table which helps to convert speed/cable to buffer profile name.

Open questions

Clone this wiki locally