Skip to content

Buffers Configuration Update design

Andriy Moroz edited this page Aug 31, 2017 · 4 revisions

Overview

Switch Buffers Configuration feature allows to distribute buffer memory among the ports in order to guarantee lossless traffic flow. Current buffers configuration is predefined for the "worst case scenario" and does not take into account particular port settings and conditions.
Suggested update to the Buffers Configuration assumes dynamic profile selection based on current port parameters to ensure the most efficient buffer memory usage and maximize switch throughput.

General feature design

Port speed supported: 10G, 25G, 40G, 50G, 100G
Cable length supported: 5m, 40m, 300m

Port speed is explicitly specified in the minigraph configuration file.
Cable length for each port can be specified in the minigraph or determined from the device port role and the port neighbor device role. This is also stored in the minigraph under PngDec/Devices/Device, tag 'type':

  <Device i:type="ToRRouter">
      ...
  </Device>

Table below shows mapping of neighbor ports roles to cable length:

Device and Neighbor roles Cable length
ToRRouter(T0) - Server 5m
ToRRouter(T0) - LeafRouter(T1) 40m
LeafRouter(T1) - SpineRouter(T2) 300m

Buffers Configuration profiles for all combinations of port speed + cable length should be defined. Profile for each port is selected basing on port speed and attached cable length. If no exactly matching mapping found, the closest option will be used with the greater parameters.

Buffers Configuration update will be done in two stages. The first part of the update includes changing of the data model and the buffers configuration json generation. Second part implements port buffer profile update in run time. Update is triggered by the port speed change.

Components to be updated

Implementation of this feature consists of two parts. Part 1 includes initial buffers configuration performed only on switch start. Part 2 allows buffer configuration change on port speed change.

Part 1: Initial configuration

Changes in Part1 implement port buffers configuration on switch initialization.

Data Model Update

Update buffers configuration in json file

Files *buffers.json (e.g. msn2700.32ports.buffers.json) located at sonic-buildimage/src/sonic-swss/swssconfig/sample/ contain buffers related switch configuration (pools, profiles, binding of profiles to ports, etc).

New profiles

The number of profiles as well as their parameters are hardware specific and should be calculated for every switch model separately. The table below contains the list of profiles for Mellanox MSN2700

Profile Name Size threshold Pool Name Xon Xoff
pg_lossless_40G_5m_profile 34K 1 ingress_lossless 18K 16K
pg_lossless_50G_5m_profile 34K 1 ingress_lossless 18K 16K
pg_lossless_100G_5m_profile 36K 1 ingress_lossless 18K 18K
pg_lossless_40G_40m_profile 41K 1 ingress_lossless 18K 23K
pg_lossless_50G_40m_profile 41K 1 ingress_lossless 18K 23K
pg_lossless_100G_40m_profile 53K 1 ingress_lossless 18K 35K
pg_lossless_40G_300m_profile 92K 1 ingress_lossless 18K 74K
pg_lossless_50G_300m_profile 92K 1 ingress_lossless 18K 74K
pg_lossless_100G_300m_profile 180K 1 ingress_lossless 18K 162K
... ... ... ... ... ...
Profiles declaration (table BUFFER_PROFILE_TABLE)
  • PG Profiles for all combinations of supported speed and cable length should be declared.

  • PG Profiles declared as map in the j2 template like this:

      {% set pg_profiles = {
          'pg_lossless_10G_5m_profile':   { 'xon': 18432, 'xoff': 16384, 'size': 34816, 'dynamic_th': 1 },
          'pg_lossless_25G_5m_profile':   { 'xon': 18432, 'xoff': 16384, 'size': 34816, 'dynamic_th': 1 },
          'pg_lossless_40G_5m_profile':   { 'xon': 18432, 'xoff': 16384, 'size': 34816, 'dynamic_th': 1 },
          'pg_lossless_50G_5m_profile':   { 'xon': 18432, 'xoff': 16384, 'size': 34816, 'dynamic_th': 1 },
          ...
      %}
    
  • "SET" block will be generated only for those profiles which were actually used for at least one port.

BUFFER_PG_TABLE update

Existing PG configuration in json file:

"BUFFER_PG_TABLE:Ethernet0,Ethernet4,Ethernet8,...:3-4": {
 	"profile" : "[BUFFER_PROFILE_TABLE:pg_lossless_profile]"
},
"OP": "SET"

replaced with the jinja2 template described below.

Port parameters to profile look-up table:

{%- set portconfig2profile = {
	'40000_5m'   : 'pg_lossless_40G_5m_profile',  //name + size?
	'40000_40m'  : 'pg_lossless_40G_40m_profile',
	'40000_300m' : 'pg_lossless_40G_300m_profile',

	'50000_5m'   : 'pg_lossless_50G_5m_profile',
	'50000_40m'  : 'pg_lossless_50G_40m_profile',
	'50000_300m' : 'pg_lossless_50G_300m_profile',

	'100000_5m'  : 'pg_lossless_100G_5m_profile',
	'100000_40m' : 'pg_lossless_100G_40m_profile',
	'100000_300m': 'pg_lossless_100G_300m_profile'
	
	...
	}
-%}

Port and neighbor role to cable length look-up table:

{% set ports2cable = {
	'ToRRouter_Server'       : '5m',
	'LeafRouter_ToRRouter'   : '40m',
	'SpineRouter_LeafRouter' : '300m'
	}
%}

Macro (function) to determine cable length

{% set switch_role = minigraph_devices[minigraph_hostname]['type'] %}

{% macro cable_length(interface_name) %}
// pseudocode
if found in minigraph
	return ethernet_interfaces['cable']
else	
	{% set nei = '"'+minigraph_neighbors[interface['name']]['name']+'"' -%}
	{% set nei_role = minigraph_devices[nei]['type'] -%}
	if found switch_role + nei_role 
		return {{ ports2cable[switch_role_nei_role] or ports2cable[nei_role_switch_role]}}
	else
		return max_length
{%- endmacro %}

Loop to generate Ethernet port-to-profile mapping tables:

{% pg_range = '3-4' %}

{% set ingress_lossless_pg_pool_size = 0 %}
{% for interface in ethernet_interfaces %}
	// pseudocode
	{% set speed = interface['speed'] %}
	{% cable = cable_length(interface['name']) %}
	{% set port_config = speed + '_' + cable -%}
	if !(portconfig2profile has_key port_config)
		port_config = find_closest_greater_profile(speed, cable)
	profile = portconfig2profile[port_config]			
	ingress_lossless_pg_pool_size += profile.size
	{
		"BUFFER_PG_TABLE:{{ interface['name'] }}:{{ pg_range }}": {
		        "profile" : "[BUFFER_PROFILE_TABLE:{{ profile }}]"
		},
		"OP": "SET"
	}{% if not loop.last %},{% endif %}

{% endfor %}
BUFFER_POOL_TABLE update

Buffers pool ingress_lossless_pool previously used for all ingress lossless profiles is now split into two parts:

  • static part with the same name ingress_lossless_pool and size decreased but enough to serve all lossless profiles except the PG
  • PG part which is calculated as a sum of sizes needed for profiles used for all ports.

Example:

{%
	{
	    "BUFFER_POOL_TABLE:ingress_lossless_pg_pool": {
	        "size": "{{ ingress_lossless_pg_pool_size }}",
	        "type": "ingress",
	        "mode": "dynamic"
	    },
	    "OP": "SET"
	}
%}

ingress_lossless_pg_pool_size calculation is covered in chapter "BUFFER_PG_TABLE update"

Other changes

The declaration of the following tables could be replaced with the template in the interfaces list part. This will make config file more generic

  • BUFFER_PORT_INGRESS_PROFILE_LIST
  • BUFFER_PORT_EGRESS_PROFILE_LIST
  • BUFFER_QUEUE_TABLE

E.g.

{
	"BUFFER_QUEUE_TABLE:{{ all_ethernet_interfaces }}:0-1": {
		"profile" : "[BUFFER_PROFILE_TABLE:q_lossy_profile]"
	},
	"OP": "SET"
}

The value of all_ethernet_interfaces variable can be assigned in the loop used for PG profiles (see chapter "BUFFER_PG_TABLE update")

Components to be updated (Part #2)

Changes in part #2 implement port buffers configuration in run time.

Data Model Update

Profiles declaration (table BUFFER_PROFILE_TABLE)
  • All profiles will be declared in the initial json file.
  • new field status will be added to the profile table to indicate profile status. Values for the field "active" or "inactive". Value will be set to "active" when profile is created in SAI and used by one or more ports. Value "inactive" means profile is not used and not created in SAI.
; field   = value
status    = "active/inactive"   ; Session state.

Initially all profiles should be declared with status "inactive".

Orchagent Update

BufferOrch component should be updated to create only profiles which were actually used. Field status should be updated accordingly.

Open questions

  • Where to get cable length and port_config-2-buffer profile mapping in run-time (Part 2)?
Clone this wiki locally