Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Orchagent] Add optional create_switch timeout parameter #3258

Merged
merged 4 commits into from
Sep 25, 2024

Conversation

pavannaregundi
Copy link
Contributor

@pavannaregundi pavannaregundi commented Aug 16, 2024

Change adds optional create_switch timeout as command line parameter to orchagent.

What I did
Change adds optional create_switch timeout as command line parameter to orchagent.

Why I did it
Older platforms are seeing increase in time required in bookworm based branches.

How I verified it
Changes are verified to see passed timeout value is reflected only for create_switch api call from orchagent.

Details if related
Migrating some older armhf platform from SONiC 202311 to master and 202405 branch, resulted in syncd create_switch timeout.
It is analyzed to see that SAI SDK code remains the same and time consume by SAI SDK during init when some(snmp mgmt-framework lldp gnmi dhcp_relay radv teamd pmon bgp) of the docker in sonic are disabled is same as running on 202311(with same environment).
Some of the other observations are:

  • Switch create time varies from 50sec to 85 sec randomly depending on CPU load.
  • Running bullseye based 5.10 sonic kernel yielded same result on latest master.
  • Running docker ps -a in 202405 shows all dockers attempting to start, while in 202311 it delays most dockers
  • No single docker seems to have a significant impact

@Blueve
Copy link

Blueve commented Aug 16, 2024

@prsunny can you help review?

@kcudnik this is the alternative way to allow some low performance switch run higher SONiC version
Previous PR has closed: sonic-net/sonic-sairedis#1406

Change adds optional create_switch timeout as command line
parameter to orchagent.

Signed-off-by: Pavan Naregundi <[email protected]>
dvs.runcmd("cp /usr/bin/orchagent.sh /usr/bin/orchagent.sh_zmq_ut_backup")
dvs.runcmd("sed -i.bak 's/\/usr\/bin\/orchagent /\/usr\/bin\/orchagent -q tcp:\/\/127.0.0.1:8100 /g' /usr/bin/orchagent.sh")
dvs.runcmd("sed -i.bak 's/\/usr\/bin\/orchagent /\/usr\/bin\/orchagent -q tcp:\/\/127.0.0.1:8100 -t 60 /g' /usr/bin/orchagent.sh")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this specific test is changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the change is in main() passing a new argument to orchagent, found we that we are doing a similar argument change in this test case. This change does not affect the behavior of the test as timeout value we are sending is still same as default(60 sec). But helps to cover the UT for new code change.


void usage()
{
cout << "usage: orchagent [-h] [-r record_type] [-d record_location] [-f swss_rec_filename] [-j sairedis_rec_filename] [-b batch_size] [-m MAC] [-i INST_ID] [-s] [-z mode] [-k bulk_size] [-q zmq_server_address] [-c mode]" << endl;
cout << "usage: orchagent [-h] [-r record_type] [-d record_location] [-f swss_rec_filename] [-j sairedis_rec_filename] [-b batch_size] [-m MAC] [-i INST_ID] [-s] [-z mode] [-k bulk_size] [-q zmq_server_address] [-c mode] [-t create_switch_timeout]" << endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to change the existing timeout? How will this be controlled from user perspective?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option provides the flexibility to override the existing timeout, it is optional because we most of platforms' performance are good enough to create switch within default timeout.

The timeout setting can be controlled by platform designer, e.g. https://github.com/sonic-net/sonic-buildimage/pull/19928/files

@Blueve
Copy link

Blueve commented Sep 24, 2024

@pavannaregundi can you create PR for 202311 as there is conflict?

@prsunny prsunny merged commit 3c230d2 into sonic-net:master Sep 25, 2024
17 checks passed
@Blueve
Copy link

Blueve commented Oct 11, 2024

ADO: 29208990

rajkumar38 added a commit to rajkumar38/sonic-swss that referenced this pull request Oct 11, 2024
@kcudnik
Copy link
Contributor

kcudnik commented Oct 11, 2024

switch_create must take less than 30 seconds on warm boot scenario to not cause dataplane interruption, higher values are not acceptable in production at all

mssonicbld pushed a commit to mssonicbld/sonic-swss that referenced this pull request Oct 11, 2024
)

What I did
Change adds optional create_switch timeout as command line parameter to orchagent.

Why I did it
Older platforms are seeing increase in time required in bookworm based branches.
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #3324

mssonicbld pushed a commit that referenced this pull request Oct 11, 2024
What I did
Change adds optional create_switch timeout as command line parameter to orchagent.

Why I did it
Older platforms are seeing increase in time required in bookworm based branches.
yxieca pushed a commit that referenced this pull request Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants