-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix port up/bfd sessions bringup notification delay issue. #3269
Fix port up/bfd sessions bringup notification delay issue. #3269
Conversation
@liuh-80 I built an image with this change and tested. For the first time boot up after installation on a single linecard, all ports come up in 8 minutes and all 34k routes are also installed. For subsequent reboot a single linecard, it takes about 7 minutes for all linkup and 34k routes installed. It seems this change addresses the issue. We need to do more testing to verify that, includes the OC testing. |
@bocon13 , we found a performance issue which cause by your PR, can you review this fix? Performance issue: sonic-net/sonic-buildimage#19569 |
@liuh-80 , we must need some UT to prevent such regression. |
thanks @liuh-80 ! just curious what is the result of Comsumer pop notification once VS pop until size of entry is 0? |
Will add sonic-swss test case to prevent this issue happen again, |
This will make Consumer pop all notifications belong to current consumer, so higher priority notification will be blocked. |
/AzurePipelines run Azure.sonic-swss |
Azure Pipelines successfully started running 1 pipeline(s). |
/AzurePipelines run Azure.sonic-swss |
Azure Pipelines successfully started running 1 pipeline(s). |
does this mean, bulk-api's (SAI) that orchagent invoke for routes etc will now have a limit of 128 entries |
Yes, it will have a 128 entries limit. |
I think it's a timing issue, for example the validation of this PR has lots of test case failed, but after I increase the wait_for_n_keys timeout, many test case passed. However this change do impact performance, because after this change every doTask() call can only handle 128 entries, so some scenario take longer time. I'm trying to only change RouteOrch to improve performance. |
ba0cd6d
to
ff0b951
Compare
We tried the fix in the PR in the Voq chassis and seeing the following issue |
Hi @saksarav-nokia , the code change in this PR does not change any orch logic, so the as my understand this issue is a bug in orchagent and it's already there. is there a plan to fix the issue? |
@bingwang-ms for viz |
…#3269) Fix port up/bfd sessions bringup notification delay issue. Why I did it Fix following issue: sonic-net/sonic-buildimage#19569 How I did it Revert change in Consumer::execute(), which introduced by this commit: 9258978#diff-96451cb89f907afccbd39ddadb6d30aa21fe6fbd01b1cbaf6362078b926f1f08 The change in this commit add a while loop: do { std::deque entries; table->pops(entries); update_size = addToSync(entries); } while (update_size != 0); The addToSync sync method will return the size of entries Which means, if there are massive routes notification, other high priority notification for example port up notification will blocked until all routes notification been handled.
Cherry-pick PR to 202405: #3328 |
Fix port up/bfd sessions bringup notification delay issue. Why I did it Fix following issue: sonic-net/sonic-buildimage#19569 How I did it Revert change in Consumer::execute(), which introduced by this commit: 9258978#diff-96451cb89f907afccbd39ddadb6d30aa21fe6fbd01b1cbaf6362078b926f1f08 The change in this commit add a while loop: do { std::deque entries; table->pops(entries); update_size = addToSync(entries); } while (update_size != 0); The addToSync sync method will return the size of entries Which means, if there are massive routes notification, other high priority notification for example port up notification will blocked until all routes notification been handled.
Fix port up/bfd sessions bringup notification delay issue.
Why I did it
Fix following issue:
sonic-net/sonic-buildimage#19569
Work item tracking
How I did it
Revert change in Consumer::execute(), which introduced by this commit:
9258978#diff-96451cb89f907afccbd39ddadb6d30aa21fe6fbd01b1cbaf6362078b926f1f08
The change in this commit add a while loop:
do
{
std::deque entries;
table->pops(entries);
update_size = addToSync(entries);
} while (update_size != 0);
The addToSync sync method will return the size of entries
Which means, if there are massive routes notification, other high priority notification for example port up notification will blocked until all routes notification been handled.
How to verify it
Pass all UT.
Manually verify issue fixed.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Fix port up/bfd sessions bringup notification delay issue.
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)