Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if one test fails everything after fails #99

Closed
mgheorghe opened this issue May 2, 2023 · 11 comments
Closed

if one test fails everything after fails #99

mgheorghe opened this issue May 2, 2023 · 11 comments

Comments

@mgheorghe
Copy link

run create/remove only both will pass
run create/get/remove since get fails remove will fail also

run #94 all at once notice that half way through they will start failing
run #94 one file at a time and notice all will pass

observed same behavior with real switch and in the CI pipeline with saivs

from pprint import pprint

import pytest


class TestSaiVlan:
    # object with no parents

    def test_vlan_create(self, npu):
        commands = [
            {
                'name': 'vlan_1',
                'op': 'create',
                'type': 'SAI_OBJECT_TYPE_VLAN',
                'attributes': ['SAI_VLAN_ATTR_VLAN_ID', '10'],
            }
        ]

        results = [*npu.process_commands(commands)]
        print('======= SAI commands RETURN values create =======')
        pprint(results)
        assert all(results), 'Create error'

    def test_sai_vlan_attr_vlan_id_get(self, npu):
        commands = [
            {
                'name': 'sai_vlan_attr_vlan_id_get',
                'vlan_oid': '$vlan_1',
                'op': 'get',
                'type': 'SAI_OBJECT_TYPE_VLAN',
                'attributes': ['SAI_VLAN_ATTR_VLAN_ID', '111'],
            }
        ]
        results = [*npu.process_commands(commands)]
        print('======= SAI commands RETURN values get =======')
        pprint(results)
        assert all([result[1].value() == '10' for result in results]), 'Get error'

    def test_vlan_remove(self, npu):
        commands = [
            {
                'name': 'vlan_1',
                'op': 'remove',
                'type': 'SAI_OBJECT_TYPE_VLAN',
                'attributes': ['SAI_VLAN_ATTR_VLAN_ID', '10'],
            }
        ]

        results = [*npu.process_commands(commands)]
        print('======= SAI commands RETURN values remove =======')
        pprint(results)
        assert all(
            [result == 'SAI_STATUS_SUCCESS' for result in results]
        ), 'Remove error'

@andriy-kokhan
Copy link
Contributor

andriy-kokhan commented May 3, 2023

I'm assuming you are running this test with sairedis RPC. If so, syncd application is used as a SAI driver. Often, in case of some issue with SAI call, syncd application simply exits which causes subsequent TCs to fail.

If you check the logs (/var/log/syslog), at the very end, you will find something like this:

May  3 06:39:21 234e2ae01e6b syncd: :- sai_deserialize_object_id: invalid oid SAI_OBJECT_TYPE_VLAN
May  3 06:39:21 234e2ae01e6b syncd: :- run: Runtime error: :- sai_deserialize_object_id: invalid oid SAI_OBJECT_TYPE_VLAN

There are two issues with get TCs:

  1. get command should use a key from create command as a name
  2. result[1].value() should be replaced with result.value()
    def test_sai_vlan_attr_vlan_id_get(self, npu):
        commands = [
            {
                'name': 'vlan_1',
                'op': 'get',
                'attributes': ['SAI_VLAN_ATTR_VLAN_ID', ''],
            }
        ]
        results = [*npu.process_commands(commands)]
        print('======= SAI commands RETURN values get =======')
        pprint(results)
        assert all([result.value() == '10' for result in results]), 'Get error'

With these changes I was able to pass all 3 TCs.

# pytest --testbed=saivs_standalone -v -k "TestSaiVlan" 

collected 575 items / 572 deselected / 3 selected       

test_vlan_dd.py::TestSaiVlan::test_vlan_create PASSED                           [ 33%]
test_vlan_dd.py::TestSaiVlan::test_sai_vlan_attr_vlan_id_get PASSED             [ 66%]
test_vlan_dd.py::TestSaiVlan::test_vlan_remove PASSED                           [100%]

Obviously, SAI-C checks should be extended to catch misconfiguration like this. Will try to address this in subsequent PRs.

@andriy-kokhan
Copy link
Contributor

As per the latest updates, the TC should be changed as follows:

    def test_sai_vlan_attr_vlan_id_get(self, npu):
        commands = [
            {
                'name': 'vlan_1',
                'op': 'get',
                'attributes': ['SAI_VLAN_ATTR_VLAN_ID'],
            }
        ]
        results = [*npu.process_commands(commands)]
        print('======= SAI commands RETURN values get =======')
        pprint(results)
        assert all([result[0].value() == '10' for result in results]), 'Get error'

@mgheorghe
Copy link
Author

this issue is not about the get test case, it is to show that once one test fails everything after it fails.
the grpc explanation makes sense, if that is the root cause, can we reconnect in case we get disconnected ?
something like before each sai call check if connection if still active and if not reconnect

@andriy-kokhan
Copy link
Contributor

It's not just about the re-connection. In fact, syncd app restarts. So, the next TC should start from switch init phase again. The approach how you can deal with it is already proposed in some TCs. E.g.,

@pytest.fixture(autouse=True)

@mgheorghe
Copy link
Author

i'll have to look if the autouse will solve this:
probably too late but i finally i generated 2 examples please see will fail mgheorghe#2 will pass mgheorghe#3 using saivs

@andriy-kokhan
Copy link
Contributor

In your example the get operation cause the failure because of redundant parameters in the command.
Should be reduced to

        commands = [
            {
                'name': 'vlan_1',
                'op': 'get',
                'attributes': ['SAI_VLAN_ATTR_VLAN_ID'],
            }
        ]

That's why I'm saying that SAI-C checks should be extended to catch misconfiguration like this. The test should not fail because of the extra parameters even they are wrong.

@mgheorghe
Copy link
Author

mgheorghe commented May 4, 2023

i'm not concerned about the get test, my concern is with the remove failing when it should not.
while i see the example with the fixture, i'll give it a try for now but i still believe this should be handled at the framework level and user should not have to call npu.reset after every failure.

for now let me give the fixture a try, see how it goes and i'll come back with more feedback

@andriy-kokhan
Copy link
Contributor

That's the way how syncd works. So, in case of Redis RPC, we simply can not handle all possible scenarios in SAI-C itself.
Adding a few more check to avoid possible misconfigurations - #105

@mgheorghe
Copy link
Author

when running saivs i noticed sometimes after a test fail dyncD gets into a bad state and anything after that fails and i need to restart the containers.
ERROR api/test_vlan.py::TestSaiVlan::test_vlan_create - AssertionError: SyncD has not started yet...

what will be the best place to raise a syncd issue because this may not be a sai-challenger issue and looks more like syncd reliability issue.

@andriy-kokhan
Copy link
Contributor

Please check #154 . It should address this issue.

@andriy-kokhan
Copy link
Contributor

Closing as per comment above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants