Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test cases for panic/OPAL TI for TOD failover recory failure. #445

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

maheshsal
Copy link
Contributor

Two commits

Commit 1: hmi: Add test case to trigger TOD topology switch.

This test triggers the TOD topology failover on all the chips to see OPAL
TI and panic path to make sure OS does not get stuck while going down.

This test needs following skiboot and kernel commit to pass:

skiboot:
  497734984 opal/hmi: set a flag to inform OS that TOD/TB has failed.
  ca349b836 opal/hmi: Don't retry TOD recovery if it is already in failed state.
  017da88b2 opal/hmi: Fix double unlock of hmi lock in failure path.

kernel:
  http://patchwork.ozlabs.org/patch/1051379/

Commit 2: Opal TI: Add test for OPAL TI.

Trigger manual OPAL TI by directly setting scom address provided in
device-tree node ibm,sw-xstop-fir. This is to test basic functionality of
OPAL TI under normal circumstance.

Observations:

  • On Zaius, I see the panic + reboot after HMI failure works fine. But on one of the Witherspoon I have seen hangs in ipmi_msg_sync while dumping dmesg buffer to nvram (pnv_platform_error_reboot->panic_flush_kmsg_end->kmsg_dump->pstore_dump->OPAL..calls..->ipmi_queue_msg_sync). Investigating more to understand why we don't get ipmi timeout which can get systsem out of hang..

  • On Manual OPAL TI, I see following messages:
    3.24326|secure|SecureROM valid - enabling functionality
    4.57365|IPMI: shutdown requested

    I need to try this on few another system with latest PNOR.

NOTE: The above tests verifies that system reboots successfully after panic or OPAL TI OR else test fails with appropriate error message.

Tests can be run with below option independently:
--run testcases.OpTestHMIHandling.OpalTI
--run testcases.OpTestHMIHandling.TodTopologyFailoverOpalTI
--run testcases.OpTestHMIHandling.TodTopologyFailoverPanic

This test triggers the TOD topology failover on all the chips to see OPAL
TI and panic path to make sure OS does not get stuck while going down.

This test needs following skiboot and kernel commit to pass:

skiboot:
  497734984 opal/hmi: set a flag to inform OS that TOD/TB has failed.
  ca349b836 opal/hmi: Don't retry TOD recovery if it is already in failed state.
  017da88b2 opal/hmi: Fix double unlock of hmi lock in failure path.

kernel:
  http://patchwork.ozlabs.org/patch/1051379/

Signed-off-by: Mahesh Salgaonkar <[email protected]>
Trigger manual OPAL TI by directly setting scom address provided in
device-tree node ibm,sw-xstop-fir. This is to test basic functionality of
OPAL TI under normal circumstance.

Signed-off-by: Mahesh Salgaonkar <[email protected]>
@maheshsal
Copy link
Contributor Author

maheshsal commented Mar 22, 2019

Observations:

On Zaius, I see the panic + reboot after HMI failure works fine. But on one of the Witherspoon I
have seen hangs in ipmi_msg_sync while dumping dmesg buffer to nvram
(pnv_platform_error_reboot->panic_flush_kmsg_end->kmsg_dump->pstore_dump
->OPAL..calls..->ipmi_queue_msg_sync). Investigating more to understand why we don't get ipmi
timeout which can get systsem out of hang..

The hang mentioned above on witherspoon is now fixed by skiboot patch at http://patchwork.ozlabs.org/patch/1061289/

@hegdevasant
Copy link

Can you please rebase this PR?

-Vasant

@PraveenPenguin PraveenPenguin force-pushed the master branch 2 times, most recently from 4d0cb14 to b976629 Compare October 6, 2023 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants