Skip to content

Commit

Permalink
#0: Bugfix for launch message setting on watcher hang for Tensix
Browse files Browse the repository at this point in the history
On active erisc, when we exit to base FW due to a watcher assert, we
need to set the launch message to DONE. This is because the next run
will look at the launch message and try to force exit a running kernel
if it finds one via launch message GO. The launch message was being set
to DONE for Tensix as well, causing SD to finish before picking up the
assert.
  • Loading branch information
tt-dma committed Sep 20, 2024
1 parent a3afddb commit c78033c
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 5 deletions.
8 changes: 5 additions & 3 deletions tt_metal/hw/inc/debug/assert.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,14 @@ void assert_and_hang(uint32_t line_num) {
v->which = debug_get_which_riscv();
}

// Update launch msg to show that we've exited.
// Hang, or in the case of erisc, early exit.
#if defined(COMPILE_FOR_ERISC)
// Update launch msg to show that we've exited. This is required so that the next run doesn't think there's a kernel
// still running and try to make it exit.
tt_l1_ptr launch_msg_t *launch_msg = GET_MAILBOX_ADDRESS_DEV(launch);
launch_msg->go.run = RUN_MSG_DONE;

// Hang, or in the case of erisc, early exit.
#if defined(COMPILE_FOR_ERISC)
// This exits to base FW
internal_::disable_erisc_app();
erisc_early_exit(eth_l1_mem::address_map::ERISC_MEM_MAILBOX_STACK_SAVE);
#endif
Expand Down
5 changes: 3 additions & 2 deletions tt_metal/hw/inc/debug/sanitize_noc.h
Original file line number Diff line number Diff line change
Expand Up @@ -130,11 +130,12 @@ inline void debug_sanitize_post_noc_addr_and_hang(
v[noc_id].invalid = invalid;
}

// Update launch msg to show that we've exited.
#if defined(COMPILE_FOR_ERISC)
// Update launch msg to show that we've exited. This is required so that the next run doesn't think there's a kernel
// still running and try to make it exit.
tt_l1_ptr launch_msg_t *launch_msg = GET_MAILBOX_ADDRESS_DEV(launch);
launch_msg->go.run = RUN_MSG_DONE;

#if defined(COMPILE_FOR_ERISC)
// For erisc, we can't hang the kernel/fw, because the core doesn't get restarted when a new
// kernel is written. In this case we'll do an early exit back to base FW.
internal_::disable_erisc_app();
Expand Down

0 comments on commit c78033c

Please sign in to comment.