Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#14826: Remove misoptimizations from init code #14861

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nathan-TT
Copy link
Contributor

Ticket

#14826

Problem description

Empty kernels are larger than expected.

What's changed

  1. Stop wzerorange being recognized as memset. Memset is no longer pulled in.

  2. Reduce insns in data image copy. Original loop was 21 isnsn (3.5 per word), new loop is 10 insns (3.3 per word).

  3. Do not use a loop for residue. We only have to handle 0, 1 and 2 cases. A loop is more overhead.

  4. Sprinkle a few more unroll-inhibiting pragmas around.

Rename init code as do_crt1, to make it clearer what it is doing.

These changes remove 436 bytes from a kernel code.

Checklist

  • [ YES ] Post commit CI passes
  • Blackhole Post commit (if applicable)
  • Model regression CI testing passes (if applicable)
  • Device performance regression CI testing passes (if applicable)
  • New/Existing tests provide coverage for changes

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Clang-Tidy found issue(s) with the introduced code (1/4)

tt_metal/hw/firmware/src/brisck.cc Show resolved Hide resolved
tt_metal/hw/firmware/src/brisck.cc Show resolved Hide resolved
tt_metal/hw/firmware/src/erisc.cc Show resolved Hide resolved
tt_metal/hw/firmware/src/erisc.cc Show resolved Hide resolved
tt_metal/hw/firmware/src/erisc.cc Show resolved Hide resolved
tt_metal/hw/firmware/src/erisc.cc Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Clang-Tidy found issue(s) with the introduced code (2/4)

tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Clang-Tidy found issue(s) with the introduced code (3/4)

tt_metal/hw/inc/firmware_common.h Outdated Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/inc/firmware_common.h Show resolved Hide resolved
tt_metal/hw/firmware/src/ncrisck.cc Show resolved Hide resolved
tt_metal/hw/firmware/src/ncrisck.cc Show resolved Hide resolved
tt_metal/hw/toolchain/substitutes.cpp Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Clang-Tidy found issue(s) with the introduced code (4/4)

tt_metal/hw/toolchain/substitutes.cpp Show resolved Hide resolved
tt_metal/hw/toolchain/substitutes.cpp Outdated Show resolved Hide resolved
tt_metal/hw/firmware/src/trisck.cc Show resolved Hide resolved
tt_metal/hw/firmware/src/trisck.cc Show resolved Hide resolved
tt_metal/hw/firmware/src/idle_erisck.cc Show resolved Hide resolved
tt_metal/hw/firmware/src/idle_erisck.cc Show resolved Hide resolved
@nathan-TT
Copy link
Contributor Author

nathan-TT commented Nov 12, 2024

I was dissatisfied with the volatile asms, so changed to actual assembly instructions, hiding the form of the loop from the compiler.

All the git hub comments are unavoidable changes -- this is startup code, of course it's going to be 'exciting'

1) Stop wzerorange being recognized as memset
2) Reduce insns in data image copy
3) Do not use a loop for residue

Rename init code as do_crt1, to make it clearer what it is doing.
Copy link
Contributor

@pgkeller pgkeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment tweak requested

tt_metal/hw/inc/firmware_common.h Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants