-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tilize block not working for less than 32 rows #14609
Comments
@nvelickovicTT @ncvetkovicTT do you guys know when you'll be able to get to this issue? |
@sjameelTT we are currently looking into a related issue with tilize on BH. Most likely it is the same underlying cause as in your case, but in any case, we'll know more when we fix the one currently investigated. |
Thanks @nvelickovicTT , hoping that fixing the other issue solves this one! |
Escalating this issue to P0 due to blocking multiple items. |
- Issue is with transpose_wh_init_short not perfo- rming sync between MATH and PACK. One way to resolve it is to use full transpose_wh_init. Further investigation is needed on proper solution.
Hey @zzigler-tt @sjameelTT, I have pushed a workaround for this issue to the |
thanks @ncvetkovicTT , let us know if you can get an ETA on how long a proper fix might take. If it is long then we will go with the workaround. If it's a couple days we can wait. Either is fine, it's just knowing the ETA will help us take the right decision. |
You'll have the ETA on Monday <3 |
TL;DR: I think that I won't find a proper fix next week, so I say that we'll have the solution after 18 Nov. Hey @sjameelTT @ntarafdar, I tried out some potential solutions for other problems that I thought might help with this one, but to no avail. To give you a brief overview, there seems to be issues with synchronizing the Tensix cores in the kernel that's in use. I tried moving around some synchronization points in the underlying LLKs, but I have to do proper debugging. Instead of analyzing the LLKs visually and trying to figure out where the sync issue is, I'll probably have to look at some waveforms which will take some time to set up. Introducing a couple of NOPs in transpose_wh_init_short fixes the issue. The way I would further debug this is to have one dump with a working number of NOPs and another with a failing number of NOPs. I hope this is possible to extract, but more likely there will be a gradual increase of failed runs as I reduce the number of NOPs. Since I have to set everything up, there will be a day or two of overhead work. @nvelickovicTT do you agree on the ETA? |
Could the issue be arising from Blackhole being able to bundle 4 tensix instructions into a single packet? |
@abhullar-tt I am not sure, but it seems like we're really missing some STALLWAIT somewhere. |
@abhullar-tt are you referring to coalesced writes? That is an interesting perspective, we haven't tried disabling that and testing. |
Hey @abhullar-tt, @sjameelTT and @zzigler-tt, you can find another branch called ncvetkovic/14609_transpose_tilize_bug which proposes another fix for the issue. @rtawfik01 remembered that there was already an issue behaving like this. The real cause of the bug are not the out-of-sync PACK/MATH as I initially thought, but the fact that I think that the issue where this was first mentioned is #12234, but I am not sure if this was really fixed or not, do you perhaps know @ttmtrajkovic? @nvelickovicTT FYI |
@ttmtrajkovic just to recap, Those flags can only be toggled when both math & pack are fully idle, so that means in fused kernels with pack_untilize, each init function will need syncs for both math & pack to be idle in order to reconfig those bits. Enabling those bits caused issues with destination register debug prints, and also potential increase in power draw for the destination reg. If the debug print issue is fixed, the above bits can be set to true for all ops and it will not impact functionality. Otherwise if we don't want these bits set as true for all ops, then the workaround here is to add CSR waits to make sure all transactions in-flight for both pack & math are complete. |
To recreate
git checkout sjameel/transpose_tilize_bug
build
TT_METAL_DPRINT_ONE_FILE_PER_RISC=1 TT_METAL_DPRINT_CORES=0,0 pytest tests/tt_eager/python_api_testing/unit_testing/misc/test_transpose.py::test_tranpose_hw_rm_no_padding
It will print these dprint outputs in
generated/dprint
Before tilize_block data:
after tilize (human readable)
Bottom 4 rows have incorrect values in the last 8 or so values.
The text was updated successfully, but these errors were encountered: