You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thanks for the huge effort spent on reverse engineering AMX and documenting it. It is really appreciated. I've been able to try out AMX for an application of mine and that would have been impossible without your extensive documentation.
I just wanted to confirm a few things, and I was hoping with the experience acquired with your reverse engineering effort, you could confirm a few things. Perhaps even add some pointers to enhance your already excellent documentation.
Am I understanding correctly that there is some redundancy between matint/matfp and mac16/fm[as][16,32,64] when you only need to do outer products? Essentially, by choosing ALU mode 0 or 1 in matint/matfp, it appears to me you can emulate almost all functionality of mac16/fm[as][16,32,64]. Is that right or do you see any particular advantage to mac16/fm[as][16,32,64] vs matint/matfp? I'm just trying to understand the thought process of the AMX designers, and why would they have separate instructions when they could just merge them together.
With that said, one functionality I was unable to find in matint/matfp was the ability to do a multiply-only operation x*y rather than accumulating/subtracting with z. With mac16/fm[as][16,32,64] this can be done by setting bit 27 to 1. This is important e.g. for the first loop iteration in a matrix multiply, when z may already have non-zero values. Is there a combination of bits I can use to do an outer product without accumulating/subtracting from z?
The text was updated successfully, but these errors were encountered:
RE 1, my assumption is that we're seeing ISA evolution; there was an AMX on iPhone hardware before AMX on M1, and my guess is that mac16/fm[as][16,32,64] were in the first iteration of AMX, and then matint /matfp were added later. If this were the case, then if you had a zoo of old iPhones, you could probably find the point in history when the new instructions were added.
RE 2, rather than peeling one loop iteration and having that iteration skip the z input, issue some instruction to initialise the relevant part of z to zero (e.g. fma16 with 27=1, 28=1, 29=1, 62=1 to set all of z to zero).
This is a question rather than an issue.
First of all, thanks for the huge effort spent on reverse engineering AMX and documenting it. It is really appreciated. I've been able to try out AMX for an application of mine and that would have been impossible without your extensive documentation.
I just wanted to confirm a few things, and I was hoping with the experience acquired with your reverse engineering effort, you could confirm a few things. Perhaps even add some pointers to enhance your already excellent documentation.
matint
/matfp
andmac16
/fm[as][16,32,64]
when you only need to do outer products? Essentially, by choosing ALU mode 0 or 1 inmatint
/matfp
, it appears to me you can emulate almost all functionality ofmac16
/fm[as][16,32,64]
. Is that right or do you see any particular advantage tomac16
/fm[as][16,32,64]
vsmatint
/matfp
? I'm just trying to understand the thought process of the AMX designers, and why would they have separate instructions when they could just merge them together.matint
/matfp
was the ability to do a multiply-only operationx*y
rather than accumulating/subtracting withz
. Withmac16
/fm[as][16,32,64]
this can be done by setting bit 27 to 1. This is important e.g. for the first loop iteration in a matrix multiply, whenz
may already have non-zero values. Is there a combination of bits I can use to do an outer product without accumulating/subtracting fromz
?The text was updated successfully, but these errors were encountered: