You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support for packed vector instructions for floating point and integer operations.
Design and implement a generic signature that supports various explicit operations (e.g., mul, add) on, for instance, 64-bit floating point values (in e.g., 256bit packed vector registers).
Design and implement various structures that matches the above signature (e.g., for packed 64-bit floats and for packed 64-bit integers). Make use of the MLKit prim feature for intrinsics.
Implement support for the intrinsics in the Compiler/Lambda/LambdaExp MLKit intermediate language to be targeted by the operations in the structures. Implement support for the operations all the way down to the Compiler/Backend/X64/CodeGenX64 / Compiler/Backend/X64/CodeGenUtilX64 modules (e.g., extend the operations in Compiler/Backend/PrimName.sml)
Implement operations for loading from and storing to memory. We can use the BlockF64 values for representing and allocating memory.
Discussion.
An important aspect here is that the implementation will have to include boxing-operations that implicitly box the vector values into memory. The optimiser can then eliminate box-unboxing and unbox-box compositions. The reason is that, in general, it is impossible to ensure that a value is not passed to a generic function, stored in a data structure, or captured in a closure; it is assumed that all values can be represented in one 64-bit word (perhaps tagged with the LSB being 1, if the GC should not traverse the value).
I foresee some issues with implementing support for register allocation on the ymm registers. Also, We must make sure that the optimiser (i.e., module Compiler/Lambda/OptLambda) does not pass wide 256-bit values to generic functions. Also, such values cannot be passed as arguments to functions and neither can they be stored in closures. They are solely for operations in basic blocks. Ideally, these restrictions could be enforced in Compiler/Lambda/LambdaStatSem.
An interesting application for these operations would be to make use of the operations to implement efficiently some of the operations in the Real64Array / Real64Vector structures.
Add support for packed vector instructions for floating point and integer operations.
Design and implement a generic signature that supports various explicit operations (e.g., mul, add) on, for instance, 64-bit floating point values (in e.g., 256bit packed vector registers).
Design and implement various structures that matches the above signature (e.g., for packed 64-bit floats and for packed 64-bit integers). Make use of the MLKit
prim
feature for intrinsics.Implement support for the intrinsics in the
Compiler/Lambda/LambdaExp
MLKit intermediate language to be targeted by the operations in the structures. Implement support for the operations all the way down to theCompiler/Backend/X64/CodeGenX64
/Compiler/Backend/X64/CodeGenUtilX64
modules (e.g., extend the operations inCompiler/Backend/PrimName.sml
)Implement operations for loading from and storing to memory. We can use the
BlockF64
values for representing and allocating memory.Discussion.
An important aspect here is that the implementation will have to include boxing-operations that implicitly box the vector values into memory. The optimiser can then eliminate box-unboxing and unbox-box compositions. The reason is that, in general, it is impossible to ensure that a value is not passed to a generic function, stored in a data structure, or captured in a closure; it is assumed that all values can be represented in one 64-bit word (perhaps tagged with the LSB being 1, if the GC should not traverse the value).
I foresee some issues with implementing support for register allocation on the
ymm
registers. Also, We must make sure that the optimiser (i.e., moduleCompiler/Lambda/OptLambda
) does not pass wide 256-bit values to generic functions. Also, such values cannot be passed as arguments to functions and neither can they be stored in closures. They are solely for operations in basic blocks. Ideally, these restrictions could be enforced inCompiler/Lambda/LambdaStatSem
.An interesting application for these operations would be to make use of the operations to implement efficiently some of the operations in the
Real64Array
/Real64Vector
structures.References
Book
Optimizing Subroutines in Assembly Language
x86 and amd64 instruction reference
Formally optimal boxing
Notes on x86-64 Programming
Twitter-post on the AVX landscape
The text was updated successfully, but these errors were encountered: