CodeGen: Implement support for math.lerp lowering #1609
+108
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
To implement math.lerp without branches, we add SELECT_NUM which
selects one of the two inputs based on the comparison condition.
For simplicity, we only support C == D for now; this can be extended to
a more generic version with a IrCondition operand E, but that requires
more work on the SSE side (to flip the comparison for some conditions
like Greater, and expose more generic vcmpsd).
Note: On AArch64 this will effectively result in a change in floating point
behavior between native code and non-native code: clang synthesizes
fmadd (because floating point contraction is allowed by default, and the
arch always has the instruction), whereas this change will use fmul+fadd.
I am not sure if this is good or bad, and if this is a problem in C or not.
Specifically, clang's behavior results in different results between X64
and AArch64 when not using codegen, and with this change the behavior
when using codegen is... the same? :)
Fixing this will require either using LERP_NUM instead and hand-coding
lowering, or exposing some sort of "quasi" MADD_NUM (which would
lower to fma on AArch64 and mul+add on X64).
A small benefit to the current approach is
lerp(1, 5, t)
constant-folds thesubtraction. With LERP_NUM this optimization will need to be implemented
manually as a partial constant-folding for LERP_NUM.
A similar problem exists today for vector.cross & vector.dot. So maybe this
is not something we need to fix, unsure. Thoughts welcome, I'll leave this
in draft for now.