CodeGen: Implement support for math.lerp lowering #1609

zeux · 2025-01-09T18:22:40Z

To implement math.lerp without branches, we add SELECT_NUM which
selects one of the two inputs based on the comparison condition.

For simplicity, we only support C == D for now; this can be extended to
a more generic version with a IrCondition operand E, but that requires
more work on the SSE side (to flip the comparison for some conditions
like Greater, and expose more generic vcmpsd).

Note: On AArch64 this will effectively result in a change in floating point
behavior between native code and non-native code: clang synthesizes
fmadd (because floating point contraction is allowed by default, and the
arch always has the instruction), whereas this change will use fmul+fadd.

I am not sure if this is good or bad, and if this is a problem in C or not.
Specifically, clang's behavior results in different results between X64
and AArch64 when not using codegen, and with this change the behavior
when using codegen is... the same? :)

Fixing this will require either using LERP_NUM instead and hand-coding
lowering, or exposing some sort of "quasi" MADD_NUM (which would
lower to fma on AArch64 and mul+add on X64).

A small benefit to the current approach is lerp(1, 5, t) constant-folds the
subtraction. With LERP_NUM this optimization will need to be implemented
manually as a partial constant-folding for LERP_NUM.

A similar problem exists today for vector.cross & vector.dot. So maybe this
is not something we need to fix, unsure. Thoughts welcome, I'll leave this
in draft for now.

zeux · 2025-01-09T19:31:08Z

I'll wait until the next upstream sync to rebase this anyway but I'm leaning towards keeping this implementation for now.

Specifically:

There is already a set of functions that have similar discrepancies between NCG & interpreted execution
The behavior in interpreted execution is different between 32-bit ARM v7 (mul+add) and 64-bit ARM v8 (fma)
In typical offline compilation contexts like Roblox, if the bytecode is compiled on a different architecture, constant folding can compute different results
The general problem is not quite limited to fma either: a lot of trigonometric functions will return different results based on the libm used, denormal behavior is different based on FTZ flags (that may differ between compiler & runtime) and even if the flags are the same, FTZ behavior is different on ARM vs X86
MSVC does not use fma for Windows-on-ARM builds for lerp implementation, so using fma in codegen would create another inconsistency there.

So basically this space just feels too fragmented for any decision to be obviously right. Henceforth the "use mul+add" approach does not seem obviously wrong :)

To implement math.lerp without branches, we add SELECT_NUM which selects one of the two inputs based on the comparison condition. For simplicity, we only support C == D for now; this can be extended to a more generic version with a IrCondition operand E, but that requires more work on the SSE side (to flip the comparison for some conditions like Greater, and expose more generic vcmpsd).

vegorov-rbx · 2025-01-13T20:48:40Z

Yes, there are many small inconsistencies like that, so we provide the best effort solution, but do not place the strictness guarantees on floating-point.
So it's ok to proceed and if we want to use compiler flags to improve the situation in one case or another, that can be done separately.

CodeGen/src/IrLoweringA64.cpp

CodeGen/include/Luau/IrData.h

- A64 can reuse any of a/b/c/d for output register - X64 can reuse c/d for output register; only b can't be reused because it might be clobbered when a is loaded from memory - Remove unused condition argument from SELECT_NUM as we assume == for now

zeux force-pushed the fnlerp-cg branch from 341fd55 to 07578df Compare January 10, 2025 19:46

zeux marked this pull request as ready for review January 10, 2025 19:46

zeux requested a review from vegorov-rbx January 10, 2025 19:46

vegorov-rbx reviewed Jan 13, 2025

View reviewed changes

CodeGen/src/IrLoweringA64.cpp Outdated Show resolved Hide resolved

vegorov-rbx reviewed Jan 13, 2025

View reviewed changes

CodeGen/include/Luau/IrData.h Show resolved Hide resolved

Address code review feedback

9641df9

- A64 can reuse any of a/b/c/d for output register - X64 can reuse c/d for output register; only b can't be reused because it might be clobbered when a is loaded from memory - Remove unused condition argument from SELECT_NUM as we assume == for now

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodeGen: Implement support for math.lerp lowering #1609

CodeGen: Implement support for math.lerp lowering #1609

zeux commented Jan 9, 2025 •

edited

Loading

zeux commented Jan 9, 2025

vegorov-rbx commented Jan 13, 2025

CodeGen: Implement support for math.lerp lowering #1609

Are you sure you want to change the base?

CodeGen: Implement support for math.lerp lowering #1609

Conversation

zeux commented Jan 9, 2025 • edited Loading

zeux commented Jan 9, 2025

vegorov-rbx commented Jan 13, 2025

zeux commented Jan 9, 2025 •

edited

Loading