-
Notifications
You must be signed in to change notification settings - Fork 11.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch DirectX Target to use the Itanium ABI #111632
base: main
Are you sure you want to change the base?
Conversation
To consolidate behavior of function mangling and limit the number of places that ABI changes will need to be made, this switches the DirectX target used for HLSL to use the Itanium ABI from the Microsoft ABI. The Itanium ABI has greater flexibility in decisions regarding mangling of new types of which we have more than a few yet to add. This required adding a function to call all global destructors as the Microsoft ABI had done. This requires a few changes to tests. Most notably the mangling style has changed which accounts for most of the changes. In making those changes, I took the opportunity to harmonize some very similar tests for greater consistency. I also shaved off some unneeded run flags that had probably been copied over from one test to another. Other changes effected by using the new ABI include using smaller types when possible in a few instances, eliminating an unnecessary alloca in one instance in this-assignment.hlsl, and changing the order of inout parameters getting copied in and out. That last is a subtle change in functionality, but one where there was sufficient inconsistency in the past that standardizing is important, but the particular direction of the standardization is less important for the sake of existing shaders. fixes llvm#110736
@llvm/pr-subscribers-backend-directx @llvm/pr-subscribers-clang Author: Greg Roth (pow2clk) ChangesTo consolidate behavior of function mangling and limit the number of places that ABI changes will need to be made, this switches the DirectX target used for HLSL to use the Itanium ABI from the Microsoft ABI. The Itanium ABI has greater flexibility in decisions regarding mangling of new types of which we have more than a few yet to add. This required adding a function to call all global destructors as the Microsoft ABI had done. This requires a few changes to tests. Most notably the mangling style has changed which accounts for most of the changes. In making those changes, I took the opportunity to harmonize some very similar tests for greater consistency. I also shaved off some unneeded run flags that had probably been copied over from one test to another. Other changes effected by using the new ABI include using smaller types when possible in a few instances, eliminating an unnecessary alloca in one instance in this-assignment.hlsl, and changing the order of inout parameters getting copied in and out. That last is a subtle change in functionality, but one where there was sufficient inconsistency in the past that standardizing is important, but the particular direction of the standardization is less important for the sake of existing shaders. fixes #110736 Patch is 142.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/111632.diff 48 Files Affected:
diff --git a/clang/lib/Basic/Targets/DirectX.h b/clang/lib/Basic/Targets/DirectX.h
index cf7ea5e83503dc..19b61252409b09 100644
--- a/clang/lib/Basic/Targets/DirectX.h
+++ b/clang/lib/Basic/Targets/DirectX.h
@@ -62,7 +62,7 @@ class LLVM_LIBRARY_VISIBILITY DirectXTargetInfo : public TargetInfo {
PlatformName = llvm::Triple::getOSTypeName(Triple.getOS());
resetDataLayout("e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:"
"32-f64:64-n8:16:32:64");
- TheCXXABI.set(TargetCXXABI::Microsoft);
+ TheCXXABI.set(TargetCXXABI::GenericItanium);
}
bool useFP16ConversionIntrinsics() const override { return false; }
void getTargetDefines(const LangOptions &Opts,
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 965e09a7a760ec..75dab596e1b2c4 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -2997,6 +2997,10 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction &CGF, const VarDecl &D,
if (D.isNoDestroy(CGM.getContext()))
return;
+ // HLSL doesn't support atexit.
+ if (CGM.getLangOpts().HLSL)
+ return CGM.AddCXXDtorEntry(dtor, addr);
+
// OpenMP offloading supports C++ constructors and destructors but we do not
// always have 'atexit' available. Instead lower these to use the LLVM global
// destructors which we can handle directly in the runtime. Note that this is
diff --git a/clang/test/CodeGenHLSL/ArrayTemporary.hlsl b/clang/test/CodeGenHLSL/ArrayTemporary.hlsl
index 63a30b61440eb5..7d77c0aff736cc 100644
--- a/clang/test/CodeGenHLSL/ArrayTemporary.hlsl
+++ b/clang/test/CodeGenHLSL/ArrayTemporary.hlsl
@@ -68,11 +68,11 @@ void call4(float Arr[2][2]) {
// CHECK: [[Tmp2:%.*]] = alloca [4 x float]
// CHECK: [[Tmp3:%.*]] = alloca [3 x i32]
// CHECK: call void @llvm.memcpy.p0.p0.i32(ptr align 4 [[Tmp1]], ptr align 4 [[FA2]], i32 8, i1 false)
-// CHECK: call void @"??$template_fn@$$BY01M@@YAXY01M@Z"(ptr noundef byval([2 x float]) align 4 [[Tmp1]])
+// CHECK: call void @_Z11template_fnIA2_fEvT_(ptr noundef byval([2 x float]) align 4 [[Tmp1]])
// CHECK: call void @llvm.memcpy.p0.p0.i32(ptr align 4 [[Tmp2]], ptr align 4 [[FA4]], i32 16, i1 false)
-// CHECK: call void @"??$template_fn@$$BY03M@@YAXY03M@Z"(ptr noundef byval([4 x float]) align 4 [[Tmp2]])
+// CHECK: call void @_Z11template_fnIA4_fEvT_(ptr noundef byval([4 x float]) align 4 [[Tmp2]])
// CHECK: call void @llvm.memcpy.p0.p0.i32(ptr align 4 [[Tmp3]], ptr align 4 [[IA3]], i32 12, i1 false)
-// CHECK: call void @"??$template_fn@$$BY02H@@YAXY02H@Z"(ptr noundef byval([3 x i32]) align 4 [[Tmp3]])
+// CHECK: call void @_Z11template_fnIA3_iEvT_(ptr noundef byval([3 x i32]) align 4 [[Tmp3]])
template<typename T>
void template_fn(T Val) {}
@@ -90,7 +90,7 @@ void template_call(float FA2[2], float FA4[4], int IA3[3]) {
// CHECK: [[Addr:%.*]] = getelementptr inbounds [2 x float], ptr [[FA2]], i32 0, i32 0
// CHECK: [[Tmp:%.*]] = load float, ptr [[Addr]]
-// CHECK: call void @"??$template_fn@M@@YAXM@Z"(float noundef [[Tmp]])
+// CHECK: call void @_Z11template_fnIfEvT_(float noundef [[Tmp]])
// CHECK: [[Idx0:%.*]] = getelementptr inbounds [2 x float], ptr [[FA2]], i32 0, i32 0
// CHECK: [[Val0:%.*]] = load float, ptr [[Idx0]]
diff --git a/clang/test/CodeGenHLSL/BasicFeatures/OutputArguments.hlsl b/clang/test/CodeGenHLSL/BasicFeatures/OutputArguments.hlsl
index 58237889db1dca..6afead4f233660 100644
--- a/clang/test/CodeGenHLSL/BasicFeatures/OutputArguments.hlsl
+++ b/clang/test/CodeGenHLSL/BasicFeatures/OutputArguments.hlsl
@@ -260,10 +260,10 @@ void order_matters(inout int X, inout int Y) {
// CHECK: store i32 [[VVal]], ptr [[Tmp0]]
// CHECK: [[VVal:%.*]] = load i32, ptr [[V]]
// CHECK: store i32 [[VVal]], ptr [[Tmp1]]
-// CHECK: call void {{.*}}order_matters{{.*}}(ptr noalias noundef nonnull align 4 dereferenceable(4) [[Tmp1]], ptr noalias noundef nonnull align 4 dereferenceable(4) [[Tmp0]])
-// CHECK: [[Arg1Val:%.*]] = load i32, ptr [[Tmp1]]
+// CHECK: call void {{.*}}order_matters{{.*}}(ptr noalias noundef nonnull align 4 dereferenceable(4) [[Tmp0]], ptr noalias noundef nonnull align 4 dereferenceable(4) [[Tmp1]])
+// CHECK: [[Arg1Val:%.*]] = load i32, ptr [[Tmp0]]
// CHECK: store i32 [[Arg1Val]], ptr [[V]]
-// CHECK: [[Arg2Val:%.*]] = load i32, ptr [[Tmp0]]
+// CHECK: [[Arg2Val:%.*]] = load i32, ptr [[Tmp1]]
// CHECK: store i32 [[Arg2Val]], ptr [[V]]
// OPT: ret i32 2
@@ -289,17 +289,19 @@ void setFour(inout int I) {
// CHECK: [[B:%.*]] = alloca %struct.B
// CHECK: [[Tmp:%.*]] = alloca i32
-// CHECK: [[BFLoad:%.*]] = load i32, ptr [[B]]
-// CHECK: [[BFshl:%.*]] = shl i32 [[BFLoad]], 24
-// CHECK: [[BFashr:%.*]] = ashr i32 [[BFshl]], 24
-// CHECK: store i32 [[BFashr]], ptr [[Tmp]]
+// CHECK: [[BFLoad:%.*]] = load i16, ptr [[B]]
+// CHECK: [[BFshl:%.*]] = shl i16 [[BFLoad]], 8
+// CHECK: [[BFashr:%.*]] = ashr i16 [[BFshl]], 8
+// CHECK: [[BFcast:%.*]] = sext i16 [[BFashr]] to i32
+// CHECK: store i32 [[BFcast]], ptr [[Tmp]]
// CHECK: call void {{.*}}setFour{{.*}}(ptr noalias noundef nonnull align 4 dereferenceable(4) [[Tmp]])
// CHECK: [[RetVal:%.*]] = load i32, ptr [[Tmp]]
-// CHECK: [[BFLoad:%.*]] = load i32, ptr [[B]]
-// CHECK: [[BFValue:%.*]] = and i32 [[RetVal]], 255
-// CHECK: [[ZerodField:%.*]] = and i32 [[BFLoad]], -256
-// CHECK: [[BFSet:%.*]] = or i32 [[ZerodField]], [[BFValue]]
-// CHECK: store i32 [[BFSet]], ptr [[B]]
+// CHECK: [[TruncVal:%.*]] = trunc i32 [[RetVal]] to i16
+// CHECK: [[BFLoad:%.*]] = load i16, ptr [[B]]
+// CHECK: [[BFValue:%.*]] = and i16 [[TruncVal]], 255
+// CHECK: [[ZerodField:%.*]] = and i16 [[BFLoad]], -256
+// CHECK: [[BFSet:%.*]] = or i16 [[ZerodField]], [[BFValue]]
+// CHECK: store i16 [[BFSet]], ptr [[B]]
// OPT: ret i32 8
export int case11() {
diff --git a/clang/test/CodeGenHLSL/GlobalConstructorFunction.hlsl b/clang/test/CodeGenHLSL/GlobalConstructorFunction.hlsl
index b39311ad67cd62..c0eb1b138ed047 100644
--- a/clang/test/CodeGenHLSL/GlobalConstructorFunction.hlsl
+++ b/clang/test/CodeGenHLSL/GlobalConstructorFunction.hlsl
@@ -25,11 +25,11 @@ void main(unsigned GI : SV_GroupIndex) {}
// CHECK: define void @main()
// CHECK-NEXT: entry:
// Verify function constructors are emitted
-// NOINLINE-NEXT: call void @"?call_me_first@@YAXXZ"()
-// NOINLINE-NEXT: call void @"?then_call_me@@YAXXZ"()
+// NOINLINE-NEXT: call void @_Z13call_me_firstv()
+// NOINLINE-NEXT: call void @_Z12then_call_mev()
// NOINLINE-NEXT: %0 = call i32 @llvm.dx.flattened.thread.id.in.group()
-// NOINLINE-NEXT: call void @"?main@@YAXI@Z"(i32 %0)
-// NOINLINE-NEXT: call void @"?call_me_last@@YAXXZ"(
+// NOINLINE-NEXT: call void @_Z4mainj(i32 %0)
+// NOINLINE-NEXT: call void @_Z12call_me_lastv(
// NOINLINE-NEXT: ret void
// Verify constructor calls are inlined when AlwaysInline is run
diff --git a/clang/test/CodeGenHLSL/GlobalConstructorLib.hlsl b/clang/test/CodeGenHLSL/GlobalConstructorLib.hlsl
index 78f6475462bc47..09c44f6242c53c 100644
--- a/clang/test/CodeGenHLSL/GlobalConstructorLib.hlsl
+++ b/clang/test/CodeGenHLSL/GlobalConstructorLib.hlsl
@@ -13,7 +13,7 @@ void FirstEntry() {}
// CHECK: define void @FirstEntry()
// CHECK-NEXT: entry:
// NOINLINE-NEXT: call void @_GLOBAL__sub_I_GlobalConstructorLib.hlsl()
-// NOINLINE-NEXT: call void @"?FirstEntry@@YAXXZ"()
+// NOINLINE-NEXT: call void @_Z10FirstEntryv()
// Verify inlining leaves only calls to "llvm." intrinsics
// INLINE-NOT: call {{[^@]*}} @{{[^l][^l][^v][^m][^\.]}}
// CHECK: ret void
@@ -25,7 +25,7 @@ void SecondEntry() {}
// CHECK: define void @SecondEntry()
// CHECK-NEXT: entry:
// NOINLINE-NEXT: call void @_GLOBAL__sub_I_GlobalConstructorLib.hlsl()
-// NOINLINE-NEXT: call void @"?SecondEntry@@YAXXZ"()
+// NOINLINE-NEXT: call void @_Z11SecondEntryv()
// Verify inlining leaves only calls to "llvm." intrinsics
// INLINE-NOT: call {{[^@]*}} @{{[^l][^l][^v][^m][^\.]}}
// CHECK: ret void
@@ -33,6 +33,10 @@ void SecondEntry() {}
// Verify the constructor is alwaysinline
// NOINLINE: ; Function Attrs: {{.*}}alwaysinline
-// NOINLINE-NEXT: define internal void @_GLOBAL__sub_I_GlobalConstructorLib.hlsl() [[IntAttr:\#[0-9]+]]
+// NOINLINE-NEXT: define linkonce_odr void @_ZN4hlsl8RWBufferIfEC2Ev({{.*}} [[CtorAttr:\#[0-9]+]]
-// NOINLINE: attributes [[IntAttr]] = {{.*}} alwaysinline
+// NOINLINE: ; Function Attrs: {{.*}}alwaysinline
+// NOINLINE-NEXT: define internal void @_GLOBAL__sub_I_GlobalConstructorLib.hlsl() [[InitAttr:\#[0-9]+]]
+
+// NOINLINE-DAG: attributes [[InitAttr]] = {{.*}} alwaysinline
+// NOINLINE-DAG: attributes [[CtorAttr]] = {{.*}} alwaysinline
diff --git a/clang/test/CodeGenHLSL/GlobalConstructors.hlsl b/clang/test/CodeGenHLSL/GlobalConstructors.hlsl
index 7e2f288726c954..7b26dba0d19010 100644
--- a/clang/test/CodeGenHLSL/GlobalConstructors.hlsl
+++ b/clang/test/CodeGenHLSL/GlobalConstructors.hlsl
@@ -12,5 +12,5 @@ void main(unsigned GI : SV_GroupIndex) {}
//CHECK-NEXT: entry:
//CHECK-NEXT: call void @_GLOBAL__sub_I_GlobalConstructors.hlsl()
//CHECK-NEXT: %0 = call i32 @llvm.dx.flattened.thread.id.in.group()
-//CHECK-NEXT: call void @"?main@@YAXI@Z"(i32 %0)
+//CHECK-NEXT: call void @_Z4mainj(i32 %0)
//CHECK-NEXT: ret void
diff --git a/clang/test/CodeGenHLSL/GlobalDestructors.hlsl b/clang/test/CodeGenHLSL/GlobalDestructors.hlsl
index ea28354222f885..f98318601134bb 100644
--- a/clang/test/CodeGenHLSL/GlobalDestructors.hlsl
+++ b/clang/test/CodeGenHLSL/GlobalDestructors.hlsl
@@ -1,7 +1,7 @@
-// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -std=hlsl202x -emit-llvm -disable-llvm-passes %s -o - | FileCheck %s --check-prefixes=CS,NOINLINE,CHECK
-// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -std=hlsl202x -emit-llvm -disable-llvm-passes %s -o - | FileCheck %s --check-prefixes=LIB,NOINLINE,CHECK
-// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -std=hlsl202x -emit-llvm -O0 %s -o - | FileCheck %s --check-prefixes=INLINE,CHECK
-// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -std=hlsl202x -emit-llvm -O0 %s -o - | FileCheck %s --check-prefixes=INLINE,CHECK
+// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -emit-llvm -disable-llvm-passes %s -o - | FileCheck %s --check-prefixes=CS,NOINLINE,CHECK
+// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -emit-llvm -disable-llvm-passes %s -o - | FileCheck %s --check-prefixes=LIB,NOINLINE,CHECK
+// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -emit-llvm -O0 %s -o - | FileCheck %s --check-prefixes=INLINE,CHECK
+// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -emit-llvm -O0 %s -o - | FileCheck %s --check-prefixes=INLINE,CHECK
// Tests that constructors and destructors are appropriately generated for globals
// and that their calls are inlined when AlwaysInline is run
@@ -59,7 +59,7 @@ void main(unsigned GI : SV_GroupIndex) {
// Verify destructor is emitted
// NOINLINE-NEXT: call void @_GLOBAL__sub_I_GlobalDestructors.hlsl()
// NOINLINE-NEXT: %0 = call i32 @llvm.dx.flattened.thread.id.in.group()
-// NOINLINE-NEXT: call void @"?main@@YAXI@Z"(i32 %0)
+// NOINLINE-NEXT: call void @_Z4mainj(i32 %0)
// NOINLINE-NEXT: call void @_GLOBAL__D_a()
// NOINLINE-NEXT: ret void
// Verify inlining leaves only calls to "llvm." intrinsics
@@ -71,8 +71,8 @@ void main(unsigned GI : SV_GroupIndex) {
// NOINLINE: define internal void @_GLOBAL__D_a() [[IntAttr:\#[0-9]+]]
// NOINLINE-NEXT: entry:
-// NOINLINE-NEXT: call void @"??1Tail@@QAA@XZ"(ptr @"?T@?1??Wag@@YAXXZ@4UTail@@A")
-// NOINLINE-NEXT: call void @"??1Pupper@@QAA@XZ"(ptr @"?GlobalPup@@3UPupper@@A")
+// NOINLINE-NEXT: call void @_ZN4TailD1Ev(ptr @_ZZ3WagvE1T)
+// NOINLINE-NEXT: call void @_ZN6PupperD1Ev(ptr @GlobalPup)
// NOINLINE-NEXT: ret void
// NOINLINE: attributes [[IntAttr]] = {{.*}} alwaysinline
diff --git a/clang/test/CodeGenHLSL/basic_types.hlsl b/clang/test/CodeGenHLSL/basic_types.hlsl
index 15c963dfa666f4..d987af45a649fb 100644
--- a/clang/test/CodeGenHLSL/basic_types.hlsl
+++ b/clang/test/CodeGenHLSL/basic_types.hlsl
@@ -6,38 +6,38 @@
// RUN: -emit-llvm -disable-llvm-passes -o - -DNAMESPACED| FileCheck %s
-// CHECK:"?uint16_t_Val@@3GA" = global i16 0, align 2
-// CHECK:"?int16_t_Val@@3FA" = global i16 0, align 2
-// CHECK:"?uint_Val@@3IA" = global i32 0, align 4
-// CHECK:"?uint64_t_Val@@3KA" = global i64 0, align 8
-// CHECK:"?int64_t_Val@@3JA" = global i64 0, align 8
-// CHECK:"?int16_t2_Val@@3T?$__vector@F$01@__clang@@A" = global <2 x i16> zeroinitializer, align 4
-// CHECK:"?int16_t3_Val@@3T?$__vector@F$02@__clang@@A" = global <3 x i16> zeroinitializer, align 8
-// CHECK:"?int16_t4_Val@@3T?$__vector@F$03@__clang@@A" = global <4 x i16> zeroinitializer, align 8
-// CHECK:"?uint16_t2_Val@@3T?$__vector@G$01@__clang@@A" = global <2 x i16> zeroinitializer, align 4
-// CHECK:"?uint16_t3_Val@@3T?$__vector@G$02@__clang@@A" = global <3 x i16> zeroinitializer, align 8
-// CHECK:"?uint16_t4_Val@@3T?$__vector@G$03@__clang@@A" = global <4 x i16> zeroinitializer, align 8
-// CHECK:"?int2_Val@@3T?$__vector@H$01@__clang@@A" = global <2 x i32> zeroinitializer, align 8
-// CHECK:"?int3_Val@@3T?$__vector@H$02@__clang@@A" = global <3 x i32> zeroinitializer, align 16
-// CHECK:"?int4_Val@@3T?$__vector@H$03@__clang@@A" = global <4 x i32> zeroinitializer, align 16
-// CHECK:"?uint2_Val@@3T?$__vector@I$01@__clang@@A" = global <2 x i32> zeroinitializer, align 8
-// CHECK:"?uint3_Val@@3T?$__vector@I$02@__clang@@A" = global <3 x i32> zeroinitializer, align 16
-// CHECK:"?uint4_Val@@3T?$__vector@I$03@__clang@@A" = global <4 x i32> zeroinitializer, align 16
-// CHECK:"?int64_t2_Val@@3T?$__vector@J$01@__clang@@A" = global <2 x i64> zeroinitializer, align 16
-// CHECK:"?int64_t3_Val@@3T?$__vector@J$02@__clang@@A" = global <3 x i64> zeroinitializer, align 32
-// CHECK:"?int64_t4_Val@@3T?$__vector@J$03@__clang@@A" = global <4 x i64> zeroinitializer, align 32
-// CHECK:"?uint64_t2_Val@@3T?$__vector@K$01@__clang@@A" = global <2 x i64> zeroinitializer, align 16
-// CHECK:"?uint64_t3_Val@@3T?$__vector@K$02@__clang@@A" = global <3 x i64> zeroinitializer, align 32
-// CHECK:"?uint64_t4_Val@@3T?$__vector@K$03@__clang@@A" = global <4 x i64> zeroinitializer, align 32
-// CHECK:"?half2_Val@@3T?$__vector@$f16@$01@__clang@@A" = global <2 x half> zeroinitializer, align 4
-// CHECK:"?half3_Val@@3T?$__vector@$f16@$02@__clang@@A" = global <3 x half> zeroinitializer, align 8
-// CHECK:"?half4_Val@@3T?$__vector@$f16@$03@__clang@@A" = global <4 x half> zeroinitializer, align 8
-// CHECK:"?float2_Val@@3T?$__vector@M$01@__clang@@A" = global <2 x float> zeroinitializer, align 8
-// CHECK:"?float3_Val@@3T?$__vector@M$02@__clang@@A" = global <3 x float> zeroinitializer, align 16
-// CHECK:"?float4_Val@@3T?$__vector@M$03@__clang@@A" = global <4 x float> zeroinitializer, align 16
-// CHECK:"?double2_Val@@3T?$__vector@N$01@__clang@@A" = global <2 x double> zeroinitializer, align 16
-// CHECK:"?double3_Val@@3T?$__vector@N$02@__clang@@A" = global <3 x double> zeroinitializer, align 32
-// CHECK:"?double4_Val@@3T?$__vector@N$03@__clang@@A" = global <4 x double> zeroinitializer, align 32
+// CHECK: @uint16_t_Val = global i16 0, align 2
+// CHECK: @int16_t_Val = global i16 0, align 2
+// CHECK: @uint_Val = global i32 0, align 4
+// CHECK: @uint64_t_Val = global i64 0, align 8
+// CHECK: @int64_t_Val = global i64 0, align 8
+// CHECK: @int16_t2_Val = global <2 x i16> zeroinitializer, align 4
+// CHECK: @int16_t3_Val = global <3 x i16> zeroinitializer, align 8
+// CHECK: @int16_t4_Val = global <4 x i16> zeroinitializer, align 8
+// CHECK: @uint16_t2_Val = global <2 x i16> zeroinitializer, align 4
+// CHECK: @uint16_t3_Val = global <3 x i16> zeroinitializer, align 8
+// CHECK: @uint16_t4_Val = global <4 x i16> zeroinitializer, align 8
+// CHECK: @int2_Val = global <2 x i32> zeroinitializer, align 8
+// CHECK: @int3_Val = global <3 x i32> zeroinitializer, align 16
+// CHECK: @int4_Val = global <4 x i32> zeroinitializer, align 16
+// CHECK: @uint2_Val = global <2 x i32> zeroinitializer, align 8
+// CHECK: @uint3_Val = global <3 x i32> zeroinitializer, align 16
+// CHECK: @uint4_Val = global <4 x i32> zeroinitializer, align 16
+// CHECK: @int64_t2_Val = global <2 x i64> zeroinitializer, align 16
+// CHECK: @int64_t3_Val = global <3 x i64> zeroinitializer, align 32
+// CHECK: @int64_t4_Val = global <4 x i64> zeroinitializer, align 32
+// CHECK: @uint64_t2_Val = global <2 x i64> zeroinitializer, align 16
+// CHECK: @uint64_t3_Val = global <3 x i64> zeroinitializer, align 32
+// CHECK: @uint64_t4_Val = global <4 x i64> zeroinitializer, align 32
+// CHECK: @half2_Val = global <2 x half> zeroinitializer, align 4
+// CHECK: @half3_Val = global <3 x half> zeroinitializer, align 8
+// CHECK: @half4_Val = global <4 x half> zeroinitializer, align 8
+// CHECK: @float2_Val = global <2 x float> zeroinitializer, align 8
+// CHECK: @float3_Val = global <3 x float> zeroinitializer, align 16
+// CHECK: @float4_Val = global <4 x float> zeroinitializer, align 16
+// CHECK: @double2_Val = global <2 x double> zeroinitializer, align 16
+// CHECK: @double3_Val = global <3 x double> zeroinitializer, align 32
+// CHECK: @double4_Val = global <4 x double> zeroinitializer, align 32
#ifdef NAMESPACED
#define TYPE_DECL(T) hlsl::T T##_Val
diff --git a/clang/test/CodeGenHLSL/builtins/RWBuffer-annotations.hlsl b/clang/test/CodeGenHLSL/builtins/RWBuffer-annotations.hlsl
index 7ca78e60fb9c59..e1e047485e4df0 100644
--- a/clang/test/CodeGenHLSL/builtins/RWBuffer-annotations.hlsl
+++ b/clang/test/CodeGenHLSL/builtins/RWBuffer-annotations.hlsl
@@ -16,9 +16,9 @@ void main() {
}
// CHECK: !hlsl.uavs = !{![[Single:[0-9]+]], ![[Array:[0-9]+]], ![[SingleAllocated:[0-9]+]], ![[ArrayAllocated:[0-9]+]], ![[SingleSpace:[0-9]+]], ![[ArraySpace:[0-9]+]]}
-// CHECK-DAG: ![[Single]] = !{ptr @"?Buffer1@@3V?$RWBuffer@M@hlsl@@A", i32 10, i32 9, i1 false, i32 -1, i32 0}
-// CHECK-DAG: ![[Array]] = !{ptr @"?BufferArray@@3PAV?$RWBuffer@T?$__vector@M$03@__clang@@@hlsl@@A", i32 10, i32 9, i1 false, i32 -1, i32 0}
-// CHECK-DAG: ![[SingleAllocated]] = !{ptr @"?Buffer2@@3V?$RWBuffer@M@hlsl@@A", i32 10, i32 9, i1 false, i32 3, i32 0}
-// CHECK-DAG: ![[ArrayAllocated]] = !{ptr @"?BufferArray2@@3PAV?$RWBuffer@T?$__vector@M$03@__clang@@@hlsl@@A", i32 10, i32 9, i1 false, i32 4, i32 0}
-// CHECK-DAG: ![[SingleSpace]] = !{ptr @"?Buffer3@@3V?$RWBuffer@M@hlsl@@A", i32 10, i32 9, i1 false, i32 3, i32 1}
-// CHECK-DAG: ![[ArraySpace]] = !{ptr @"?BufferArray3@@3PAV?$RWBuffer@T?$__vector@M$03@__clang@@@hlsl@@A", i32 10, i32 9, i1 false, i32 4, i32 1}
+// CHECK-DAG: ![[Single]] = !{ptr @Buffer1, i32 10, i32 9, i1 false, i32 -1, i32 0}
+// CHECK-DAG: ![[Array]] = !{ptr @BufferArray, i32 10, i32 9, i1 false, i32 -1, i32 0}
+// CHECK-DAG: ![[SingleAllocated]] = !{ptr @Buffer2, i32 10, i32 9, i1 false, i32 3, i32 0}
+// CHECK-DAG: ![[ArrayAllocated]] = !{ptr @BufferArray2, i32 10, i32 9, i1 false, i32 4, i32 0}
+// CHECK-DAG: ![[SingleSpace]] = !{ptr @Buffer3, i32 10, i32 9, i1 false, i32 3, i32 1}
+// CHECK-DAG: ![[ArraySpace]] = !{ptr @BufferArray3, i32 10, i32 9, i1 false, i32 4, i32 1}
diff --git a/clang/test/CodeGenHLSL/builtins/RWBuffer-elementtype.hlsl b/clang/test/CodeGenHLSL/builtins/RWBuffer-elementtype.hlsl
index 036c9c28ef2779..eca4f1598fd658 100644
--- a/clang/test/CodeGenHLSL/builtins/RWBuffer-elementtype.hlsl
+++ b/clang/test/CodeGenHLSL/builtins/RWBuffer-elementtype.hlsl
@@ -37,16 +37,16 @@ void main(int GI : SV_GroupIndex) {
BufF32x3[GI] = 0;
}
-// CHECK: !{{[0-9]+}} = !{ptr @"?BufI16@@3V?$RWBuffer@F@hlsl@@A", i32 10, i32 2,
-// CHECK: !{{[0-9]+}} = !{ptr @"?BufU16@@3V?$RWBuffer@G@hlsl@@A", i32 10, i32 3,
-// CHECK: !{{[0-9]+}} = !{ptr @"?BufI32@@3V?$RWBuffer@H@hlsl@@A", i32 10, i32 4,
-// CHECK: !{{[0-9]+}} = !{ptr @"?BufU32@@3V?$RWBuffer@I@hlsl@@A", i32 10, i32 5,
-// CHECK: !{{[0-9]+}} = !{ptr @"?BufI64@@3V?$RWBuffer@J@hlsl@@A", i32 10, i32 6,
-// CHECK: !{{[0-9]+}} = !{ptr @"?BufU64@@3V?$RWBuffer@K@hlsl@@A", i32 10, i32 7,
-// CHECK: !{{[0-9]+}} = !{ptr @"?BufF16@@3V?$RWBuffer@$f16@@hlsl@@A", i32 10, i32 8,
-// CHECK: !{{[0-9]+}} = !{ptr @"?BufF32@@3V?$RWBuffer@M@hlsl@@A", i32 10, i32 9,
-// CHECK: !{{[0-9]+}} = !{ptr @"?BufF64@@3V?$RWBuffer@N@hlsl@@A", i32 10, i32 10,
-// CHECK: !{{[0-9]+}} = !{ptr @"?BufI16x4@@3V?$RWBuffer@T?$__vector@F$03@__clang@@@hlsl@@A", i32 10, i32 2,
-// CHECK: !{{[0-9]+}} = !{ptr @"?BufU32x3@@3V?$R...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments to explain the changes that aren't just mangling style changes.
|
||
// CHECK-LABEL: define noundef i64 @_Z6shru64mm(i64 noundef %V, i64 noundef %S) #0 { | ||
// CHECK-DAG: %[[Masked:.*]] = and i64 %{{.*}}, 63 | ||
// CHECK-DAG: %{{.*}} = lshr i64 %{{.*}}, %[[Masked]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all incidental, I just thought we should test some logical shifts.
// CHECK: store i32 [[Arg1Val]], ptr [[V]] | ||
// CHECK: [[Arg2Val:%.*]] = load i32, ptr [[Tmp0]] | ||
// CHECK: [[Arg2Val:%.*]] = load i32, ptr [[Tmp1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the new order of copy in copy out is reflected
// CHECK: [[BFValue:%.*]] = and i16 [[TruncVal]], 255 | ||
// CHECK: [[ZerodField:%.*]] = and i16 [[BFLoad]], -256 | ||
// CHECK: [[BFSet:%.*]] = or i16 [[ZerodField]], [[BFValue]] | ||
// CHECK: store i16 [[BFSet]], ptr [[B]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an instance of using smaller types for bitfields. Doing so introduced some additional commands.
|
||
using hlsl::abs; | ||
|
||
#ifdef __HLSL_ENABLE_16_BIT | ||
// NATIVE_HALF: define noundef i16 @ | ||
// NATIVE_HALF-LABEL: define noundef i16 @_Z16test_abs_int16_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a lot of inconsistency in these sorts of checks. Some included more and some less. I tried to include enough to unambiguously identify the function in question.
// CHECK-NEXT: store ptr {{.*}}, ptr [[ThisPtrAddr]] | ||
// CHECK-NEXT: [[ThisPtr:%.*]] = load ptr, ptr [[ThisPtrAddr]] | ||
// CHECK-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr align 4 [[ThisPtr]], ptr align 4 [[Obj:%.*]], i32 8, i1 false) | ||
// CHECK-NEXT: [[FirstAddr:%.*]] = getelementptr inbounds nuw %struct.Pair, ptr [[ThisPtr]], i32 0, i32 0 | ||
// CHECK-NEXT: [[First:%.*]] = load i32, ptr [[FirstAddr]] | ||
// CHECK-NEXT: [[FirstPlusTwo:%.*]] = add nsw i32 [[First]], 2 | ||
// CHECK-NEXT: store i32 [[FirstPlusTwo]], ptr [[FirstAddr]] | ||
// CHECK-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr align 4 [[AggRes]], ptr align 4 [[Obj]], i32 8, i1 false) | ||
// CHECK-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr align 4 {{.*}}, ptr align 4 [[Obj]], i32 8, i1 false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't explain exactly why the ResPtr
alloca was removed, but it was clearly unnecessary. The wildcard here represents the same location as was previously captured in AggRes
and copied into ResPtr
, but then ResPtr
was never used.
// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -emit-llvm -disable-llvm-passes %s -o - | FileCheck %s --check-prefixes=CS,NOINLINE,CHECK | ||
// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -emit-llvm -disable-llvm-passes %s -o - | FileCheck %s --check-prefixes=LIB,NOINLINE,CHECK | ||
// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.0-compute -emit-llvm -O0 %s -o - | FileCheck %s --check-prefixes=INLINE,CHECK | ||
// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.3-library -emit-llvm -O0 %s -o - | FileCheck %s --check-prefixes=INLINE,CHECK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hlsl202x is the default now, so this flag is redundant.
// RUN: FileCheck %s --check-prefixes=CHECK,NATIVE_HALF | ||
// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.3-library %s \ | ||
// RUN: -emit-llvm -disable-llvm-passes -o - | \ | ||
// RUN: FileCheck %s --check-prefixes=CHECK,NO_HALF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file extension already has the same effect as the flag -x hlsl
// CHECK-NEXT: = or i32 [[Tmp2]], 1 | ||
// CHECK: init.check: | ||
// CHECK-NEXT: call void @_ZN4hlsl8RWBufferIiEC1Ev | ||
// CHECK-NEXT: store i8 1, ptr @_ZGVZ4mainvE5mybuf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes represent a different, but equivalent way of protecting the one-time initialization of a static local variable
To consolidate behavior of function mangling and limit the number of places that ABI changes will need to be made, this switches the DirectX target used for HLSL to use the Itanium ABI from the Microsoft ABI. The Itanium ABI has greater flexibility in decisions regarding mangling of new types of which we have more than a few yet to add.
One effect of this will be that linking library shaders compiled with DXC will not be possible with shaders compiled with clang. That isn't considered a terribly interesting use case and one that would likely have been onerous to maintain anyway.
This involved adding a function to call all global destructors as the Microsoft ABI had done.
This requires a few changes to tests. Most notably the mangling style has changed which accounts for most of the changes. In making those changes, I took the opportunity to harmonize some very similar tests for greater consistency. I also shaved off some unneeded run flags that had probably been copied over from one test to another.
Other changes effected by using the new ABI include using different types when manipulating smaller bitfields, eliminating an unnecessary alloca in one instance in this-assignment.hlsl, changing the way static local initialization is guarded, and changing the order of inout parameters getting copied in and out. That last is a subtle change in functionality, but one where there was sufficient inconsistency in the past that standardizing is important, but the particular direction of the standardization is less important for the sake of existing shaders.
fixes #110736