Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][Transforms][IPO] Add func suffix in ArgumentPromotion and DeadArgumentElimination #109899

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yonghong-song
Copy link
Contributor

@yonghong-song yonghong-song commented Sep 25, 2024

The goal is to add suffix to Argument Promotion and Dead Argument Elimination passes. So users will know that function signature get changed. One of motivation is to help kernel tracing with bpf technology.
Previous patch is [1] and it is reverted due to some test failures. This patch fixed a test failure on top of [1].

There are some concerns about func suffix may impact sample based profiling. I did some experiments and show that this is not the case. The sample profiling gets func name from dwarf and those func names in dwarf does not have suffixes added by this patch and sample profiling works fine with this patch.

For details of the description for the patch, see [1].

[1] #105742 for details

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 25, 2024

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-lto
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-function-specialization

Author: None (yonghong-song)

Changes

The goal is to add suffix to Argument Promotion and Dead Argument Elimination passes. So users will know that function signature get changed. One of motivation is to help kernel tracing with bpf technology.
Previous patch is [1] and it is reverted due to some test failures. This patch fixed a test failure on top of [1].

For details of the description for the patch, see [1].

[1] #105742 for details


Patch is 123.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109899.diff

80 Files Affected:

  • (modified) compiler-rt/test/cfi/stats.cpp (+2-2)
  • (modified) llvm/lib/Transforms/IPO/ArgumentPromotion.cpp (+1)
  • (modified) llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp (+4)
  • (modified) llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll (+2-2)
  • (modified) llvm/test/BugPoint/remove_arguments_test.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/arg_promotion.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/internalize.ll (+1-1)
  • (modified) llvm/test/ThinLTO/X86/memprof-aliased-location1.ll (+12-12)
  • (modified) llvm/test/ThinLTO/X86/memprof-aliased-location2.ll (+12-12)
  • (modified) llvm/test/ThinLTO/X86/memprof-basic.ll (+18-1)
  • (modified) llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll (+13-1)
  • (modified) llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll (+18-1)
  • (modified) llvm/test/ThinLTO/X86/memprof-indirectcall.ll (+14-1)
  • (modified) llvm/test/ThinLTO/X86/memprof-inlined.ll (+14-1)
  • (modified) llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/BPF/argpromotion.ll (+1-1)
  • (modified) llvm/test/Transforms/ArgumentPromotion/X86/attributes.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/X86/min-legal-vector-width.ll (+16-16)
  • (modified) llvm/test/Transforms/ArgumentPromotion/X86/thiscall.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/actual-arguments.ll (+5-5)
  • (modified) llvm/test/Transforms/ArgumentPromotion/aggregate-promote-dead-gep.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/aggregate-promote.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/align.ll (+8-8)
  • (modified) llvm/test/Transforms/ArgumentPromotion/allocsize.ll (+8-8)
  • (modified) llvm/test/Transforms/ArgumentPromotion/attrs.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/basictest.ll (+4-4)
  • (modified) llvm/test/Transforms/ArgumentPromotion/bitcasts.ll (+4-4)
  • (modified) llvm/test/Transforms/ArgumentPromotion/byval-2.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/byval-with-padding.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/byval.ll (+10-10)
  • (modified) llvm/test/Transforms/ArgumentPromotion/chained.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/control-flow2.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/crash.ll (+1-1)
  • (modified) llvm/test/Transforms/ArgumentPromotion/dbg.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/fp80.ll (+6-6)
  • (modified) llvm/test/Transforms/ArgumentPromotion/inalloca.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/invalidation.ll (+3-3)
  • (modified) llvm/test/Transforms/ArgumentPromotion/load-alignment-value-overflows-addrspace-size.ll (+4-4)
  • (modified) llvm/test/Transforms/ArgumentPromotion/max-elements-limit.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/metadata.ll (+4-4)
  • (modified) llvm/test/Transforms/ArgumentPromotion/min-legal-vector-width.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/nonzero-address-spaces.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/opaque-ptr.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/pr27568.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/pr32917.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/pr33641_remove_arg_dbgvalue.ll (+1-1)
  • (modified) llvm/test/Transforms/ArgumentPromotion/profile.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/propagate-remove-dead-args.ll (+9-9)
  • (modified) llvm/test/Transforms/ArgumentPromotion/recursion/aggregate-promote-recursive.ll (+3-3)
  • (modified) llvm/test/Transforms/ArgumentPromotion/recursion/argpromotion-recursion-pr1259.ll (+4-4)
  • (modified) llvm/test/Transforms/ArgumentPromotion/recursion/recursion-mixed-calls.ll (+6-6)
  • (modified) llvm/test/Transforms/ArgumentPromotion/recursion/recursion-non-zero-offset.ll (+4-4)
  • (modified) llvm/test/Transforms/ArgumentPromotion/reserve-tbaa.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/sret.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/store-into-inself.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/unused-argument.ll (+4-4)
  • (modified) llvm/test/Transforms/Attributor/reduced/clear_cached_analysis_for_deleted_functions.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/2007-02-07-FuncRename.ll (+1-1)
  • (modified) llvm/test/Transforms/DeadArgElim/2007-12-20-ParamAttrs.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/2010-04-30-DbgInfo.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/aggregates.ll (+5-5)
  • (modified) llvm/test/Transforms/DeadArgElim/call_profile.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/comdat.ll (+1-1)
  • (modified) llvm/test/Transforms/DeadArgElim/dbginfo-update-dbgval-local.ll (+3-3)
  • (modified) llvm/test/Transforms/DeadArgElim/dbginfo.ll (+1-1)
  • (modified) llvm/test/Transforms/DeadArgElim/deadretval.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/fct_ptr.ll (+1-1)
  • (modified) llvm/test/Transforms/DeadArgElim/func_metadata.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/funclet.ll (+1-1)
  • (modified) llvm/test/Transforms/DeadArgElim/keepalive.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/nonzero-address-spaces.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/returned.ll (+5-5)
  • (modified) llvm/test/Transforms/DeadArgElim/variadic_safety.ll (+1-1)
  • (modified) llvm/test/Transforms/FunctionSpecialization/function-specialization2.ll (+6-6)
  • (modified) llvm/test/Transforms/FunctionSpecialization/global-var-constants.ll (+7-7)
  • (modified) llvm/test/Transforms/FunctionSpecialization/non-argument-tracked.ll (+12-12)
  • (modified) llvm/test/Transforms/FunctionSpecialization/specialization-order.ll (+6-6)
  • (modified) llvm/test/Transforms/PhaseOrdering/dae-dce.ll (+4-2)
  • (modified) llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll (+2-2)
  • (modified) llvm/test/Transforms/SCCP/recursion.ll (+3-3)
diff --git a/compiler-rt/test/cfi/stats.cpp b/compiler-rt/test/cfi/stats.cpp
index ca6b3bf0df4814..9c4900e86129aa 100644
--- a/compiler-rt/test/cfi/stats.cpp
+++ b/compiler-rt/test/cfi/stats.cpp
@@ -26,12 +26,12 @@ extern "C" __attribute__((noinline)) void nvcall(A *a) {
 }
 
 extern "C" __attribute__((noinline)) A *dcast(A *a) {
-  // CHECK: stats.cpp:[[@LINE+1]] {{_?}}dcast cfi-derived-cast 24
+  // CHECK: stats.cpp:[[@LINE+1]] {{_?}}dcast.retelim cfi-derived-cast 24
   return (A *)(ABase *)a;
 }
 
 extern "C" __attribute__((noinline)) A *ucast(A *a) {
-  // CHECK: stats.cpp:[[@LINE+1]] {{_?}}ucast cfi-unrelated-cast 81
+  // CHECK: stats.cpp:[[@LINE+1]] {{_?}}ucast.retelim cfi-unrelated-cast 81
   return (A *)(char *)a;
 }
 
diff --git a/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp b/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
index 1f9b546ed29996..c8b75dd475ae44 100644
--- a/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
+++ b/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
@@ -215,6 +215,7 @@ doPromotion(Function *F, FunctionAnalysisManager &FAM,
 
   F->getParent()->getFunctionList().insert(F->getIterator(), NF);
   NF->takeName(F);
+  NF->setName(NF->getName() + ".argprom");
 
   // Loop over all the callers of the function, transforming the call sites to
   // pass in the loaded pointers.
diff --git a/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp b/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
index d1548592b1ce26..b912cc66d19db5 100644
--- a/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
+++ b/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
@@ -889,6 +889,10 @@ bool DeadArgumentEliminationPass::removeDeadStuffFromFunction(Function *F) {
   // it again.
   F->getParent()->getFunctionList().insert(F->getIterator(), NF);
   NF->takeName(F);
+  if (NumArgumentsEliminated)
+    NF->setName(NF->getName() + ".argelim");
+  else
+    NF->setName(NF->getName() + ".retelim");
   NF->IsNewDbgInfoFormat = F->IsNewDbgInfoFormat;
 
   // Loop over all the callers of the function, transforming the call sites to
diff --git a/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll b/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll
index 2bc486f541c71f..4f16c02b1473ff 100644
--- a/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll
+++ b/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll
@@ -9,7 +9,7 @@ define internal void @a() alwaysinline {
 }
 
 define internal void @b(ptr) noinline {
-; CHECK-LABEL: @b(
+; CHECK-LABEL: @b.argprom(
 ; CHECK-NEXT:    ret void
 ;
   ret void
@@ -17,7 +17,7 @@ define internal void @b(ptr) noinline {
 
 define internal void @c() noinline {
 ; CHECK-LABEL: @c(
-; CHECK-NEXT:    call void @b()
+; CHECK-NEXT:    call void @b.argprom()
 ; CHECK-NEXT:    ret void
 ;
   call void @b(ptr @a)
diff --git a/llvm/test/BugPoint/remove_arguments_test.ll b/llvm/test/BugPoint/remove_arguments_test.ll
index 9e9c51eaafc383..bb93e45e4b46ef 100644
--- a/llvm/test/BugPoint/remove_arguments_test.ll
+++ b/llvm/test/BugPoint/remove_arguments_test.ll
@@ -11,7 +11,7 @@
 
 declare i32 @test2()
 
-; CHECK: define void @test() {
+; CHECK: define void @test.argelim() {
 define i32 @test(i32 %A, ptr %B, float %C) {
 	call i32 @test2()
 	ret i32 %1
diff --git a/llvm/test/CodeGen/AArch64/arg_promotion.ll b/llvm/test/CodeGen/AArch64/arg_promotion.ll
index cc37d230c6cbe4..724a7f109f1e29 100644
--- a/llvm/test/CodeGen/AArch64/arg_promotion.ll
+++ b/llvm/test/CodeGen/AArch64/arg_promotion.ll
@@ -38,16 +38,16 @@ define dso_local void @caller_4xi32(ptr noalias %src, ptr noalias %dst) #1 {
 ; CHECK-LABEL: define dso_local void @caller_4xi32(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SRC_VAL:%.*]] = load <4 x i32>, ptr [[SRC:%.*]], align 16
-; CHECK-NEXT:    call fastcc void @callee_4xi32(<4 x i32> [[SRC_VAL]], ptr noalias [[DST:%.*]])
+; CHECK-NEXT:    call fastcc void @callee_4xi32.argprom.argprom(<4 x i32> [[SRC_VAL]], ptr noalias [[DST:%.*]])
 ; CHECK-NEXT:    ret void
 ;
 entry:
-  call fastcc void @callee_4xi32(ptr noalias %src, ptr noalias %dst)
+  call fastcc void @callee_4xi32.argprom(ptr noalias %src, ptr noalias %dst)
   ret void
 }
 
-define internal fastcc void @callee_4xi32(ptr noalias %src, ptr noalias %dst) #1 {
-; CHECK-LABEL: define internal fastcc void @callee_4xi32(
+define internal fastcc void @callee_4xi32.argprom(ptr noalias %src, ptr noalias %dst) #1 {
+; CHECK-LABEL: define internal fastcc void @callee_4xi32.argprom.argprom(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    store <4 x i32> [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
 ; CHECK-NEXT:    ret void
@@ -65,7 +65,7 @@ define dso_local void @caller_i256(ptr noalias %src, ptr noalias %dst) #0 {
 ; CHECK-LABEL: define dso_local void @caller_i256(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SRC_VAL:%.*]] = load i256, ptr [[SRC:%.*]], align 16
-; CHECK-NEXT:    call fastcc void @callee_i256(i256 [[SRC_VAL]], ptr noalias [[DST:%.*]])
+; CHECK-NEXT:    call fastcc void @callee_i256.argprom(i256 [[SRC_VAL]], ptr noalias [[DST:%.*]])
 ; CHECK-NEXT:    ret void
 ;
 entry:
@@ -74,7 +74,7 @@ entry:
 }
 
 define internal fastcc void @callee_i256(ptr noalias %src, ptr noalias %dst) #0 {
-; CHECK-LABEL: define internal fastcc void @callee_i256(
+; CHECK-LABEL: define internal fastcc void @callee_i256.argprom(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    store i256 [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
 ; CHECK-NEXT:    ret void
@@ -159,7 +159,7 @@ define dso_local void @caller_struct4xi32(ptr noalias %src, ptr noalias %dst) #1
 ; CHECK-NEXT:    [[SRC_VAL:%.*]] = load <4 x i32>, ptr [[SRC:%.*]], align 16
 ; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr i8, ptr [[SRC]], i64 16
 ; CHECK-NEXT:    [[SRC_VAL1:%.*]] = load <4 x i32>, ptr [[TMP0]], align 16
-; CHECK-NEXT:    call fastcc void @callee_struct4xi32(<4 x i32> [[SRC_VAL]], <4 x i32> [[SRC_VAL1]], ptr noalias [[DST:%.*]])
+; CHECK-NEXT:    call fastcc void @callee_struct4xi32.argprom(<4 x i32> [[SRC_VAL]], <4 x i32> [[SRC_VAL1]], ptr noalias [[DST:%.*]])
 ; CHECK-NEXT:    ret void
 ;
 entry:
@@ -168,7 +168,7 @@ entry:
 }
 
 define internal fastcc void @callee_struct4xi32(ptr noalias %src, ptr noalias %dst) #1 {
-; CHECK-LABEL: define internal fastcc void @callee_struct4xi32(
+; CHECK-LABEL: define internal fastcc void @callee_struct4xi32.argprom(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    store <4 x i32> [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
 ; CHECK-NEXT:    [[DST2:%.*]] = getelementptr inbounds [[STRUCT_4XI32:%.*]], ptr [[DST]], i64 0, i32 1
diff --git a/llvm/test/CodeGen/AMDGPU/internalize.ll b/llvm/test/CodeGen/AMDGPU/internalize.ll
index 6b2a4d5fc328b4..08b42f93bf5f47 100644
--- a/llvm/test/CodeGen/AMDGPU/internalize.ll
+++ b/llvm/test/CodeGen/AMDGPU/internalize.ll
@@ -10,7 +10,7 @@
 ; ALL: gvar_used
 @gvar_used = addrspace(1) global i32 undef, align 4
 
-; OPT: define internal fastcc void @func_used_noinline(
+; OPT: define internal fastcc void @func_used_noinline.argelim(
 ; OPT-NONE: define fastcc void @func_used_noinline(
 define fastcc void @func_used_noinline(ptr addrspace(1) %out, i32 %tid) #1 {
 entry:
diff --git a/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll b/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll
index 42819d5421ca0f..8be9727b316d28 100644
--- a/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll
+++ b/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll
@@ -84,22 +84,22 @@ attributes #0 = { noinline optnone }
 ;; The first call to foo does not allocate cold memory. It should call the
 ;; original functions, which ultimately call the original allocation decorated
 ;; with a "notcold" attribute.
-; IR:   call {{.*}} @_Z3foov()
+; IR:   call {{.*}} @_Z3foov.retelim()
 ;; The second call to foo allocates cold memory. It should call cloned functions
 ;; which ultimately call a cloned allocation decorated with a "cold" attribute.
-; IR:   call {{.*}} @_Z3foov.memprof.1()
-; IR: define internal {{.*}} @_Z3barv()
+; IR:   call {{.*}} @_Z3foov.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3barv.retelim()
 ; IR:   call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv()
-; IR:   call {{.*}} @_Z3barv()
-; IR: define internal {{.*}} @_Z3foov()
-; IR:   call {{.*}} @_Z3bazv()
-; IR: define internal {{.*}} @_Z3barv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.retelim()
+; IR:   call {{.*}} @_Z3barv.retelim()
+; IR: define internal {{.*}} @_Z3foov.retelim()
+; IR:   call {{.*}} @_Z3bazv.retelim()
+; IR: define internal {{.*}} @_Z3barv.memprof.1.retelim()
 ; IR:   call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv.memprof.1()
-; IR:   call {{.*}} @_Z3barv.memprof.1()
-; IR: define internal {{.*}} @_Z3foov.memprof.1()
-; IR:   call {{.*}} @_Z3bazv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
+; IR:   call {{.*}} @_Z3barv.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IR:   call {{.*}} @_Z3bazv.memprof.1.retelim()
 ; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
 ; IR: attributes #[[COLD]] = { "memprof"="cold" }
 
diff --git a/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll b/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll
index 663f8525043c2f..4c18cf8226c8bb 100644
--- a/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll
+++ b/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll
@@ -84,22 +84,22 @@ attributes #0 = { noinline optnone }
 ;; The first call to foo does not allocate cold memory. It should call the
 ;; original functions, which ultimately call the original allocation decorated
 ;; with a "notcold" attribute.
-; IR:   call {{.*}} @_Z3foov()
+; IR:   call {{.*}} @_Z3foov.retelim()
 ;; The second call to foo allocates cold memory. It should call cloned functions
 ;; which ultimately call a cloned allocation decorated with a "cold" attribute.
-; IR:   call {{.*}} @_Z3foov.memprof.1()
-; IR: define internal {{.*}} @_Z3barv()
+; IR:   call {{.*}} @_Z3foov.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3barv.retelim()
 ; IR:   call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv()
-; IR:   call {{.*}} @_Z3barv()
-; IR: define internal {{.*}} @_Z3foov()
-; IR:   call {{.*}} @_Z3bazv()
-; IR: define internal {{.*}} @_Z3barv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.retelim()
+; IR:   call {{.*}} @_Z3barv.retelim()
+; IR: define internal {{.*}} @_Z3foov.retelim()
+; IR:   call {{.*}} @_Z3bazv.retelim()
+; IR: define internal {{.*}} @_Z3barv.memprof.1.retelim()
 ; IR:   call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv.memprof.1()
-; IR:   call {{.*}} @_Z3barv.memprof.1()
-; IR: define internal {{.*}} @_Z3foov.memprof.1()
-; IR:   call {{.*}} @_Z3bazv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
+; IR:   call {{.*}} @_Z3barv.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IR:   call {{.*}} @_Z3bazv.memprof.1.retelim()
 ; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
 ; IR: attributes #[[COLD]] = { "memprof"="cold" }
 
diff --git a/llvm/test/ThinLTO/X86/memprof-basic.ll b/llvm/test/ThinLTO/X86/memprof-basic.ll
index 6922dbfd368467..b7aadf8e32a771 100644
--- a/llvm/test/ThinLTO/X86/memprof-basic.ll
+++ b/llvm/test/ThinLTO/X86/memprof-basic.ll
@@ -53,7 +53,7 @@
 ;; We should have cloned bar, baz, and foo, for the cold memory allocation.
 ; RUN:	cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED
 
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
 
 
 ;; Try again but with distributed ThinLTO
@@ -303,6 +303,23 @@ attributes #0 = { noinline optnone }
 ; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
 ; IR: attributes #[[COLD]] = { "memprof"="cold" }
 
+; IRNODIST: define {{.*}} @main
+; IRNODIST:   call {{.*}} @_Z3foov.retelim()
+; IRNODIST:   call {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z3barv.retelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3bazv.retelim()
+; IRNODIST:   call {{.*}} @_Z3barv.retelim()
+; IRNODIST: define internal {{.*}} @_Z3foov.retelim()
+; IRNODIST:   call {{.*}} @_Z3bazv.retelim()
+; IRNODIST: define internal {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
+; IRNODIST:   call {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST:   call {{.*}} @_Z3bazv.memprof.1.retelim()
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
 
 ; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
 ; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll b/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll
index 65d794e9cba87c..bfc7b02a956c6f 100644
--- a/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll
+++ b/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll
@@ -68,7 +68,7 @@
 ; RUN:  -o %t.out 2>&1 | FileCheck %s --check-prefix=DUMP \
 ; RUN:  --check-prefix=STATS --check-prefix=STATS-BE --check-prefix=REMARKS
 
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
 
 
 ;; Try again but with distributed ThinLTO
@@ -247,6 +247,18 @@ attributes #0 = { noinline optnone}
 ; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
 ; IR: attributes #[[COLD]] = { "memprof"="cold" }
 
+; IRNODIST: define internal {{.*}} @_Z1Dv.retelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z1Fv.retelim()
+; IRNODIST:   call {{.*}} @_Z1Dv.retelim()
+; IRNODIST: define internal {{.*}} @_Z1Bv.retelim()
+; IRNODIST:   call {{.*}} @_Z1Dv.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z1Ev.retelim()
+; IRNODIST:   call {{.*}} @_Z1Dv.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z1Dv.memprof.1.retelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
 
 ; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
 ; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll b/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll
index f1a494d077fefc..4153524bf44706 100644
--- a/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll
+++ b/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll
@@ -61,7 +61,7 @@
 ; RUN:  -o %t.out 2>&1 | FileCheck %s --check-prefix=DUMP \
 ; RUN:  --check-prefix=STATS --check-prefix=STATS-BE --check-prefix=REMARKS
 
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
 
 
 ;; Try again but with distributed ThinLTO
@@ -283,6 +283,23 @@ attributes #0 = { noinline optnone }
 ; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
 ; IR: attributes #[[COLD]] = { "memprof"="cold" }
 
+; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.argelim(
+; IRNODIST:   call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD:[0-9]+]]
+; IRNODIST:   call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
+; IRNODIST: define internal {{.*}} @_Z1BPPcS0_(
+; IRNODIST:   call {{.*}} @_Z1EPPcS0_.argelim(
+; IRNODIST: define internal {{.*}} @_Z1CPPcS0_(
+; IRNODIST:   call {{.*}} @_Z1EPPcS0_.memprof.3.argelim(
+; IRNODIST: define internal {{.*}} @_Z1DPPcS0_(
+; IRNODIST:   call {{.*}} @_Z1EPPcS0_.memprof.2.argelim(
+; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.memprof.2.argelim(
+; IRNODIST:   call {{.*}} @_Znam(i64 noundef 10) #[[COLD:[0-9]+]]
+; IRNODIST:   call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
+; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.memprof.3.argelim(
+; IRNODIST:   call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
+; IRNODIST:   call {{.*}} @_Znam(i64 noundef 10) #[[COLD]]
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
 
 ; STATS: 2 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
 ; STATS-BE: 2 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-indirectcall.ll b/llvm/test/ThinLTO/X86/memprof-indirectcall.ll
index 07a52f441ca278..ba8811b46175e3 100644
--- a/llvm/test/ThinLTO/X86/memprof-indirectcall.ll
+++ b/llvm/test/ThinLTO/X86/memprof-indirectcall.ll
@@ -74,7 +74,7 @@
 ;; from main allocating cold memory.
 ; RUN:  cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED
 
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
 
 
 ;; Try again but with distributed ThinLTO
@@ -419,6 +419,19 @@ attributes #0 = { noinline optnone }
 ; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
 ; IR: attributes #[[COLD]] = { "memprof"="cold" }
 
+; IRNODIST: define {{.*}} @main(
+; IRNODIST:   call {{.*}} @_Z3foov.argelim()
+; IRNODIST:   call {{.*}} @_Z3foov.memprof.1.argelim()
+; IRNODIST:   call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST:   call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST:   call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST:   call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST: define internal {{.*}} @_Z3foov.argelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.argelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
 
 ; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
 ; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-inlined.ll b/llvm/test/ThinLTO/X86/memprof-inlined.ll
index 89df345b220423..7111a536a3110a 100644
--- a/llvm/test/ThinLTO/X86/memprof-inlined.ll
+++ b/llvm/test/ThinLTO/X86/memprof-inlined.ll
@@ -63,7 +63,7 @@
 ;; cold memory.
 ; RUN:	cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED
 
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
 
 
 ;; Try again but with distributed ThinLTO
@@ -323,6 +323,19 @@ attributes #0 = { noinline optnone }
 ; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
 ; IR: attributes #[[COLD]] = { "memprof"="cold" }
 
+; IRNODIST: define internal {{.*}} @_Z3barv.retelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3foov.retelim()
+; IRNODIST:   call {{.*}} @_Z3barv.retelim()
+; IRNODIST: define {{.*}} @main()
+; IRNODIST:   call {{.*}} @_Z3foov.retelim()
+; IRNODIST:   call {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST:   call {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
 
 ; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
 ; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll b/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll
index daa4e1fb757d21..51839033177034 100644
--- a/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll
+++ b/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll
@@ -3,7 +3,7 @@
 ; RUN: cat %t | FileChe...
[truncated]

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 25, 2024

@llvm/pr-subscribers-llvm-analysis

Author: None (yonghong-song)

Changes

The goal is to add suffix to Argument Promotion and Dead Argument Elimination passes. So users will know that function signature get changed. One of motivation is to help kernel tracing with bpf technology.
Previous patch is [1] and it is reverted due to some test failures. This patch fixed a test failure on top of [1].

For details of the description for the patch, see [1].

[1] #105742 for details


Patch is 123.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109899.diff

80 Files Affected:

  • (modified) compiler-rt/test/cfi/stats.cpp (+2-2)
  • (modified) llvm/lib/Transforms/IPO/ArgumentPromotion.cpp (+1)
  • (modified) llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp (+4)
  • (modified) llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll (+2-2)
  • (modified) llvm/test/BugPoint/remove_arguments_test.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/arg_promotion.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/internalize.ll (+1-1)
  • (modified) llvm/test/ThinLTO/X86/memprof-aliased-location1.ll (+12-12)
  • (modified) llvm/test/ThinLTO/X86/memprof-aliased-location2.ll (+12-12)
  • (modified) llvm/test/ThinLTO/X86/memprof-basic.ll (+18-1)
  • (modified) llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll (+13-1)
  • (modified) llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll (+18-1)
  • (modified) llvm/test/ThinLTO/X86/memprof-indirectcall.ll (+14-1)
  • (modified) llvm/test/ThinLTO/X86/memprof-inlined.ll (+14-1)
  • (modified) llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/BPF/argpromotion.ll (+1-1)
  • (modified) llvm/test/Transforms/ArgumentPromotion/X86/attributes.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/X86/min-legal-vector-width.ll (+16-16)
  • (modified) llvm/test/Transforms/ArgumentPromotion/X86/thiscall.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/actual-arguments.ll (+5-5)
  • (modified) llvm/test/Transforms/ArgumentPromotion/aggregate-promote-dead-gep.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/aggregate-promote.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/align.ll (+8-8)
  • (modified) llvm/test/Transforms/ArgumentPromotion/allocsize.ll (+8-8)
  • (modified) llvm/test/Transforms/ArgumentPromotion/attrs.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/basictest.ll (+4-4)
  • (modified) llvm/test/Transforms/ArgumentPromotion/bitcasts.ll (+4-4)
  • (modified) llvm/test/Transforms/ArgumentPromotion/byval-2.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/byval-with-padding.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/byval.ll (+10-10)
  • (modified) llvm/test/Transforms/ArgumentPromotion/chained.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/control-flow2.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/crash.ll (+1-1)
  • (modified) llvm/test/Transforms/ArgumentPromotion/dbg.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/fp80.ll (+6-6)
  • (modified) llvm/test/Transforms/ArgumentPromotion/inalloca.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/invalidation.ll (+3-3)
  • (modified) llvm/test/Transforms/ArgumentPromotion/load-alignment-value-overflows-addrspace-size.ll (+4-4)
  • (modified) llvm/test/Transforms/ArgumentPromotion/max-elements-limit.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/metadata.ll (+4-4)
  • (modified) llvm/test/Transforms/ArgumentPromotion/min-legal-vector-width.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/nonzero-address-spaces.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/opaque-ptr.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/pr27568.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/pr32917.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/pr33641_remove_arg_dbgvalue.ll (+1-1)
  • (modified) llvm/test/Transforms/ArgumentPromotion/profile.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/propagate-remove-dead-args.ll (+9-9)
  • (modified) llvm/test/Transforms/ArgumentPromotion/recursion/aggregate-promote-recursive.ll (+3-3)
  • (modified) llvm/test/Transforms/ArgumentPromotion/recursion/argpromotion-recursion-pr1259.ll (+4-4)
  • (modified) llvm/test/Transforms/ArgumentPromotion/recursion/recursion-mixed-calls.ll (+6-6)
  • (modified) llvm/test/Transforms/ArgumentPromotion/recursion/recursion-non-zero-offset.ll (+4-4)
  • (modified) llvm/test/Transforms/ArgumentPromotion/reserve-tbaa.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/sret.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/store-into-inself.ll (+2-2)
  • (modified) llvm/test/Transforms/ArgumentPromotion/unused-argument.ll (+4-4)
  • (modified) llvm/test/Transforms/Attributor/reduced/clear_cached_analysis_for_deleted_functions.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/2007-02-07-FuncRename.ll (+1-1)
  • (modified) llvm/test/Transforms/DeadArgElim/2007-12-20-ParamAttrs.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/2010-04-30-DbgInfo.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/aggregates.ll (+5-5)
  • (modified) llvm/test/Transforms/DeadArgElim/call_profile.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/comdat.ll (+1-1)
  • (modified) llvm/test/Transforms/DeadArgElim/dbginfo-update-dbgval-local.ll (+3-3)
  • (modified) llvm/test/Transforms/DeadArgElim/dbginfo.ll (+1-1)
  • (modified) llvm/test/Transforms/DeadArgElim/deadretval.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/fct_ptr.ll (+1-1)
  • (modified) llvm/test/Transforms/DeadArgElim/func_metadata.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/funclet.ll (+1-1)
  • (modified) llvm/test/Transforms/DeadArgElim/keepalive.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/nonzero-address-spaces.ll (+2-2)
  • (modified) llvm/test/Transforms/DeadArgElim/returned.ll (+5-5)
  • (modified) llvm/test/Transforms/DeadArgElim/variadic_safety.ll (+1-1)
  • (modified) llvm/test/Transforms/FunctionSpecialization/function-specialization2.ll (+6-6)
  • (modified) llvm/test/Transforms/FunctionSpecialization/global-var-constants.ll (+7-7)
  • (modified) llvm/test/Transforms/FunctionSpecialization/non-argument-tracked.ll (+12-12)
  • (modified) llvm/test/Transforms/FunctionSpecialization/specialization-order.ll (+6-6)
  • (modified) llvm/test/Transforms/PhaseOrdering/dae-dce.ll (+4-2)
  • (modified) llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll (+2-2)
  • (modified) llvm/test/Transforms/SCCP/recursion.ll (+3-3)
diff --git a/compiler-rt/test/cfi/stats.cpp b/compiler-rt/test/cfi/stats.cpp
index ca6b3bf0df4814..9c4900e86129aa 100644
--- a/compiler-rt/test/cfi/stats.cpp
+++ b/compiler-rt/test/cfi/stats.cpp
@@ -26,12 +26,12 @@ extern "C" __attribute__((noinline)) void nvcall(A *a) {
 }
 
 extern "C" __attribute__((noinline)) A *dcast(A *a) {
-  // CHECK: stats.cpp:[[@LINE+1]] {{_?}}dcast cfi-derived-cast 24
+  // CHECK: stats.cpp:[[@LINE+1]] {{_?}}dcast.retelim cfi-derived-cast 24
   return (A *)(ABase *)a;
 }
 
 extern "C" __attribute__((noinline)) A *ucast(A *a) {
-  // CHECK: stats.cpp:[[@LINE+1]] {{_?}}ucast cfi-unrelated-cast 81
+  // CHECK: stats.cpp:[[@LINE+1]] {{_?}}ucast.retelim cfi-unrelated-cast 81
   return (A *)(char *)a;
 }
 
diff --git a/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp b/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
index 1f9b546ed29996..c8b75dd475ae44 100644
--- a/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
+++ b/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
@@ -215,6 +215,7 @@ doPromotion(Function *F, FunctionAnalysisManager &FAM,
 
   F->getParent()->getFunctionList().insert(F->getIterator(), NF);
   NF->takeName(F);
+  NF->setName(NF->getName() + ".argprom");
 
   // Loop over all the callers of the function, transforming the call sites to
   // pass in the loaded pointers.
diff --git a/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp b/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
index d1548592b1ce26..b912cc66d19db5 100644
--- a/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
+++ b/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
@@ -889,6 +889,10 @@ bool DeadArgumentEliminationPass::removeDeadStuffFromFunction(Function *F) {
   // it again.
   F->getParent()->getFunctionList().insert(F->getIterator(), NF);
   NF->takeName(F);
+  if (NumArgumentsEliminated)
+    NF->setName(NF->getName() + ".argelim");
+  else
+    NF->setName(NF->getName() + ".retelim");
   NF->IsNewDbgInfoFormat = F->IsNewDbgInfoFormat;
 
   // Loop over all the callers of the function, transforming the call sites to
diff --git a/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll b/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll
index 2bc486f541c71f..4f16c02b1473ff 100644
--- a/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll
+++ b/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll
@@ -9,7 +9,7 @@ define internal void @a() alwaysinline {
 }
 
 define internal void @b(ptr) noinline {
-; CHECK-LABEL: @b(
+; CHECK-LABEL: @b.argprom(
 ; CHECK-NEXT:    ret void
 ;
   ret void
@@ -17,7 +17,7 @@ define internal void @b(ptr) noinline {
 
 define internal void @c() noinline {
 ; CHECK-LABEL: @c(
-; CHECK-NEXT:    call void @b()
+; CHECK-NEXT:    call void @b.argprom()
 ; CHECK-NEXT:    ret void
 ;
   call void @b(ptr @a)
diff --git a/llvm/test/BugPoint/remove_arguments_test.ll b/llvm/test/BugPoint/remove_arguments_test.ll
index 9e9c51eaafc383..bb93e45e4b46ef 100644
--- a/llvm/test/BugPoint/remove_arguments_test.ll
+++ b/llvm/test/BugPoint/remove_arguments_test.ll
@@ -11,7 +11,7 @@
 
 declare i32 @test2()
 
-; CHECK: define void @test() {
+; CHECK: define void @test.argelim() {
 define i32 @test(i32 %A, ptr %B, float %C) {
 	call i32 @test2()
 	ret i32 %1
diff --git a/llvm/test/CodeGen/AArch64/arg_promotion.ll b/llvm/test/CodeGen/AArch64/arg_promotion.ll
index cc37d230c6cbe4..724a7f109f1e29 100644
--- a/llvm/test/CodeGen/AArch64/arg_promotion.ll
+++ b/llvm/test/CodeGen/AArch64/arg_promotion.ll
@@ -38,16 +38,16 @@ define dso_local void @caller_4xi32(ptr noalias %src, ptr noalias %dst) #1 {
 ; CHECK-LABEL: define dso_local void @caller_4xi32(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SRC_VAL:%.*]] = load <4 x i32>, ptr [[SRC:%.*]], align 16
-; CHECK-NEXT:    call fastcc void @callee_4xi32(<4 x i32> [[SRC_VAL]], ptr noalias [[DST:%.*]])
+; CHECK-NEXT:    call fastcc void @callee_4xi32.argprom.argprom(<4 x i32> [[SRC_VAL]], ptr noalias [[DST:%.*]])
 ; CHECK-NEXT:    ret void
 ;
 entry:
-  call fastcc void @callee_4xi32(ptr noalias %src, ptr noalias %dst)
+  call fastcc void @callee_4xi32.argprom(ptr noalias %src, ptr noalias %dst)
   ret void
 }
 
-define internal fastcc void @callee_4xi32(ptr noalias %src, ptr noalias %dst) #1 {
-; CHECK-LABEL: define internal fastcc void @callee_4xi32(
+define internal fastcc void @callee_4xi32.argprom(ptr noalias %src, ptr noalias %dst) #1 {
+; CHECK-LABEL: define internal fastcc void @callee_4xi32.argprom.argprom(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    store <4 x i32> [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
 ; CHECK-NEXT:    ret void
@@ -65,7 +65,7 @@ define dso_local void @caller_i256(ptr noalias %src, ptr noalias %dst) #0 {
 ; CHECK-LABEL: define dso_local void @caller_i256(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SRC_VAL:%.*]] = load i256, ptr [[SRC:%.*]], align 16
-; CHECK-NEXT:    call fastcc void @callee_i256(i256 [[SRC_VAL]], ptr noalias [[DST:%.*]])
+; CHECK-NEXT:    call fastcc void @callee_i256.argprom(i256 [[SRC_VAL]], ptr noalias [[DST:%.*]])
 ; CHECK-NEXT:    ret void
 ;
 entry:
@@ -74,7 +74,7 @@ entry:
 }
 
 define internal fastcc void @callee_i256(ptr noalias %src, ptr noalias %dst) #0 {
-; CHECK-LABEL: define internal fastcc void @callee_i256(
+; CHECK-LABEL: define internal fastcc void @callee_i256.argprom(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    store i256 [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
 ; CHECK-NEXT:    ret void
@@ -159,7 +159,7 @@ define dso_local void @caller_struct4xi32(ptr noalias %src, ptr noalias %dst) #1
 ; CHECK-NEXT:    [[SRC_VAL:%.*]] = load <4 x i32>, ptr [[SRC:%.*]], align 16
 ; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr i8, ptr [[SRC]], i64 16
 ; CHECK-NEXT:    [[SRC_VAL1:%.*]] = load <4 x i32>, ptr [[TMP0]], align 16
-; CHECK-NEXT:    call fastcc void @callee_struct4xi32(<4 x i32> [[SRC_VAL]], <4 x i32> [[SRC_VAL1]], ptr noalias [[DST:%.*]])
+; CHECK-NEXT:    call fastcc void @callee_struct4xi32.argprom(<4 x i32> [[SRC_VAL]], <4 x i32> [[SRC_VAL1]], ptr noalias [[DST:%.*]])
 ; CHECK-NEXT:    ret void
 ;
 entry:
@@ -168,7 +168,7 @@ entry:
 }
 
 define internal fastcc void @callee_struct4xi32(ptr noalias %src, ptr noalias %dst) #1 {
-; CHECK-LABEL: define internal fastcc void @callee_struct4xi32(
+; CHECK-LABEL: define internal fastcc void @callee_struct4xi32.argprom(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    store <4 x i32> [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
 ; CHECK-NEXT:    [[DST2:%.*]] = getelementptr inbounds [[STRUCT_4XI32:%.*]], ptr [[DST]], i64 0, i32 1
diff --git a/llvm/test/CodeGen/AMDGPU/internalize.ll b/llvm/test/CodeGen/AMDGPU/internalize.ll
index 6b2a4d5fc328b4..08b42f93bf5f47 100644
--- a/llvm/test/CodeGen/AMDGPU/internalize.ll
+++ b/llvm/test/CodeGen/AMDGPU/internalize.ll
@@ -10,7 +10,7 @@
 ; ALL: gvar_used
 @gvar_used = addrspace(1) global i32 undef, align 4
 
-; OPT: define internal fastcc void @func_used_noinline(
+; OPT: define internal fastcc void @func_used_noinline.argelim(
 ; OPT-NONE: define fastcc void @func_used_noinline(
 define fastcc void @func_used_noinline(ptr addrspace(1) %out, i32 %tid) #1 {
 entry:
diff --git a/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll b/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll
index 42819d5421ca0f..8be9727b316d28 100644
--- a/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll
+++ b/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll
@@ -84,22 +84,22 @@ attributes #0 = { noinline optnone }
 ;; The first call to foo does not allocate cold memory. It should call the
 ;; original functions, which ultimately call the original allocation decorated
 ;; with a "notcold" attribute.
-; IR:   call {{.*}} @_Z3foov()
+; IR:   call {{.*}} @_Z3foov.retelim()
 ;; The second call to foo allocates cold memory. It should call cloned functions
 ;; which ultimately call a cloned allocation decorated with a "cold" attribute.
-; IR:   call {{.*}} @_Z3foov.memprof.1()
-; IR: define internal {{.*}} @_Z3barv()
+; IR:   call {{.*}} @_Z3foov.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3barv.retelim()
 ; IR:   call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv()
-; IR:   call {{.*}} @_Z3barv()
-; IR: define internal {{.*}} @_Z3foov()
-; IR:   call {{.*}} @_Z3bazv()
-; IR: define internal {{.*}} @_Z3barv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.retelim()
+; IR:   call {{.*}} @_Z3barv.retelim()
+; IR: define internal {{.*}} @_Z3foov.retelim()
+; IR:   call {{.*}} @_Z3bazv.retelim()
+; IR: define internal {{.*}} @_Z3barv.memprof.1.retelim()
 ; IR:   call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv.memprof.1()
-; IR:   call {{.*}} @_Z3barv.memprof.1()
-; IR: define internal {{.*}} @_Z3foov.memprof.1()
-; IR:   call {{.*}} @_Z3bazv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
+; IR:   call {{.*}} @_Z3barv.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IR:   call {{.*}} @_Z3bazv.memprof.1.retelim()
 ; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
 ; IR: attributes #[[COLD]] = { "memprof"="cold" }
 
diff --git a/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll b/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll
index 663f8525043c2f..4c18cf8226c8bb 100644
--- a/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll
+++ b/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll
@@ -84,22 +84,22 @@ attributes #0 = { noinline optnone }
 ;; The first call to foo does not allocate cold memory. It should call the
 ;; original functions, which ultimately call the original allocation decorated
 ;; with a "notcold" attribute.
-; IR:   call {{.*}} @_Z3foov()
+; IR:   call {{.*}} @_Z3foov.retelim()
 ;; The second call to foo allocates cold memory. It should call cloned functions
 ;; which ultimately call a cloned allocation decorated with a "cold" attribute.
-; IR:   call {{.*}} @_Z3foov.memprof.1()
-; IR: define internal {{.*}} @_Z3barv()
+; IR:   call {{.*}} @_Z3foov.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3barv.retelim()
 ; IR:   call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv()
-; IR:   call {{.*}} @_Z3barv()
-; IR: define internal {{.*}} @_Z3foov()
-; IR:   call {{.*}} @_Z3bazv()
-; IR: define internal {{.*}} @_Z3barv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.retelim()
+; IR:   call {{.*}} @_Z3barv.retelim()
+; IR: define internal {{.*}} @_Z3foov.retelim()
+; IR:   call {{.*}} @_Z3bazv.retelim()
+; IR: define internal {{.*}} @_Z3barv.memprof.1.retelim()
 ; IR:   call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv.memprof.1()
-; IR:   call {{.*}} @_Z3barv.memprof.1()
-; IR: define internal {{.*}} @_Z3foov.memprof.1()
-; IR:   call {{.*}} @_Z3bazv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
+; IR:   call {{.*}} @_Z3barv.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IR:   call {{.*}} @_Z3bazv.memprof.1.retelim()
 ; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
 ; IR: attributes #[[COLD]] = { "memprof"="cold" }
 
diff --git a/llvm/test/ThinLTO/X86/memprof-basic.ll b/llvm/test/ThinLTO/X86/memprof-basic.ll
index 6922dbfd368467..b7aadf8e32a771 100644
--- a/llvm/test/ThinLTO/X86/memprof-basic.ll
+++ b/llvm/test/ThinLTO/X86/memprof-basic.ll
@@ -53,7 +53,7 @@
 ;; We should have cloned bar, baz, and foo, for the cold memory allocation.
 ; RUN:	cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED
 
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
 
 
 ;; Try again but with distributed ThinLTO
@@ -303,6 +303,23 @@ attributes #0 = { noinline optnone }
 ; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
 ; IR: attributes #[[COLD]] = { "memprof"="cold" }
 
+; IRNODIST: define {{.*}} @main
+; IRNODIST:   call {{.*}} @_Z3foov.retelim()
+; IRNODIST:   call {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z3barv.retelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3bazv.retelim()
+; IRNODIST:   call {{.*}} @_Z3barv.retelim()
+; IRNODIST: define internal {{.*}} @_Z3foov.retelim()
+; IRNODIST:   call {{.*}} @_Z3bazv.retelim()
+; IRNODIST: define internal {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
+; IRNODIST:   call {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST:   call {{.*}} @_Z3bazv.memprof.1.retelim()
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
 
 ; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
 ; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll b/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll
index 65d794e9cba87c..bfc7b02a956c6f 100644
--- a/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll
+++ b/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll
@@ -68,7 +68,7 @@
 ; RUN:  -o %t.out 2>&1 | FileCheck %s --check-prefix=DUMP \
 ; RUN:  --check-prefix=STATS --check-prefix=STATS-BE --check-prefix=REMARKS
 
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
 
 
 ;; Try again but with distributed ThinLTO
@@ -247,6 +247,18 @@ attributes #0 = { noinline optnone}
 ; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
 ; IR: attributes #[[COLD]] = { "memprof"="cold" }
 
+; IRNODIST: define internal {{.*}} @_Z1Dv.retelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z1Fv.retelim()
+; IRNODIST:   call {{.*}} @_Z1Dv.retelim()
+; IRNODIST: define internal {{.*}} @_Z1Bv.retelim()
+; IRNODIST:   call {{.*}} @_Z1Dv.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z1Ev.retelim()
+; IRNODIST:   call {{.*}} @_Z1Dv.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z1Dv.memprof.1.retelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
 
 ; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
 ; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll b/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll
index f1a494d077fefc..4153524bf44706 100644
--- a/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll
+++ b/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll
@@ -61,7 +61,7 @@
 ; RUN:  -o %t.out 2>&1 | FileCheck %s --check-prefix=DUMP \
 ; RUN:  --check-prefix=STATS --check-prefix=STATS-BE --check-prefix=REMARKS
 
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
 
 
 ;; Try again but with distributed ThinLTO
@@ -283,6 +283,23 @@ attributes #0 = { noinline optnone }
 ; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
 ; IR: attributes #[[COLD]] = { "memprof"="cold" }
 
+; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.argelim(
+; IRNODIST:   call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD:[0-9]+]]
+; IRNODIST:   call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
+; IRNODIST: define internal {{.*}} @_Z1BPPcS0_(
+; IRNODIST:   call {{.*}} @_Z1EPPcS0_.argelim(
+; IRNODIST: define internal {{.*}} @_Z1CPPcS0_(
+; IRNODIST:   call {{.*}} @_Z1EPPcS0_.memprof.3.argelim(
+; IRNODIST: define internal {{.*}} @_Z1DPPcS0_(
+; IRNODIST:   call {{.*}} @_Z1EPPcS0_.memprof.2.argelim(
+; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.memprof.2.argelim(
+; IRNODIST:   call {{.*}} @_Znam(i64 noundef 10) #[[COLD:[0-9]+]]
+; IRNODIST:   call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
+; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.memprof.3.argelim(
+; IRNODIST:   call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
+; IRNODIST:   call {{.*}} @_Znam(i64 noundef 10) #[[COLD]]
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
 
 ; STATS: 2 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
 ; STATS-BE: 2 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-indirectcall.ll b/llvm/test/ThinLTO/X86/memprof-indirectcall.ll
index 07a52f441ca278..ba8811b46175e3 100644
--- a/llvm/test/ThinLTO/X86/memprof-indirectcall.ll
+++ b/llvm/test/ThinLTO/X86/memprof-indirectcall.ll
@@ -74,7 +74,7 @@
 ;; from main allocating cold memory.
 ; RUN:  cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED
 
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
 
 
 ;; Try again but with distributed ThinLTO
@@ -419,6 +419,19 @@ attributes #0 = { noinline optnone }
 ; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
 ; IR: attributes #[[COLD]] = { "memprof"="cold" }
 
+; IRNODIST: define {{.*}} @main(
+; IRNODIST:   call {{.*}} @_Z3foov.argelim()
+; IRNODIST:   call {{.*}} @_Z3foov.memprof.1.argelim()
+; IRNODIST:   call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST:   call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST:   call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST:   call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST: define internal {{.*}} @_Z3foov.argelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.argelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
 
 ; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
 ; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-inlined.ll b/llvm/test/ThinLTO/X86/memprof-inlined.ll
index 89df345b220423..7111a536a3110a 100644
--- a/llvm/test/ThinLTO/X86/memprof-inlined.ll
+++ b/llvm/test/ThinLTO/X86/memprof-inlined.ll
@@ -63,7 +63,7 @@
 ;; cold memory.
 ; RUN:	cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED
 
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
 
 
 ;; Try again but with distributed ThinLTO
@@ -323,6 +323,19 @@ attributes #0 = { noinline optnone }
 ; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
 ; IR: attributes #[[COLD]] = { "memprof"="cold" }
 
+; IRNODIST: define internal {{.*}} @_Z3barv.retelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3foov.retelim()
+; IRNODIST:   call {{.*}} @_Z3barv.retelim()
+; IRNODIST: define {{.*}} @main()
+; IRNODIST:   call {{.*}} @_Z3foov.retelim()
+; IRNODIST:   call {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST:   call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST:   call {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
 
 ; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
 ; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll b/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll
index daa4e1fb757d21..51839033177034 100644
--- a/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll
+++ b/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll
@@ -3,7 +3,7 @@
 ; RUN: cat %t | FileChe...
[truncated]

@yonghong-song
Copy link
Contributor Author

This pull request includes 4 commits. The first 3 commits are from the previous reviewed pull request:
#105742
The last commit is to fix additional test failures.

@efriedma-quic could you take a look? If everything looks good, could you approve it? Thanks!

@arsenm
Copy link
Contributor

arsenm commented Sep 25, 2024

This pull request includes 4 commits. The first 3 commits are from the previous reviewed pull request: #105742 The last commit is to fix additional test failures.

You can just rebase it to get rid of them

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to cause SamplePGO tooling (both in the compiler and out of it) to need updating. Here's a place in the compiler that needs updating, e.g.:

const char *KnownSuffixes[] = {LLVMSuffix, PartSuffix, UniqSuffix};

To avoid affecting profile handling, and avoid a lot of test churn, can you put this under an option (ideally defaulted off)?

@@ -303,6 +303,23 @@ attributes #0 = { noinline optnone }
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }

; IRNODIST: define {{.*}} @main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the memprof tests, probably better to just loosen up the original matching a bit (by removing the () and/or adding {{.*}})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I can do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks (I still see the test churn but assume you haven't had a chance to update those yet).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above 'IRNODIST' thing is already gone in the latest patch. The above code is marked as 'Outdated'.

define internal fastcc void @callee_4xi32(ptr noalias %src, ptr noalias %dst) #1 {
; CHECK-LABEL: define internal fastcc void @callee_4xi32(
define internal fastcc void @callee_4xi32.argprom(ptr noalias %src, ptr noalias %dst) #1 {
; CHECK-LABEL: define internal fastcc void @callee_4xi32.argprom.argprom(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change it to avoid adding cascading suffixes? This gets a little verbose and potentially even harder for e.g. profile tooling that tries to ignore suffixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this for two reasons. First, gcc has cascading suffixes, e.g. when I compiled llvm with gcc, I got the following:

_ZN5clang19RecursiveASTVisitorIN12_GLOBAL__N_119PluralMisuseChecker13MethodCrawlerEE14TraverseIfStmtEPNS_6IfStmtEPN4llvm15SmallVectorImplINS7_14PointerIntPairIPNS_4StmtELj1EbNS7_21PointerLikeTypeTraitsISB_EENS7_18PointerIntPairInfoISB_Lj1ESD_EEEEEE.part.0.constprop.0.isra.0

Second, cascading the suffix can give a hint what signature-changing transformation has done so it would be easier for people to find the changed func signature.

@mtrofin
Copy link
Member

mtrofin commented Sep 25, 2024

I recommend having a RFC for this. First, names are important in a number of scenarios, currently - @xur-llvm can detail cases where the linux kernel wouldn't build because of name suffixes.

Second, I'd like to take a step back and understand alternatives (for which a more detailed description of the scenario, in a RFC, would be a good/necessary idea). For example, and in the absence of more information, I wonder why not leave the names use function level metadata, and save it into a section in the binary?

@arsenm
Copy link
Contributor

arsenm commented Sep 25, 2024

So users will know that function signature get changed

Generally users wouldn't need to know that; but they would know from the signature itself?

@yonghong-song
Copy link
Contributor Author

This is going to cause SamplePGO tooling (both in the compiler and out of it) to need updating. Here's a place in the compiler that needs updating, e.g.:

const char *KnownSuffixes[] = {LLVMSuffix, PartSuffix, UniqSuffix};

To avoid affecting profile handling, and avoid a lot of test churn, can you put this under an option (ideally defaulted off)?

Thanks for the pointer. I will take a look. IIUC, besides '.llvm.' suffix, llvm has some other suffixes as well, e.g., '.' for FULL LTO, '.specailized' (in Transforms/IPO/FunctionSpecialization.cpp). Are they handled properly?

@yonghong-song
Copy link
Contributor Author

I recommend having a RFC for this. First, names are important in a number of scenarios, currently - @xur-llvm can detail cases where the linux kernel wouldn't build because of name suffixes.

Full LTO already have lots of suffixes, how profiling handle this?
Yes, I would like to know more about this and I think we should resolve it. gcc has suffixes and gcc has the same problem?

Second, I'd like to take a step back and understand alternatives (for which a more detailed description of the scenario, in a RFC, would be a good/necessary idea). For example, and in the absence of more information, I wonder why not leave the names use function level metadata, and save it into a section in the binary?

A lot of discussion already in #105742. Ultimately, what we want is the precise func signature for every func. What you proposed is okay, save func -> signature in a section of the binary. I am wondering how this can be done.

@yonghong-song
Copy link
Contributor Author

So users will know that function signature get changed

Generally users wouldn't need to know that; but they would know from the signature itself?

In kernel tracing, if func name is not changed, the func signature will be assumed to be based on source code. If compiler silently changes signature, then kernel tracing could get incorrect result. So we either need to change func name to indicate func signature have changed or we need additional information in the binary which will tell signature has changed and better what is the new signature.

@arsenm
Copy link
Contributor

arsenm commented Sep 29, 2024

In kernel tracing, if func name is not changed, the func signature will be assumed to be based on source code. If compiler silently changes signature, then kernel tracing could get incorrect result. So we either need to change func name to indicate func signature have changed or we need additional information in the binary which will tell signature has changed and better what is the new signature.

The signature won't change if it's externally visible.

@yonghong-song
Copy link
Contributor Author

In kernel tracing, if func name is not changed, the func signature will be assumed to be based on source code. If compiler silently changes signature, then kernel tracing could get incorrect result. So we either need to change func name to indicate func signature have changed or we need additional information in the binary which will tell signature has changed and better what is the new signature.

The signature won't change if it's externally visible.

@arsenm Indeed, you are right. We only talk about static functions whose signatures may change.

@yonghong-song
Copy link
Contributor Author

This is going to cause SamplePGO tooling (both in the compiler and out of it) to need updating. Here's a place in the compiler that needs updating, e.g.:

const char *KnownSuffixes[] = {LLVMSuffix, PartSuffix, UniqSuffix};

To avoid affecting profile handling, and avoid a lot of test churn, can you put this under an option (ideally defaulted off)?

I tried an example with bpftool (https://github.com/torvalds/linux/tree/master/tools/bpf/bpftool). I build the libbpf/bpftool with additional flags -gline-tables-only -fdebug-info-for-profiling -funique-internal-linkage-names. I also intentionally modified one of static function 'btf_new' so 'btf_new' function eventually will have .argelim suffix.

I then used the following command to generate the training data.

  sudo perf record -e BR_INST_RETIRED.NEAR_TAKEN:uppp -b -o perf.data -c 10059 --buildid-mmap ./bpftool prog
  sudo perf script -F ip,brstack -i perf.data --show-mmap-events &> perfscript.out 
  llvm-profgen --binary ./bpftool --perfscript=perfscript.out --output=sample.perfscript.bin
  llvm-profdata merge --sample --text sample.perfscript.bin --output=sample.perfscript.txt

I checked sample.perfscript.txt, the 'btf_new' symbol indeed in the training data:

$ llvm-readelf -s bpftool | grep btf_new 
   420: 000000000004a780  2396 FUNC    LOCAL  DEFAULT    13 _ZL7btf_newPKvjP3btfi.__uniq.13970676711478106820152951367420834074.argelim
$ grep btf_new sample.perfscript.txt
_ZL7btf_newPKvjP3btfi.__uniq.13970676711478106820152951367420834074:324779:0

Another example

$ llvm-readelf -s bpftool | grep print_boot_time
   265: 0000000000026580   262 FUNC    LOCAL  DEFAULT    13 _ZL15print_boot_timeyPcj.__uniq.209043448395238328353871160106749095556.argelim
$ grep print_boot_time sample.perfscript.txt
_ZL15print_boot_timeyPcj.__uniq.209043448395238328353871160106749095556:95:0

I did some code inspection and find that llvm-profgen uses symbol table to find the code and do disassemble. But eventual write to the training data is based on dwarf (address range -> func name in dwarf). In dwarf, the func name is

$ llvm-dwarfdump bpftool | grep _ZL7btf_newPKvjP3btfi
                  DW_AT_call_origin     (0x00014d02 "_ZL7btf_newPKvjP3btfi.__uniq.13970676711478106820152951367420834074")
                DW_AT_linkage_name      ("_ZL7btf_newPKvjP3btfi.__uniq.13970676711478106820152951367420834074")
                  DW_AT_call_origin     (0x00014d02 "_ZL7btf_newPKvjP3btfi.__uniq.13970676711478106820152951367420834074")
                  DW_AT_call_origin     (0x00014d02 "_ZL7btf_newPKvjP3btfi.__uniq.13970676711478106820152951367420834074")
                  DW_AT_call_origin     (0x00014d02 "_ZL7btf_newPKvjP3btfi.__uniq.13970676711478106820152951367420834074")
                  DW_AT_call_origin     (0x00014d02 "_ZL7btf_newPKvjP3btfi.__uniq.13970676711478106820152951367420834074")
                  DW_AT_call_origin     (0x00014d02 "_ZL7btf_newPKvjP3btfi.__uniq.13970676711478106820152951367420834074")

You can see the above linkage name which is the one in the training data.

So I think the .argelim suffix should not impact sampling based profile.

The same for .argprom suffix:

$ llvm-readelf -s bpftool | grep btf_invalidate_raw_data_1
   414: 0000000000055210    60 FUNC    LOCAL  DEFAULT    13 _ZL25btf_invalidate_raw_data_1P3btfPKc.__uniq.13970676711478106820152951367420834074.argprom
[[email protected] ~/work/bpf-next/tools/bpf/bpftool (schedext-v1-3-debug)]$ llvm-dwarfdump bpftool | grep btf_invalidate_raw_data_1
                  DW_AT_call_origin     (0x00015e44 "_ZL25btf_invalidate_raw_data_1P3btfPKc.__uniq.13970676711478106820152951367420834074")
                DW_AT_linkage_name      ("_ZL25btf_invalidate_raw_data_1P3btfPKc.__uniq.13970676711478106820152951367420834074")
                DW_AT_name      ("btf_invalidate_raw_data_1")

…rgumentElimination

ArgumentPromotion and DeadArgumentElimination passes could change
function signatures but the function name remains the same as before
the transformation. This makes it hard for tracing with bpf programs
where user tends to use function signature in the source.
See discussion [1] for details.

This patch added suffix to functions whose signatures
are changed. The suffix lets users know that function
signature has changed and they need to impact the IR or binary
to find modified signature before tracing those functions.

The suffix for ArgumentPromotion is ".argprom" and
the suffix for DeadArgumentElimination is ".argelim".
The suffix also gives user hints about what kind of
transformation has been done.

With this patch, I built a recent linux kernel with
full LTO enabled. I got 4 functions with only argpromotion like
  set_track_update.argelim.argprom
  pmd_trans_huge_lock.argprom
  ...
I got 1058 functions with only deadargelim like
  process_bit0.argelim
  pci_io_ecs_init.argelim
  ...
I got 3 functions with both argpromotion and deadargelim
  set_track_update.argelim.argprom
  zero_pud_populate.argelim.argprom
  zero_pmd_populate.argelim.argprom

There are some concerns about func suffix may impact sample based
profiling. I did some experiments and show that this is not the
case. The sample profiling gets func name from dwarf and those
func names in dwarf does not have suffixes added by this patch
and sample profiling works fine with this patch.

  [1] llvm#104678
@yonghong-song yonghong-song changed the title [Transforms][IPO] Add func suffix in ArgumentPromotion and DeadArgumentElimination [RFC][Transforms][IPO] Add func suffix in ArgumentPromotion and DeadArgumentElimination Oct 6, 2024
@yonghong-song
Copy link
Contributor Author

@teresajohnson @arsenm I tried sampling based profiling with one of bpf applications and it looks like the added suffixes are not affecting sampling based profiling. See the details in the above. I also marked the patch as RFC as you suggested.

For func suffixes (or more than one suffixes), gcc already has precedences. The below are some examples when build clang with gcc:

$ llvm-readelf -s clang | grep isra | grep constprop.
...
135408: 00000000061d7140  1529 FUNC    LOCAL  DEFAULT    14 _ZN4llvm4yaml7yamlizeISt6vectorIN12_GLOBAL__N_15ParamESaIS4_EENS0_12EmptyContextEEENSt9enable_ifIXsrNS0_18has_SequenceTraitsIT_EE5valueEvE4typeERNS0_2IOERSA_bRT0_.constprop.0.isra.0
135415: 00000000061de7b0  6558 FUNC    LOCAL  DEFAULT    14 _ZN4llvm4yaml7yamlizeISt6vectorIN12_GLOBAL__N_15ClassESaIS4_EENS0_12EmptyContextEEENSt9enable_ifIXsrNS0_18has_SequenceTraitsIT_EE5valueEvE4typeERNS0_2IOERSA_bRT0_.constprop.0.isra.0
135416: 00000000061e0150  1755 FUNC    LOCAL  DEFAULT    14 _ZN4llvm4yaml7yamlizeISt6vectorIN12_GLOBAL__N_18FunctionESaIS4_EENS0_12EmptyContextEEENSt9enable_ifIXsrNS0_18has_SequenceTraitsIT_EE5valueEvE4typeERNS0_2IOERSA_bRT0_.constprop.0.isra.0
135417: 00000000061e0830  5534 FUNC    LOCAL  DEFAULT    14 _ZN4llvm4yaml7yamlizeISt6vectorIN12_GLOBAL__N_13TagESaIS4_EENS0_12EmptyContextEEENSt9enable_ifIXsrNS0_18has_SequenceTraitsIT_EE5valueEvE4typeERNS0_2IOERSA_bRT0_.constprop.0.isra.0
135419: 00000000061e3ab0  1502 FUNC    LOCAL  DEFAULT    14 _ZN4llvm4yaml7yamlizeISt6vectorIN12_GLOBAL__N_19VersionedESaIS4_EENS0_12EmptyContextEEENSt9enable_ifIXsrNS0_18has_SequenceTraitsIT_EE5valueEvE4typeERNS0_2IOERSA_bRT0_.constprop.0.isra.0

And there are even some cases having three suffixes:

$ llvm-readelf -s clang | grep isra | grep constprop | grep part
  5663: 000000000147dd70  1041 FUNC    LOCAL  DEFAULT    14 _ZN4llvm10GCNTTIImpl18getVectorInstrCostEjPNS_4TypeENS_19TargetTransformInfo14TargetCostKindEjPNS_5ValueES6_.part.0.constprop.1.isra.0
  5664: 000000000147e190  1041 FUNC    LOCAL  DEFAULT    14 _ZN4llvm10GCNTTIImpl18getVectorInstrCostEjPNS_4TypeENS_19TargetTransformInfo14TargetCostKindEjPNS_5ValueES6_.part.0.constprop.0.isra.0

So if new suffixes in clang won't affect functionality, then it should be okay for clang as well to allow multiple suffixes.

Please let me know what you think.

@teresajohnson
Copy link
Contributor

I am currently OOO so added a couple reviewers familiar with SamplePGO and other profile matching (e.g. memprof) that might be affected.

@teresajohnson
Copy link
Contributor

@teresajohnson @arsenm I tried sampling based profiling with one of bpf applications and it looks like the added suffixes are not affecting sampling based profiling. See the details in the above. I also marked the patch as RFC as you suggested.

Regarding that analysis, I just want to clarify: you are showing that the profiled binary has the new suffixes in its symbol table, but that the dwarf data for the same binary does not have the new suffixes, and that llvm-profgen will construct the profile from the dwarf so not contain the suffixes? I am not very familiar with llvm-profgen so defer to @huangjd. It would be good to confirm with a round trip through the feedback path that things work as expected.

For func suffixes (or more than one suffixes), gcc already has precedences. The below are some examples when build clang with gcc:

$ llvm-readelf -s clang | grep isra | grep constprop.
...
135408: 00000000061d7140  1529 FUNC    LOCAL  DEFAULT    14 _ZN4llvm4yaml7yamlizeISt6vectorIN12_GLOBAL__N_15ParamESaIS4_EENS0_12EmptyContextEEENSt9enable_ifIXsrNS0_18has_SequenceTraitsIT_EE5valueEvE4typeERNS0_2IOERSA_bRT0_.constprop.0.isra.0
135415: 00000000061de7b0  6558 FUNC    LOCAL  DEFAULT    14 _ZN4llvm4yaml7yamlizeISt6vectorIN12_GLOBAL__N_15ClassESaIS4_EENS0_12EmptyContextEEENSt9enable_ifIXsrNS0_18has_SequenceTraitsIT_EE5valueEvE4typeERNS0_2IOERSA_bRT0_.constprop.0.isra.0
135416: 00000000061e0150  1755 FUNC    LOCAL  DEFAULT    14 _ZN4llvm4yaml7yamlizeISt6vectorIN12_GLOBAL__N_18FunctionESaIS4_EENS0_12EmptyContextEEENSt9enable_ifIXsrNS0_18has_SequenceTraitsIT_EE5valueEvE4typeERNS0_2IOERSA_bRT0_.constprop.0.isra.0
135417: 00000000061e0830  5534 FUNC    LOCAL  DEFAULT    14 _ZN4llvm4yaml7yamlizeISt6vectorIN12_GLOBAL__N_13TagESaIS4_EENS0_12EmptyContextEEENSt9enable_ifIXsrNS0_18has_SequenceTraitsIT_EE5valueEvE4typeERNS0_2IOERSA_bRT0_.constprop.0.isra.0
135419: 00000000061e3ab0  1502 FUNC    LOCAL  DEFAULT    14 _ZN4llvm4yaml7yamlizeISt6vectorIN12_GLOBAL__N_19VersionedESaIS4_EENS0_12EmptyContextEEENSt9enable_ifIXsrNS0_18has_SequenceTraitsIT_EE5valueEvE4typeERNS0_2IOERSA_bRT0_.constprop.0.isra.0

And there are even some cases having three suffixes:

$ llvm-readelf -s clang | grep isra | grep constprop | grep part
  5663: 000000000147dd70  1041 FUNC    LOCAL  DEFAULT    14 _ZN4llvm10GCNTTIImpl18getVectorInstrCostEjPNS_4TypeENS_19TargetTransformInfo14TargetCostKindEjPNS_5ValueES6_.part.0.constprop.1.isra.0
  5664: 000000000147e190  1041 FUNC    LOCAL  DEFAULT    14 _ZN4llvm10GCNTTIImpl18getVectorInstrCostEjPNS_4TypeENS_19TargetTransformInfo14TargetCostKindEjPNS_5ValueES6_.part.0.constprop.0.isra.0

So if new suffixes in clang won't affect functionality, then it should be okay for clang as well to allow multiple suffixes.

Please let me know what you think.

Except we don't tend to feed back profiles collected from gcc built binaries to clang for SamplePGO, etc, so we need to ensure it will still work in clang.

@teresajohnson
Copy link
Contributor

I recommend having a RFC for this. First, names are important in a number of scenarios, currently - @xur-llvm can detail cases where the linux kernel wouldn't build because of name suffixes.

Full LTO already have lots of suffixes, how profiling handle this? Yes, I would like to know more about this and I think we should resolve it. gcc has suffixes and gcc has the same problem?

That's a good question, we don't tend to use Full LTO so I don't know in practice

@yonghong-song
Copy link
Contributor Author

I am currently OOO so added a couple reviewers familiar with SamplePGO and other profile matching (e.g. memprof) that might be affected.

Sound good to me. I will double check memprof as well.

@snehasish
Copy link
Contributor

There are downstream (internal) usages which rely on function names as they exist in the symbol table. Memprof relies on the dwarf linkage name so it's usage is similar to llvm-profgen. In the past, suffixes after a period in the symbol were meant to be interpreted as clones of the original function, however it is not well defined.

Personally, I'm not in favour of overloading the existing usage by appending suffixes to indicate the optimizations performed. This seems to be a hacky approach and instead we should adopt a more formal mechanism to communicate such information to external tools. As @mtrofin suggested it would be a good to start an RFC discussion on discourse to gather the different usages from a broader set of users than those on this patch before proceeding.

Since not all suffixes are the same, key considerations for the discourse RFC could be -

  • when the symbol name changes
    clang backend e.g. -funiq-internal-linkage-name
    LLVM IR - function specialization
    ThinLTO promotion
    Backend - Propeller / FDO based function splitting
  • what do the suffixes imply (clones, modifications, parts etc)
  • how should debuggers and profiling tools treat the symbols

Some of these were discussed piecemeal in #105742 however a broader discussion could be beneficial. What do you think?

@mtrofin
Copy link
Member

mtrofin commented Oct 8, 2024

I recommend having a RFC for this. First, names are important in a number of scenarios, currently - @xur-llvm can detail cases where the linux kernel wouldn't build because of name suffixes.

Full LTO already have lots of suffixes, how profiling handle this? Yes, I would like to know more about this and I think we should resolve it. gcc has suffixes and gcc has the same problem?

Second, I'd like to take a step back and understand alternatives (for which a more detailed description of the scenario, in a RFC, would be a good/necessary idea). For example, and in the absence of more information, I wonder why not leave the names use function level metadata, and save it into a section in the binary?

A lot of discussion already in #105742. Ultimately, what we want is the precise func signature for every func. What you proposed is okay, save func -> signature in a section of the binary. I am wondering how this can be done.

What seems to be missing in those discussions, and I'd hope to see more spelled out in a RFC, is the user scenario: why does the function name not reflecting argument changes matter, what user scenario breaks?

@mtrofin mtrofin self-requested a review October 8, 2024 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants