Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstrPGO] Instrument sampling profile based cold function #109837

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

wlei-llvm
Copy link
Contributor

@wlei-llvm wlei-llvm commented Sep 24, 2024

This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead.

More details about the implementation and flags:

  • Added a flag --instrument-cold-function-coverage in PGOInstrumentation.cpp as the main switch to control skipping the instrumentation.
  • Built the extra instrumentation passes(a bundle of passes in addPGOInstrPasses) under sampling PGO pipeline. This is controlled by --instrument-sample-cold-function-path flag.
  • Added a driver flag -fprofile-generate-cold-function-coverage:
      1. Config the flags in one place, i,e. adding --instrument-sample-cold-function-path=<...> and --instrument-cold-function-coverage, and --pgo-function-entry-coverage. Note that the instrumentation file path is passed through --instrument-sample-cold-function-path, because we cannot use the PGOOptions.ProfileFile as it's already used by -fprofile-sample-use=<...>.
      1. makes linker to link compiler_rt.profile lib(see ToolChain.cpp#L1125-L1131 ).
  • Added a flag(--cold-function-coverage-max-entry-count) to config entry count to determine cold function.

Overall, the full command is like:

clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...>  code.cc -o code

Copy link

github-actions bot commented Sep 24, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@wlei-llvm wlei-llvm marked this pull request as ready for review October 1, 2024 17:46
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' PGO Profile Guided Optimizations llvm:transforms labels Oct 1, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 1, 2024

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-pgo

Author: Lei Wang (wlei-llvm)

Changes

This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead.

More details about the implementation and flags:

  • Added a flag --instrument-cold-function-coverage in PGOInstrumentation.cpp as the main switch to control skipping the instrumentation.
  • Built the extra instrumentation passes(a bundle of passes in addPGOInstrPasses) under sampling PGO pipeline. This is controlled by --instrument-sample-cold-function-path flag.
  • Added a driver flag -fprofile-generate-cold-function-coverage:
      1. Config the flags in one place, i,e. adding --instrument-sample-cold-function-path=&lt;...&gt; and --instrument-cold-function-coverage, and --pgo-function-entry-coverage. Note that the instrumentation file path is passed through --instrument-sample-cold-function-path, because we cannot use the PGOOptions.ProfileFile as it's already used by -fprofile-sample-use=&lt;...&gt;.
      1. makes linker to link compiler_rt.profile lib(see ToolChain.cpp#L1125-L1131 ).
  • Added a flag(--cold-function-coverage-max-entry-count) to config entry count to determine cold function.

Overall, the full command is like:

clang++ -O2 -fprofile-generate-cold-function-coverage=&lt;...&gt; -fprofile-sample-use=&lt;...&gt;  code.cc -o code

Full diff: https://github.com/llvm/llvm-project/pull/109837.diff

8 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+6)
  • (modified) clang/lib/Driver/ToolChain.cpp (+3-1)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+18)
  • (added) clang/test/CodeGen/Inputs/pgo-cold-func.prof (+2)
  • (added) clang/test/CodeGen/pgo-cold-function-coverage.c (+12)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+18)
  • (modified) llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (+16)
  • (added) llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll (+24)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 002f60350543d9..6bb92427f2d53f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1784,6 +1784,12 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling",
   PosFlag<SetTrue, [], [ClangOption, CC1Option],
           "Emit extra debug info to make sample profile more accurate">,
   NegFlag<SetFalse>>;
+def fprofile_generate_cold_function_coverage : Flag<["-"], "fprofile-generate-cold-function-coverage">, 
+    Group<f_Group>, Visibility<[ClangOption, CLOption]>,
+    HelpText<"Generate instrumented code to cold functions into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
+def fprofile_generate_cold_function_coverage_EQ : Joined<["-"], "fprofile-generate-cold-function-coverage=">, 
+    Group<f_Group>, Visibility<[ClangOption, CLOption]>, MetaVarName<"<directory>">,
+    HelpText<"Generate instrumented code to cold functions into <directory>/default.profraw (overridden by LLVM_PROFILE_FILE env var)">; 
 def fprofile_instr_generate : Flag<["-"], "fprofile-instr-generate">,
     Group<f_Group>, Visibility<[ClangOption, CLOption]>,
     HelpText<"Generate instrumented code to collect execution counts into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 16f9b629fc538c..e56db150c6b47e 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -889,7 +889,9 @@ bool ToolChain::needsProfileRT(const ArgList &Args) {
          Args.hasArg(options::OPT_fprofile_instr_generate) ||
          Args.hasArg(options::OPT_fprofile_instr_generate_EQ) ||
          Args.hasArg(options::OPT_fcreate_profile) ||
-         Args.hasArg(options::OPT_forder_file_instrumentation);
+         Args.hasArg(options::OPT_forder_file_instrumentation) ||
+         Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage) ||
+         Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage_EQ);
 }
 
 bool ToolChain::needsGCovInstrumentation(const llvm::opt::ArgList &Args) {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 0bab48caf1a5e2..00602a08232ba2 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -649,6 +649,24 @@ static void addPGOAndCoverageFlags(const ToolChain &TC, Compilation &C,
     }
   }
 
+  if (auto *ColdFuncCoverageArg = Args.getLastArg(
+          options::OPT_fprofile_generate_cold_function_coverage,
+          options::OPT_fprofile_generate_cold_function_coverage_EQ)) {
+    SmallString<128> Path(
+        ColdFuncCoverageArg->getOption().matches(
+            options::OPT_fprofile_generate_cold_function_coverage_EQ)
+            ? ColdFuncCoverageArg->getValue()
+            : "");
+    llvm::sys::path::append(Path, "default_%m.profraw");
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back(Args.MakeArgString(
+        Twine("--instrument-sample-cold-function-path=") + Path));
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back("--instrument-cold-function-coverage");
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back("--pgo-function-entry-coverage");
+  }
+
   Arg *PGOGenArg = nullptr;
   if (PGOGenerateArg) {
     assert(!CSPGOGenerateArg);
diff --git a/clang/test/CodeGen/Inputs/pgo-cold-func.prof b/clang/test/CodeGen/Inputs/pgo-cold-func.prof
new file mode 100644
index 00000000000000..e50be02e0a8545
--- /dev/null
+++ b/clang/test/CodeGen/Inputs/pgo-cold-func.prof
@@ -0,0 +1,2 @@
+foo:1:1
+ 1: 1
diff --git a/clang/test/CodeGen/pgo-cold-function-coverage.c b/clang/test/CodeGen/pgo-cold-function-coverage.c
new file mode 100644
index 00000000000000..0d2767c022b9f1
--- /dev/null
+++ b/clang/test/CodeGen/pgo-cold-function-coverage.c
@@ -0,0 +1,12 @@
+// Test -fprofile-generate-cold-function-coverage 
+// RUN: %clang -O2 -fprofile-generate-cold-function-coverage=/xxx/yyy/ -fprofile-sample-accurate -fprofile-sample-use=%S/Inputs/pgo-cold-func.prof  -S -emit-llvm -o - %s | FileCheck %s
+
+// CHECK: @__llvm_profile_filename = {{.*}} c"/xxx/yyy/default_%m.profraw\00"
+
+// CHECK: @__profc_bar
+// CHECK-NOT: @__profc_foo
+
+int bar(int x) { return x;}
+int foo(int x) { 
+    return x;
+}
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 8f151a99b11709..f75abe0c7c7649 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -296,7 +296,13 @@ static cl::opt<bool> UseLoopVersioningLICM(
     "enable-loop-versioning-licm", cl::init(false), cl::Hidden,
     cl::desc("Enable the experimental Loop Versioning LICM pass"));
 
+static cl::opt<std::string> InstrumentSampleColdFuncPath(
+    "instrument-sample-cold-function-path", cl::init(""),
+    cl::desc("File path for instrumenting sampling PGO guided cold functions"),
+    cl::Hidden);
+
 extern cl::opt<std::string> UseCtxProfile;
+extern cl::opt<bool> InstrumentColdFunctionCoverage;
 
 namespace llvm {
 extern cl::opt<bool> EnableMemProfContextDisambiguation;
@@ -1119,6 +1125,18 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
       // removed.
       MPM.addPass(
           PGOIndirectCallPromotion(true /* IsInLTO */, true /* SamplePGO */));
+
+    if (InstrumentSampleColdFuncPath.getNumOccurrences() &&
+        Phase != ThinOrFullLTOPhase::ThinLTOPostLink) {
+      assert(!InstrumentSampleColdFuncPath.empty() &&
+             "File path is requeired for instrumentation generation");
+      InstrumentColdFunctionCoverage = true;
+      addPreInlinerPasses(MPM, Level, Phase);
+      addPGOInstrPasses(MPM, Level, /* RunProfileGen */ true,
+                        /* IsCS */ false, /* AtomicCounterUpdate */ false,
+                        InstrumentSampleColdFuncPath, "",
+                        IntrusiveRefCntPtr<vfs::FileSystem>());
+    }
   }
 
   // Try to perform OpenMP specific optimizations on the module. This is a
diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
index 10442fa0bb9003..890ad853d86461 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
@@ -319,6 +319,18 @@ static cl::opt<unsigned> PGOFunctionCriticalEdgeThreshold(
     cl::desc("Do not instrument functions with the number of critical edges "
              " greater than this threshold."));
 
+cl::opt<bool> InstrumentColdFunctionCoverage(
+    "instrument-cold-function-coverage", cl::init(false), cl::Hidden,
+    cl::desc("Enable cold function coverage instrumentation (currently only "
+             "used under sampling "
+             " PGO pipeline))"));
+
+static cl::opt<uint64_t> ColdFuncCoverageMaxEntryCount(
+    "cold-function-coverage-max-entry-count", cl::init(0), cl::Hidden,
+    cl::desc("When enabling cold function coverage instrumentation, skip "
+             "instrumenting the "
+             "function whose entry count is above the given value"));
+
 extern cl::opt<unsigned> MaxNumVTableAnnotations;
 
 namespace llvm {
@@ -1891,6 +1903,10 @@ static bool skipPGOGen(const Function &F) {
     return true;
   if (F.getInstructionCount() < PGOFunctionSizeThreshold)
     return true;
+  if (InstrumentColdFunctionCoverage &&
+      (!F.getEntryCount() ||
+       F.getEntryCount()->getCount() > ColdFuncCoverageMaxEntryCount))
+    return true;
   return false;
 }
 
diff --git a/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll b/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll
new file mode 100644
index 00000000000000..ba1a11f7184356
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll
@@ -0,0 +1,24 @@
+; RUN: opt < %s  --passes=pgo-instr-gen -instrument-cold-function-coverage -S  | FileCheck --check-prefixes=COLD %s
+; RUN: opt < %s  --passes=pgo-instr-gen -instrument-cold-function-coverage -cold-function-coverage-max-entry-count=1 -S  | FileCheck --check-prefixes=ENTRY-COUNT %s
+
+; COLD: call void @llvm.instrprof.increment(ptr @__profn_foo, i64  [[#]], i32 1, i32 0)
+; COLD-NOT: __profn_main
+
+; ENTRY-COUNT: call void @llvm.instrprof.increment(ptr @__profn_foo, i64  [[#]], i32 1, i32 0)
+; ENTRY-COUNT: call void @llvm.instrprof.increment(ptr @__profn_main, i64 [[#]], i32 1, i32 0)
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @foo() !prof !0 {
+entry:
+  ret void
+}
+
+define i32 @main() !prof !1 {
+entry:
+  ret i32 0
+}
+
+!0 = !{!"function_entry_count", i64 0}
+!1 = !{!"function_entry_count", i64 1}

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 1, 2024

@llvm/pr-subscribers-clang

Author: Lei Wang (wlei-llvm)

Changes

This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead.

More details about the implementation and flags:

  • Added a flag --instrument-cold-function-coverage in PGOInstrumentation.cpp as the main switch to control skipping the instrumentation.
  • Built the extra instrumentation passes(a bundle of passes in addPGOInstrPasses) under sampling PGO pipeline. This is controlled by --instrument-sample-cold-function-path flag.
  • Added a driver flag -fprofile-generate-cold-function-coverage:
      1. Config the flags in one place, i,e. adding --instrument-sample-cold-function-path=&lt;...&gt; and --instrument-cold-function-coverage, and --pgo-function-entry-coverage. Note that the instrumentation file path is passed through --instrument-sample-cold-function-path, because we cannot use the PGOOptions.ProfileFile as it's already used by -fprofile-sample-use=&lt;...&gt;.
      1. makes linker to link compiler_rt.profile lib(see ToolChain.cpp#L1125-L1131 ).
  • Added a flag(--cold-function-coverage-max-entry-count) to config entry count to determine cold function.

Overall, the full command is like:

clang++ -O2 -fprofile-generate-cold-function-coverage=&lt;...&gt; -fprofile-sample-use=&lt;...&gt;  code.cc -o code

Full diff: https://github.com/llvm/llvm-project/pull/109837.diff

8 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+6)
  • (modified) clang/lib/Driver/ToolChain.cpp (+3-1)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+18)
  • (added) clang/test/CodeGen/Inputs/pgo-cold-func.prof (+2)
  • (added) clang/test/CodeGen/pgo-cold-function-coverage.c (+12)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+18)
  • (modified) llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (+16)
  • (added) llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll (+24)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 002f60350543d9..6bb92427f2d53f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1784,6 +1784,12 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling",
   PosFlag<SetTrue, [], [ClangOption, CC1Option],
           "Emit extra debug info to make sample profile more accurate">,
   NegFlag<SetFalse>>;
+def fprofile_generate_cold_function_coverage : Flag<["-"], "fprofile-generate-cold-function-coverage">, 
+    Group<f_Group>, Visibility<[ClangOption, CLOption]>,
+    HelpText<"Generate instrumented code to cold functions into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
+def fprofile_generate_cold_function_coverage_EQ : Joined<["-"], "fprofile-generate-cold-function-coverage=">, 
+    Group<f_Group>, Visibility<[ClangOption, CLOption]>, MetaVarName<"<directory>">,
+    HelpText<"Generate instrumented code to cold functions into <directory>/default.profraw (overridden by LLVM_PROFILE_FILE env var)">; 
 def fprofile_instr_generate : Flag<["-"], "fprofile-instr-generate">,
     Group<f_Group>, Visibility<[ClangOption, CLOption]>,
     HelpText<"Generate instrumented code to collect execution counts into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 16f9b629fc538c..e56db150c6b47e 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -889,7 +889,9 @@ bool ToolChain::needsProfileRT(const ArgList &Args) {
          Args.hasArg(options::OPT_fprofile_instr_generate) ||
          Args.hasArg(options::OPT_fprofile_instr_generate_EQ) ||
          Args.hasArg(options::OPT_fcreate_profile) ||
-         Args.hasArg(options::OPT_forder_file_instrumentation);
+         Args.hasArg(options::OPT_forder_file_instrumentation) ||
+         Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage) ||
+         Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage_EQ);
 }
 
 bool ToolChain::needsGCovInstrumentation(const llvm::opt::ArgList &Args) {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 0bab48caf1a5e2..00602a08232ba2 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -649,6 +649,24 @@ static void addPGOAndCoverageFlags(const ToolChain &TC, Compilation &C,
     }
   }
 
+  if (auto *ColdFuncCoverageArg = Args.getLastArg(
+          options::OPT_fprofile_generate_cold_function_coverage,
+          options::OPT_fprofile_generate_cold_function_coverage_EQ)) {
+    SmallString<128> Path(
+        ColdFuncCoverageArg->getOption().matches(
+            options::OPT_fprofile_generate_cold_function_coverage_EQ)
+            ? ColdFuncCoverageArg->getValue()
+            : "");
+    llvm::sys::path::append(Path, "default_%m.profraw");
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back(Args.MakeArgString(
+        Twine("--instrument-sample-cold-function-path=") + Path));
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back("--instrument-cold-function-coverage");
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back("--pgo-function-entry-coverage");
+  }
+
   Arg *PGOGenArg = nullptr;
   if (PGOGenerateArg) {
     assert(!CSPGOGenerateArg);
diff --git a/clang/test/CodeGen/Inputs/pgo-cold-func.prof b/clang/test/CodeGen/Inputs/pgo-cold-func.prof
new file mode 100644
index 00000000000000..e50be02e0a8545
--- /dev/null
+++ b/clang/test/CodeGen/Inputs/pgo-cold-func.prof
@@ -0,0 +1,2 @@
+foo:1:1
+ 1: 1
diff --git a/clang/test/CodeGen/pgo-cold-function-coverage.c b/clang/test/CodeGen/pgo-cold-function-coverage.c
new file mode 100644
index 00000000000000..0d2767c022b9f1
--- /dev/null
+++ b/clang/test/CodeGen/pgo-cold-function-coverage.c
@@ -0,0 +1,12 @@
+// Test -fprofile-generate-cold-function-coverage 
+// RUN: %clang -O2 -fprofile-generate-cold-function-coverage=/xxx/yyy/ -fprofile-sample-accurate -fprofile-sample-use=%S/Inputs/pgo-cold-func.prof  -S -emit-llvm -o - %s | FileCheck %s
+
+// CHECK: @__llvm_profile_filename = {{.*}} c"/xxx/yyy/default_%m.profraw\00"
+
+// CHECK: @__profc_bar
+// CHECK-NOT: @__profc_foo
+
+int bar(int x) { return x;}
+int foo(int x) { 
+    return x;
+}
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 8f151a99b11709..f75abe0c7c7649 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -296,7 +296,13 @@ static cl::opt<bool> UseLoopVersioningLICM(
     "enable-loop-versioning-licm", cl::init(false), cl::Hidden,
     cl::desc("Enable the experimental Loop Versioning LICM pass"));
 
+static cl::opt<std::string> InstrumentSampleColdFuncPath(
+    "instrument-sample-cold-function-path", cl::init(""),
+    cl::desc("File path for instrumenting sampling PGO guided cold functions"),
+    cl::Hidden);
+
 extern cl::opt<std::string> UseCtxProfile;
+extern cl::opt<bool> InstrumentColdFunctionCoverage;
 
 namespace llvm {
 extern cl::opt<bool> EnableMemProfContextDisambiguation;
@@ -1119,6 +1125,18 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
       // removed.
       MPM.addPass(
           PGOIndirectCallPromotion(true /* IsInLTO */, true /* SamplePGO */));
+
+    if (InstrumentSampleColdFuncPath.getNumOccurrences() &&
+        Phase != ThinOrFullLTOPhase::ThinLTOPostLink) {
+      assert(!InstrumentSampleColdFuncPath.empty() &&
+             "File path is requeired for instrumentation generation");
+      InstrumentColdFunctionCoverage = true;
+      addPreInlinerPasses(MPM, Level, Phase);
+      addPGOInstrPasses(MPM, Level, /* RunProfileGen */ true,
+                        /* IsCS */ false, /* AtomicCounterUpdate */ false,
+                        InstrumentSampleColdFuncPath, "",
+                        IntrusiveRefCntPtr<vfs::FileSystem>());
+    }
   }
 
   // Try to perform OpenMP specific optimizations on the module. This is a
diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
index 10442fa0bb9003..890ad853d86461 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
@@ -319,6 +319,18 @@ static cl::opt<unsigned> PGOFunctionCriticalEdgeThreshold(
     cl::desc("Do not instrument functions with the number of critical edges "
              " greater than this threshold."));
 
+cl::opt<bool> InstrumentColdFunctionCoverage(
+    "instrument-cold-function-coverage", cl::init(false), cl::Hidden,
+    cl::desc("Enable cold function coverage instrumentation (currently only "
+             "used under sampling "
+             " PGO pipeline))"));
+
+static cl::opt<uint64_t> ColdFuncCoverageMaxEntryCount(
+    "cold-function-coverage-max-entry-count", cl::init(0), cl::Hidden,
+    cl::desc("When enabling cold function coverage instrumentation, skip "
+             "instrumenting the "
+             "function whose entry count is above the given value"));
+
 extern cl::opt<unsigned> MaxNumVTableAnnotations;
 
 namespace llvm {
@@ -1891,6 +1903,10 @@ static bool skipPGOGen(const Function &F) {
     return true;
   if (F.getInstructionCount() < PGOFunctionSizeThreshold)
     return true;
+  if (InstrumentColdFunctionCoverage &&
+      (!F.getEntryCount() ||
+       F.getEntryCount()->getCount() > ColdFuncCoverageMaxEntryCount))
+    return true;
   return false;
 }
 
diff --git a/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll b/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll
new file mode 100644
index 00000000000000..ba1a11f7184356
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll
@@ -0,0 +1,24 @@
+; RUN: opt < %s  --passes=pgo-instr-gen -instrument-cold-function-coverage -S  | FileCheck --check-prefixes=COLD %s
+; RUN: opt < %s  --passes=pgo-instr-gen -instrument-cold-function-coverage -cold-function-coverage-max-entry-count=1 -S  | FileCheck --check-prefixes=ENTRY-COUNT %s
+
+; COLD: call void @llvm.instrprof.increment(ptr @__profn_foo, i64  [[#]], i32 1, i32 0)
+; COLD-NOT: __profn_main
+
+; ENTRY-COUNT: call void @llvm.instrprof.increment(ptr @__profn_foo, i64  [[#]], i32 1, i32 0)
+; ENTRY-COUNT: call void @llvm.instrprof.increment(ptr @__profn_main, i64 [[#]], i32 1, i32 0)
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @foo() !prof !0 {
+entry:
+  ret void
+}
+
+define i32 @main() !prof !1 {
+entry:
+  ret i32 0
+}
+
+!0 = !{!"function_entry_count", i64 0}
+!1 = !{!"function_entry_count", i64 1}

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp Outdated Show resolved Hide resolved
@@ -1784,6 +1784,12 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling",
PosFlag<SetTrue, [], [ClangOption, CC1Option],
"Emit extra debug info to make sample profile more accurate">,
NegFlag<SetFalse>>;
def fprofile_generate_cold_function_coverage : Flag<["-"], "fprofile-generate-cold-function-coverage">,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to expose this through the clang frontend?

Copy link
Contributor Author

@wlei-llvm wlei-llvm Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if my PR description is not clear. Note that there is no use for -fprofile-generate and -fprofile-instr-generate here, so a driver flag here is to configure the instr file path and make linker to link the compiler_rt.profile object files (just similar to -fprofile-[instr]-generate=).

The reason for not using -fprofile-[instr]-generate is because those flags conflict with -fprofile-sample-use, see PGOOptions, ProfileFile is a shared file path which means the two flags are actually mutually exclusive.

Another option is to make -fprofile-generate compatible with -fprofile-samle-use(like refactoring PGOOptions or adding another flag to configure the file path things), this seems to me they are more complicated than the current one. But I’m open to any suggestions on this.

Copy link
Member

@mtrofin mtrofin Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, why not just use clang ... -mllvm -instrument-cold-function-coverage? Is this a clang - only feature (i.e. rust can't use it?) Is it just for symmetry with the current PGO flags?

(This is not a pushback, mainly curious. Also the patch would be quite smaller if you didn't need to pass through the flags from clang to llvm)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, why not just use clang ... -mllvm -instrument-cold-function-coverage? Is this a clang - only feature (i.e. rust can't use it?) Is it just for symmetry with the current PGO flags?

(This is not a pushback, mainly curious. Also the patch would be quite smaller if you didn't need to pass through the flags from clang to llvm)

I see, thanks for the suggestion! We also need to link runtime lib(compiler_rt.profile.a)
but yeah, I agree it's possible to pass directly to llvm and linker without through clang. Then the full command line would be like

clang ... -mllvm -instrument-cold-function-coverage -mllvm -instrument-sample-cold-function-path=<file-path> -mllvm --pgo-function-entry-coverage 
ld.lld ... --u__llvm_runtime_variable  .../lib/x86_64-unknown-linux-gnu/libclang_rt.profile.a

So yes, adding the clang driver flag is kind of to mirror current IRPGO flags, for easy maintenance purpose(given that -fprofile-generate doesn't work with -fprofile-sample-use) and also to centralize the configuration for the convenience. IMO, the compiler_rt is probably the main blocker here, I didn't find an easy way to bundle it with a llvm flag.

Appreciate any further suggestions!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would -Wl,-lclang_rt.profile work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also to centralize the configuration for the convenience

+1 I think a frontend flag is useful. It allows us to identify incompatibilities early and provide clear error messages instead of more obscure failures down the line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would incompatibilities and ability to provide useful error messages be dependent on the flag being in the frontend? Is it that the backend can't actually reliably point the finger at the specific flag causing the conflict, so a message would be diluted to "sampling won't work with this"?

(probably a subject for a different forum tho)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would -Wl,-lclang_rt.profile work?

Got it, I think it should work, except it requires another linker flag: the --u__llvm_runtime_variable, we can configure it to linker too, but basically those instr PGO flags control addProfileRTLibs (which seems not equivalent to -Wl,-lclang_rt.profile), I just wanted to mirror those flags so that we don't need to maintain if anything changes to addProfileRTLibs in the future. (Though I feel this code should be very stable, should not be a big problem, so mainly for the convenience for compiler user to use one flag instead of using/maintaining multiple flags )
Overall, I agree that it's feasible to do it without clang flag. I'm not very familiar with the convention for adding driver flags, so if you think this doesn't justify it, I will drop it from this patch. Thanks for the discussion!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that the backend can't actually reliably point the finger at the specific flag causing the conflict

Yes, if we know that this instrumentation mode should not be mixed with e.g. sanitizers or something else we can enforce these checks early. I don't see a particular downside to adding a frontend flag. The convenience of bundling the 2 lld flags and 3 mllvm flags into a single frontend flag seems like a good enough motivation to do so.

@snehasish
Copy link
Contributor

The major motivation is to detect dead functions for the services that are optimized with sampling PGO.

For your use case, can you use ProfileSymbolList to identify very cold functions (executed but unsampled) or are you indeed looking for functions that are never executed?

Will this be used to guide developers with diagnostics or more aggressive compiler driven optimizations?

@ellishg
Copy link
Contributor

ellishg commented Oct 1, 2024

Why not use the existing -pgo-function-entry-coverage (https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113/14?u=ellishg) LLVM flag? It takes advantage of the llvm.instrprof.cover intrinsic which has less size and runtime overhead than llvm.instrprof.increment. It also supports IRPGO and CSIRPGO while it seems that this PR requires a sample PGO input.

@wlei-llvm
Copy link
Contributor Author

For your use case, can you use ProfileSymbolList to identify very cold functions (executed but unsampled) or are you indeed looking for functions that are never executed?

We are indeed focusing on functions that are never executed. The ProfileSymbolList is already used to identify functions with a “0” entry count, which is the target of the instrumentation in this work. We then aim to further distinguish the dead functions from them.

Will this be used to guide developers with diagnostics or more aggressive compiler driven optimizations?

We haven’t considered to use them for any compiler optimization. The primary goal is to improve the developer experience, just to take the general benefits from removing dead code: improving readability, good codebase maintainability, reducing build time, etc. Another potential use might be to verify the sampling PGO profile coverage/quality, say to check if we missed any functions that are actually hot.

@wlei-llvm
Copy link
Contributor Author

Thanks for the context!

Why not use the existing -pgo-function-entry-coverage (https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113/14?u=ellishg) LLVM flag? It takes advantage of the llvm.instrprof.cover intrinsic which has less size and runtime overhead than llvm.instrprof.increment.

We do use the -pgo-function-entry-coverage in this PR, see here. but furthermore, we skip instrumenting the functions that are covered by sampling PGO profile.

It also supports IRPGO and CSIRPGO while it seems that this PR requires a sample PGO input.

Yeah, as I commented above, unfortunately currently the IRPGO main flag is incompatible with the Sampling PGO flag(also a bit diverged for the pipeline passes), that's why we have to add extra instr passes under sampling PGO pipeline and added a new driver flag.

@ellishg
Copy link
Contributor

ellishg commented Oct 1, 2024

Why not use the existing -pgo-function-entry-coverage (https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113/14?u=ellishg) LLVM flag? It takes advantage of the llvm.instrprof.cover intrinsic which has less size and runtime overhead than llvm.instrprof.increment.

We do use the -pgo-function-entry-coverage in this PR, see here. but furthermore, we skip instrumenting the functions that are covered by sampling PGO profile.

Oh, I missed that and your earlier comment. Makes more sense, thanks!

clang/test/CodeGen/pgo-cold-function-coverage.c Outdated Show resolved Hide resolved
clang/lib/Driver/ToolChains/Clang.cpp Outdated Show resolved Hide resolved
; RUN: opt < %s --passes=pgo-instr-gen -instrument-cold-function-coverage -S | FileCheck --check-prefixes=COLD %s
; RUN: opt < %s --passes=pgo-instr-gen -instrument-cold-function-coverage -cold-function-coverage-max-entry-count=1 -S | FileCheck --check-prefixes=ENTRY-COUNT %s

; COLD: call void @llvm.instrprof.increment(ptr @__profn_foo, i64 [[#]], i32 1, i32 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to support -instrument-cold-function-coverage without -pgo-function-entry-coverage? If not, I think we should add the flag here so we get the llvm.instrprof.cover intrinsic to more closely align with expected behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!(it should still work without -pgo-function-entry-coverage, but agree to test the most common case)

Comment on lines 1129 to 1139
if (InstrumentSampleColdFuncPath.getNumOccurrences() &&
Phase != ThinOrFullLTOPhase::ThinLTOPostLink) {
assert(!InstrumentSampleColdFuncPath.empty() &&
"File path is requeired for instrumentation generation");
InstrumentColdFunctionCoverage = true;
addPreInlinerPasses(MPM, Level, Phase);
addPGOInstrPasses(MPM, Level, /* RunProfileGen */ true,
/* IsCS */ false, /* AtomicCounterUpdate */ false,
InstrumentSampleColdFuncPath, "",
IntrusiveRefCntPtr<vfs::FileSystem>());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at this a bit closer, I'm not sure why it needs to be tied to closely to SamplePGO. Couldn't we move this out of the LoadSampleProfile and move it after IsPGOInstrUse/IsCtxProfUse? That way we can use IRPGO, CSIRPGO, and SamplePGO profile counts to block instrumentation hot functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at this a bit closer, I'm not sure why it needs to be tied to closely to SamplePGO. Couldn't we move this out of the LoadSampleProfile and move it after IsPGOInstrUse/IsCtxProfUse? That way we can use IRPGO, CSIRPGO, and SamplePGO profile counts to block instrumentation hot functions.

Ah, good point, moved them closer to other IRPGO passes. Thanks for the suggestion!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Now I think this should be independent of SamplePGO. Can we remove "sample" from the flag and variable names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean the sample in InstrumentSampleColdFuncPathflag or all other variables(like IsSampleColdFuncCovGen)? I thought something like:

if (IsSampleColdFuncCovGen || IsXXXColdFuncCovGen) {
     addPGOInstrPasses(.., .., InstrumentColdFuncCoveragePath, .. )
}

InstrumentColdFuncCoveragePath would be a general flag that can be used for all cold function coverage case. but currently IsSampleColdFuncCovGen only represents for sampling PGO cold func. And If we want to extend it in future, then we can add more bool flag IsXXXColdFuncCovGen...

I also added an assertion:

  assert((InstrumentColdFuncCoveragePath.empty() || LoadSampleProfile) &&
         "Sampling-based cold functon coverage is not supported without "
         "providing sampling PGO profile");

Seems I have to remove that if we want it to be general. (just curious) do you plan to extend it for your case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how there would be any functional difference between using IsSampleColdFuncCovGen and IsXXXColdFuncCovGen, but I could be missing something. If we can have a single flag for all cases, I think that would be cleaner and I can also try to use it for my case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how there would be any functional difference between using IsSampleColdFuncCovGen and IsXXXColdFuncCovGen, but I could be missing something. If we can have a single flag for all cases, I think that would be cleaner and I can also try to use it for my case.

I see, sounds good.

My previous thought is: there could be functional difference for using the -instrument-cold-function-coverage flag with sampling PGO flag(-fprofile-sample-use) vs without sampling PGO flags. With the sampling PGO flag, if a function symbol shows in the binary but not in the profile, we can treat it as cold function(set 0 entry count), we then would instrument those function. But without sampling PGO flag(also no other PGO options), the entry count is unknown(-1), then we don't instrument them.

So user needs to be aware of the combination use of those flags, I was then trying to prevent the misuse, to disallow the use of --instrument-cold-function-coverage without (sampling) PGO use options(use the assertion in the version).

But on a second thought, if we want to extend for other PGO options, say if the profile annotation and instrumentation are not in the same pipeline, it's probably hard to check this locally, we can leave it to the build system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a function symbol shows in the binary but not in the profile, we can treat it as cold function(set 0 entry count)

I wasn't aware that SamplePGO did this. This is indeed different than IRPGO. Perhaps it makes more sense to use a separate flag to control this case. Maybe something like -instrument-cold-function-coverage-mode={optimistic,conservative} where optimistic means we optimistically treat unprofiled functions as cold and conservative means we conservatively assume unprofiled functions is hot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that's more flexible, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants