Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Constrain use LiveMask by the operand's LaneMask for RP calculation. #111452

Closed
wants to merge 2 commits into from

Conversation

jrbyrnes
Copy link
Contributor

@jrbyrnes jrbyrnes commented Oct 7, 2024

For speculative RP queries, recede may calculate inaccurate masks for subreg uses. Previously, the calculation would look at any live lane for the use at the position of the MI in the LIS. This also adds lanes for any subregs which are live at but not used by the instruction. By constraining against the getSubRegIndexLaneMask for the operand's subreg, we are sure to not pick up on these extra lanes.

For current clients of recede, this is not an issue. This is because 1. the current clients do not violate the program order in the LIS, and 2. the change to RP is based on the difference between previous mask and new mask. Since current clients are not exposed to this issue, this patch is sort of NFC.

Co-authored-by: Valery Pykhtin [email protected]

…ulation.

CoAuthor: Valery Pykhtin <[email protected]>

For speculative RP queries, recede may calculate inaccurate masks for subreg uses. Previously, the calcaultion would look at any live lane for the use at the position of the MI in the LIS. This also adds lanes for any subregs which are live at but not used by the instruction. By constraining against the getSubRegIndexLaneMask for the operand's subreg, we are sure to not pick up on these extra lanes.

For current clients of recede, this is not an issue. This is because 1. the current clients do not violate the program order in the LIS, and 2. the change to RP is based on the difference between previous mask and new mask. Since current clients are not exposed to this issue, this patch is sort of NFC.

Change-Id: I63c788785b104ed88297d06e9b2b0dbc0bf6d040
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 7, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Jeffrey Byrnes (jrbyrnes)

Changes

CoAuthor: Valery Pykhtin <[email protected]>

For speculative RP queries, recede may calculate inaccurate masks for subreg uses. Previously, the calcaultion would look at any live lane for the use at the position of the MI in the LIS. This also adds lanes for any subregs which are live at but not used by the instruction. By constraining against the getSubRegIndexLaneMask for the operand's subreg, we are sure to not pick up on these extra lanes.

For current clients of recede, this is not an issue. This is because 1. the current clients do not violate the program order in the LIS, and 2. the change to RP is based on the difference between previous mask and new mask. Since current clients are not exposed to this issue, this patch is sort of NFC.

Change-Id: I63c788785b104ed88297d06e9b2b0dbc0bf6d040


Full diff: https://github.com/llvm/llvm-project/pull/111452.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/GCNRegPressure.cpp (+31-21)
  • (modified) llvm/lib/Target/AMDGPU/GCNRegPressure.h (+5-4)
diff --git a/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp b/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
index cb0624f11592d2..29cf9fcc715bdd 100644
--- a/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
@@ -259,7 +259,8 @@ static void
 collectVirtualRegUses(SmallVectorImpl<RegisterMaskPair> &RegMaskPairs,
                       const MachineInstr &MI, const LiveIntervals &LIS,
                       const MachineRegisterInfo &MRI) {
-  SlotIndex InstrSI;
+
+  auto &TRI = *MRI.getTargetRegisterInfo();
   for (const auto &MO : MI.operands()) {
     if (!MO.isReg() || !MO.getReg().isVirtual())
       continue;
@@ -267,25 +268,31 @@ collectVirtualRegUses(SmallVectorImpl<RegisterMaskPair> &RegMaskPairs,
       continue;
 
     Register Reg = MO.getReg();
-    if (llvm::any_of(RegMaskPairs, [Reg](const RegisterMaskPair &RM) {
-          return RM.RegUnit == Reg;
-        }))
-      continue;
+    auto I = llvm::find_if(RegMaskPairs, [Reg](const RegisterMaskPair &RM) {
+      return RM.RegUnit == Reg;
+    });
+
+    auto &P = I == RegMaskPairs.end()
+                  ? RegMaskPairs.emplace_back(Reg, LaneBitmask::getNone())
+                  : *I;
 
-    LaneBitmask UseMask;
-    auto &LI = LIS.getInterval(Reg);
+    P.LaneMask |= MO.getSubReg() ? TRI.getSubRegIndexLaneMask(MO.getSubReg())
+                                 : MRI.getMaxLaneMaskForVReg(Reg);
+  }
+
+  SlotIndex InstrSI;
+  for (auto &P : RegMaskPairs) {
+    auto &LI = LIS.getInterval(P.RegUnit);
     if (!LI.hasSubRanges())
-      UseMask = MRI.getMaxLaneMaskForVReg(Reg);
-    else {
-      // For a tentative schedule LIS isn't updated yet but livemask should
-      // remain the same on any schedule. Subreg defs can be reordered but they
-      // all must dominate uses anyway.
-      if (!InstrSI)
-        InstrSI = LIS.getInstructionIndex(*MO.getParent()).getBaseIndex();
-      UseMask = getLiveLaneMask(LI, InstrSI, MRI);
-    }
+      continue;
+
+    // For a tentative schedule LIS isn't updated yet but livemask should
+    // remain the same on any schedule. Subreg defs can be reordered but they
+    // all must dominate uses anyway.
+    if (!InstrSI)
+      InstrSI = LIS.getInstructionIndex(MI).getBaseIndex();
 
-    RegMaskPairs.emplace_back(Reg, UseMask);
+    P.LaneMask = getLiveLaneMask(LI, InstrSI, MRI, P.LaneMask);
   }
 }
 
@@ -294,22 +301,25 @@ collectVirtualRegUses(SmallVectorImpl<RegisterMaskPair> &RegMaskPairs,
 
 LaneBitmask llvm::getLiveLaneMask(unsigned Reg, SlotIndex SI,
                                   const LiveIntervals &LIS,
-                                  const MachineRegisterInfo &MRI) {
-  return getLiveLaneMask(LIS.getInterval(Reg), SI, MRI);
+                                  const MachineRegisterInfo &MRI,
+                                  LaneBitmask Mask) {
+  return getLiveLaneMask(LIS.getInterval(Reg), SI, MRI, Mask);
 }
 
 LaneBitmask llvm::getLiveLaneMask(const LiveInterval &LI, SlotIndex SI,
-                                  const MachineRegisterInfo &MRI) {
+                                  const MachineRegisterInfo &MRI,
+                                  LaneBitmask Mask) {
   LaneBitmask LiveMask;
   if (LI.hasSubRanges()) {
     for (const auto &S : LI.subranges())
-      if (S.liveAt(SI)) {
+      if ((S.LaneMask & Mask).any() && S.liveAt(SI)) {
         LiveMask |= S.LaneMask;
         assert(LiveMask == (LiveMask & MRI.getMaxLaneMaskForVReg(LI.reg())));
       }
   } else if (LI.liveAt(SI)) {
     LiveMask = MRI.getMaxLaneMaskForVReg(LI.reg());
   }
+  LiveMask &= Mask;
   return LiveMask;
 }
 
diff --git a/llvm/lib/Target/AMDGPU/GCNRegPressure.h b/llvm/lib/Target/AMDGPU/GCNRegPressure.h
index 54dc1972d27619..e71d6bb4becf8c 100644
--- a/llvm/lib/Target/AMDGPU/GCNRegPressure.h
+++ b/llvm/lib/Target/AMDGPU/GCNRegPressure.h
@@ -265,13 +265,14 @@ class GCNDownwardRPTracker : public GCNRPTracker {
                const LiveRegSet *LiveRegsCopy = nullptr);
 };
 
-LaneBitmask getLiveLaneMask(unsigned Reg,
-                            SlotIndex SI,
+LaneBitmask getLiveLaneMask(unsigned Reg, SlotIndex SI,
                             const LiveIntervals &LIS,
-                            const MachineRegisterInfo &MRI);
+                            const MachineRegisterInfo &MRI,
+                            LaneBitmask Mask = LaneBitmask::getAll());
 
 LaneBitmask getLiveLaneMask(const LiveInterval &LI, SlotIndex SI,
-                            const MachineRegisterInfo &MRI);
+                            const MachineRegisterInfo &MRI,
+                            LaneBitmask Mask = LaneBitmask::getAll());
 
 GCNRPTracker::LiveRegSet getLiveRegs(SlotIndex SI, const LiveIntervals &LIS,
                                      const MachineRegisterInfo &MRI);

@vpykhtin
Copy link
Contributor

vpykhtin commented Oct 8, 2024

I'm ok with this change, assuming Matt's comments are addressed.

Change-Id: Ibf989b53555fe686693ca01d2e63f64aed58ded7
@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Oct 8, 2024

Thanks for review

5cb6b15

@jrbyrnes jrbyrnes closed this Oct 8, 2024
jrbyrnes referenced this pull request Oct 8, 2024
…ulation.

For speculative RP queries, recede may calculate inaccurate masks for subreg uses. Previously, the calculation would look at any live lane for the use at the position of the MI in the LIS. This also adds lanes for any subregs which are live at but not used by the instruction. By constraining against the getSubRegIndexLaneMask for the operand's subreg, we are sure to not pick up on these extra lanes.

For current clients of recede, this is not an issue. This is because 1. the current clients do not violate the program order in the LIS, and 2. the change to RP is based on the difference between previous mask and new mask. Since current clients are not exposed to this issue, this patch is sort of NFC.

Co-authored-by: Valery Pykhtin [email protected]
Change-Id: Iaed80271226b2587297e6fb78fe081afec1a9275
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants