Skip to content

[IPO] Added attributor for identifying invariant loads #141800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

zGoldthorpe
Copy link
Contributor

The attributor conservatively marks pointers whose loads are eligible to be marked as !invariant.load.
It does so by identifying:

  1. Pointers marked noalias and readonly
  2. Pointers whose underlying objects are all eligible for invariant loads.

The attributor then manifests this attribute at non-atomic non-volatile load instructions.

@llvmbot
Copy link
Member

llvmbot commented May 28, 2025

@llvm/pr-subscribers-llvm-transforms

Author: None (zGoldthorpe)

Changes

The attributor conservatively marks pointers whose loads are eligible to be marked as !invariant.load.
It does so by identifying:

  1. Pointers marked noalias and readonly
  2. Pointers whose underlying objects are all eligible for invariant loads.

The attributor then manifests this attribute at non-atomic non-volatile load instructions.


Patch is 23.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/141800.diff

5 Files Affected:

  • (modified) llvm/include/llvm/Transforms/IPO/Attributor.h (+38)
  • (modified) llvm/lib/Transforms/IPO/Attributor.cpp (+2)
  • (modified) llvm/lib/Transforms/IPO/AttributorAttributes.cpp (+245)
  • (modified) llvm/test/Transforms/Attributor/multiple-offsets-pointer-info.ll (+4-4)
  • (added) llvm/test/Transforms/Attributor/tag-invariant-loads.ll (+220)
diff --git a/llvm/include/llvm/Transforms/IPO/Attributor.h b/llvm/include/llvm/Transforms/IPO/Attributor.h
index c628bbb007230..53fa7a04dc5b5 100644
--- a/llvm/include/llvm/Transforms/IPO/Attributor.h
+++ b/llvm/include/llvm/Transforms/IPO/Attributor.h
@@ -6289,6 +6289,44 @@ struct AAUnderlyingObjects : AbstractAttribute {
                           AA::ValueScope Scope = AA::Interprocedural) const = 0;
 };
 
+/// An abstract interface for identifying pointers from which loads can be
+/// marked invariant.
+struct AAInvariantLoadPointer : public AbstractAttribute {
+  AAInvariantLoadPointer(const IRPosition &IRP) : AbstractAttribute(IRP) {}
+
+  /// See AbstractAttribute::isValidIRPositionForInit
+  static bool isValidIRPositionForInit(Attributor &A, const IRPosition &IRP) {
+    if (!IRP.getAssociatedType()->isPointerTy())
+      return false;
+    return AbstractAttribute::isValidIRPositionForInit(A, IRP);
+  }
+
+  /// Create an abstract attribute view for the position \p IRP.
+  static AAInvariantLoadPointer &createForPosition(const IRPosition &IRP,
+                                                   Attributor &A);
+
+  /// Return true if the pointer's contents are known to remain invariant.
+  virtual bool isKnownInvariant() const = 0;
+
+  /// Return true if the pointer's contents are assumed to remain invariant.
+  virtual bool isAssumedInvariant() const = 0;
+
+  /// See AbstractAttribute::getName().
+  StringRef getName() const override { return "AAInvariantLoadPointer"; }
+
+  /// See AbstractAttribute::getIdAddr().
+  const char *getIdAddr() const override { return &ID; }
+
+  /// This function should return true if the type of the \p AA is
+  /// AAInvariantLoadPointer
+  static bool classof(const AbstractAttribute *AA) {
+    return (AA->getIdAddr() == &ID);
+  }
+
+  /// Unique ID (due to the unique address).
+  static const char ID;
+};
+
 /// An abstract interface for address space information.
 struct AAAddressSpace : public StateWrapper<BooleanState, AbstractAttribute> {
   AAAddressSpace(const IRPosition &IRP, Attributor &A)
diff --git a/llvm/lib/Transforms/IPO/Attributor.cpp b/llvm/lib/Transforms/IPO/Attributor.cpp
index cbdbf9ae1494d..1dc576656d12a 100644
--- a/llvm/lib/Transforms/IPO/Attributor.cpp
+++ b/llvm/lib/Transforms/IPO/Attributor.cpp
@@ -3620,6 +3620,8 @@ void Attributor::identifyDefaultAbstractAttributes(Function &F) {
       if (SimplifyAllLoads)
         getAssumedSimplified(IRPosition::value(I), nullptr,
                              UsedAssumedInformation, AA::Intraprocedural);
+      getOrCreateAAFor<AAInvariantLoadPointer>(
+          IRPosition::value(*LI->getPointerOperand()));
       getOrCreateAAFor<AAAddressSpace>(
           IRPosition::value(*LI->getPointerOperand()));
     } else {
diff --git a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
index 470c5308edca4..f0647747d6c7f 100644
--- a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -191,6 +191,7 @@ PIPE_OPERATOR(AAInterFnReachability)
 PIPE_OPERATOR(AAPointerInfo)
 PIPE_OPERATOR(AAAssumptionInfo)
 PIPE_OPERATOR(AAUnderlyingObjects)
+PIPE_OPERATOR(AAInvariantLoadPointer)
 PIPE_OPERATOR(AAAddressSpace)
 PIPE_OPERATOR(AAAllocationInfo)
 PIPE_OPERATOR(AAIndirectCallInfo)
@@ -12534,6 +12535,248 @@ struct AAIndirectCallInfoCallSite : public AAIndirectCallInfo {
 };
 } // namespace
 
+/// --------------------- Invariant Load Pointer -------------------------------
+namespace {
+
+struct AAInvariantLoadPointerImpl
+    : public StateWrapper<BitIntegerState<uint8_t, 7>, AAInvariantLoadPointer,
+                          uint8_t> {
+  // load invariance is implied by, but not equivalent to IS_NOALIAS |
+  // IS_READONLY, as load invariance is also implied by all underlying objects
+  // being load invariant.
+  //
+  // IS_INVARIANT is set to indicate that the contents of the pointer are
+  // *known* to be invariant.
+  enum {
+    IS_INVARIANT = 1 << 0,
+    IS_NOALIAS = 1 << 1,
+    IS_READONLY = 1 << 2,
+  };
+  static_assert(getBestState() == (IS_INVARIANT | IS_NOALIAS | IS_READONLY),
+                "Unexpected best state!");
+
+  using Base = StateWrapper<BitIntegerState<uint8_t, 7>, AAInvariantLoadPointer,
+                            uint8_t>;
+
+  // the BitIntegerState is optimistic about noalias and readonly, but
+  // pessimistic about invariance
+  AAInvariantLoadPointerImpl(const IRPosition &IRP, Attributor &A)
+      : Base(IRP, IS_NOALIAS | IS_READONLY) {}
+
+  void initialize(Attributor &A) final {
+    // conservatively assume that the pointer's contents are not invariant,
+    // until proven otherwise.
+    removeAssumedBits(IS_INVARIANT);
+  }
+
+  bool isKnownInvariant() const final {
+    return isKnown(IS_INVARIANT) || isKnown(IS_NOALIAS | IS_READONLY);
+  }
+
+  bool isAssumedInvariant() const final {
+    return isAssumed(IS_INVARIANT) || isAssumed(IS_NOALIAS | IS_READONLY);
+  }
+
+  ChangeStatus updateImpl(Attributor &A) override {
+    if (isKnownInvariant())
+      return ChangeStatus::UNCHANGED;
+
+    ChangeStatus Changed = ChangeStatus::UNCHANGED;
+
+    Changed |= updateNoAlias(A);
+    Changed |= updateReadOnly(A);
+
+    bool UsedAssumedInformation = false;
+    const auto IsInvariantLoadIfPointer = [&](const Value &V) {
+      if (!V.getType()->isPointerTy())
+        return true;
+      const auto *IsInvariantLoadPointer =
+          A.getOrCreateAAFor<AAInvariantLoadPointer>(IRPosition::value(V), this,
+                                                     DepClassTy::REQUIRED);
+      if (IsInvariantLoadPointer->isKnownInvariant())
+        return true;
+      if (!IsInvariantLoadPointer->isAssumedInvariant())
+        return false;
+
+      UsedAssumedInformation = true;
+      return true;
+    };
+
+    const auto *AUO = A.getOrCreateAAFor<AAUnderlyingObjects>(
+        getIRPosition(), this, DepClassTy::REQUIRED);
+
+    if (!AUO->forallUnderlyingObjects(IsInvariantLoadIfPointer)) {
+      removeAssumedBits(IS_INVARIANT);
+      return ChangeStatus::CHANGED;
+    }
+
+    if (!UsedAssumedInformation) {
+      // pointer is known (not assumed) to be invariant
+      addKnownBits(IS_INVARIANT);
+      return ChangeStatus::CHANGED;
+    }
+
+    return Changed;
+  }
+
+  ChangeStatus manifest(Attributor &A) override {
+    if (!isKnownInvariant())
+      return ChangeStatus::UNCHANGED;
+
+    ChangeStatus Changed = ChangeStatus::UNCHANGED;
+    Value *Ptr = &getAssociatedValue();
+    const auto TagInvariantLoads = [&](const Use &U, bool &) {
+      if (U.get() != Ptr)
+        return true;
+      auto *I = dyn_cast<Instruction>(U.getUser());
+      if (!I)
+        return true;
+
+      // Ensure that we are only changing uses from the corresponding callgraph
+      // SSC in the case that the AA isn't run on the entire module
+      if (!A.isRunOn(I->getFunction()))
+        return true;
+
+      if (I->hasMetadata(LLVMContext::MD_invariant_load))
+        return true;
+
+      if (auto *LI = dyn_cast<LoadInst>(I)) {
+        if (LI->isVolatile() || LI->isAtomic())
+          return true;
+
+        LI->setMetadata(LLVMContext::MD_invariant_load,
+                        MDNode::get(LI->getContext(), {}));
+        Changed = ChangeStatus::CHANGED;
+      }
+      return true;
+    };
+
+    (void)A.checkForAllUses(TagInvariantLoads, *this, *Ptr);
+    return Changed;
+  }
+
+  /// See AbstractAttribute::getAsStr().
+  const std::string getAsStr(Attributor *) const override {
+    std::string Str;
+    raw_string_ostream OS(Str);
+    OS << "load invariant pointer: " << isKnown() << '\n';
+    return Str;
+  }
+
+  /// See AbstractAttribute::trackStatistics().
+  void trackStatistics() const override {}
+
+protected:
+  ChangeStatus updateNoAlias(Attributor &A) {
+    if (isKnown(IS_NOALIAS) || !isAssumed(IS_NOALIAS))
+      return ChangeStatus::UNCHANGED;
+
+    const auto *ANoAlias = A.getOrCreateAAFor<AANoAlias>(getIRPosition(), this,
+                                                         DepClassTy::REQUIRED);
+    if (!ANoAlias)
+      return tryInferNoAlias(A);
+
+    if (!ANoAlias->isAssumedNoAlias()) {
+      removeAssumedBits(IS_NOALIAS);
+      return ChangeStatus::CHANGED;
+    }
+    if (ANoAlias->isKnownNoAlias())
+      addKnownBits(IS_NOALIAS);
+
+    return ChangeStatus::UNCHANGED;
+  }
+
+  /// Fallback method if updateNoAlias fails to infer noalias information from
+  /// AANoAlias.
+  virtual ChangeStatus tryInferNoAlias(Attributor &A) {
+    return ChangeStatus::UNCHANGED;
+  }
+
+  ChangeStatus updateReadOnly(Attributor &A) {
+    if (isKnown(IS_READONLY) || !isAssumed(IS_READONLY))
+      return ChangeStatus::UNCHANGED;
+
+    // AAMemoryBehavior may crash if value is global
+    if (!getAssociatedFunction())
+      return tryInferReadOnly(A);
+
+    const auto *AMemoryBehavior = A.getOrCreateAAFor<AAMemoryBehavior>(
+        getIRPosition(), this, DepClassTy::REQUIRED);
+    if (!AMemoryBehavior)
+      return tryInferReadOnly(A);
+
+    if (!AMemoryBehavior->isAssumedReadOnly()) {
+      removeAssumedBits(IS_READONLY);
+      return ChangeStatus::CHANGED;
+    }
+    if (AMemoryBehavior->isKnownReadOnly())
+      addKnownBits(IS_READONLY);
+
+    return ChangeStatus::UNCHANGED;
+  }
+
+  /// Fallback method if updateReadOnly fails to infer readonly information from
+  /// AAMemoryBehavior.
+  virtual ChangeStatus tryInferReadOnly(Attributor &A) {
+    return ChangeStatus::UNCHANGED;
+  }
+};
+
+struct AAInvariantLoadPointerFloating final : AAInvariantLoadPointerImpl {
+  AAInvariantLoadPointerFloating(const IRPosition &IRP, Attributor &A)
+      : AAInvariantLoadPointerImpl(IRP, A) {}
+};
+
+struct AAInvariantLoadPointerReturned final : AAInvariantLoadPointerImpl {
+  AAInvariantLoadPointerReturned(const IRPosition &IRP, Attributor &A)
+      : AAInvariantLoadPointerImpl(IRP, A) {}
+};
+
+struct AAInvariantLoadPointerCallSiteReturned final
+    : AAInvariantLoadPointerImpl {
+  AAInvariantLoadPointerCallSiteReturned(const IRPosition &IRP, Attributor &A)
+      : AAInvariantLoadPointerImpl(IRP, A) {}
+};
+
+struct AAInvariantLoadPointerArgument final : AAInvariantLoadPointerImpl {
+  AAInvariantLoadPointerArgument(const IRPosition &IRP, Attributor &A)
+      : AAInvariantLoadPointerImpl(IRP, A) {}
+
+protected:
+  ChangeStatus tryInferNoAlias(Attributor &A) override {
+    const auto *Arg = getAssociatedArgument();
+    if (Arg->hasNoAliasAttr()) {
+      addKnownBits(IS_NOALIAS);
+      return ChangeStatus::UNCHANGED;
+    }
+
+    // noalias information is not provided, and cannot be inferred from
+    // AANoAlias
+    removeAssumedBits(IS_NOALIAS);
+    return ChangeStatus::CHANGED;
+  }
+
+  ChangeStatus tryInferReadOnly(Attributor &A) override {
+    const auto *Arg = getAssociatedArgument();
+    if (Arg->onlyReadsMemory()) {
+      addKnownBits(IS_READONLY);
+      return ChangeStatus::UNCHANGED;
+    }
+
+    // readonly information is not provided, and cannot be inferred from
+    // AAMemoryBehavior
+    removeAssumedBits(IS_READONLY);
+    return ChangeStatus::CHANGED;
+  }
+};
+
+struct AAInvariantLoadPointerCallSiteArgument final
+    : AAInvariantLoadPointerImpl {
+  AAInvariantLoadPointerCallSiteArgument(const IRPosition &IRP, Attributor &A)
+      : AAInvariantLoadPointerImpl(IRP, A) {}
+};
+} // namespace
+
 /// ------------------------ Address Space  ------------------------------------
 namespace {
 
@@ -13031,6 +13274,7 @@ const char AAInterFnReachability::ID = 0;
 const char AAPointerInfo::ID = 0;
 const char AAAssumptionInfo::ID = 0;
 const char AAUnderlyingObjects::ID = 0;
+const char AAInvariantLoadPointer::ID = 0;
 const char AAAddressSpace::ID = 0;
 const char AAAllocationInfo::ID = 0;
 const char AAIndirectCallInfo::ID = 0;
@@ -13165,6 +13409,7 @@ CREATE_VALUE_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAPotentialValues)
 CREATE_VALUE_ABSTRACT_ATTRIBUTE_FOR_POSITION(AANoUndef)
 CREATE_VALUE_ABSTRACT_ATTRIBUTE_FOR_POSITION(AANoFPClass)
 CREATE_VALUE_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAPointerInfo)
+CREATE_VALUE_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAInvariantLoadPointer)
 CREATE_VALUE_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAAddressSpace)
 CREATE_VALUE_ABSTRACT_ATTRIBUTE_FOR_POSITION(AAAllocationInfo)
 
diff --git a/llvm/test/Transforms/Attributor/multiple-offsets-pointer-info.ll b/llvm/test/Transforms/Attributor/multiple-offsets-pointer-info.ll
index f04ac4d73340f..9e58a35107491 100644
--- a/llvm/test/Transforms/Attributor/multiple-offsets-pointer-info.ll
+++ b/llvm/test/Transforms/Attributor/multiple-offsets-pointer-info.ll
@@ -10,7 +10,7 @@ define i8 @select_offsets_simplifiable_1(i1 %cnd1, i1 %cnd2) {
 ; CHECK-LABEL: define {{[^@]+}}@select_offsets_simplifiable_1
 ; CHECK-SAME: (i1 [[CND1:%.*]], i1 [[CND2:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[BYTES:%.*]] = call ptr @calloc(i64 noundef 1024, i64 noundef 1)
+; CHECK-NEXT:    [[BYTES:%.*]] = call noalias ptr @calloc(i64 noundef 1024, i64 noundef 1)
 ; CHECK-NEXT:    [[GEP23:%.*]] = getelementptr inbounds [1024 x i8], ptr [[BYTES]], i64 0, i64 23
 ; CHECK-NEXT:    store i8 23, ptr [[GEP23]], align 4
 ; CHECK-NEXT:    [[GEP29:%.*]] = getelementptr inbounds [1024 x i8], ptr [[BYTES]], i64 0, i64 29
@@ -190,7 +190,7 @@ define i8 @select_offsets_not_simplifiable_3(i1 %cnd1, i1 %cnd2) {
 ; CHECK-LABEL: define {{[^@]+}}@select_offsets_not_simplifiable_3
 ; CHECK-SAME: (i1 [[CND1:%.*]], i1 [[CND2:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[BYTES:%.*]] = call ptr @calloc(i64 noundef 1024, i64 noundef 1)
+; CHECK-NEXT:    [[BYTES:%.*]] = call noalias ptr @calloc(i64 noundef 1024, i64 noundef 1)
 ; CHECK-NEXT:    [[SEL0:%.*]] = select i1 [[CND1]], i64 23, i64 29
 ; CHECK-NEXT:    [[SEL1:%.*]] = select i1 [[CND2]], i64 [[SEL0]], i64 7
 ; CHECK-NEXT:    [[GEP_SEL:%.*]] = getelementptr inbounds [1024 x i8], ptr [[BYTES]], i64 0, i64 [[SEL1]]
@@ -214,7 +214,7 @@ define i8 @select_offsets_not_simplifiable_4(i1 %cnd1, i1 %cnd2) {
 ; CHECK-LABEL: define {{[^@]+}}@select_offsets_not_simplifiable_4
 ; CHECK-SAME: (i1 [[CND1:%.*]], i1 [[CND2:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[BYTES:%.*]] = call ptr @calloc(i64 noundef 1024, i64 noundef 1)
+; CHECK-NEXT:    [[BYTES:%.*]] = call noalias ptr @calloc(i64 noundef 1024, i64 noundef 1)
 ; CHECK-NEXT:    [[SEL0:%.*]] = select i1 [[CND1]], i64 23, i64 29
 ; CHECK-NEXT:    [[SEL1:%.*]] = select i1 [[CND2]], i64 [[SEL0]], i64 7
 ; CHECK-NEXT:    [[GEP_SEL:%.*]] = getelementptr inbounds [1024 x i8], ptr [[BYTES]], i64 0, i64 [[SEL1]]
@@ -445,7 +445,7 @@ define i8 @phi_gep_not_simplifiable_2(i1 %cnd1, i1 %cnd2) {
 ; CHECK-LABEL: define {{[^@]+}}@phi_gep_not_simplifiable_2
 ; CHECK-SAME: (i1 [[CND1:%.*]], i1 [[CND2:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[BYTES:%.*]] = call ptr @calloc(i64 noundef 1024, i64 noundef 1)
+; CHECK-NEXT:    [[BYTES:%.*]] = call noalias ptr @calloc(i64 noundef 1024, i64 noundef 1)
 ; CHECK-NEXT:    [[GEP23:%.*]] = getelementptr inbounds [1024 x i8], ptr [[BYTES]], i64 0, i64 23
 ; CHECK-NEXT:    br i1 [[CND1]], label [[THEN:%.*]], label [[ELSE:%.*]]
 ; CHECK:       then:
diff --git a/llvm/test/Transforms/Attributor/tag-invariant-loads.ll b/llvm/test/Transforms/Attributor/tag-invariant-loads.ll
new file mode 100644
index 0000000000000..6df07a0d68bee
--- /dev/null
+++ b/llvm/test/Transforms/Attributor/tag-invariant-loads.ll
@@ -0,0 +1,220 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=attributor %s -S | FileCheck %s
+
+@G = global i32 zeroinitializer, align 4
+
+declare ptr @get_ptr()
+declare noalias ptr @get_noalias_ptr()
+
+define i32 @test_plain(ptr %ptr) {
+; CHECK-LABEL: define i32 @test_plain(
+; CHECK-SAME: ptr nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[PTR:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    [[VAL:%.*]] = load i32, ptr [[PTR]], align 4
+; CHECK-NEXT:    ret i32 [[VAL]]
+;
+  %val = load i32, ptr %ptr, align 4
+  ret i32 %val
+}
+
+define i32 @test_noalias_ptr(ptr noalias %ptr) {
+; CHECK-LABEL: define i32 @test_noalias_ptr(
+; CHECK-SAME: ptr noalias nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[PTR:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[VAL:%.*]] = load i32, ptr [[PTR]], align 4, !invariant.load [[META0:![0-9]+]]
+; CHECK-NEXT:    ret i32 [[VAL]]
+;
+  %val = load i32, ptr %ptr, align 4
+  ret i32 %val
+}
+
+define i32 @test_swap(ptr noalias %ptr, i32 %write) {
+; CHECK-LABEL: define i32 @test_swap(
+; CHECK-SAME: ptr noalias nofree noundef nonnull align 4 captures(none) dereferenceable(4) [[PTR:%.*]], i32 [[WRITE:%.*]]) #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT:    [[VAL:%.*]] = load i32, ptr [[PTR]], align 4
+; CHECK-NEXT:    store i32 [[WRITE]], ptr [[PTR]], align 4
+; CHECK-NEXT:    ret i32 [[VAL]]
+;
+  %val = load i32, ptr %ptr, align 4
+  store i32 %write, ptr %ptr, align 4
+  ret i32 %val
+}
+
+define i32 @test_volatile_load(ptr noalias %ptr) {
+; CHECK-LABEL: define i32 @test_volatile_load(
+; CHECK-SAME: ptr noalias nofree noundef align 4 [[PTR:%.*]]) #[[ATTR2:[0-9]+]] {
+; CHECK-NEXT:    [[VAL:%.*]] = load volatile i32, ptr [[PTR]], align 4
+; CHECK-NEXT:    ret i32 [[VAL]]
+;
+  %val = load volatile i32, ptr %ptr, align 4
+  ret i32 %val
+}
+
+define i32 @test_atomic_load(ptr noalias %ptr) {
+; CHECK-LABEL: define i32 @test_atomic_load(
+; CHECK-SAME: ptr noalias nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[PTR:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[VAL:%.*]] = load atomic i32, ptr [[PTR]] unordered, align 4
+; CHECK-NEXT:    ret i32 [[VAL]]
+;
+  %val = load atomic i32, ptr %ptr unordered, align 4
+  ret i32 %val
+}
+
+define i32 @test_atomic_volatile_load(ptr noalias %ptr) {
+; CHECK-LABEL: define i32 @test_atomic_volatile_load(
+; CHECK-SAME: ptr noalias nofree noundef align 4 [[PTR:%.*]]) #[[ATTR2]] {
+; CHECK-NEXT:    [[VAL:%.*]] = load atomic volatile i32, ptr [[PTR]] unordered, align 4
+; CHECK-NEXT:    ret i32 [[VAL]]
+;
+  %val = load atomic volatile i32, ptr %ptr unordered, align 4
+  ret i32 %val
+}
+
+define i32 @test_global() {
+; CHECK-LABEL: define i32 @test_global(
+; CHECK-SAME: ) #[[ATTR3:[0-9]+]] {
+; CHECK-NEXT:    [[VAL:%.*]] = load i32, ptr @G, align 4
+; CHECK-NEXT:    ret i32 [[VAL]]
+;
+  %val = load i32, ptr @G, align 4
+  ret i32 %val
+}
+
+define internal i32 @test_internal_noalias_load(ptr %ptr) {
+; CHECK-LABEL: define internal i32 @test_internal_noalias_load(
+; CHECK-SAME: ptr noalias nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[PTR:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[VAL:%.*]] = load i32, ptr [[PTR]], align 4, !invariant.load [[META0]]
+; CHECK-NEXT:    ret i32 [[VAL]]
+;
+  %val = load i32, ptr %ptr, align 4
+  ret i32 %val
+}
+
+define i32 @test_call_internal_noalias(ptr noalias %ptr) {
+; CHECK-LABEL: define i32 @test_call_internal_noalias(
+; CHECK-SAME: ptr noalias nofree readonly captures(none) [[PTR:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[VAL:%.*]] = call i32 @test_internal_noalias_load(ptr noalias nofree noundef readonly align 4 captures(none) [[PTR]]) #[[ATTR4:[0-9]+]]
+; CHECK-NEXT:    ret i32 [[VAL]]
+;
+  %val = call i32 @test_internal_noalias_load(ptr %ptr)
+  ret i32 %val
+}
+
+define internal i32 @test_internal_load(ptr %ptr) {
+; CHECK-LABEL: define internal i32 @test_internal_load(
+; CHECK-SAME: ptr nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[PTR:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[VAL:%.*]] = load i32, ptr [[PTR]], align 4
+; CHECK-NEXT:    ret i32 [[VAL]]
+;
+  %val = load i32, ptr %ptr, align 4
+  ret i32 %val
+}
+
+define i32 @test_call_internal(ptr %ptr) {
+; CHECK-LABEL: define i32 @test_call_internal(
+; CHECK-SAME: ptr nofree readonly captures(none) [[PTR:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[VAL:%.*]] = call i32 @test_internal_load(ptr nofree noundef readonly align 4 captures(none) [[PTR]]) #[[ATTR4]]
+; CHECK-NEXT:    ret i32 [[VAL]]
+;
+  %val = call i32 @test_internal_load(ptr %ptr)
+  ret i32 %val
+}
+
+define i32 @test_call_ptr() {
+; CHECK-LABEL: define i32 @test_call_ptr() {
+; CHECK-NEXT:    [[PTR:%.*]] = call ptr @get_ptr()
+; CHECK-NEXT:    [[VAL:%.*]] = load i32, ptr [[PTR]], align 4
+; CHECK-NEXT:    ret i32 [[VAL]]
+;
+  %ptr = call ptr @get_ptr()
+  %val = load i32, ptr %ptr, align 4
+  ret i32 %val
+}
+
+define i32 @test_call_noalias_ptr() ...
[truncated]

@shiltian shiltian requested review from arsenm and jdoerfert May 28, 2025 17:07
define i32 @test_noalias_ptr(ptr noalias %ptr) {
; CHECK-LABEL: define i32 @test_noalias_ptr(
; CHECK-SAME: ptr noalias nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[PTR:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: [[VAL:%.*]] = load i32, ptr [[PTR]], align 4, !invariant.load [[META0:![0-9]+]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if this is true in general? My understanding is that !invariant.load means that the value doesn't change at any program point where the pointer is dereferencable, while readonly noalias only guarantees a lack of modification during the execution of the function.

That is, is this safe with respect to inlining?

Consider

define i1 @check(ptr noalias readonly nonnull noundef align 4 dereferencable(4) %ptr) {
   %v = load i32, ptr %p ; This can be !invariant.load by current analysis
   %r = cmpi eq i32 %v, 0
   ret i1 %r
}
define i1 @foo(ptr %p, i32 %x) {
  store i32 %x, ptr %p
  %v = call i1 @check(ptr %p)
  ret i1 %v
}

As far as I'm aware, if you inline @check into @foo, the !invariant.load can safely be hoisted above the store (or the store becomes UB and is eliminatable, or so on), thus defeating the purpose of @foo.

I think this analysis needs some information (which, I suspect, is triple-specific) about objects that have a program-wide lifetime (kernel arguments on GPUs, arguments to main(), etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. I misunderstood the strength of the assumptions on an !invariant.load. I'm not sure if there is a useful way to salvage the attributor at this level of generality. It's probably best to close this PR and narrow the scope, as you suggest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only applies to !isCallableCC functions at most

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if there are no atomic loads

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is readonly even available in the presence of atomic loads? Or at least in combination with noalias?

That is, I'm not sure define void @waitOnSpinlock(ptr readonly noalios %lock) is correct?

... That being said, yeah, this inference requires either !isCallableCC or that the address space of the value being read from is something like ptr addrspace(4) on AMDGPU where the data is guaranteed to be invariant (or .rodata on x86, if there's a representation for things that go there).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I went digging, and the atomics guid says

Alias analysis: Note that AA will return ModRef for anything Acquire or Release, and for the address accessed by any Monotonic operation.

So I'm pretty sure it's true than an acquire or stronger atomic - which is what you'd need for any sort of "wait for someone to modify this" - should block readonly

That is, it seems true that acquire semantics on a load (or stronger) make the loaded-from location not readonly in the LLVM sense.

And for weaker semantics - monotonic, say, - you're dealing with things like atomic counters where you can reorder things however you'd like so long as you don't lose writes.

That is, I claim that

define void @f1(ptr addrspace(1) readonly noalias noundef %flag, ...) {
  ; ..., no uses of %flag
spin:
  %can.proceed = load i1, ptr addrspace(1) %flag, acquire
  br i1 %can.proceed, label %cont, label %spin
cont:
  ; ..., more things, don't modify %flag
  ret void
}

is ill-formed and if the attributor adds the readonly flag to code like this it's wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, the attributor does add readonly to %flag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readonly doesn't imply the memory doesn't change, it only states that the function does not write through a reference derived from this argument. I don't think the AA property does anything here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

So what we basically want is a per-pointer nosync analysis ... that is, "a reference derived is from this argument isn't used to synchronize concurrent execution"

(Checking for nosync on the kernel itself is too strong - it'll catch stuff like ds.bpermute)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have improved the analysis to detect effectful atomic loads from pointers, which should address these concerns.

@zGoldthorpe
Copy link
Contributor Author

zGoldthorpe commented May 28, 2025

I have tried addressing most of the feedback.
However, I haven't done anything regarding atomics, yet (besides never marking atomic loads as !invariant.load).

Assuming !isCallableCC, are nonatomic loads to readonly noalias pointers potentially non-invariant once there is also an atomic load to the same pointer in the same function?

"Side effects" include volatile loads and atomic loads that are at least
monotonic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants