Skip to content

Commit 9e77f59

Browse files
authored
[LV] Account for vp_merge in out of loop EVL reductions in legacy cost model (#115903)
In #101641, support for out of loop reductions with EVL tail folding was added by transforming selects to vp_merges in transformRecipestoEVLRecipes. Whilst the select was previously free, the vp_merge wasn't and incurs a cost on RISC-V with the VPlan cost model. But this diverged from the legacy cost model and caused the "VPlan cost model and legacy cost model disagreed" assertion to trigger when building 502.gcc_r from SPEC CPU 2017. Neither the select nor vp_merge recipes from the VPlan exist in the underlying instructions, so I thought it would make the most sense to fix this by adding the cost to the underlying phi instruction in getInstructionCost. It's worth noting that on RISC-V this vp_merge won't actually generate any instructions because the mask is all true, and will be folded away. So we should update the cost model at some point to reflect that.
1 parent 9a730d8 commit 9e77f59

File tree

2 files changed

+35
-0
lines changed

2 files changed

+35
-0
lines changed

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6566,6 +6566,16 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I,
65666566
CmpInst::BAD_ICMP_PREDICATE, CostKind);
65676567
}
65686568

6569+
// When tail folding with EVL, if the phi is part of an out of loop
6570+
// reduction then it will be transformed into a wide vp_merge.
6571+
if (VF.isVector() && foldTailWithEVL() &&
6572+
Legal->getReductionVars().contains(Phi) && !isInLoopReduction(Phi)) {
6573+
IntrinsicCostAttributes ICA(
6574+
Intrinsic::vp_merge, ToVectorTy(Phi->getType(), VF),
6575+
{ToVectorTy(Type::getInt1Ty(Phi->getContext()), VF)});
6576+
return TTI.getIntrinsicInstrCost(ICA, CostKind);
6577+
}
6578+
65696579
return TTI.getCFInstrCost(Instruction::PHI, CostKind);
65706580
}
65716581
case Instruction::UDiv:
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize \
2+
; RUN: -force-tail-folding-style=data-with-evl \
3+
; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
4+
; RUN: -mtriple=riscv64 -mattr=+v -S < %s 2>&1 | FileCheck %s
5+
6+
; CHECK: Cost of 2 for VF vscale x 4: WIDEN-INTRINSIC vp<%{{.+}}> = call llvm.vp.merge(ir<true>, ir<%add>, ir<%rdx>, vp<%{{.+}}>)
7+
; CHECK: LV: Found an estimated cost of 2 for VF vscale x 4 For instruction: %rdx = phi i32 [ %start, %entry ], [ %add, %loop ]
8+
9+
define i32 @add(ptr %a, i64 %n, i32 %start) {
10+
entry:
11+
br label %loop
12+
13+
loop:
14+
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
15+
%rdx = phi i32 [ %start, %entry ], [ %add, %loop ]
16+
%arrayidx = getelementptr inbounds i32, ptr %a, i64 %iv
17+
%0 = load i32, ptr %arrayidx, align 4
18+
%add = add nsw i32 %0, %rdx
19+
%iv.next = add nuw nsw i64 %iv, 1
20+
%exitcond.not = icmp eq i64 %iv.next, %n
21+
br i1 %exitcond.not, label %exit, label %loop
22+
23+
exit:
24+
ret i32 %add
25+
}

0 commit comments

Comments
 (0)