Standards conformance
@@ -1183,7 +1190,7 @@ Mcuda
-## Notes
+## Notes
**Standards conformance:**
@@ -1290,7 +1297,7 @@ GNU is the only compiler with options governing the use of non-standard intrinsi
**Warn for bad call checking**: This Cray option ("-eb") issues a warning message rather than an error message when the compiler detects a call to a procedure with one or more dummy arguments having the TARGET, VOLATILE or ASYNCHRONOUS attribute and there is not an explicit interface definition.
-## Notes
+## Appendix
### What is and is not included
diff --git a/flang/documentation/Overview.md b/flang/docs/Overview.md
similarity index 98%
rename from flang/documentation/Overview.md
rename to flang/docs/Overview.md
index 47ad18f023f95..9878589438450 100644
--- a/flang/documentation/Overview.md
+++ b/flang/docs/Overview.md
@@ -1,4 +1,4 @@
-
+# Parser Combinators
+
+```eval_rst
+.. contents::
+ :local:
+```
+
+This document is a primer on Parser Combinators and their use in Flang.
+
## Concept
The Fortran language recognizer here can be classified as an LL recursive
descent parser. It is composed from a *parser combinator* library that
diff --git a/flang/documentation/Parsing.md b/flang/docs/Parsing.md
similarity index 97%
rename from flang/documentation/Parsing.md
rename to flang/docs/Parsing.md
index b961cd630ae18..dec63e6fbdab4 100644
--- a/flang/documentation/Parsing.md
+++ b/flang/docs/Parsing.md
@@ -1,4 +1,4 @@
-
-The F18 Parser
-==============
+# The F18 Parser
+
+```eval_rst
+.. contents::
+ :local:
+```
+
This program source code implements a parser for the Fortran programming
language.
@@ -42,8 +47,8 @@ source file and receive its parse tree and error messages. The interfaces
of the Parsing class correspond to the two major passes of the parser,
which are described below.
-Prescanning and Preprocessing
------------------------------
+## Prescanning and Preprocessing
+
The first pass is performed by an instance of the Prescanner class,
with help from an instance of Preprocessor.
@@ -100,8 +105,8 @@ The content of the cooked character stream is available and useful
for debugging, being as it is a simple value forwarded from the first major
pass of the compiler to the second.
-Source Provenance
------------------
+## Source Provenance
+
The prescanner constructs a chronicle of every file that is read by the
parser, viz. the original source file and all others that it directly
or indirectly includes. One copy of the content of each of these files
@@ -124,8 +129,8 @@ Simple `const char *` pointers to characters in the cooked character
stream, or to contiguous ranges thereof, are used as source position
indicators within the parser and in the parse tree.
-Messages
---------
+## Messages
+
Message texts, and snprintf-like formatting strings for constructing
messages, are instantiated in the various components of the parser with
C++ user defined character literals tagged with `_err_en_US` and `_en_US`
@@ -134,8 +139,8 @@ English used in the United States) so that they may be easily identified
for localization. As described above, messages are associated with
source code positions by means of provenance values.
-The Parse Tree
---------------
+## The Parse Tree
+
Each of the ca. 450 numbered requirement productions in the standard
Fortran language grammar, as well as the productions implied by legacy
extensions and preserved obsolescent features, maps to a distinct class
@@ -174,8 +179,8 @@ stability of pointers into these lists.
There is a general purpose library by means of which parse trees may
be traversed.
-Parsing
--------
+## Parsing
+
This compiler attempts to recognize the entire cooked character stream
(see above) as a Fortran program. It records the reductions made during
a successful recognition as a parse tree value. The recognized grammar
@@ -203,8 +208,8 @@ of "parser combinator" template functions that compose them to form more
complicated recognizers and their correspondences to the construction
of parse tree values.
-Unparsing
----------
+## Unparsing
+
Parse trees can be converted back into free form Fortran source code.
This formatter is not really a classical "pretty printer", but is
more of a data structure dump whose output is suitable for compilation
diff --git a/flang/documentation/Preprocessing.md b/flang/docs/Preprocessing.md
similarity index 94%
rename from flang/documentation/Preprocessing.md
rename to flang/docs/Preprocessing.md
index eff3f921e43c5..3c6984cfa2fd0 100644
--- a/flang/documentation/Preprocessing.md
+++ b/flang/docs/Preprocessing.md
@@ -1,4 +1,4 @@
-
-Fortran Preprocessing
-=====================
+# Fortran Preprocessing
+
+```eval_rst
+.. contents::
+ :local:
+```
+
+## Behavior common to (nearly) all compilers:
-Behavior common to (nearly) all compilers:
-------------------------------------------
* Macro and argument names are sensitive to case.
* Fixed form right margin clipping after column 72 (or 132)
has precedence over macro name recognition, and also over
@@ -39,9 +43,8 @@ Behavior common to (nearly) all compilers:
* A `#define` directive intermixed with continuation lines can't
define a macro that's invoked earlier in the same continued statement.
-Behavior that is not consistent over all extant compilers but which
-probably should be uncontroversial:
------------------------------------
+## Behavior that is not consistent over all extant compilers but which probably should be uncontroversial:
+
* Invoked macro names can straddle a Fortran line continuation.
* ... unless implicit fixed form card padding intervenes; i.e.,
in fixed form, a continued macro name has to be split at column
@@ -65,8 +68,8 @@ probably should be uncontroversial:
directive indicator.
* `#define KWM !` allows KWM to signal a comment.
-Judgement calls, where precedents are unclear:
-----------------------------------------------
+## Judgement calls, where precedents are unclear:
+
* Expressions in `#if` and `#elif` should support both Fortran and C
operators; e.g., `#if 2 .LT. 3` should work.
* If a function-like macro does not close its parentheses, line
@@ -84,16 +87,16 @@ Judgement calls, where precedents are unclear:
lines, it may or may not affect text in the continued statement that
appeared before the directive.
-Behavior that few compilers properly support (or none), but should:
--------------------------------------------------------------------
+## Behavior that few compilers properly support (or none), but should:
+
* A macro invocation can straddle free form continuation lines in all of their
forms, with continuation allowed in the name, before the arguments, and
within the arguments.
* Directives can be capitalized in free form, too.
* `__VA_ARGS__` and `__VA_OPT__` work in variadic function-like macros.
-In short, a Fortran preprocessor should work as if:
----------------------------------------------------
+## In short, a Fortran preprocessor should work as if:
+
1. Fixed form lines are padded up to column 72 (or 132) and clipped thereafter.
2. Fortran comments are removed.
3. C-style line continuations are processed in preprocessing directives.
@@ -125,8 +128,7 @@ text.
OpenMP-style directives that look like comments are not addressed by
this scheme but are obvious extensions.
-Appendix
-========
+## Appendix
`N` in the table below means "not supported"; this doesn't
mean a bug, it just means that a particular behavior was
not observed.
diff --git a/flang/documentation/PullRequestChecklist.md b/flang/docs/PullRequestChecklist.md
similarity index 95%
rename from flang/documentation/PullRequestChecklist.md
rename to flang/docs/PullRequestChecklist.md
index 9a43fa9b46e02..b253c153f61ec 100644
--- a/flang/documentation/PullRequestChecklist.md
+++ b/flang/docs/PullRequestChecklist.md
@@ -1,4 +1,4 @@
-
+# Runtime Descriptors
+
+```eval_rst
+.. contents::
+ :local:
+```
+
## Concept
The properties that characterize data values and objects in Fortran
programs must sometimes be materialized when the program runs.
diff --git a/flang/documentation/Semantics.md b/flang/docs/Semantics.md
similarity index 99%
rename from flang/documentation/Semantics.md
rename to flang/docs/Semantics.md
index 3f185f9f52b8a..361426c936c24 100644
--- a/flang/documentation/Semantics.md
+++ b/flang/docs/Semantics.md
@@ -1,4 +1,4 @@
- X
// (ShAmt == 0) ? fshr(*, X, ShAmt) : X --> X
- if (match(TrueVal, isFsh) && FalseVal == X && CmpLHS == ShAmt &&
- Pred == ICmpInst::ICMP_EQ)
- return X;
- // (ShAmt != 0) ? X : fshl(X, *, ShAmt) --> X
- // (ShAmt != 0) ? X : fshr(*, X, ShAmt) --> X
- if (match(FalseVal, isFsh) && TrueVal == X && CmpLHS == ShAmt &&
- Pred == ICmpInst::ICMP_NE)
+ if (match(TrueVal, isFsh) && FalseVal == X && CmpLHS == ShAmt)
return X;
// Test for a zero-shift-guard-op around rotates. These are used to
@@ -4004,11 +4007,6 @@ static Value *simplifySelectWithICmpCond(Value *CondVal, Value *TrueVal,
m_Intrinsic(m_Value(X),
m_Deferred(X),
m_Value(ShAmt)));
- // (ShAmt != 0) ? fshl(X, X, ShAmt) : X --> fshl(X, X, ShAmt)
- // (ShAmt != 0) ? fshr(X, X, ShAmt) : X --> fshr(X, X, ShAmt)
- if (match(TrueVal, isRotate) && FalseVal == X && CmpLHS == ShAmt &&
- Pred == ICmpInst::ICMP_NE)
- return TrueVal;
// (ShAmt == 0) ? X : fshl(X, X, ShAmt) --> fshl(X, X, ShAmt)
// (ShAmt == 0) ? X : fshr(X, X, ShAmt) --> fshr(X, X, ShAmt)
if (match(FalseVal, isRotate) && TrueVal == X && CmpLHS == ShAmt &&
@@ -4025,27 +4023,20 @@ static Value *simplifySelectWithICmpCond(Value *CondVal, Value *TrueVal,
// arms of the select. See if substituting this value into the arm and
// simplifying the result yields the same value as the other arm.
if (Pred == ICmpInst::ICMP_EQ) {
- if (SimplifyWithOpReplaced(FalseVal, CmpLHS, CmpRHS, Q, MaxRecurse) ==
+ if (SimplifyWithOpReplaced(FalseVal, CmpLHS, CmpRHS, Q,
+ /* AllowRefinement */ false, MaxRecurse) ==
TrueVal ||
- SimplifyWithOpReplaced(FalseVal, CmpRHS, CmpLHS, Q, MaxRecurse) ==
+ SimplifyWithOpReplaced(FalseVal, CmpRHS, CmpLHS, Q,
+ /* AllowRefinement */ false, MaxRecurse) ==
TrueVal)
return FalseVal;
- if (SimplifyWithOpReplaced(TrueVal, CmpLHS, CmpRHS, Q, MaxRecurse) ==
+ if (SimplifyWithOpReplaced(TrueVal, CmpLHS, CmpRHS, Q,
+ /* AllowRefinement */ true, MaxRecurse) ==
FalseVal ||
- SimplifyWithOpReplaced(TrueVal, CmpRHS, CmpLHS, Q, MaxRecurse) ==
+ SimplifyWithOpReplaced(TrueVal, CmpRHS, CmpLHS, Q,
+ /* AllowRefinement */ true, MaxRecurse) ==
FalseVal)
return FalseVal;
- } else if (Pred == ICmpInst::ICMP_NE) {
- if (SimplifyWithOpReplaced(TrueVal, CmpLHS, CmpRHS, Q, MaxRecurse) ==
- FalseVal ||
- SimplifyWithOpReplaced(TrueVal, CmpRHS, CmpLHS, Q, MaxRecurse) ==
- FalseVal)
- return TrueVal;
- if (SimplifyWithOpReplaced(FalseVal, CmpLHS, CmpRHS, Q, MaxRecurse) ==
- TrueVal ||
- SimplifyWithOpReplaced(FalseVal, CmpRHS, CmpLHS, Q, MaxRecurse) ==
- TrueVal)
- return TrueVal;
}
return nullptr;
diff --git a/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
index 3629dbff102c3..39069e24e0612 100644
--- a/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
@@ -1592,12 +1592,16 @@ TypeIndex CodeViewDebug::lowerTypeArray(const DICompositeType *Ty) {
assert(Element->getTag() == dwarf::DW_TAG_subrange_type);
const DISubrange *Subrange = cast(Element);
- // This ought to allow `lowerBound: 0`, https://bugs.llvm.org/show_bug.cgi?id=47287
- // assert(!Subrange->getRawLowerBound() &&
- // "codeview doesn't support subranges with lower bounds");
int64_t Count = -1;
- if (auto *CI = Subrange->getCount().dyn_cast())
- Count = CI->getSExtValue();
+ // Calculate the count if either LowerBound is absent or is zero and
+ // either of Count or UpperBound are constant.
+ auto *LI = Subrange->getLowerBound().dyn_cast();
+ if (!Subrange->getRawLowerBound() || (LI && (LI->getSExtValue() == 0))) {
+ if (auto *CI = Subrange->getCount().dyn_cast())
+ Count = CI->getSExtValue();
+ else if (auto *UI = Subrange->getUpperBound().dyn_cast())
+ Count = UI->getSExtValue() + 1; // LowerBound is zero
+ }
// Forward declarations of arrays without a size and VLAs use a count of -1.
// Emit a count of zero in these cases to match what MSVC does for arrays
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
index e958f38e486b0..ceeae14c10738 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
@@ -1417,8 +1417,10 @@ static bool hasVectorBeenPadded(const DICompositeType *CTy) {
Elements[0]->getTag() == dwarf::DW_TAG_subrange_type &&
"Invalid vector element array, expected one element of type subrange");
const auto Subrange = cast(Elements[0]);
- const auto CI = Subrange->getCount().get();
- const int32_t NumVecElements = CI->getSExtValue();
+ const auto NumVecElements =
+ Subrange->getCount()
+ ? Subrange->getCount().get()->getSExtValue()
+ : 0;
// Ensure we found the element count and that the actual size is wide
// enough to contain the requested size.
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index 70d6dcc2e3e29..4c4839ca65229 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -336,10 +336,8 @@ static bool isNopCopy(const MachineInstr &PreviousCopy, unsigned Src,
unsigned Def, const TargetRegisterInfo *TRI) {
Register PreviousSrc = PreviousCopy.getOperand(1).getReg();
Register PreviousDef = PreviousCopy.getOperand(0).getReg();
- if (Src == PreviousSrc) {
- assert(Def == PreviousDef);
+ if (Src == PreviousSrc && Def == PreviousDef)
return true;
- }
if (!TRI->isSubRegister(PreviousSrc, Src))
return false;
unsigned SubIdx = TRI->getSubRegIndex(PreviousSrc, Src);
diff --git a/llvm/lib/CodeGen/PHIEliminationUtils.cpp b/llvm/lib/CodeGen/PHIEliminationUtils.cpp
index bae96eb84521a..2a72717e711dc 100644
--- a/llvm/lib/CodeGen/PHIEliminationUtils.cpp
+++ b/llvm/lib/CodeGen/PHIEliminationUtils.cpp
@@ -27,31 +27,35 @@ llvm::findPHICopyInsertPoint(MachineBasicBlock* MBB, MachineBasicBlock* SuccMBB,
// Usually, we just want to insert the copy before the first terminator
// instruction. However, for the edge going to a landing pad, we must insert
// the copy before the call/invoke instruction. Similarly for an INLINEASM_BR
- // going to an indirect target.
- if (!SuccMBB->isEHPad() && !SuccMBB->isInlineAsmBrIndirectTarget())
+ // going to an indirect target. This is similar to SplitKit.cpp's
+ // computeLastInsertPoint, and similarly assumes that there cannot be multiple
+ // instructions that are Calls with EHPad successors or INLINEASM_BR in a
+ // block.
+ bool EHPadSuccessor = SuccMBB->isEHPad();
+ if (!EHPadSuccessor && !SuccMBB->isInlineAsmBrIndirectTarget())
return MBB->getFirstTerminator();
- // Discover any defs/uses in this basic block.
- SmallPtrSet DefUsesInMBB;
+ // Discover any defs in this basic block.
+ SmallPtrSet DefsInMBB;
MachineRegisterInfo& MRI = MBB->getParent()->getRegInfo();
- for (MachineInstr &RI : MRI.reg_instructions(SrcReg)) {
+ for (MachineInstr &RI : MRI.def_instructions(SrcReg))
if (RI.getParent() == MBB)
- DefUsesInMBB.insert(&RI);
- }
+ DefsInMBB.insert(&RI);
- MachineBasicBlock::iterator InsertPoint;
- if (DefUsesInMBB.empty()) {
- // No defs. Insert the copy at the start of the basic block.
- InsertPoint = MBB->begin();
- } else if (DefUsesInMBB.size() == 1) {
- // Insert the copy immediately after the def/use.
- InsertPoint = *DefUsesInMBB.begin();
- ++InsertPoint;
- } else {
- // Insert the copy immediately after the last def/use.
- InsertPoint = MBB->end();
- while (!DefUsesInMBB.count(&*--InsertPoint)) {}
- ++InsertPoint;
+ MachineBasicBlock::iterator InsertPoint = MBB->begin();
+ // Insert the copy at the _latest_ point of:
+ // 1. Immediately AFTER the last def
+ // 2. Immediately BEFORE a call/inlineasm_br.
+ for (auto I = MBB->rbegin(), E = MBB->rend(); I != E; ++I) {
+ if (DefsInMBB.contains(&*I)) {
+ InsertPoint = std::next(I.getReverse());
+ break;
+ }
+ if ((EHPadSuccessor && I->isCall()) ||
+ I->getOpcode() == TargetOpcode::INLINEASM_BR) {
+ InsertPoint = I.getReverse();
+ break;
+ }
}
// Make sure the copy goes after any phi nodes but before
diff --git a/llvm/lib/CodeGen/RegAllocFast.cpp b/llvm/lib/CodeGen/RegAllocFast.cpp
index 5396f9f3a1432..cf3eaba23bee9 100644
--- a/llvm/lib/CodeGen/RegAllocFast.cpp
+++ b/llvm/lib/CodeGen/RegAllocFast.cpp
@@ -106,8 +106,13 @@ namespace {
/// that it is alive across blocks.
BitVector MayLiveAcrossBlocks;
- /// State of a register unit.
- enum RegUnitState {
+ /// State of a physical register.
+ enum RegState {
+ /// A disabled register is not available for allocation, but an alias may
+ /// be in use. A register can only be moved out of the disabled state if
+ /// all aliases are disabled.
+ regDisabled,
+
/// A free register is not currently in use and can be allocated
/// immediately without checking aliases.
regFree,
@@ -121,8 +126,8 @@ namespace {
/// register. In that case, LiveVirtRegs contains the inverse mapping.
};
- /// Maps each physical register to a RegUnitState enum or virtual register.
- std::vector RegUnitStates;
+ /// Maps each physical register to a RegState enum or a virtual register.
+ std::vector PhysRegState;
SmallVector VirtDead;
SmallVector Coalesced;
@@ -184,10 +189,6 @@ namespace {
bool isLastUseOfLocalReg(const MachineOperand &MO) const;
void addKillFlag(const LiveReg &LRI);
-#ifndef NDEBUG
- bool verifyRegStateMapping(const LiveReg &LR) const;
-#endif
-
void killVirtReg(LiveReg &LR);
void killVirtReg(Register VirtReg);
void spillVirtReg(MachineBasicBlock::iterator MI, LiveReg &LR);
@@ -195,7 +196,7 @@ namespace {
void usePhysReg(MachineOperand &MO);
void definePhysReg(MachineBasicBlock::iterator MI, MCPhysReg PhysReg,
- unsigned NewState);
+ RegState NewState);
unsigned calcSpillCost(MCPhysReg PhysReg) const;
void assignVirtToPhysReg(LiveReg &, MCPhysReg PhysReg);
@@ -228,7 +229,7 @@ namespace {
bool mayLiveOut(Register VirtReg);
bool mayLiveIn(Register VirtReg);
- void dumpState() const;
+ void dumpState();
};
} // end anonymous namespace
@@ -239,8 +240,7 @@ INITIALIZE_PASS(RegAllocFast, "regallocfast", "Fast Register Allocator", false,
false)
void RegAllocFast::setPhysRegState(MCPhysReg PhysReg, unsigned NewState) {
- for (MCRegUnitIterator UI(PhysReg, TRI); UI.isValid(); ++UI)
- RegUnitStates[*UI] = NewState;
+ PhysRegState[PhysReg] = NewState;
}
/// This allocates space for the specified virtual register to be held on the
@@ -384,23 +384,12 @@ void RegAllocFast::addKillFlag(const LiveReg &LR) {
}
}
-#ifndef NDEBUG
-bool RegAllocFast::verifyRegStateMapping(const LiveReg &LR) const {
- for (MCRegUnitIterator UI(LR.PhysReg, TRI); UI.isValid(); ++UI) {
- if (RegUnitStates[*UI] != LR.VirtReg)
- return false;
- }
-
- return true;
-}
-#endif
-
/// Mark virtreg as no longer available.
void RegAllocFast::killVirtReg(LiveReg &LR) {
- assert(verifyRegStateMapping(LR) && "Broken RegState mapping");
addKillFlag(LR);
- MCPhysReg PhysReg = LR.PhysReg;
- setPhysRegState(PhysReg, regFree);
+ assert(PhysRegState[LR.PhysReg] == LR.VirtReg &&
+ "Broken RegState mapping");
+ setPhysRegState(LR.PhysReg, regFree);
LR.PhysReg = 0;
}
@@ -427,9 +416,7 @@ void RegAllocFast::spillVirtReg(MachineBasicBlock::iterator MI,
/// Do the actual work of spilling.
void RegAllocFast::spillVirtReg(MachineBasicBlock::iterator MI, LiveReg &LR) {
- assert(verifyRegStateMapping(LR) && "Broken RegState mapping");
-
- MCPhysReg PhysReg = LR.PhysReg;
+ assert(PhysRegState[LR.PhysReg] == LR.VirtReg && "Broken RegState mapping");
if (LR.Dirty) {
// If this physreg is used by the instruction, we want to kill it on the
@@ -437,7 +424,7 @@ void RegAllocFast::spillVirtReg(MachineBasicBlock::iterator MI, LiveReg &LR) {
bool SpillKill = MachineBasicBlock::iterator(LR.LastUse) != MI;
LR.Dirty = false;
- spill(MI, LR.VirtReg, PhysReg, SpillKill);
+ spill(MI, LR.VirtReg, LR.PhysReg, SpillKill);
if (SpillKill)
LR.LastUse = nullptr; // Don't kill register again
@@ -473,16 +460,53 @@ void RegAllocFast::usePhysReg(MachineOperand &MO) {
assert(PhysReg.isPhysical() && "Bad usePhysReg operand");
markRegUsedInInstr(PhysReg);
+ switch (PhysRegState[PhysReg]) {
+ case regDisabled:
+ break;
+ case regReserved:
+ PhysRegState[PhysReg] = regFree;
+ LLVM_FALLTHROUGH;
+ case regFree:
+ MO.setIsKill();
+ return;
+ default:
+ // The physreg was allocated to a virtual register. That means the value we
+ // wanted has been clobbered.
+ llvm_unreachable("Instruction uses an allocated register");
+ }
- for (MCRegUnitIterator UI(PhysReg, TRI); UI.isValid(); ++UI) {
- switch (RegUnitStates[*UI]) {
+ // Maybe a superregister is reserved?
+ for (MCRegAliasIterator AI(PhysReg, TRI, false); AI.isValid(); ++AI) {
+ MCPhysReg Alias = *AI;
+ switch (PhysRegState[Alias]) {
+ case regDisabled:
+ break;
case regReserved:
- RegUnitStates[*UI] = regFree;
+ // Either PhysReg is a subregister of Alias and we mark the
+ // whole register as free, or PhysReg is the superregister of
+ // Alias and we mark all the aliases as disabled before freeing
+ // PhysReg.
+ // In the latter case, since PhysReg was disabled, this means that
+ // its value is defined only by physical sub-registers. This check
+ // is performed by the assert of the default case in this loop.
+ // Note: The value of the superregister may only be partial
+ // defined, that is why regDisabled is a valid state for aliases.
+ assert((TRI->isSuperRegister(PhysReg, Alias) ||
+ TRI->isSuperRegister(Alias, PhysReg)) &&
+ "Instruction is not using a subregister of a reserved register");
LLVM_FALLTHROUGH;
case regFree:
+ if (TRI->isSuperRegister(PhysReg, Alias)) {
+ // Leave the superregister in the working set.
+ setPhysRegState(Alias, regFree);
+ MO.getParent()->addRegisterKilled(Alias, TRI, true);
+ return;
+ }
+ // Some other alias was in the working set - clear it.
+ setPhysRegState(Alias, regDisabled);
break;
default:
- llvm_unreachable("Unexpected reg unit state");
+ llvm_unreachable("Instruction uses an alias of an allocated register");
}
}
@@ -495,20 +519,38 @@ void RegAllocFast::usePhysReg(MachineOperand &MO) {
/// similar to defineVirtReg except the physreg is reserved instead of
/// allocated.
void RegAllocFast::definePhysReg(MachineBasicBlock::iterator MI,
- MCPhysReg PhysReg, unsigned NewState) {
- for (MCRegUnitIterator UI(PhysReg, TRI); UI.isValid(); ++UI) {
- switch (unsigned VirtReg = RegUnitStates[*UI]) {
+ MCPhysReg PhysReg, RegState NewState) {
+ markRegUsedInInstr(PhysReg);
+ switch (Register VirtReg = PhysRegState[PhysReg]) {
+ case regDisabled:
+ break;
+ default:
+ spillVirtReg(MI, VirtReg);
+ LLVM_FALLTHROUGH;
+ case regFree:
+ case regReserved:
+ setPhysRegState(PhysReg, NewState);
+ return;
+ }
+
+ // This is a disabled register, disable all aliases.
+ setPhysRegState(PhysReg, NewState);
+ for (MCRegAliasIterator AI(PhysReg, TRI, false); AI.isValid(); ++AI) {
+ MCPhysReg Alias = *AI;
+ switch (Register VirtReg = PhysRegState[Alias]) {
+ case regDisabled:
+ break;
default:
spillVirtReg(MI, VirtReg);
- break;
+ LLVM_FALLTHROUGH;
case regFree:
case regReserved:
+ setPhysRegState(Alias, regDisabled);
+ if (TRI->isSuperRegister(PhysReg, Alias))
+ return;
break;
}
}
-
- markRegUsedInInstr(PhysReg);
- setPhysRegState(PhysReg, NewState);
}
/// Return the cost of spilling clearing out PhysReg and aliases so it is free
@@ -521,24 +563,46 @@ unsigned RegAllocFast::calcSpillCost(MCPhysReg PhysReg) const {
<< " is already used in instr.\n");
return spillImpossible;
}
+ switch (Register VirtReg = PhysRegState[PhysReg]) {
+ case regDisabled:
+ break;
+ case regFree:
+ return 0;
+ case regReserved:
+ LLVM_DEBUG(dbgs() << printReg(VirtReg, TRI) << " corresponding "
+ << printReg(PhysReg, TRI) << " is reserved already.\n");
+ return spillImpossible;
+ default: {
+ LiveRegMap::const_iterator LRI = findLiveVirtReg(VirtReg);
+ assert(LRI != LiveVirtRegs.end() && LRI->PhysReg &&
+ "Missing VirtReg entry");
+ return LRI->Dirty ? spillDirty : spillClean;
+ }
+ }
- for (MCRegUnitIterator UI(PhysReg, TRI); UI.isValid(); ++UI) {
- switch (unsigned VirtReg = RegUnitStates[*UI]) {
+ // This is a disabled register, add up cost of aliases.
+ LLVM_DEBUG(dbgs() << printReg(PhysReg, TRI) << " is disabled.\n");
+ unsigned Cost = 0;
+ for (MCRegAliasIterator AI(PhysReg, TRI, false); AI.isValid(); ++AI) {
+ MCPhysReg Alias = *AI;
+ switch (Register VirtReg = PhysRegState[Alias]) {
+ case regDisabled:
+ break;
case regFree:
+ ++Cost;
break;
case regReserved:
- LLVM_DEBUG(dbgs() << printReg(VirtReg, TRI) << " corresponding "
- << printReg(PhysReg, TRI) << " is reserved already.\n");
return spillImpossible;
default: {
LiveRegMap::const_iterator LRI = findLiveVirtReg(VirtReg);
assert(LRI != LiveVirtRegs.end() && LRI->PhysReg &&
"Missing VirtReg entry");
- return LRI->Dirty ? spillDirty : spillClean;
+ Cost += LRI->Dirty ? spillDirty : spillClean;
+ break;
}
}
}
- return 0;
+ return Cost;
}
/// This method updates local state so that we know that PhysReg is the
@@ -845,17 +909,9 @@ void RegAllocFast::handleThroughOperands(MachineInstr &MI,
if (!Reg || !Reg.isPhysical())
continue;
markRegUsedInInstr(Reg);
-
- for (MCRegUnitIterator UI(Reg, TRI); UI.isValid(); ++UI) {
- if (!ThroughRegs.count(RegUnitStates[*UI]))
- continue;
-
- // Need to spill any aliasing registers.
- for (MCRegUnitRootIterator RI(*UI, TRI); RI.isValid(); ++RI) {
- for (MCSuperRegIterator SI(*RI, TRI, true); SI.isValid(); ++SI) {
- definePhysReg(MI, *SI, regFree);
- }
- }
+ for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI) {
+ if (ThroughRegs.count(PhysRegState[*AI]))
+ definePhysReg(MI, *AI, regFree);
}
}
@@ -919,40 +975,37 @@ void RegAllocFast::handleThroughOperands(MachineInstr &MI,
}
#ifndef NDEBUG
-
-void RegAllocFast::dumpState() const {
- for (unsigned Unit = 1, UnitE = TRI->getNumRegUnits(); Unit != UnitE;
- ++Unit) {
- switch (unsigned VirtReg = RegUnitStates[Unit]) {
+void RegAllocFast::dumpState() {
+ for (unsigned Reg = 1, E = TRI->getNumRegs(); Reg != E; ++Reg) {
+ if (PhysRegState[Reg] == regDisabled) continue;
+ dbgs() << " " << printReg(Reg, TRI);
+ switch(PhysRegState[Reg]) {
case regFree:
break;
case regReserved:
- dbgs() << " " << printRegUnit(Unit, TRI) << "[P]";
+ dbgs() << "*";
break;
default: {
- dbgs() << ' ' << printRegUnit(Unit, TRI) << '=' << printReg(VirtReg);
- LiveRegMap::const_iterator I = findLiveVirtReg(VirtReg);
- assert(I != LiveVirtRegs.end() && "have LiveVirtRegs entry");
- if (I->Dirty)
- dbgs() << "[D]";
- assert(TRI->hasRegUnit(I->PhysReg, Unit) && "inverse mapping present");
+ dbgs() << '=' << printReg(PhysRegState[Reg]);
+ LiveRegMap::iterator LRI = findLiveVirtReg(PhysRegState[Reg]);
+ assert(LRI != LiveVirtRegs.end() && LRI->PhysReg &&
+ "Missing VirtReg entry");
+ if (LRI->Dirty)
+ dbgs() << "*";
+ assert(LRI->PhysReg == Reg && "Bad inverse map");
break;
}
}
}
dbgs() << '\n';
// Check that LiveVirtRegs is the inverse.
- for (const LiveReg &LR : LiveVirtRegs) {
- Register VirtReg = LR.VirtReg;
- assert(VirtReg.isVirtual() && "Bad map key");
- MCPhysReg PhysReg = LR.PhysReg;
- if (PhysReg != 0) {
- assert(Register::isPhysicalRegister(PhysReg) &&
- "mapped to physreg");
- for (MCRegUnitIterator UI(PhysReg, TRI); UI.isValid(); ++UI) {
- assert(RegUnitStates[*UI] == VirtReg && "inverse map valid");
- }
- }
+ for (LiveRegMap::iterator i = LiveVirtRegs.begin(),
+ e = LiveVirtRegs.end(); i != e; ++i) {
+ if (!i->PhysReg)
+ continue;
+ assert(i->VirtReg.isVirtual() && "Bad map key");
+ assert(Register::isPhysicalRegister(i->PhysReg) && "Bad map value");
+ assert(PhysRegState[i->PhysReg] == i->VirtReg && "Bad inverse map");
}
}
#endif
@@ -1194,7 +1247,7 @@ void RegAllocFast::allocateBasicBlock(MachineBasicBlock &MBB) {
this->MBB = &MBB;
LLVM_DEBUG(dbgs() << "\nAllocating " << MBB);
- RegUnitStates.assign(TRI->getNumRegUnits(), regFree);
+ PhysRegState.assign(TRI->getNumRegs(), regDisabled);
assert(LiveVirtRegs.empty() && "Mapping not cleared from last block?");
MachineBasicBlock::iterator MII = MBB.begin();
diff --git a/llvm/lib/CodeGen/SelectionDAG/FastISel.cpp b/llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
index fc6c3a145f132..f5948d2a20dca 100644
--- a/llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
@@ -690,6 +690,12 @@ bool FastISel::selectGetElementPtr(const User *I) {
Register N = getRegForValue(I->getOperand(0));
if (!N) // Unhandled operand. Halt "fast" selection and bail.
return false;
+
+ // FIXME: The code below does not handle vector GEPs. Halt "fast" selection
+ // and bail.
+ if (isa(I->getType()))
+ return false;
+
bool NIsKill = hasTrivialKill(I->getOperand(0));
// Keep a running tab of the total offset to coalesce multiple N = N + Offset
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 1d596c89c9113..d2930391f87a5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -169,32 +169,6 @@ static cl::opt SwitchPeelThreshold(
// store [4096 x i8] %data, [4096 x i8]* %buffer
static const unsigned MaxParallelChains = 64;
-// Return the calling convention if the Value passed requires ABI mangling as it
-// is a parameter to a function or a return value from a function which is not
-// an intrinsic.
-static Optional getABIRegCopyCC(const Value *V) {
- if (auto *R = dyn_cast(V))
- return R->getParent()->getParent()->getCallingConv();
-
- if (auto *CI = dyn_cast(V)) {
- const bool IsInlineAsm = CI->isInlineAsm();
- const bool IsIndirectFunctionCall =
- !IsInlineAsm && !CI->getCalledFunction();
-
- // It is possible that the call instruction is an inline asm statement or an
- // indirect function call in which case the return value of
- // getCalledFunction() would be nullptr.
- const bool IsInstrinsicCall =
- !IsInlineAsm && !IsIndirectFunctionCall &&
- CI->getCalledFunction()->getIntrinsicID() != Intrinsic::not_intrinsic;
-
- if (!IsInlineAsm && !IsInstrinsicCall)
- return CI->getCallingConv();
- }
-
- return None;
-}
-
static SDValue getCopyFromPartsVector(SelectionDAG &DAG, const SDLoc &DL,
const SDValue *Parts, unsigned NumParts,
MVT PartVT, EVT ValueVT, const Value *V,
@@ -409,7 +383,7 @@ static SDValue getCopyFromPartsVector(SelectionDAG &DAG, const SDLoc &DL,
// as appropriate.
for (unsigned i = 0; i != NumParts; ++i)
Ops[i] = getCopyFromParts(DAG, DL, &Parts[i], 1,
- PartVT, IntermediateVT, V);
+ PartVT, IntermediateVT, V, CallConv);
} else if (NumParts > 0) {
// If the intermediate type was expanded, build the intermediate
// operands from the parts.
@@ -418,7 +392,7 @@ static SDValue getCopyFromPartsVector(SelectionDAG &DAG, const SDLoc &DL,
unsigned Factor = NumParts / NumIntermediates;
for (unsigned i = 0; i != NumIntermediates; ++i)
Ops[i] = getCopyFromParts(DAG, DL, &Parts[i * Factor], Factor,
- PartVT, IntermediateVT, V);
+ PartVT, IntermediateVT, V, CallConv);
}
// Build a vector with BUILD_VECTOR or CONCAT_VECTORS from the
@@ -1624,7 +1598,7 @@ SDValue SelectionDAGBuilder::getValueImpl(const Value *V) {
unsigned InReg = FuncInfo.InitializeRegForValue(Inst);
RegsForValue RFV(*DAG.getContext(), TLI, DAG.getDataLayout(), InReg,
- Inst->getType(), getABIRegCopyCC(V));
+ Inst->getType(), None);
SDValue Chain = DAG.getEntryNode();
return RFV.getCopyFromRegs(DAG, FuncInfo, getCurSDLoc(), Chain, nullptr, V);
}
@@ -5555,7 +5529,7 @@ bool SelectionDAGBuilder::EmitFuncArgumentDbgValue(
if (VMI != FuncInfo.ValueMap.end()) {
const auto &TLI = DAG.getTargetLoweringInfo();
RegsForValue RFV(V->getContext(), TLI, DAG.getDataLayout(), VMI->second,
- V->getType(), getABIRegCopyCC(V));
+ V->getType(), None);
if (RFV.occupiesMultipleRegs()) {
splitMultiRegDbgValue(RFV.getRegsAndSizes());
return true;
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 96df20039b15d..64af293caf9ea 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -5726,6 +5726,11 @@ SDValue TargetLowering::getNegatedExpression(SDValue Op, SelectionDAG &DAG,
return SDValue();
}
+ auto RemoveDeadNode = [&](SDValue N) {
+ if (N && N.getNode()->use_empty())
+ DAG.RemoveDeadNode(N.getNode());
+ };
+
SDLoc DL(Op);
switch (Opcode) {
@@ -5746,8 +5751,10 @@ SDValue TargetLowering::getNegatedExpression(SDValue Op, SelectionDAG &DAG,
// If we already have the use of the negated floating constant, it is free
// to negate it even it has multiple uses.
- if (!Op.hasOneUse() && CFP.use_empty())
+ if (!Op.hasOneUse() && CFP.use_empty()) {
+ RemoveDeadNode(CFP);
break;
+ }
Cost = NegatibleCost::Neutral;
return CFP;
}
@@ -5804,13 +5811,19 @@ SDValue TargetLowering::getNegatedExpression(SDValue Op, SelectionDAG &DAG,
// Negate the X if its cost is less or equal than Y.
if (NegX && (CostX <= CostY)) {
Cost = CostX;
- return DAG.getNode(ISD::FSUB, DL, VT, NegX, Y, Flags);
+ SDValue N = DAG.getNode(ISD::FSUB, DL, VT, NegX, Y, Flags);
+ if (NegY != N)
+ RemoveDeadNode(NegY);
+ return N;
}
// Negate the Y if it is not expensive.
if (NegY) {
Cost = CostY;
- return DAG.getNode(ISD::FSUB, DL, VT, NegY, X, Flags);
+ SDValue N = DAG.getNode(ISD::FSUB, DL, VT, NegY, X, Flags);
+ if (NegX != N)
+ RemoveDeadNode(NegX);
+ return N;
}
break;
}
@@ -5847,7 +5860,10 @@ SDValue TargetLowering::getNegatedExpression(SDValue Op, SelectionDAG &DAG,
// Negate the X if its cost is less or equal than Y.
if (NegX && (CostX <= CostY)) {
Cost = CostX;
- return DAG.getNode(Opcode, DL, VT, NegX, Y, Flags);
+ SDValue N = DAG.getNode(Opcode, DL, VT, NegX, Y, Flags);
+ if (NegY != N)
+ RemoveDeadNode(NegY);
+ return N;
}
// Ignore X * 2.0 because that is expected to be canonicalized to X + X.
@@ -5858,7 +5874,10 @@ SDValue TargetLowering::getNegatedExpression(SDValue Op, SelectionDAG &DAG,
// Negate the Y if it is not expensive.
if (NegY) {
Cost = CostY;
- return DAG.getNode(Opcode, DL, VT, X, NegY, Flags);
+ SDValue N = DAG.getNode(Opcode, DL, VT, X, NegY, Flags);
+ if (NegX != N)
+ RemoveDeadNode(NegX);
+ return N;
}
break;
}
@@ -5887,13 +5906,19 @@ SDValue TargetLowering::getNegatedExpression(SDValue Op, SelectionDAG &DAG,
// Negate the X if its cost is less or equal than Y.
if (NegX && (CostX <= CostY)) {
Cost = std::min(CostX, CostZ);
- return DAG.getNode(Opcode, DL, VT, NegX, Y, NegZ, Flags);
+ SDValue N = DAG.getNode(Opcode, DL, VT, NegX, Y, NegZ, Flags);
+ if (NegY != N)
+ RemoveDeadNode(NegY);
+ return N;
}
// Negate the Y if it is not expensive.
if (NegY) {
Cost = std::min(CostY, CostZ);
- return DAG.getNode(Opcode, DL, VT, X, NegY, NegZ, Flags);
+ SDValue N = DAG.getNode(Opcode, DL, VT, X, NegY, NegZ, Flags);
+ if (NegX != N)
+ RemoveDeadNode(NegX);
+ return N;
}
break;
}
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 2c94c2c62e5f0..42c1fa8af0e6b 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1827,7 +1827,10 @@ Value *TargetLoweringBase::getIRStackGuard(IRBuilder<> &IRB) const {
if (getTargetMachine().getTargetTriple().isOSOpenBSD()) {
Module &M = *IRB.GetInsertBlock()->getParent()->getParent();
PointerType *PtrTy = Type::getInt8PtrTy(M.getContext());
- return M.getOrInsertGlobal("__guard_local", PtrTy);
+ Constant *C = M.getOrInsertGlobal("__guard_local", PtrTy);
+ if (GlobalVariable *G = dyn_cast_or_null(C))
+ G->setVisibility(GlobalValue::HiddenVisibility);
+ return C;
}
return nullptr;
}
diff --git a/llvm/lib/Extensions/Extensions.cpp b/llvm/lib/Extensions/Extensions.cpp
index e69de29bb2d1d..2fe537f91876a 100644
--- a/llvm/lib/Extensions/Extensions.cpp
+++ b/llvm/lib/Extensions/Extensions.cpp
@@ -0,0 +1,15 @@
+#include "llvm/Passes/PassPlugin.h"
+#define HANDLE_EXTENSION(Ext) \
+ llvm::PassPluginLibraryInfo get##Ext##PluginInfo();
+#include "llvm/Support/Extension.def"
+
+
+namespace llvm {
+ namespace details {
+ void extensions_anchor() {
+#define HANDLE_EXTENSION(Ext) \
+ static auto Ext = get##Ext##PluginInfo();
+#include "llvm/Support/Extension.def"
+ }
+ }
+}
diff --git a/llvm/lib/Extensions/LLVMBuild.txt b/llvm/lib/Extensions/LLVMBuild.txt
index 2005830a4dd7a..7a98c8f680513 100644
--- a/llvm/lib/Extensions/LLVMBuild.txt
+++ b/llvm/lib/Extensions/LLVMBuild.txt
@@ -18,4 +18,4 @@
type = Library
name = Extensions
parent = Libraries
-required_libraries =
+required_libraries = Support
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 9468a3aa3c8dd..6c72cd01ce6ea 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -185,16 +185,18 @@ void OpenMPIRBuilder::finalize() {
}
Value *OpenMPIRBuilder::getOrCreateIdent(Constant *SrcLocStr,
- IdentFlag LocFlags) {
+ IdentFlag LocFlags,
+ unsigned Reserve2Flags) {
// Enable "C-mode".
LocFlags |= OMP_IDENT_FLAG_KMPC;
- GlobalVariable *&DefaultIdent = IdentMap[{SrcLocStr, uint64_t(LocFlags)}];
- if (!DefaultIdent) {
+ Value *&Ident =
+ IdentMap[{SrcLocStr, uint64_t(LocFlags) << 31 | Reserve2Flags}];
+ if (!Ident) {
Constant *I32Null = ConstantInt::getNullValue(Int32);
- Constant *IdentData[] = {I32Null,
- ConstantInt::get(Int32, uint64_t(LocFlags)),
- I32Null, I32Null, SrcLocStr};
+ Constant *IdentData[] = {
+ I32Null, ConstantInt::get(Int32, uint32_t(LocFlags)),
+ ConstantInt::get(Int32, Reserve2Flags), I32Null, SrcLocStr};
Constant *Initializer = ConstantStruct::get(
cast(IdentPtr->getPointerElementType()), IdentData);
@@ -203,15 +205,16 @@ Value *OpenMPIRBuilder::getOrCreateIdent(Constant *SrcLocStr,
for (GlobalVariable &GV : M.getGlobalList())
if (GV.getType() == IdentPtr && GV.hasInitializer())
if (GV.getInitializer() == Initializer)
- return DefaultIdent = &GV;
-
- DefaultIdent = new GlobalVariable(M, IdentPtr->getPointerElementType(),
- /* isConstant = */ false,
- GlobalValue::PrivateLinkage, Initializer);
- DefaultIdent->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
- DefaultIdent->setAlignment(Align(8));
+ return Ident = &GV;
+
+ auto *GV = new GlobalVariable(M, IdentPtr->getPointerElementType(),
+ /* isConstant = */ true,
+ GlobalValue::PrivateLinkage, Initializer);
+ GV->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
+ GV->setAlignment(Align(8));
+ Ident = GV;
}
- return DefaultIdent;
+ return Ident;
}
Constant *OpenMPIRBuilder::getOrCreateSrcLocStr(StringRef LocStr) {
@@ -227,11 +230,30 @@ Constant *OpenMPIRBuilder::getOrCreateSrcLocStr(StringRef LocStr) {
GV.getInitializer() == Initializer)
return SrcLocStr = ConstantExpr::getPointerCast(&GV, Int8Ptr);
- SrcLocStr = Builder.CreateGlobalStringPtr(LocStr);
+ SrcLocStr = Builder.CreateGlobalStringPtr(LocStr, /* Name */ "",
+ /* AddressSpace */ 0, &M);
}
return SrcLocStr;
}
+Constant *OpenMPIRBuilder::getOrCreateSrcLocStr(StringRef FunctionName,
+ StringRef FileName,
+ unsigned Line,
+ unsigned Column) {
+ SmallString<128> Buffer;
+ Buffer.push_back(';');
+ Buffer.append(FileName);
+ Buffer.push_back(';');
+ Buffer.append(FunctionName);
+ Buffer.push_back(';');
+ Buffer.append(std::to_string(Line));
+ Buffer.push_back(';');
+ Buffer.append(std::to_string(Column));
+ Buffer.push_back(';');
+ Buffer.push_back(';');
+ return getOrCreateSrcLocStr(Buffer.str());
+}
+
Constant *OpenMPIRBuilder::getOrCreateDefaultSrcLocStr() {
return getOrCreateSrcLocStr(";unknown;unknown;0;0;;");
}
@@ -241,17 +263,13 @@ OpenMPIRBuilder::getOrCreateSrcLocStr(const LocationDescription &Loc) {
DILocation *DIL = Loc.DL.get();
if (!DIL)
return getOrCreateDefaultSrcLocStr();
- StringRef Filename =
+ StringRef FileName =
!DIL->getFilename().empty() ? DIL->getFilename() : M.getName();
StringRef Function = DIL->getScope()->getSubprogram()->getName();
Function =
!Function.empty() ? Function : Loc.IP.getBlock()->getParent()->getName();
- std::string LineStr = std::to_string(DIL->getLine());
- std::string ColumnStr = std::to_string(DIL->getColumn());
- std::stringstream SrcLocStr;
- SrcLocStr << ";" << Filename.data() << ";" << Function.data() << ";"
- << LineStr << ";" << ColumnStr << ";;";
- return getOrCreateSrcLocStr(SrcLocStr.str());
+ return getOrCreateSrcLocStr(Function, FileName, DIL->getLine(),
+ DIL->getColumn());
}
Value *OpenMPIRBuilder::getOrCreateThreadID(Value *Ident) {
diff --git a/llvm/lib/IR/Globals.cpp b/llvm/lib/IR/Globals.cpp
index dd8e62164de1e..ed946ef3fd12b 100644
--- a/llvm/lib/IR/Globals.cpp
+++ b/llvm/lib/IR/Globals.cpp
@@ -104,7 +104,8 @@ bool GlobalValue::isInterposable() const {
bool GlobalValue::canBenefitFromLocalAlias() const {
// See AsmPrinter::getSymbolPreferLocal().
- return GlobalObject::isExternalLinkage(getLinkage()) && !isDeclaration() &&
+ return hasDefaultVisibility() &&
+ GlobalObject::isExternalLinkage(getLinkage()) && !isDeclaration() &&
!isa(this) && !hasComdat();
}
diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp
index 1fffce015f707..a82f15895782c 100644
--- a/llvm/lib/IR/IRBuilder.cpp
+++ b/llvm/lib/IR/IRBuilder.cpp
@@ -42,13 +42,14 @@ using namespace llvm;
/// created.
GlobalVariable *IRBuilderBase::CreateGlobalString(StringRef Str,
const Twine &Name,
- unsigned AddressSpace) {
+ unsigned AddressSpace,
+ Module *M) {
Constant *StrConstant = ConstantDataArray::getString(Context, Str);
- Module &M = *BB->getParent()->getParent();
- auto *GV = new GlobalVariable(M, StrConstant->getType(), true,
- GlobalValue::PrivateLinkage, StrConstant, Name,
- nullptr, GlobalVariable::NotThreadLocal,
- AddressSpace);
+ if (!M)
+ M = BB->getParent()->getParent();
+ auto *GV = new GlobalVariable(
+ *M, StrConstant->getType(), true, GlobalValue::PrivateLinkage,
+ StrConstant, Name, nullptr, GlobalVariable::NotThreadLocal, AddressSpace);
GV->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
GV->setAlignment(Align(1));
return GV;
diff --git a/llvm/lib/IR/LegacyPassManager.cpp b/llvm/lib/IR/LegacyPassManager.cpp
index 74869fa62c66f..4189aea46294c 100644
--- a/llvm/lib/IR/LegacyPassManager.cpp
+++ b/llvm/lib/IR/LegacyPassManager.cpp
@@ -1475,74 +1475,6 @@ void FPPassManager::dumpPassStructure(unsigned Offset) {
}
}
-#ifdef EXPENSIVE_CHECKS
-namespace {
-namespace details {
-
-// Basic hashing mechanism to detect structural change to the IR, used to verify
-// pass return status consistency with actual change. Loosely copied from
-// llvm/lib/Transforms/Utils/FunctionComparator.cpp
-
-class StructuralHash {
- uint64_t Hash = 0x6acaa36bef8325c5ULL;
-
- void update(uint64_t V) { Hash = hashing::detail::hash_16_bytes(Hash, V); }
-
-public:
- StructuralHash() = default;
-
- void update(Function &F) {
- if (F.empty())
- return;
-
- update(F.isVarArg());
- update(F.arg_size());
-
- SmallVector BBs;
- SmallPtrSet VisitedBBs;
-
- BBs.push_back(&F.getEntryBlock());
- VisitedBBs.insert(BBs[0]);
- while (!BBs.empty()) {
- const BasicBlock *BB = BBs.pop_back_val();
- update(45798); // Block header
- for (auto &Inst : *BB)
- update(Inst.getOpcode());
-
- const Instruction *Term = BB->getTerminator();
- for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {
- if (!VisitedBBs.insert(Term->getSuccessor(i)).second)
- continue;
- BBs.push_back(Term->getSuccessor(i));
- }
- }
- }
-
- void update(Module &M) {
- for (Function &F : M)
- update(F);
- }
-
- uint64_t getHash() const { return Hash; }
-};
-
-} // namespace details
-
-uint64_t StructuralHash(Function &F) {
- details::StructuralHash H;
- H.update(F);
- return H.getHash();
-}
-
-uint64_t StructuralHash(Module &M) {
- details::StructuralHash H;
- H.update(M);
- return H.getHash();
-}
-
-} // end anonymous namespace
-
-#endif
/// Execute all of the passes scheduled for execution by invoking
/// runOnFunction method. Keep track of whether any of the passes modifies
@@ -1581,16 +1513,7 @@ bool FPPassManager::runOnFunction(Function &F) {
{
PassManagerPrettyStackEntry X(FP, F);
TimeRegion PassTimer(getPassTimer(FP));
-#ifdef EXPENSIVE_CHECKS
- uint64_t RefHash = StructuralHash(F);
-#endif
LocalChanged |= FP->runOnFunction(F);
-
-#ifdef EXPENSIVE_CHECKS
- assert((LocalChanged || (RefHash == StructuralHash(F))) &&
- "Pass modifies its input and doesn't report it.");
-#endif
-
if (EmitICRemark) {
unsigned NewSize = F.getInstructionCount();
@@ -1691,17 +1614,7 @@ MPPassManager::runOnModule(Module &M) {
PassManagerPrettyStackEntry X(MP, M);
TimeRegion PassTimer(getPassTimer(MP));
-#ifdef EXPENSIVE_CHECKS
- uint64_t RefHash = StructuralHash(M);
-#endif
-
LocalChanged |= MP->runOnModule(M);
-
-#ifdef EXPENSIVE_CHECKS
- assert((LocalChanged || (RefHash == StructuralHash(M))) &&
- "Pass modifies its input and doesn't report it.");
-#endif
-
if (EmitICRemark) {
// Update the size of the module.
unsigned ModuleCount = M.getInstructionCount();
diff --git a/llvm/lib/Support/X86TargetParser.cpp b/llvm/lib/Support/X86TargetParser.cpp
index c629f872df121..4c2d4efbfca8d 100644
--- a/llvm/lib/Support/X86TargetParser.cpp
+++ b/llvm/lib/Support/X86TargetParser.cpp
@@ -522,7 +522,7 @@ static constexpr FeatureBitset ImpliedFeaturesAVX5124FMAPS = {};
static constexpr FeatureBitset ImpliedFeaturesAVX5124VNNIW = {};
// SSE4_A->FMA4->XOP chain.
-static constexpr FeatureBitset ImpliedFeaturesSSE4_A = FeatureSSSE3;
+static constexpr FeatureBitset ImpliedFeaturesSSE4_A = FeatureSSE3;
static constexpr FeatureBitset ImpliedFeaturesFMA4 = FeatureAVX | FeatureSSE4_A;
static constexpr FeatureBitset ImpliedFeaturesXOP = FeatureFMA4;
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 83653dcbb8cf7..c6cc6e9e84718 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -1694,11 +1694,10 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
- RestoreBegin = std::prev(RestoreEnd);;
- while (IsSVECalleeSave(RestoreBegin) &&
- RestoreBegin != MBB.begin())
+ RestoreBegin = std::prev(RestoreEnd);
+ while (RestoreBegin != MBB.begin() &&
+ IsSVECalleeSave(std::prev(RestoreBegin)))
--RestoreBegin;
- ++RestoreBegin;
assert(IsSVECalleeSave(RestoreBegin) &&
IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
diff --git a/llvm/lib/Target/AArch64/SVEInstrFormats.td b/llvm/lib/Target/AArch64/SVEInstrFormats.td
index c56a65b9e2124..e86f2a6ebde46 100644
--- a/llvm/lib/Target/AArch64/SVEInstrFormats.td
+++ b/llvm/lib/Target/AArch64/SVEInstrFormats.td
@@ -5416,7 +5416,7 @@ multiclass sve_mem_64b_sst_vi_ptrs opc, string asm,
def : InstAlias(NAME # _IMM) ZPR64:$Zt, PPR3bAny:$Pg, ZPR64:$Zn, imm_ty:$imm5), 0>;
def : InstAlias(NAME # _IMM) Z_s:$Zt, PPR3bAny:$Pg, ZPR64:$Zn, 0), 1>;
+ (!cast(NAME # _IMM) Z_d:$Zt, PPR3bAny:$Pg, ZPR64:$Zn, 0), 1>;
def : Pat<(op (nxv2i64 ZPR:$data), (nxv2i1 PPR:$gp), (nxv2i64 ZPR:$ptrs), imm_ty:$index, vt),
(!cast(NAME # _IMM) ZPR:$data, PPR:$gp, ZPR:$ptrs, imm_ty:$index)>;
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index ffcf4c30bc70d..92980d2406cf2 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -192,8 +192,8 @@ static bool updateOperand(FoldCandidate &Fold,
if (Fold.isImm()) {
if (MI->getDesc().TSFlags & SIInstrFlags::IsPacked &&
!(MI->getDesc().TSFlags & SIInstrFlags::IsMAI) &&
- AMDGPU::isInlinableLiteralV216(static_cast(Fold.ImmToFold),
- ST.hasInv2PiInlineImm())) {
+ AMDGPU::isFoldableLiteralV216(Fold.ImmToFold,
+ ST.hasInv2PiInlineImm())) {
// Set op_sel/op_sel_hi on this operand or bail out if op_sel is
// already set.
unsigned Opcode = MI->getOpcode();
@@ -209,30 +209,30 @@ static bool updateOperand(FoldCandidate &Fold,
ModIdx = AMDGPU::getNamedOperandIdx(Opcode, ModIdx);
MachineOperand &Mod = MI->getOperand(ModIdx);
unsigned Val = Mod.getImm();
- if ((Val & SISrcMods::OP_SEL_0) || !(Val & SISrcMods::OP_SEL_1))
- return false;
- // Only apply the following transformation if that operand requries
- // a packed immediate.
- switch (TII.get(Opcode).OpInfo[OpNo].OperandType) {
- case AMDGPU::OPERAND_REG_IMM_V2FP16:
- case AMDGPU::OPERAND_REG_IMM_V2INT16:
- case AMDGPU::OPERAND_REG_INLINE_C_V2FP16:
- case AMDGPU::OPERAND_REG_INLINE_C_V2INT16:
- // If upper part is all zero we do not need op_sel_hi.
- if (!isUInt<16>(Fold.ImmToFold)) {
- if (!(Fold.ImmToFold & 0xffff)) {
- Mod.setImm(Mod.getImm() | SISrcMods::OP_SEL_0);
+ if (!(Val & SISrcMods::OP_SEL_0) && (Val & SISrcMods::OP_SEL_1)) {
+ // Only apply the following transformation if that operand requries
+ // a packed immediate.
+ switch (TII.get(Opcode).OpInfo[OpNo].OperandType) {
+ case AMDGPU::OPERAND_REG_IMM_V2FP16:
+ case AMDGPU::OPERAND_REG_IMM_V2INT16:
+ case AMDGPU::OPERAND_REG_INLINE_C_V2FP16:
+ case AMDGPU::OPERAND_REG_INLINE_C_V2INT16:
+ // If upper part is all zero we do not need op_sel_hi.
+ if (!isUInt<16>(Fold.ImmToFold)) {
+ if (!(Fold.ImmToFold & 0xffff)) {
+ Mod.setImm(Mod.getImm() | SISrcMods::OP_SEL_0);
+ Mod.setImm(Mod.getImm() & ~SISrcMods::OP_SEL_1);
+ Old.ChangeToImmediate((Fold.ImmToFold >> 16) & 0xffff);
+ return true;
+ }
Mod.setImm(Mod.getImm() & ~SISrcMods::OP_SEL_1);
- Old.ChangeToImmediate((Fold.ImmToFold >> 16) & 0xffff);
+ Old.ChangeToImmediate(Fold.ImmToFold & 0xffff);
return true;
}
- Mod.setImm(Mod.getImm() & ~SISrcMods::OP_SEL_1);
- Old.ChangeToImmediate(Fold.ImmToFold & 0xffff);
- return true;
+ break;
+ default:
+ break;
}
- break;
- default:
- break;
}
}
}
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index 00e6d517bde58..3df2157fc402d 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -1282,6 +1282,19 @@ bool isInlinableIntLiteralV216(int32_t Literal) {
return Lo16 == Hi16 && isInlinableIntLiteral(Lo16);
}
+bool isFoldableLiteralV216(int32_t Literal, bool HasInv2Pi) {
+ assert(HasInv2Pi);
+
+ int16_t Lo16 = static_cast(Literal);
+ if (isInt<16>(Literal) || isUInt<16>(Literal))
+ return true;
+
+ int16_t Hi16 = static_cast(Literal >> 16);
+ if (!(Literal & 0xffff))
+ return true;
+ return Lo16 == Hi16;
+}
+
bool isArgPassedInSGPR(const Argument *A) {
const Function *F = A->getParent();
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
index e71554575f6af..26bb77f4b4c74 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
@@ -660,6 +660,9 @@ bool isInlinableLiteralV216(int32_t Literal, bool HasInv2Pi);
LLVM_READNONE
bool isInlinableIntLiteralV216(int32_t Literal);
+LLVM_READNONE
+bool isFoldableLiteralV216(int32_t Literal, bool HasInv2Pi);
+
bool isArgPassedInSGPR(const Argument *Arg);
LLVM_READONLY
diff --git a/llvm/lib/Target/PowerPC/PPCBoolRetToInt.cpp b/llvm/lib/Target/PowerPC/PPCBoolRetToInt.cpp
index 2259a29f838ab..f125ca011cd22 100644
--- a/llvm/lib/Target/PowerPC/PPCBoolRetToInt.cpp
+++ b/llvm/lib/Target/PowerPC/PPCBoolRetToInt.cpp
@@ -78,9 +78,9 @@ class PPCBoolRetToInt : public FunctionPass {
Value *Curr = WorkList.back();
WorkList.pop_back();
auto *CurrUser = dyn_cast(Curr);
- // Operands of CallInst are skipped because they may not be Bool type,
- // and their positions are defined by ABI.
- if (CurrUser && !isa(Curr))
+ // Operands of CallInst/Constant are skipped because they may not be Bool
+ // type. For CallInst, their positions are defined by ABI.
+ if (CurrUser && !isa(Curr) && !isa(Curr))
for (auto &Op : CurrUser->operands())
if (Defs.insert(Op).second)
WorkList.push_back(Op);
@@ -90,6 +90,9 @@ class PPCBoolRetToInt : public FunctionPass {
// Translate a i1 value to an equivalent i32/i64 value:
Value *translate(Value *V) {
+ assert(V->getType() == Type::getInt1Ty(V->getContext()) &&
+ "Expect an i1 value");
+
Type *IntTy = ST->isPPC64() ? Type::getInt64Ty(V->getContext())
: Type::getInt32Ty(V->getContext());
@@ -252,9 +255,9 @@ class PPCBoolRetToInt : public FunctionPass {
auto *First = dyn_cast(Pair.first);
auto *Second = dyn_cast(Pair.second);
assert((!First || Second) && "translated from user to non-user!?");
- // Operands of CallInst are skipped because they may not be Bool type,
- // and their positions are defined by ABI.
- if (First && !isa(First))
+ // Operands of CallInst/Constant are skipped because they may not be Bool
+ // type. For CallInst, their positions are defined by ABI.
+ if (First && !isa(First) && !isa(First))
for (unsigned i = 0; i < First->getNumOperands(); ++i)
Second->setOperand(i, BoolToIntMap[First->getOperand(i)]);
}
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 5c1a4cb16568c..f54f1673526dd 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -799,7 +799,7 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
setOperationAction(ISD::MUL, MVT::v4f32, Legal);
setOperationAction(ISD::FMA, MVT::v4f32, Legal);
- if (TM.Options.UnsafeFPMath || Subtarget.hasVSX()) {
+ if (Subtarget.hasVSX()) {
setOperationAction(ISD::FDIV, MVT::v4f32, Legal);
setOperationAction(ISD::FSQRT, MVT::v4f32, Legal);
}
@@ -920,6 +920,8 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
setOperationAction(ISD::SUB, MVT::v2i64, Expand);
}
+ setOperationAction(ISD::SETCC, MVT::v1i128, Expand);
+
setOperationAction(ISD::LOAD, MVT::v2i64, Promote);
AddPromotedToType (ISD::LOAD, MVT::v2i64, MVT::v2f64);
setOperationAction(ISD::STORE, MVT::v2i64, Promote);
@@ -1258,6 +1260,9 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
setLibcallName(RTLIB::SRA_I128, nullptr);
}
+ if (!isPPC64)
+ setMaxAtomicSizeInBitsSupported(32);
+
setStackPointerRegisterToSaveRestore(isPPC64 ? PPC::X1 : PPC::R1);
// We have target-specific dag combine patterns for the following nodes:
@@ -1295,12 +1300,6 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
setTargetDAGCombine(ISD::SELECT_CC);
}
- // Use reciprocal estimates.
- if (TM.Options.UnsafeFPMath) {
- setTargetDAGCombine(ISD::FDIV);
- setTargetDAGCombine(ISD::FSQRT);
- }
-
if (Subtarget.hasP9Altivec()) {
setTargetDAGCombine(ISD::ABS);
setTargetDAGCombine(ISD::VSELECT);
diff --git a/llvm/lib/Target/PowerPC/PPCInstr64Bit.td b/llvm/lib/Target/PowerPC/PPCInstr64Bit.td
index 6956c40a70be5..de42d354a0488 100644
--- a/llvm/lib/Target/PowerPC/PPCInstr64Bit.td
+++ b/llvm/lib/Target/PowerPC/PPCInstr64Bit.td
@@ -1026,8 +1026,8 @@ def : InstAlias<"mfamr $Rx", (MFSPR8 g8rc:$Rx, 29)>;
foreach SPRG = 0-3 in {
def : InstAlias<"mfsprg $RT, "#SPRG, (MFSPR8 g8rc:$RT, !add(SPRG, 272))>;
def : InstAlias<"mfsprg"#SPRG#" $RT", (MFSPR8 g8rc:$RT, !add(SPRG, 272))>;
- def : InstAlias<"mfsprg "#SPRG#", $RT", (MTSPR8 !add(SPRG, 272), g8rc:$RT)>;
- def : InstAlias<"mfsprg"#SPRG#" $RT", (MTSPR8 !add(SPRG, 272), g8rc:$RT)>;
+ def : InstAlias<"mtsprg "#SPRG#", $RT", (MTSPR8 !add(SPRG, 272), g8rc:$RT)>;
+ def : InstAlias<"mtsprg"#SPRG#" $RT", (MTSPR8 !add(SPRG, 272), g8rc:$RT)>;
}
def : InstAlias<"mfasr $RT", (MFSPR8 g8rc:$RT, 280)>;
diff --git a/llvm/lib/Target/PowerPC/PPCMIPeephole.cpp b/llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
index d2aba6bd6e8de..227c863685ae9 100644
--- a/llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
+++ b/llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
@@ -1555,6 +1555,8 @@ bool PPCMIPeephole::emitRLDICWhenLoweringJumpTables(MachineInstr &MI) {
MI.getOperand(1).setReg(SrcMI->getOperand(1).getReg());
MI.getOperand(2).setImm(NewSH);
MI.getOperand(3).setImm(NewMB);
+ MI.getOperand(1).setIsKill(SrcMI->getOperand(1).isKill());
+ SrcMI->getOperand(1).setIsKill(false);
LLVM_DEBUG(dbgs() << "To: ");
LLVM_DEBUG(MI.dump());
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index a9b9eceb41304..925636c823219 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -601,8 +601,8 @@ bool WebAssemblyTargetLowering::isIntDivCheap(EVT VT,
}
bool WebAssemblyTargetLowering::isVectorLoadExtDesirable(SDValue ExtVal) const {
- MVT ExtT = ExtVal.getSimpleValueType();
- MVT MemT = cast(ExtVal->getOperand(0))->getSimpleValueType(0);
+ EVT ExtT = ExtVal.getValueType();
+ EVT MemT = cast(ExtVal->getOperand(0))->getValueType(0);
return (ExtT == MVT::v8i16 && MemT == MVT::v8i8) ||
(ExtT == MVT::v4i32 && MemT == MVT::v4i16) ||
(ExtT == MVT::v2i64 && MemT == MVT::v2i32);
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index db27711f29b17..fa695c39cd1eb 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -1148,22 +1148,6 @@ static Instruction *canonicalizeAbsNabs(SelectInst &Sel, ICmpInst &Cmp,
return &Sel;
}
-static Value *simplifyWithOpReplaced(Value *V, Value *Op, Value *ReplaceOp,
- const SimplifyQuery &Q) {
- // If this is a binary operator, try to simplify it with the replaced op
- // because we know Op and ReplaceOp are equivalant.
- // For example: V = X + 1, Op = X, ReplaceOp = 42
- // Simplifies as: add(42, 1) --> 43
- if (auto *BO = dyn_cast(V)) {
- if (BO->getOperand(0) == Op)
- return SimplifyBinOp(BO->getOpcode(), ReplaceOp, BO->getOperand(1), Q);
- if (BO->getOperand(1) == Op)
- return SimplifyBinOp(BO->getOpcode(), BO->getOperand(0), ReplaceOp, Q);
- }
-
- return nullptr;
-}
-
/// If we have a select with an equality comparison, then we know the value in
/// one of the arms of the select. See if substituting this value into an arm
/// and simplifying the result yields the same value as the other arm.
@@ -1190,20 +1174,45 @@ static Value *foldSelectValueEquivalence(SelectInst &Sel, ICmpInst &Cmp,
if (Cmp.getPredicate() == ICmpInst::ICMP_NE)
std::swap(TrueVal, FalseVal);
+ auto *FalseInst = dyn_cast(FalseVal);
+ if (!FalseInst)
+ return nullptr;
+
+ // InstSimplify already performed this fold if it was possible subject to
+ // current poison-generating flags. Try the transform again with
+ // poison-generating flags temporarily dropped.
+ bool WasNUW = false, WasNSW = false, WasExact = false;
+ if (auto *OBO = dyn_cast(FalseVal)) {
+ WasNUW = OBO->hasNoUnsignedWrap();
+ WasNSW = OBO->hasNoSignedWrap();
+ FalseInst->setHasNoUnsignedWrap(false);
+ FalseInst->setHasNoSignedWrap(false);
+ }
+ if (auto *PEO = dyn_cast(FalseVal)) {
+ WasExact = PEO->isExact();
+ FalseInst->setIsExact(false);
+ }
+
// Try each equivalence substitution possibility.
// We have an 'EQ' comparison, so the select's false value will propagate.
// Example:
// (X == 42) ? 43 : (X + 1) --> (X == 42) ? (X + 1) : (X + 1) --> X + 1
- // (X == 42) ? (X + 1) : 43 --> (X == 42) ? (42 + 1) : 43 --> 43
Value *CmpLHS = Cmp.getOperand(0), *CmpRHS = Cmp.getOperand(1);
- if (simplifyWithOpReplaced(FalseVal, CmpLHS, CmpRHS, Q) == TrueVal ||
- simplifyWithOpReplaced(FalseVal, CmpRHS, CmpLHS, Q) == TrueVal ||
- simplifyWithOpReplaced(TrueVal, CmpLHS, CmpRHS, Q) == FalseVal ||
- simplifyWithOpReplaced(TrueVal, CmpRHS, CmpLHS, Q) == FalseVal) {
- if (auto *FalseInst = dyn_cast(FalseVal))
- FalseInst->dropPoisonGeneratingFlags();
+ if (SimplifyWithOpReplaced(FalseVal, CmpLHS, CmpRHS, Q,
+ /* AllowRefinement */ false) == TrueVal ||
+ SimplifyWithOpReplaced(FalseVal, CmpRHS, CmpLHS, Q,
+ /* AllowRefinement */ false) == TrueVal) {
return FalseVal;
}
+
+ // Restore poison-generating flags if the transform did not apply.
+ if (WasNUW)
+ FalseInst->setHasNoUnsignedWrap();
+ if (WasNSW)
+ FalseInst->setHasNoSignedWrap();
+ if (WasExact)
+ FalseInst->setIsExact();
+
return nullptr;
}
diff --git a/llvm/test/CodeGen/AArch64/arm64-fast-isel-conversion-fallback.ll b/llvm/test/CodeGen/AArch64/arm64-fast-isel-conversion-fallback.ll
index 7c546936ba27a..392af063eb8a0 100644
--- a/llvm/test/CodeGen/AArch64/arm64-fast-isel-conversion-fallback.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-fast-isel-conversion-fallback.ll
@@ -4,8 +4,8 @@
define i32 @fptosi_wh(half %a) nounwind ssp {
entry:
; CHECK-LABEL: fptosi_wh
-; CHECK: fcvt s0, h0
-; CHECK: fcvtzs [[REG:w[0-9]+]], s0
+; CHECK: fcvt s1, h0
+; CHECK: fcvtzs [[REG:w[0-9]+]], s1
; CHECK: mov w0, [[REG]]
%conv = fptosi half %a to i32
ret i32 %conv
@@ -15,8 +15,8 @@ entry:
define i32 @fptoui_swh(half %a) nounwind ssp {
entry:
; CHECK-LABEL: fptoui_swh
-; CHECK: fcvt s0, h0
-; CHECK: fcvtzu [[REG:w[0-9]+]], s0
+; CHECK: fcvt s1, h0
+; CHECK: fcvtzu [[REG:w[0-9]+]], s1
; CHECK: mov w0, [[REG]]
%conv = fptoui half %a to i32
ret i32 %conv
diff --git a/llvm/test/CodeGen/AArch64/arm64-fast-isel-conversion.ll b/llvm/test/CodeGen/AArch64/arm64-fast-isel-conversion.ll
index d8abf14c1366b..ed03aec07e7da 100644
--- a/llvm/test/CodeGen/AArch64/arm64-fast-isel-conversion.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-fast-isel-conversion.ll
@@ -54,8 +54,8 @@ entry:
; CHECK: ldrh w8, [sp, #12]
; CHECK: str w8, [sp, #8]
; CHECK: ldr w8, [sp, #8]
-; CHECK: ; kill: def $x8 killed $w8
-; CHECK: str x8, [sp]
+; CHECK: mov x9, x8
+; CHECK: str x9, [sp]
; CHECK: ldr x0, [sp]
; CHECK: ret
%a.addr = alloca i8, align 1
@@ -109,8 +109,8 @@ entry:
; CHECK: strh w8, [sp, #12]
; CHECK: ldrsh w8, [sp, #12]
; CHECK: str w8, [sp, #8]
-; CHECK: ldrsw x8, [sp, #8]
-; CHECK: str x8, [sp]
+; CHECK: ldrsw x9, [sp, #8]
+; CHECK: str x9, [sp]
; CHECK: ldr x0, [sp]
; CHECK: ret
%a.addr = alloca i8, align 1
diff --git a/llvm/test/CodeGen/AArch64/arm64-vcvt_f.ll b/llvm/test/CodeGen/AArch64/arm64-vcvt_f.ll
index e1e889b906c01..6b3e8d747d43d 100644
--- a/llvm/test/CodeGen/AArch64/arm64-vcvt_f.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-vcvt_f.ll
@@ -285,11 +285,11 @@ define i16 @to_half(float %in) {
; FAST: // %bb.0:
; FAST-NEXT: sub sp, sp, #16 // =16
; FAST-NEXT: .cfi_def_cfa_offset 16
-; FAST-NEXT: fcvt h0, s0
+; FAST-NEXT: fcvt h1, s0
; FAST-NEXT: // implicit-def: $w0
-; FAST-NEXT: fmov s1, w0
-; FAST-NEXT: mov.16b v1, v0
-; FAST-NEXT: fmov w8, s1
+; FAST-NEXT: fmov s0, w0
+; FAST-NEXT: mov.16b v0, v1
+; FAST-NEXT: fmov w8, s0
; FAST-NEXT: mov w0, w8
; FAST-NEXT: str w0, [sp, #12] // 4-byte Folded Spill
; FAST-NEXT: mov w0, w8
diff --git a/llvm/test/CodeGen/AArch64/emutls.ll b/llvm/test/CodeGen/AArch64/emutls.ll
index 85d2c1a3b3151..25be391bbfaa4 100644
--- a/llvm/test/CodeGen/AArch64/emutls.ll
+++ b/llvm/test/CodeGen/AArch64/emutls.ll
@@ -155,7 +155,6 @@ entry:
; ARM64: .data{{$}}
; ARM64: .globl __emutls_v.i4
; ARM64-LABEL: __emutls_v.i4:
-; ARM64-NEXT: .L__emutls_v.i4$local:
; ARM64-NEXT: .xword 4
; ARM64-NEXT: .xword 4
; ARM64-NEXT: .xword 0
@@ -163,7 +162,6 @@ entry:
; ARM64: .section .rodata,
; ARM64-LABEL: __emutls_t.i4:
-; ARM64-NEXT: .L__emutls_t.i4$local:
; ARM64-NEXT: .word 15
; ARM64-NOT: __emutls_v.i5:
diff --git a/llvm/test/CodeGen/AArch64/fast-isel-sp-adjust.ll b/llvm/test/CodeGen/AArch64/fast-isel-sp-adjust.ll
index 22e3ccf2b1209..8d62fb3556661 100644
--- a/llvm/test/CodeGen/AArch64/fast-isel-sp-adjust.ll
+++ b/llvm/test/CodeGen/AArch64/fast-isel-sp-adjust.ll
@@ -15,7 +15,8 @@
; CHECK-LABEL: foo:
; CHECK: sub
; CHECK-DAG: mov x[[SP:[0-9]+]], sp
-; CHECK-DAG: mov w[[OFFSET:[0-9]+]], #4104
+; CHECK-DAG: mov [[TMP:w[0-9]+]], #4104
+; CHECK: mov w[[OFFSET:[0-9]+]], [[TMP]]
; CHECK: strb w0, [x[[SP]], x[[OFFSET]]]
define void @foo(i8 %in) {
diff --git a/llvm/test/CodeGen/AArch64/framelayout-sve-calleesaves-fix.mir b/llvm/test/CodeGen/AArch64/framelayout-sve-calleesaves-fix.mir
new file mode 100644
index 0000000000000..a3cbd39c6531f
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/framelayout-sve-calleesaves-fix.mir
@@ -0,0 +1,36 @@
+# NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+# RUN: llc -mattr=+sve -mtriple=aarch64-none-linux-gnu -start-before=prologepilog %s -o - | FileCheck %s
+
+--- |
+ define aarch64_sve_vector_pcs void @fix_restorepoint_p4() { entry: unreachable }
+ ; CHECK-LABEL: fix_restorepoint_p4:
+ ; CHECK: // %bb.0: // %entry
+ ; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
+ ; CHECK-NEXT: addvl sp, sp, #-2
+ ; CHECK-NEXT: str p4, [sp, #7, mul vl] // 2-byte Folded Spill
+ ; CHECK-NEXT: str z8, [sp, #1, mul vl] // 16-byte Folded Spill
+ ; CHECK-NEXT: addvl sp, sp, #-1
+ ; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x18, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 24 * VG
+ ; CHECK-NEXT: .cfi_escape 0x10, 0x48, 0x0a, 0x11, 0x70, 0x22, 0x11, 0x78, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d8 @ cfa - 16 - 8 * VG
+ ; CHECK-NEXT: .cfi_offset w29, -16
+ ; CHECK-NEXT: // implicit-def: $z8
+ ; CHECK-NEXT: // implicit-def: $p4
+ ; CHECK-NEXT: addvl sp, sp, #1
+ ; CHECK-NEXT: ldr p4, [sp, #7, mul vl] // 2-byte Folded Reload
+ ; CHECK-NEXT: ldr z8, [sp, #1, mul vl] // 16-byte Folded Reload
+ ; CHECK-NEXT: addvl sp, sp, #2
+ ; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
+ ; CHECK-NEXT: ret
+...
+name: fix_restorepoint_p4
+stack:
+ - { id: 0, stack-id: sve-vec, size: 16, alignment: 16 }
+body: |
+ bb.0.entry:
+ $z8 = IMPLICIT_DEF
+ $p4 = IMPLICIT_DEF
+ B %bb.1
+
+ bb.1.entry:
+ RET_ReallyLR
+---
diff --git a/llvm/test/CodeGen/AArch64/popcount.ll b/llvm/test/CodeGen/AArch64/popcount.ll
index 105969717e46b..1e796fff710c0 100644
--- a/llvm/test/CodeGen/AArch64/popcount.ll
+++ b/llvm/test/CodeGen/AArch64/popcount.ll
@@ -10,11 +10,12 @@ define i8 @popcount128(i128* nocapture nonnull readonly %0) {
; CHECK-NEXT: // implicit-def: $q1
; CHECK-NEXT: mov v1.16b, v0.16b
; CHECK-NEXT: mov v1.d[1], x8
-; CHECK-NEXT: cnt v0.16b, v1.16b
-; CHECK-NEXT: uaddlv h0, v0.16b
+; CHECK-NEXT: cnt v1.16b, v1.16b
+; CHECK-NEXT: uaddlv h2, v1.16b
; CHECK-NEXT: // implicit-def: $q1
-; CHECK-NEXT: mov v1.16b, v0.16b
-; CHECK-NEXT: fmov w0, s1
+; CHECK-NEXT: mov v1.16b, v2.16b
+; CHECK-NEXT: fmov w1, s1
+; CHECK-NEXT: mov w0, w1
; CHECK-NEXT: ret
Entry:
%1 = load i128, i128* %0, align 16
@@ -36,21 +37,21 @@ define i16 @popcount256(i256* nocapture nonnull readonly %0) {
; CHECK-NEXT: // implicit-def: $q1
; CHECK-NEXT: mov v1.16b, v0.16b
; CHECK-NEXT: mov v1.d[1], x9
-; CHECK-NEXT: cnt v0.16b, v1.16b
-; CHECK-NEXT: uaddlv h0, v0.16b
+; CHECK-NEXT: cnt v1.16b, v1.16b
+; CHECK-NEXT: uaddlv h2, v1.16b
; CHECK-NEXT: // implicit-def: $q1
-; CHECK-NEXT: mov v1.16b, v0.16b
-; CHECK-NEXT: fmov w9, s1
+; CHECK-NEXT: mov v1.16b, v2.16b
+; CHECK-NEXT: fmov w10, s1
; CHECK-NEXT: ldr d0, [x0]
; CHECK-NEXT: // implicit-def: $q1
; CHECK-NEXT: mov v1.16b, v0.16b
; CHECK-NEXT: mov v1.d[1], x8
-; CHECK-NEXT: cnt v0.16b, v1.16b
-; CHECK-NEXT: uaddlv h0, v0.16b
+; CHECK-NEXT: cnt v1.16b, v1.16b
+; CHECK-NEXT: uaddlv h2, v1.16b
; CHECK-NEXT: // implicit-def: $q1
-; CHECK-NEXT: mov v1.16b, v0.16b
-; CHECK-NEXT: fmov w8, s1
-; CHECK-NEXT: add w0, w8, w9
+; CHECK-NEXT: mov v1.16b, v2.16b
+; CHECK-NEXT: fmov w11, s1
+; CHECK-NEXT: add w0, w11, w10
; CHECK-NEXT: ret
Entry:
%1 = load i256, i256* %0, align 16
@@ -69,11 +70,11 @@ define <1 x i128> @popcount1x128(<1 x i128> %0) {
; CHECK-NEXT: fmov d0, x0
; CHECK-NEXT: mov v0.d[1], x1
; CHECK-NEXT: cnt v0.16b, v0.16b
-; CHECK-NEXT: uaddlv h0, v0.16b
-; CHECK-NEXT: // implicit-def: $q1
-; CHECK-NEXT: mov v1.16b, v0.16b
-; CHECK-NEXT: fmov w0, s1
-; CHECK-NEXT: // kill: def $x0 killed $w0
+; CHECK-NEXT: uaddlv h1, v0.16b
+; CHECK-NEXT: // implicit-def: $q0
+; CHECK-NEXT: mov v0.16b, v1.16b
+; CHECK-NEXT: fmov w2, s0
+; CHECK-NEXT: mov w0, w2
; CHECK-NEXT: movi v0.2d, #0000000000000000
; CHECK-NEXT: mov x1, v0.d[1]
; CHECK-NEXT: ret
diff --git a/llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll b/llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll
index e26b1c9471049..40ef3b00da6d4 100644
--- a/llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll
+++ b/llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll
@@ -69,15 +69,15 @@ define amdgpu_kernel void @extract_w_offset_vgpr(i32 addrspace(1)* %out) {
; GCN: renamable $vgpr30 = COPY killed renamable $vgpr14
; GCN: renamable $vgpr31 = COPY killed renamable $vgpr15
; GCN: renamable $vgpr32 = COPY killed renamable $vgpr16
- ; GCN: renamable $sgpr0_sgpr1 = S_MOV_B64 $exec
+ ; GCN: renamable $sgpr20_sgpr21 = S_MOV_B64 $exec
; GCN: renamable $vgpr1 = IMPLICIT_DEF
- ; GCN: renamable $sgpr2_sgpr3 = IMPLICIT_DEF
+ ; GCN: renamable $sgpr22_sgpr23 = IMPLICIT_DEF
; GCN: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)
; GCN: SI_SPILL_S128_SAVE killed $sgpr4_sgpr5_sgpr6_sgpr7, %stack.1, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 16 into %stack.1, align 4, addrspace 5)
; GCN: SI_SPILL_V512_SAVE killed $vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31_vgpr32, %stack.2, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 64 into %stack.2, align 4, addrspace 5)
- ; GCN: SI_SPILL_S64_SAVE killed $sgpr0_sgpr1, %stack.3, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.3, align 4, addrspace 5)
+ ; GCN: SI_SPILL_S64_SAVE killed $sgpr20_sgpr21, %stack.3, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.3, align 4, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr1, %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)
- ; GCN: SI_SPILL_S64_SAVE killed $sgpr2_sgpr3, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.5, align 4, addrspace 5)
+ ; GCN: SI_SPILL_S64_SAVE killed $sgpr22_sgpr23, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.5, align 4, addrspace 5)
; GCN: bb.1:
; GCN: successors: %bb.1(0x40000000), %bb.3(0x40000000)
; GCN: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (load 8 from %stack.5, align 4, addrspace 5)
@@ -91,8 +91,8 @@ define amdgpu_kernel void @extract_w_offset_vgpr(i32 addrspace(1)* %out) {
; GCN: renamable $vgpr18 = V_MOV_B32_e32 undef $vgpr3, implicit $exec, implicit killed $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17, implicit $m0
; GCN: S_SET_GPR_IDX_OFF implicit-def $mode, implicit $mode
; GCN: renamable $vgpr19 = COPY renamable $vgpr18
- ; GCN: renamable $sgpr2_sgpr3 = COPY renamable $sgpr4_sgpr5
- ; GCN: SI_SPILL_S64_SAVE killed $sgpr2_sgpr3, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.5, align 4, addrspace 5)
+ ; GCN: renamable $sgpr6_sgpr7 = COPY renamable $sgpr4_sgpr5
+ ; GCN: SI_SPILL_S64_SAVE killed $sgpr6_sgpr7, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.5, align 4, addrspace 5)
; GCN: SI_SPILL_S64_SAVE killed $sgpr0_sgpr1, %stack.6, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.6, align 4, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr19, %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr0, %stack.7, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.7, addrspace 5)
diff --git a/llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll b/llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll
index b119ffd303e08..e991c550c6be0 100644
--- a/llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll
+++ b/llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll
@@ -11,7 +11,7 @@
define amdgpu_kernel void @spill_sgprs_to_multiple_vgprs(i32 addrspace(1)* %out, i32 %in) #0 {
; GCN-LABEL: spill_sgprs_to_multiple_vgprs:
; GCN: ; %bb.0:
-; GCN-NEXT: s_load_dword s0, s[0:1], 0xb
+; GCN-NEXT: s_load_dword s2, s[0:1], 0xb
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND
@@ -42,354 +42,352 @@ define amdgpu_kernel void @spill_sgprs_to_multiple_vgprs(i32 addrspace(1)* %out,
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[84:91]
; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: v_writelane_b32 v0, s4, 0
+; GCN-NEXT: v_writelane_b32 v0, s5, 1
+; GCN-NEXT: v_writelane_b32 v0, s6, 2
+; GCN-NEXT: v_writelane_b32 v0, s7, 3
+; GCN-NEXT: v_writelane_b32 v0, s8, 4
+; GCN-NEXT: v_writelane_b32 v0, s9, 5
+; GCN-NEXT: v_writelane_b32 v0, s10, 6
+; GCN-NEXT: v_writelane_b32 v0, s11, 7
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; def s[4:11]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: v_writelane_b32 v0, s4, 8
+; GCN-NEXT: v_writelane_b32 v0, s5, 9
+; GCN-NEXT: v_writelane_b32 v0, s6, 10
+; GCN-NEXT: v_writelane_b32 v0, s7, 11
+; GCN-NEXT: v_writelane_b32 v0, s8, 12
+; GCN-NEXT: v_writelane_b32 v0, s9, 13
+; GCN-NEXT: v_writelane_b32 v0, s10, 14
+; GCN-NEXT: v_writelane_b32 v0, s11, 15
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; def s[4:11]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: v_writelane_b32 v0, s4, 16
+; GCN-NEXT: v_writelane_b32 v0, s5, 17
+; GCN-NEXT: v_writelane_b32 v0, s6, 18
+; GCN-NEXT: v_writelane_b32 v0, s7, 19
+; GCN-NEXT: v_writelane_b32 v0, s8, 20
+; GCN-NEXT: v_writelane_b32 v0, s9, 21
+; GCN-NEXT: v_writelane_b32 v0, s10, 22
+; GCN-NEXT: v_writelane_b32 v0, s11, 23
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; def s[4:11]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: v_writelane_b32 v0, s4, 24
+; GCN-NEXT: v_writelane_b32 v0, s5, 25
+; GCN-NEXT: v_writelane_b32 v0, s6, 26
+; GCN-NEXT: v_writelane_b32 v0, s7, 27
+; GCN-NEXT: v_writelane_b32 v0, s8, 28
+; GCN-NEXT: v_writelane_b32 v0, s9, 29
+; GCN-NEXT: v_writelane_b32 v0, s10, 30
+; GCN-NEXT: v_writelane_b32 v0, s11, 31
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; def s[4:11]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: v_writelane_b32 v0, s4, 32
+; GCN-NEXT: v_writelane_b32 v0, s5, 33
+; GCN-NEXT: v_writelane_b32 v0, s6, 34
+; GCN-NEXT: v_writelane_b32 v0, s7, 35
+; GCN-NEXT: v_writelane_b32 v0, s8, 36
+; GCN-NEXT: v_writelane_b32 v0, s9, 37
+; GCN-NEXT: v_writelane_b32 v0, s10, 38
+; GCN-NEXT: v_writelane_b32 v0, s11, 39
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; def s[4:11]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: v_writelane_b32 v0, s4, 40
+; GCN-NEXT: v_writelane_b32 v0, s5, 41
+; GCN-NEXT: v_writelane_b32 v0, s6, 42
+; GCN-NEXT: v_writelane_b32 v0, s7, 43
+; GCN-NEXT: v_writelane_b32 v0, s8, 44
+; GCN-NEXT: v_writelane_b32 v0, s9, 45
+; GCN-NEXT: v_writelane_b32 v0, s10, 46
+; GCN-NEXT: v_writelane_b32 v0, s11, 47
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; def s[4:11]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: v_writelane_b32 v0, s4, 48
+; GCN-NEXT: v_writelane_b32 v0, s5, 49
+; GCN-NEXT: v_writelane_b32 v0, s6, 50
+; GCN-NEXT: v_writelane_b32 v0, s7, 51
+; GCN-NEXT: v_writelane_b32 v0, s8, 52
+; GCN-NEXT: v_writelane_b32 v0, s9, 53
+; GCN-NEXT: v_writelane_b32 v0, s10, 54
+; GCN-NEXT: v_writelane_b32 v0, s11, 55
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; def s[4:11]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: s_mov_b32 s3, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)
-; GCN-NEXT: v_writelane_b32 v0, s0, 0
-; GCN-NEXT: v_writelane_b32 v0, s4, 1
-; GCN-NEXT: v_writelane_b32 v0, s5, 2
-; GCN-NEXT: v_writelane_b32 v0, s6, 3
-; GCN-NEXT: v_writelane_b32 v0, s7, 4
-; GCN-NEXT: v_writelane_b32 v0, s8, 5
-; GCN-NEXT: v_writelane_b32 v0, s9, 6
-; GCN-NEXT: v_writelane_b32 v0, s10, 7
-; GCN-NEXT: v_writelane_b32 v0, s11, 8
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; def s[0:7]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_writelane_b32 v0, s0, 9
-; GCN-NEXT: v_writelane_b32 v0, s1, 10
-; GCN-NEXT: v_writelane_b32 v0, s2, 11
-; GCN-NEXT: v_writelane_b32 v0, s3, 12
-; GCN-NEXT: v_writelane_b32 v0, s4, 13
-; GCN-NEXT: v_writelane_b32 v0, s5, 14
-; GCN-NEXT: v_writelane_b32 v0, s6, 15
-; GCN-NEXT: v_writelane_b32 v0, s7, 16
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; def s[0:7]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_writelane_b32 v0, s0, 17
-; GCN-NEXT: v_writelane_b32 v0, s1, 18
-; GCN-NEXT: v_writelane_b32 v0, s2, 19
-; GCN-NEXT: v_writelane_b32 v0, s3, 20
-; GCN-NEXT: v_writelane_b32 v0, s4, 21
-; GCN-NEXT: v_writelane_b32 v0, s5, 22
-; GCN-NEXT: v_writelane_b32 v0, s6, 23
-; GCN-NEXT: v_writelane_b32 v0, s7, 24
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; def s[0:7]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_writelane_b32 v0, s0, 25
-; GCN-NEXT: v_writelane_b32 v0, s1, 26
-; GCN-NEXT: v_writelane_b32 v0, s2, 27
-; GCN-NEXT: v_writelane_b32 v0, s3, 28
-; GCN-NEXT: v_writelane_b32 v0, s4, 29
-; GCN-NEXT: v_writelane_b32 v0, s5, 30
-; GCN-NEXT: v_writelane_b32 v0, s6, 31
-; GCN-NEXT: v_writelane_b32 v0, s7, 32
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; def s[0:7]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_writelane_b32 v0, s0, 33
-; GCN-NEXT: v_writelane_b32 v0, s1, 34
-; GCN-NEXT: v_writelane_b32 v0, s2, 35
-; GCN-NEXT: v_writelane_b32 v0, s3, 36
-; GCN-NEXT: v_writelane_b32 v0, s4, 37
-; GCN-NEXT: v_writelane_b32 v0, s5, 38
-; GCN-NEXT: v_writelane_b32 v0, s6, 39
-; GCN-NEXT: v_writelane_b32 v0, s7, 40
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; def s[0:7]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_writelane_b32 v0, s0, 41
-; GCN-NEXT: v_writelane_b32 v0, s1, 42
-; GCN-NEXT: v_writelane_b32 v0, s2, 43
-; GCN-NEXT: v_writelane_b32 v0, s3, 44
-; GCN-NEXT: v_writelane_b32 v0, s4, 45
-; GCN-NEXT: v_writelane_b32 v0, s5, 46
-; GCN-NEXT: v_writelane_b32 v0, s6, 47
-; GCN-NEXT: v_writelane_b32 v0, s7, 48
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; def s[0:7]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_writelane_b32 v0, s0, 49
-; GCN-NEXT: v_writelane_b32 v0, s1, 50
-; GCN-NEXT: v_writelane_b32 v0, s2, 51
-; GCN-NEXT: v_writelane_b32 v0, s3, 52
-; GCN-NEXT: v_writelane_b32 v0, s4, 53
-; GCN-NEXT: v_writelane_b32 v0, s5, 54
-; GCN-NEXT: v_writelane_b32 v0, s6, 55
-; GCN-NEXT: v_writelane_b32 v0, s7, 56
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; def s[0:7]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: s_mov_b32 s8, 0
-; GCN-NEXT: v_readlane_b32 s9, v0, 0
-; GCN-NEXT: s_cmp_lg_u32 s9, s8
-; GCN-NEXT: v_writelane_b32 v0, s12, 57
-; GCN-NEXT: v_writelane_b32 v0, s13, 58
-; GCN-NEXT: v_writelane_b32 v0, s14, 59
-; GCN-NEXT: v_writelane_b32 v0, s15, 60
-; GCN-NEXT: v_writelane_b32 v0, s16, 61
-; GCN-NEXT: v_writelane_b32 v0, s17, 62
-; GCN-NEXT: v_writelane_b32 v0, s18, 63
-; GCN-NEXT: v_writelane_b32 v1, s19, 0
-; GCN-NEXT: v_writelane_b32 v1, s20, 1
-; GCN-NEXT: v_writelane_b32 v1, s21, 2
-; GCN-NEXT: v_writelane_b32 v1, s22, 3
-; GCN-NEXT: v_writelane_b32 v1, s23, 4
-; GCN-NEXT: v_writelane_b32 v1, s24, 5
-; GCN-NEXT: v_writelane_b32 v1, s25, 6
-; GCN-NEXT: v_writelane_b32 v1, s26, 7
-; GCN-NEXT: v_writelane_b32 v1, s27, 8
-; GCN-NEXT: v_writelane_b32 v1, s36, 9
-; GCN-NEXT: v_writelane_b32 v1, s37, 10
-; GCN-NEXT: v_writelane_b32 v1, s38, 11
-; GCN-NEXT: v_writelane_b32 v1, s39, 12
-; GCN-NEXT: v_writelane_b32 v1, s40, 13
-; GCN-NEXT: v_writelane_b32 v1, s41, 14
-; GCN-NEXT: v_writelane_b32 v1, s42, 15
-; GCN-NEXT: v_writelane_b32 v1, s43, 16
-; GCN-NEXT: v_writelane_b32 v1, s44, 17
-; GCN-NEXT: v_writelane_b32 v1, s45, 18
-; GCN-NEXT: v_writelane_b32 v1, s46, 19
-; GCN-NEXT: v_writelane_b32 v1, s47, 20
-; GCN-NEXT: v_writelane_b32 v1, s48, 21
-; GCN-NEXT: v_writelane_b32 v1, s49, 22
-; GCN-NEXT: v_writelane_b32 v1, s50, 23
-; GCN-NEXT: v_writelane_b32 v1, s51, 24
-; GCN-NEXT: v_writelane_b32 v1, s52, 25
-; GCN-NEXT: v_writelane_b32 v1, s53, 26
-; GCN-NEXT: v_writelane_b32 v1, s54, 27
-; GCN-NEXT: v_writelane_b32 v1, s55, 28
-; GCN-NEXT: v_writelane_b32 v1, s56, 29
-; GCN-NEXT: v_writelane_b32 v1, s57, 30
-; GCN-NEXT: v_writelane_b32 v1, s58, 31
-; GCN-NEXT: v_writelane_b32 v1, s59, 32
-; GCN-NEXT: v_writelane_b32 v1, s60, 33
-; GCN-NEXT: v_writelane_b32 v1, s61, 34
-; GCN-NEXT: v_writelane_b32 v1, s62, 35
-; GCN-NEXT: v_writelane_b32 v1, s63, 36
-; GCN-NEXT: v_writelane_b32 v1, s64, 37
-; GCN-NEXT: v_writelane_b32 v1, s65, 38
-; GCN-NEXT: v_writelane_b32 v1, s66, 39
-; GCN-NEXT: v_writelane_b32 v1, s67, 40
-; GCN-NEXT: v_writelane_b32 v1, s68, 41
-; GCN-NEXT: v_writelane_b32 v1, s69, 42
-; GCN-NEXT: v_writelane_b32 v1, s70, 43
-; GCN-NEXT: v_writelane_b32 v1, s71, 44
-; GCN-NEXT: v_writelane_b32 v1, s72, 45
-; GCN-NEXT: v_writelane_b32 v1, s73, 46
-; GCN-NEXT: v_writelane_b32 v1, s74, 47
-; GCN-NEXT: v_writelane_b32 v1, s75, 48
-; GCN-NEXT: v_writelane_b32 v1, s76, 49
-; GCN-NEXT: v_writelane_b32 v1, s77, 50
-; GCN-NEXT: v_writelane_b32 v1, s78, 51
-; GCN-NEXT: v_writelane_b32 v1, s79, 52
-; GCN-NEXT: v_writelane_b32 v1, s80, 53
-; GCN-NEXT: v_writelane_b32 v1, s81, 54
-; GCN-NEXT: v_writelane_b32 v1, s82, 55
-; GCN-NEXT: v_writelane_b32 v1, s83, 56
-; GCN-NEXT: v_writelane_b32 v1, s84, 57
-; GCN-NEXT: v_writelane_b32 v1, s85, 58
-; GCN-NEXT: v_writelane_b32 v1, s86, 59
-; GCN-NEXT: v_writelane_b32 v1, s87, 60
-; GCN-NEXT: v_writelane_b32 v1, s88, 61
-; GCN-NEXT: v_writelane_b32 v1, s89, 62
-; GCN-NEXT: v_writelane_b32 v1, s90, 63
-; GCN-NEXT: v_writelane_b32 v2, s91, 0
-; GCN-NEXT: v_writelane_b32 v2, s0, 1
-; GCN-NEXT: v_writelane_b32 v2, s1, 2
-; GCN-NEXT: v_writelane_b32 v2, s2, 3
-; GCN-NEXT: v_writelane_b32 v2, s3, 4
-; GCN-NEXT: v_writelane_b32 v2, s4, 5
-; GCN-NEXT: v_writelane_b32 v2, s5, 6
-; GCN-NEXT: v_writelane_b32 v2, s6, 7
-; GCN-NEXT: v_writelane_b32 v2, s7, 8
+; GCN-NEXT: s_cmp_lg_u32 s2, s3
+; GCN-NEXT: v_writelane_b32 v0, s12, 56
+; GCN-NEXT: v_writelane_b32 v0, s13, 57
+; GCN-NEXT: v_writelane_b32 v0, s14, 58
+; GCN-NEXT: v_writelane_b32 v0, s15, 59
+; GCN-NEXT: v_writelane_b32 v0, s16, 60
+; GCN-NEXT: v_writelane_b32 v0, s17, 61
+; GCN-NEXT: v_writelane_b32 v0, s18, 62
+; GCN-NEXT: v_writelane_b32 v0, s19, 63
+; GCN-NEXT: v_writelane_b32 v1, s20, 0
+; GCN-NEXT: v_writelane_b32 v1, s21, 1
+; GCN-NEXT: v_writelane_b32 v1, s22, 2
+; GCN-NEXT: v_writelane_b32 v1, s23, 3
+; GCN-NEXT: v_writelane_b32 v1, s24, 4
+; GCN-NEXT: v_writelane_b32 v1, s25, 5
+; GCN-NEXT: v_writelane_b32 v1, s26, 6
+; GCN-NEXT: v_writelane_b32 v1, s27, 7
+; GCN-NEXT: v_writelane_b32 v1, s36, 8
+; GCN-NEXT: v_writelane_b32 v1, s37, 9
+; GCN-NEXT: v_writelane_b32 v1, s38, 10
+; GCN-NEXT: v_writelane_b32 v1, s39, 11
+; GCN-NEXT: v_writelane_b32 v1, s40, 12
+; GCN-NEXT: v_writelane_b32 v1, s41, 13
+; GCN-NEXT: v_writelane_b32 v1, s42, 14
+; GCN-NEXT: v_writelane_b32 v1, s43, 15
+; GCN-NEXT: v_writelane_b32 v1, s44, 16
+; GCN-NEXT: v_writelane_b32 v1, s45, 17
+; GCN-NEXT: v_writelane_b32 v1, s46, 18
+; GCN-NEXT: v_writelane_b32 v1, s47, 19
+; GCN-NEXT: v_writelane_b32 v1, s48, 20
+; GCN-NEXT: v_writelane_b32 v1, s49, 21
+; GCN-NEXT: v_writelane_b32 v1, s50, 22
+; GCN-NEXT: v_writelane_b32 v1, s51, 23
+; GCN-NEXT: v_writelane_b32 v1, s52, 24
+; GCN-NEXT: v_writelane_b32 v1, s53, 25
+; GCN-NEXT: v_writelane_b32 v1, s54, 26
+; GCN-NEXT: v_writelane_b32 v1, s55, 27
+; GCN-NEXT: v_writelane_b32 v1, s56, 28
+; GCN-NEXT: v_writelane_b32 v1, s57, 29
+; GCN-NEXT: v_writelane_b32 v1, s58, 30
+; GCN-NEXT: v_writelane_b32 v1, s59, 31
+; GCN-NEXT: v_writelane_b32 v1, s60, 32
+; GCN-NEXT: v_writelane_b32 v1, s61, 33
+; GCN-NEXT: v_writelane_b32 v1, s62, 34
+; GCN-NEXT: v_writelane_b32 v1, s63, 35
+; GCN-NEXT: v_writelane_b32 v1, s64, 36
+; GCN-NEXT: v_writelane_b32 v1, s65, 37
+; GCN-NEXT: v_writelane_b32 v1, s66, 38
+; GCN-NEXT: v_writelane_b32 v1, s67, 39
+; GCN-NEXT: v_writelane_b32 v1, s68, 40
+; GCN-NEXT: v_writelane_b32 v1, s69, 41
+; GCN-NEXT: v_writelane_b32 v1, s70, 42
+; GCN-NEXT: v_writelane_b32 v1, s71, 43
+; GCN-NEXT: v_writelane_b32 v1, s72, 44
+; GCN-NEXT: v_writelane_b32 v1, s73, 45
+; GCN-NEXT: v_writelane_b32 v1, s74, 46
+; GCN-NEXT: v_writelane_b32 v1, s75, 47
+; GCN-NEXT: v_writelane_b32 v1, s76, 48
+; GCN-NEXT: v_writelane_b32 v1, s77, 49
+; GCN-NEXT: v_writelane_b32 v1, s78, 50
+; GCN-NEXT: v_writelane_b32 v1, s79, 51
+; GCN-NEXT: v_writelane_b32 v1, s80, 52
+; GCN-NEXT: v_writelane_b32 v1, s81, 53
+; GCN-NEXT: v_writelane_b32 v1, s82, 54
+; GCN-NEXT: v_writelane_b32 v1, s83, 55
+; GCN-NEXT: v_writelane_b32 v1, s84, 56
+; GCN-NEXT: v_writelane_b32 v1, s85, 57
+; GCN-NEXT: v_writelane_b32 v1, s86, 58
+; GCN-NEXT: v_writelane_b32 v1, s87, 59
+; GCN-NEXT: v_writelane_b32 v1, s88, 60
+; GCN-NEXT: v_writelane_b32 v1, s89, 61
+; GCN-NEXT: v_writelane_b32 v1, s90, 62
+; GCN-NEXT: v_writelane_b32 v1, s91, 63
+; GCN-NEXT: v_writelane_b32 v2, s4, 0
+; GCN-NEXT: v_writelane_b32 v2, s5, 1
+; GCN-NEXT: v_writelane_b32 v2, s6, 2
+; GCN-NEXT: v_writelane_b32 v2, s7, 3
+; GCN-NEXT: v_writelane_b32 v2, s8, 4
+; GCN-NEXT: v_writelane_b32 v2, s9, 5
+; GCN-NEXT: v_writelane_b32 v2, s10, 6
+; GCN-NEXT: v_writelane_b32 v2, s11, 7
; GCN-NEXT: s_cbranch_scc1 BB0_2
; GCN-NEXT: ; %bb.1: ; %bb0
-; GCN-NEXT: v_readlane_b32 s0, v0, 1
-; GCN-NEXT: v_readlane_b32 s1, v0, 2
-; GCN-NEXT: v_readlane_b32 s2, v0, 3
-; GCN-NEXT: v_readlane_b32 s3, v0, 4
-; GCN-NEXT: v_readlane_b32 s4, v0, 5
-; GCN-NEXT: v_readlane_b32 s5, v0, 6
-; GCN-NEXT: v_readlane_b32 s6, v0, 7
-; GCN-NEXT: v_readlane_b32 s7, v0, 8
+; GCN-NEXT: v_readlane_b32 s0, v0, 0
+; GCN-NEXT: v_readlane_b32 s1, v0, 1
+; GCN-NEXT: v_readlane_b32 s2, v0, 2
+; GCN-NEXT: v_readlane_b32 s3, v0, 3
+; GCN-NEXT: v_readlane_b32 s4, v0, 4
+; GCN-NEXT: v_readlane_b32 s5, v0, 5
+; GCN-NEXT: v_readlane_b32 s6, v0, 6
+; GCN-NEXT: v_readlane_b32 s7, v0, 7
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v0, 57
-; GCN-NEXT: v_readlane_b32 s1, v0, 58
-; GCN-NEXT: v_readlane_b32 s2, v0, 59
-; GCN-NEXT: v_readlane_b32 s3, v0, 60
-; GCN-NEXT: v_readlane_b32 s4, v0, 61
-; GCN-NEXT: v_readlane_b32 s5, v0, 62
-; GCN-NEXT: v_readlane_b32 s6, v0, 63
-; GCN-NEXT: v_readlane_b32 s7, v1, 0
+; GCN-NEXT: v_readlane_b32 s0, v0, 56
+; GCN-NEXT: v_readlane_b32 s1, v0, 57
+; GCN-NEXT: v_readlane_b32 s2, v0, 58
+; GCN-NEXT: v_readlane_b32 s3, v0, 59
+; GCN-NEXT: v_readlane_b32 s4, v0, 60
+; GCN-NEXT: v_readlane_b32 s5, v0, 61
+; GCN-NEXT: v_readlane_b32 s6, v0, 62
+; GCN-NEXT: v_readlane_b32 s7, v0, 63
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v1, 1
-; GCN-NEXT: v_readlane_b32 s1, v1, 2
-; GCN-NEXT: v_readlane_b32 s2, v1, 3
-; GCN-NEXT: v_readlane_b32 s3, v1, 4
-; GCN-NEXT: v_readlane_b32 s4, v1, 5
-; GCN-NEXT: v_readlane_b32 s5, v1, 6
-; GCN-NEXT: v_readlane_b32 s6, v1, 7
-; GCN-NEXT: v_readlane_b32 s7, v1, 8
+; GCN-NEXT: v_readlane_b32 s0, v1, 0
+; GCN-NEXT: v_readlane_b32 s1, v1, 1
+; GCN-NEXT: v_readlane_b32 s2, v1, 2
+; GCN-NEXT: v_readlane_b32 s3, v1, 3
+; GCN-NEXT: v_readlane_b32 s4, v1, 4
+; GCN-NEXT: v_readlane_b32 s5, v1, 5
+; GCN-NEXT: v_readlane_b32 s6, v1, 6
+; GCN-NEXT: v_readlane_b32 s7, v1, 7
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v1, 9
-; GCN-NEXT: v_readlane_b32 s1, v1, 10
-; GCN-NEXT: v_readlane_b32 s2, v1, 11
-; GCN-NEXT: v_readlane_b32 s3, v1, 12
-; GCN-NEXT: v_readlane_b32 s4, v1, 13
-; GCN-NEXT: v_readlane_b32 s5, v1, 14
-; GCN-NEXT: v_readlane_b32 s6, v1, 15
-; GCN-NEXT: v_readlane_b32 s7, v1, 16
+; GCN-NEXT: v_readlane_b32 s0, v1, 8
+; GCN-NEXT: v_readlane_b32 s1, v1, 9
+; GCN-NEXT: v_readlane_b32 s2, v1, 10
+; GCN-NEXT: v_readlane_b32 s3, v1, 11
+; GCN-NEXT: v_readlane_b32 s4, v1, 12
+; GCN-NEXT: v_readlane_b32 s5, v1, 13
+; GCN-NEXT: v_readlane_b32 s6, v1, 14
+; GCN-NEXT: v_readlane_b32 s7, v1, 15
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v1, 17
-; GCN-NEXT: v_readlane_b32 s1, v1, 18
-; GCN-NEXT: v_readlane_b32 s2, v1, 19
-; GCN-NEXT: v_readlane_b32 s3, v1, 20
-; GCN-NEXT: v_readlane_b32 s4, v1, 21
-; GCN-NEXT: v_readlane_b32 s5, v1, 22
-; GCN-NEXT: v_readlane_b32 s6, v1, 23
-; GCN-NEXT: v_readlane_b32 s7, v1, 24
+; GCN-NEXT: v_readlane_b32 s0, v1, 16
+; GCN-NEXT: v_readlane_b32 s1, v1, 17
+; GCN-NEXT: v_readlane_b32 s2, v1, 18
+; GCN-NEXT: v_readlane_b32 s3, v1, 19
+; GCN-NEXT: v_readlane_b32 s4, v1, 20
+; GCN-NEXT: v_readlane_b32 s5, v1, 21
+; GCN-NEXT: v_readlane_b32 s6, v1, 22
+; GCN-NEXT: v_readlane_b32 s7, v1, 23
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v1, 25
-; GCN-NEXT: v_readlane_b32 s1, v1, 26
-; GCN-NEXT: v_readlane_b32 s2, v1, 27
-; GCN-NEXT: v_readlane_b32 s3, v1, 28
-; GCN-NEXT: v_readlane_b32 s4, v1, 29
-; GCN-NEXT: v_readlane_b32 s5, v1, 30
-; GCN-NEXT: v_readlane_b32 s6, v1, 31
-; GCN-NEXT: v_readlane_b32 s7, v1, 32
+; GCN-NEXT: v_readlane_b32 s0, v1, 24
+; GCN-NEXT: v_readlane_b32 s1, v1, 25
+; GCN-NEXT: v_readlane_b32 s2, v1, 26
+; GCN-NEXT: v_readlane_b32 s3, v1, 27
+; GCN-NEXT: v_readlane_b32 s4, v1, 28
+; GCN-NEXT: v_readlane_b32 s5, v1, 29
+; GCN-NEXT: v_readlane_b32 s6, v1, 30
+; GCN-NEXT: v_readlane_b32 s7, v1, 31
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v1, 33
-; GCN-NEXT: v_readlane_b32 s1, v1, 34
-; GCN-NEXT: v_readlane_b32 s2, v1, 35
-; GCN-NEXT: v_readlane_b32 s3, v1, 36
-; GCN-NEXT: v_readlane_b32 s4, v1, 37
-; GCN-NEXT: v_readlane_b32 s5, v1, 38
-; GCN-NEXT: v_readlane_b32 s6, v1, 39
-; GCN-NEXT: v_readlane_b32 s7, v1, 40
+; GCN-NEXT: v_readlane_b32 s0, v1, 32
+; GCN-NEXT: v_readlane_b32 s1, v1, 33
+; GCN-NEXT: v_readlane_b32 s2, v1, 34
+; GCN-NEXT: v_readlane_b32 s3, v1, 35
+; GCN-NEXT: v_readlane_b32 s4, v1, 36
+; GCN-NEXT: v_readlane_b32 s5, v1, 37
+; GCN-NEXT: v_readlane_b32 s6, v1, 38
+; GCN-NEXT: v_readlane_b32 s7, v1, 39
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v1, 41
-; GCN-NEXT: v_readlane_b32 s1, v1, 42
-; GCN-NEXT: v_readlane_b32 s2, v1, 43
-; GCN-NEXT: v_readlane_b32 s3, v1, 44
-; GCN-NEXT: v_readlane_b32 s4, v1, 45
-; GCN-NEXT: v_readlane_b32 s5, v1, 46
-; GCN-NEXT: v_readlane_b32 s6, v1, 47
-; GCN-NEXT: v_readlane_b32 s7, v1, 48
+; GCN-NEXT: v_readlane_b32 s0, v1, 40
+; GCN-NEXT: v_readlane_b32 s1, v1, 41
+; GCN-NEXT: v_readlane_b32 s2, v1, 42
+; GCN-NEXT: v_readlane_b32 s3, v1, 43
+; GCN-NEXT: v_readlane_b32 s4, v1, 44
+; GCN-NEXT: v_readlane_b32 s5, v1, 45
+; GCN-NEXT: v_readlane_b32 s6, v1, 46
+; GCN-NEXT: v_readlane_b32 s7, v1, 47
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v1, 49
-; GCN-NEXT: v_readlane_b32 s1, v1, 50
-; GCN-NEXT: v_readlane_b32 s2, v1, 51
-; GCN-NEXT: v_readlane_b32 s3, v1, 52
-; GCN-NEXT: v_readlane_b32 s4, v1, 53
-; GCN-NEXT: v_readlane_b32 s5, v1, 54
-; GCN-NEXT: v_readlane_b32 s6, v1, 55
-; GCN-NEXT: v_readlane_b32 s7, v1, 56
+; GCN-NEXT: v_readlane_b32 s0, v1, 48
+; GCN-NEXT: v_readlane_b32 s1, v1, 49
+; GCN-NEXT: v_readlane_b32 s2, v1, 50
+; GCN-NEXT: v_readlane_b32 s3, v1, 51
+; GCN-NEXT: v_readlane_b32 s4, v1, 52
+; GCN-NEXT: v_readlane_b32 s5, v1, 53
+; GCN-NEXT: v_readlane_b32 s6, v1, 54
+; GCN-NEXT: v_readlane_b32 s7, v1, 55
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v1, 57
-; GCN-NEXT: v_readlane_b32 s1, v1, 58
-; GCN-NEXT: v_readlane_b32 s2, v1, 59
-; GCN-NEXT: v_readlane_b32 s3, v1, 60
-; GCN-NEXT: v_readlane_b32 s4, v1, 61
-; GCN-NEXT: v_readlane_b32 s5, v1, 62
-; GCN-NEXT: v_readlane_b32 s6, v1, 63
-; GCN-NEXT: v_readlane_b32 s7, v2, 0
+; GCN-NEXT: v_readlane_b32 s0, v1, 56
+; GCN-NEXT: v_readlane_b32 s1, v1, 57
+; GCN-NEXT: v_readlane_b32 s2, v1, 58
+; GCN-NEXT: v_readlane_b32 s3, v1, 59
+; GCN-NEXT: v_readlane_b32 s4, v1, 60
+; GCN-NEXT: v_readlane_b32 s5, v1, 61
+; GCN-NEXT: v_readlane_b32 s6, v1, 62
+; GCN-NEXT: v_readlane_b32 s7, v1, 63
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v0, 9
-; GCN-NEXT: v_readlane_b32 s1, v0, 10
-; GCN-NEXT: v_readlane_b32 s2, v0, 11
-; GCN-NEXT: v_readlane_b32 s3, v0, 12
-; GCN-NEXT: v_readlane_b32 s4, v0, 13
-; GCN-NEXT: v_readlane_b32 s5, v0, 14
-; GCN-NEXT: v_readlane_b32 s6, v0, 15
-; GCN-NEXT: v_readlane_b32 s7, v0, 16
+; GCN-NEXT: v_readlane_b32 s0, v0, 8
+; GCN-NEXT: v_readlane_b32 s1, v0, 9
+; GCN-NEXT: v_readlane_b32 s2, v0, 10
+; GCN-NEXT: v_readlane_b32 s3, v0, 11
+; GCN-NEXT: v_readlane_b32 s4, v0, 12
+; GCN-NEXT: v_readlane_b32 s5, v0, 13
+; GCN-NEXT: v_readlane_b32 s6, v0, 14
+; GCN-NEXT: v_readlane_b32 s7, v0, 15
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v0, 17
-; GCN-NEXT: v_readlane_b32 s1, v0, 18
-; GCN-NEXT: v_readlane_b32 s2, v0, 19
-; GCN-NEXT: v_readlane_b32 s3, v0, 20
-; GCN-NEXT: v_readlane_b32 s4, v0, 21
-; GCN-NEXT: v_readlane_b32 s5, v0, 22
-; GCN-NEXT: v_readlane_b32 s6, v0, 23
-; GCN-NEXT: v_readlane_b32 s7, v0, 24
+; GCN-NEXT: v_readlane_b32 s0, v0, 16
+; GCN-NEXT: v_readlane_b32 s1, v0, 17
+; GCN-NEXT: v_readlane_b32 s2, v0, 18
+; GCN-NEXT: v_readlane_b32 s3, v0, 19
+; GCN-NEXT: v_readlane_b32 s4, v0, 20
+; GCN-NEXT: v_readlane_b32 s5, v0, 21
+; GCN-NEXT: v_readlane_b32 s6, v0, 22
+; GCN-NEXT: v_readlane_b32 s7, v0, 23
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v0, 25
-; GCN-NEXT: v_readlane_b32 s1, v0, 26
-; GCN-NEXT: v_readlane_b32 s2, v0, 27
-; GCN-NEXT: v_readlane_b32 s3, v0, 28
-; GCN-NEXT: v_readlane_b32 s4, v0, 29
-; GCN-NEXT: v_readlane_b32 s5, v0, 30
-; GCN-NEXT: v_readlane_b32 s6, v0, 31
-; GCN-NEXT: v_readlane_b32 s7, v0, 32
+; GCN-NEXT: v_readlane_b32 s0, v0, 24
+; GCN-NEXT: v_readlane_b32 s1, v0, 25
+; GCN-NEXT: v_readlane_b32 s2, v0, 26
+; GCN-NEXT: v_readlane_b32 s3, v0, 27
+; GCN-NEXT: v_readlane_b32 s4, v0, 28
+; GCN-NEXT: v_readlane_b32 s5, v0, 29
+; GCN-NEXT: v_readlane_b32 s6, v0, 30
+; GCN-NEXT: v_readlane_b32 s7, v0, 31
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v0, 33
-; GCN-NEXT: v_readlane_b32 s1, v0, 34
-; GCN-NEXT: v_readlane_b32 s2, v0, 35
-; GCN-NEXT: v_readlane_b32 s3, v0, 36
-; GCN-NEXT: v_readlane_b32 s4, v0, 37
-; GCN-NEXT: v_readlane_b32 s5, v0, 38
-; GCN-NEXT: v_readlane_b32 s6, v0, 39
-; GCN-NEXT: v_readlane_b32 s7, v0, 40
+; GCN-NEXT: v_readlane_b32 s0, v0, 32
+; GCN-NEXT: v_readlane_b32 s1, v0, 33
+; GCN-NEXT: v_readlane_b32 s2, v0, 34
+; GCN-NEXT: v_readlane_b32 s3, v0, 35
+; GCN-NEXT: v_readlane_b32 s4, v0, 36
+; GCN-NEXT: v_readlane_b32 s5, v0, 37
+; GCN-NEXT: v_readlane_b32 s6, v0, 38
+; GCN-NEXT: v_readlane_b32 s7, v0, 39
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v0, 41
-; GCN-NEXT: v_readlane_b32 s1, v0, 42
-; GCN-NEXT: v_readlane_b32 s2, v0, 43
-; GCN-NEXT: v_readlane_b32 s3, v0, 44
-; GCN-NEXT: v_readlane_b32 s4, v0, 45
-; GCN-NEXT: v_readlane_b32 s5, v0, 46
-; GCN-NEXT: v_readlane_b32 s6, v0, 47
-; GCN-NEXT: v_readlane_b32 s7, v0, 48
+; GCN-NEXT: v_readlane_b32 s0, v0, 40
+; GCN-NEXT: v_readlane_b32 s1, v0, 41
+; GCN-NEXT: v_readlane_b32 s2, v0, 42
+; GCN-NEXT: v_readlane_b32 s3, v0, 43
+; GCN-NEXT: v_readlane_b32 s4, v0, 44
+; GCN-NEXT: v_readlane_b32 s5, v0, 45
+; GCN-NEXT: v_readlane_b32 s6, v0, 46
+; GCN-NEXT: v_readlane_b32 s7, v0, 47
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v0, 49
-; GCN-NEXT: v_readlane_b32 s1, v0, 50
-; GCN-NEXT: v_readlane_b32 s2, v0, 51
-; GCN-NEXT: v_readlane_b32 s3, v0, 52
-; GCN-NEXT: v_readlane_b32 s4, v0, 53
-; GCN-NEXT: v_readlane_b32 s5, v0, 54
-; GCN-NEXT: v_readlane_b32 s6, v0, 55
-; GCN-NEXT: v_readlane_b32 s7, v0, 56
+; GCN-NEXT: v_readlane_b32 s0, v0, 48
+; GCN-NEXT: v_readlane_b32 s1, v0, 49
+; GCN-NEXT: v_readlane_b32 s2, v0, 50
+; GCN-NEXT: v_readlane_b32 s3, v0, 51
+; GCN-NEXT: v_readlane_b32 s4, v0, 52
+; GCN-NEXT: v_readlane_b32 s5, v0, 53
+; GCN-NEXT: v_readlane_b32 s6, v0, 54
+; GCN-NEXT: v_readlane_b32 s7, v0, 55
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v2, 1
-; GCN-NEXT: v_readlane_b32 s1, v2, 2
-; GCN-NEXT: v_readlane_b32 s2, v2, 3
-; GCN-NEXT: v_readlane_b32 s3, v2, 4
-; GCN-NEXT: v_readlane_b32 s4, v2, 5
-; GCN-NEXT: v_readlane_b32 s5, v2, 6
-; GCN-NEXT: v_readlane_b32 s6, v2, 7
-; GCN-NEXT: v_readlane_b32 s7, v2, 8
+; GCN-NEXT: v_readlane_b32 s0, v2, 0
+; GCN-NEXT: v_readlane_b32 s1, v2, 1
+; GCN-NEXT: v_readlane_b32 s2, v2, 2
+; GCN-NEXT: v_readlane_b32 s3, v2, 3
+; GCN-NEXT: v_readlane_b32 s4, v2, 4
+; GCN-NEXT: v_readlane_b32 s5, v2, 5
+; GCN-NEXT: v_readlane_b32 s6, v2, 6
+; GCN-NEXT: v_readlane_b32 s7, v2, 7
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND
@@ -444,195 +442,193 @@ ret:
define amdgpu_kernel void @split_sgpr_spill_2_vgprs(i32 addrspace(1)* %out, i32 %in) #1 {
; GCN-LABEL: split_sgpr_spill_2_vgprs:
; GCN: ; %bb.0:
-; GCN-NEXT: s_load_dword s0, s[0:1], 0xb
+; GCN-NEXT: s_load_dword s2, s[0:1], 0xb
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[36:51]
; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: v_writelane_b32 v0, s4, 0
+; GCN-NEXT: v_writelane_b32 v0, s5, 1
+; GCN-NEXT: v_writelane_b32 v0, s6, 2
+; GCN-NEXT: v_writelane_b32 v0, s7, 3
+; GCN-NEXT: v_writelane_b32 v0, s8, 4
+; GCN-NEXT: v_writelane_b32 v0, s9, 5
+; GCN-NEXT: v_writelane_b32 v0, s10, 6
+; GCN-NEXT: v_writelane_b32 v0, s11, 7
+; GCN-NEXT: v_writelane_b32 v0, s12, 8
+; GCN-NEXT: v_writelane_b32 v0, s13, 9
+; GCN-NEXT: v_writelane_b32 v0, s14, 10
+; GCN-NEXT: v_writelane_b32 v0, s15, 11
+; GCN-NEXT: v_writelane_b32 v0, s16, 12
+; GCN-NEXT: v_writelane_b32 v0, s17, 13
+; GCN-NEXT: v_writelane_b32 v0, s18, 14
+; GCN-NEXT: v_writelane_b32 v0, s19, 15
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; def s[4:19]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: v_writelane_b32 v0, s4, 16
+; GCN-NEXT: v_writelane_b32 v0, s5, 17
+; GCN-NEXT: v_writelane_b32 v0, s6, 18
+; GCN-NEXT: v_writelane_b32 v0, s7, 19
+; GCN-NEXT: v_writelane_b32 v0, s8, 20
+; GCN-NEXT: v_writelane_b32 v0, s9, 21
+; GCN-NEXT: v_writelane_b32 v0, s10, 22
+; GCN-NEXT: v_writelane_b32 v0, s11, 23
+; GCN-NEXT: v_writelane_b32 v0, s12, 24
+; GCN-NEXT: v_writelane_b32 v0, s13, 25
+; GCN-NEXT: v_writelane_b32 v0, s14, 26
+; GCN-NEXT: v_writelane_b32 v0, s15, 27
+; GCN-NEXT: v_writelane_b32 v0, s16, 28
+; GCN-NEXT: v_writelane_b32 v0, s17, 29
+; GCN-NEXT: v_writelane_b32 v0, s18, 30
+; GCN-NEXT: v_writelane_b32 v0, s19, 31
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; def s[4:19]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; def s[20:27]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; def s[0:1]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: s_mov_b32 s3, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)
-; GCN-NEXT: v_writelane_b32 v0, s0, 0
-; GCN-NEXT: v_writelane_b32 v0, s4, 1
-; GCN-NEXT: v_writelane_b32 v0, s5, 2
-; GCN-NEXT: v_writelane_b32 v0, s6, 3
-; GCN-NEXT: v_writelane_b32 v0, s7, 4
-; GCN-NEXT: v_writelane_b32 v0, s8, 5
-; GCN-NEXT: v_writelane_b32 v0, s9, 6
-; GCN-NEXT: v_writelane_b32 v0, s10, 7
-; GCN-NEXT: v_writelane_b32 v0, s11, 8
-; GCN-NEXT: v_writelane_b32 v0, s12, 9
-; GCN-NEXT: v_writelane_b32 v0, s13, 10
-; GCN-NEXT: v_writelane_b32 v0, s14, 11
-; GCN-NEXT: v_writelane_b32 v0, s15, 12
-; GCN-NEXT: v_writelane_b32 v0, s16, 13
-; GCN-NEXT: v_writelane_b32 v0, s17, 14
-; GCN-NEXT: v_writelane_b32 v0, s18, 15
-; GCN-NEXT: v_writelane_b32 v0, s19, 16
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; def s[0:15]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; def s[16:31]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_writelane_b32 v0, s0, 17
-; GCN-NEXT: v_writelane_b32 v0, s1, 18
-; GCN-NEXT: v_writelane_b32 v0, s2, 19
-; GCN-NEXT: v_writelane_b32 v0, s3, 20
-; GCN-NEXT: v_writelane_b32 v0, s4, 21
-; GCN-NEXT: v_writelane_b32 v0, s5, 22
-; GCN-NEXT: v_writelane_b32 v0, s6, 23
-; GCN-NEXT: v_writelane_b32 v0, s7, 24
-; GCN-NEXT: v_writelane_b32 v0, s8, 25
-; GCN-NEXT: v_writelane_b32 v0, s9, 26
-; GCN-NEXT: v_writelane_b32 v0, s10, 27
-; GCN-NEXT: v_writelane_b32 v0, s11, 28
-; GCN-NEXT: v_writelane_b32 v0, s12, 29
-; GCN-NEXT: v_writelane_b32 v0, s13, 30
-; GCN-NEXT: v_writelane_b32 v0, s14, 31
-; GCN-NEXT: v_writelane_b32 v0, s15, 32
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; def s[0:7]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; def s[8:9]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: s_mov_b32 s10, 0
-; GCN-NEXT: v_readlane_b32 s11, v0, 0
-; GCN-NEXT: s_cmp_lg_u32 s11, s10
-; GCN-NEXT: v_writelane_b32 v0, s36, 33
-; GCN-NEXT: v_writelane_b32 v0, s37, 34
-; GCN-NEXT: v_writelane_b32 v0, s38, 35
-; GCN-NEXT: v_writelane_b32 v0, s39, 36
-; GCN-NEXT: v_writelane_b32 v0, s40, 37
-; GCN-NEXT: v_writelane_b32 v0, s41, 38
-; GCN-NEXT: v_writelane_b32 v0, s42, 39
-; GCN-NEXT: v_writelane_b32 v0, s43, 40
-; GCN-NEXT: v_writelane_b32 v0, s44, 41
-; GCN-NEXT: v_writelane_b32 v0, s45, 42
-; GCN-NEXT: v_writelane_b32 v0, s46, 43
-; GCN-NEXT: v_writelane_b32 v0, s47, 44
-; GCN-NEXT: v_writelane_b32 v0, s48, 45
-; GCN-NEXT: v_writelane_b32 v0, s49, 46
-; GCN-NEXT: v_writelane_b32 v0, s50, 47
-; GCN-NEXT: v_writelane_b32 v0, s51, 48
-; GCN-NEXT: v_writelane_b32 v0, s16, 49
-; GCN-NEXT: v_writelane_b32 v0, s17, 50
-; GCN-NEXT: v_writelane_b32 v0, s18, 51
-; GCN-NEXT: v_writelane_b32 v0, s19, 52
-; GCN-NEXT: v_writelane_b32 v0, s20, 53
-; GCN-NEXT: v_writelane_b32 v0, s21, 54
-; GCN-NEXT: v_writelane_b32 v0, s22, 55
-; GCN-NEXT: v_writelane_b32 v0, s23, 56
-; GCN-NEXT: v_writelane_b32 v0, s24, 57
-; GCN-NEXT: v_writelane_b32 v0, s25, 58
-; GCN-NEXT: v_writelane_b32 v0, s26, 59
-; GCN-NEXT: v_writelane_b32 v0, s27, 60
-; GCN-NEXT: v_writelane_b32 v0, s28, 61
-; GCN-NEXT: v_writelane_b32 v0, s29, 62
-; GCN-NEXT: v_writelane_b32 v0, s30, 63
-; GCN-NEXT: v_writelane_b32 v1, s31, 0
-; GCN-NEXT: v_writelane_b32 v1, s0, 1
-; GCN-NEXT: v_writelane_b32 v1, s1, 2
-; GCN-NEXT: v_writelane_b32 v1, s2, 3
-; GCN-NEXT: v_writelane_b32 v1, s3, 4
-; GCN-NEXT: v_writelane_b32 v1, s4, 5
-; GCN-NEXT: v_writelane_b32 v1, s5, 6
-; GCN-NEXT: v_writelane_b32 v1, s6, 7
-; GCN-NEXT: v_writelane_b32 v1, s7, 8
-; GCN-NEXT: v_writelane_b32 v1, s8, 9
-; GCN-NEXT: v_writelane_b32 v1, s9, 10
+; GCN-NEXT: s_cmp_lg_u32 s2, s3
+; GCN-NEXT: v_writelane_b32 v0, s36, 32
+; GCN-NEXT: v_writelane_b32 v0, s37, 33
+; GCN-NEXT: v_writelane_b32 v0, s38, 34
+; GCN-NEXT: v_writelane_b32 v0, s39, 35
+; GCN-NEXT: v_writelane_b32 v0, s40, 36
+; GCN-NEXT: v_writelane_b32 v0, s41, 37
+; GCN-NEXT: v_writelane_b32 v0, s42, 38
+; GCN-NEXT: v_writelane_b32 v0, s43, 39
+; GCN-NEXT: v_writelane_b32 v0, s44, 40
+; GCN-NEXT: v_writelane_b32 v0, s45, 41
+; GCN-NEXT: v_writelane_b32 v0, s46, 42
+; GCN-NEXT: v_writelane_b32 v0, s47, 43
+; GCN-NEXT: v_writelane_b32 v0, s48, 44
+; GCN-NEXT: v_writelane_b32 v0, s49, 45
+; GCN-NEXT: v_writelane_b32 v0, s50, 46
+; GCN-NEXT: v_writelane_b32 v0, s51, 47
+; GCN-NEXT: v_writelane_b32 v0, s4, 48
+; GCN-NEXT: v_writelane_b32 v0, s5, 49
+; GCN-NEXT: v_writelane_b32 v0, s6, 50
+; GCN-NEXT: v_writelane_b32 v0, s7, 51
+; GCN-NEXT: v_writelane_b32 v0, s8, 52
+; GCN-NEXT: v_writelane_b32 v0, s9, 53
+; GCN-NEXT: v_writelane_b32 v0, s10, 54
+; GCN-NEXT: v_writelane_b32 v0, s11, 55
+; GCN-NEXT: v_writelane_b32 v0, s12, 56
+; GCN-NEXT: v_writelane_b32 v0, s13, 57
+; GCN-NEXT: v_writelane_b32 v0, s14, 58
+; GCN-NEXT: v_writelane_b32 v0, s15, 59
+; GCN-NEXT: v_writelane_b32 v0, s16, 60
+; GCN-NEXT: v_writelane_b32 v0, s17, 61
+; GCN-NEXT: v_writelane_b32 v0, s18, 62
+; GCN-NEXT: v_writelane_b32 v0, s19, 63
+; GCN-NEXT: v_writelane_b32 v1, s20, 0
+; GCN-NEXT: v_writelane_b32 v1, s21, 1
+; GCN-NEXT: v_writelane_b32 v1, s22, 2
+; GCN-NEXT: v_writelane_b32 v1, s23, 3
+; GCN-NEXT: v_writelane_b32 v1, s24, 4
+; GCN-NEXT: v_writelane_b32 v1, s25, 5
+; GCN-NEXT: v_writelane_b32 v1, s26, 6
+; GCN-NEXT: v_writelane_b32 v1, s27, 7
+; GCN-NEXT: v_writelane_b32 v1, s0, 8
+; GCN-NEXT: v_writelane_b32 v1, s1, 9
; GCN-NEXT: s_cbranch_scc1 BB1_2
; GCN-NEXT: ; %bb.1: ; %bb0
-; GCN-NEXT: v_readlane_b32 s0, v0, 1
-; GCN-NEXT: v_readlane_b32 s1, v0, 2
-; GCN-NEXT: v_readlane_b32 s2, v0, 3
-; GCN-NEXT: v_readlane_b32 s3, v0, 4
-; GCN-NEXT: v_readlane_b32 s4, v0, 5
-; GCN-NEXT: v_readlane_b32 s5, v0, 6
-; GCN-NEXT: v_readlane_b32 s6, v0, 7
-; GCN-NEXT: v_readlane_b32 s7, v0, 8
-; GCN-NEXT: v_readlane_b32 s8, v0, 9
-; GCN-NEXT: v_readlane_b32 s9, v0, 10
-; GCN-NEXT: v_readlane_b32 s10, v0, 11
-; GCN-NEXT: v_readlane_b32 s11, v0, 12
-; GCN-NEXT: v_readlane_b32 s12, v0, 13
-; GCN-NEXT: v_readlane_b32 s13, v0, 14
-; GCN-NEXT: v_readlane_b32 s14, v0, 15
-; GCN-NEXT: v_readlane_b32 s15, v0, 16
+; GCN-NEXT: v_readlane_b32 s0, v0, 0
+; GCN-NEXT: v_readlane_b32 s1, v0, 1
+; GCN-NEXT: v_readlane_b32 s2, v0, 2
+; GCN-NEXT: v_readlane_b32 s3, v0, 3
+; GCN-NEXT: v_readlane_b32 s4, v0, 4
+; GCN-NEXT: v_readlane_b32 s5, v0, 5
+; GCN-NEXT: v_readlane_b32 s6, v0, 6
+; GCN-NEXT: v_readlane_b32 s7, v0, 7
+; GCN-NEXT: v_readlane_b32 s8, v0, 8
+; GCN-NEXT: v_readlane_b32 s9, v0, 9
+; GCN-NEXT: v_readlane_b32 s10, v0, 10
+; GCN-NEXT: v_readlane_b32 s11, v0, 11
+; GCN-NEXT: v_readlane_b32 s12, v0, 12
+; GCN-NEXT: v_readlane_b32 s13, v0, 13
+; GCN-NEXT: v_readlane_b32 s14, v0, 14
+; GCN-NEXT: v_readlane_b32 s15, v0, 15
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v0, 33
-; GCN-NEXT: v_readlane_b32 s1, v0, 34
-; GCN-NEXT: v_readlane_b32 s2, v0, 35
-; GCN-NEXT: v_readlane_b32 s3, v0, 36
-; GCN-NEXT: v_readlane_b32 s4, v0, 37
-; GCN-NEXT: v_readlane_b32 s5, v0, 38
-; GCN-NEXT: v_readlane_b32 s6, v0, 39
-; GCN-NEXT: v_readlane_b32 s7, v0, 40
-; GCN-NEXT: v_readlane_b32 s8, v0, 41
-; GCN-NEXT: v_readlane_b32 s9, v0, 42
-; GCN-NEXT: v_readlane_b32 s10, v0, 43
-; GCN-NEXT: v_readlane_b32 s11, v0, 44
-; GCN-NEXT: v_readlane_b32 s12, v0, 45
-; GCN-NEXT: v_readlane_b32 s13, v0, 46
-; GCN-NEXT: v_readlane_b32 s14, v0, 47
-; GCN-NEXT: v_readlane_b32 s15, v0, 48
+; GCN-NEXT: v_readlane_b32 s0, v0, 32
+; GCN-NEXT: v_readlane_b32 s1, v0, 33
+; GCN-NEXT: v_readlane_b32 s2, v0, 34
+; GCN-NEXT: v_readlane_b32 s3, v0, 35
+; GCN-NEXT: v_readlane_b32 s4, v0, 36
+; GCN-NEXT: v_readlane_b32 s5, v0, 37
+; GCN-NEXT: v_readlane_b32 s6, v0, 38
+; GCN-NEXT: v_readlane_b32 s7, v0, 39
+; GCN-NEXT: v_readlane_b32 s8, v0, 40
+; GCN-NEXT: v_readlane_b32 s9, v0, 41
+; GCN-NEXT: v_readlane_b32 s10, v0, 42
+; GCN-NEXT: v_readlane_b32 s11, v0, 43
+; GCN-NEXT: v_readlane_b32 s12, v0, 44
+; GCN-NEXT: v_readlane_b32 s13, v0, 45
+; GCN-NEXT: v_readlane_b32 s14, v0, 46
+; GCN-NEXT: v_readlane_b32 s15, v0, 47
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v0, 17
-; GCN-NEXT: v_readlane_b32 s1, v0, 18
-; GCN-NEXT: v_readlane_b32 s2, v0, 19
-; GCN-NEXT: v_readlane_b32 s3, v0, 20
-; GCN-NEXT: v_readlane_b32 s4, v0, 21
-; GCN-NEXT: v_readlane_b32 s5, v0, 22
-; GCN-NEXT: v_readlane_b32 s6, v0, 23
-; GCN-NEXT: v_readlane_b32 s7, v0, 24
-; GCN-NEXT: v_readlane_b32 s8, v0, 25
-; GCN-NEXT: v_readlane_b32 s9, v0, 26
-; GCN-NEXT: v_readlane_b32 s10, v0, 27
-; GCN-NEXT: v_readlane_b32 s11, v0, 28
-; GCN-NEXT: v_readlane_b32 s12, v0, 29
-; GCN-NEXT: v_readlane_b32 s13, v0, 30
-; GCN-NEXT: v_readlane_b32 s14, v0, 31
-; GCN-NEXT: v_readlane_b32 s15, v0, 32
+; GCN-NEXT: v_readlane_b32 s0, v0, 16
+; GCN-NEXT: v_readlane_b32 s1, v0, 17
+; GCN-NEXT: v_readlane_b32 s2, v0, 18
+; GCN-NEXT: v_readlane_b32 s3, v0, 19
+; GCN-NEXT: v_readlane_b32 s4, v0, 20
+; GCN-NEXT: v_readlane_b32 s5, v0, 21
+; GCN-NEXT: v_readlane_b32 s6, v0, 22
+; GCN-NEXT: v_readlane_b32 s7, v0, 23
+; GCN-NEXT: v_readlane_b32 s8, v0, 24
+; GCN-NEXT: v_readlane_b32 s9, v0, 25
+; GCN-NEXT: v_readlane_b32 s10, v0, 26
+; GCN-NEXT: v_readlane_b32 s11, v0, 27
+; GCN-NEXT: v_readlane_b32 s12, v0, 28
+; GCN-NEXT: v_readlane_b32 s13, v0, 29
+; GCN-NEXT: v_readlane_b32 s14, v0, 30
+; GCN-NEXT: v_readlane_b32 s15, v0, 31
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v1, 1
-; GCN-NEXT: v_readlane_b32 s1, v1, 2
-; GCN-NEXT: v_readlane_b32 s2, v1, 3
-; GCN-NEXT: v_readlane_b32 s3, v1, 4
-; GCN-NEXT: v_readlane_b32 s4, v1, 5
-; GCN-NEXT: v_readlane_b32 s5, v1, 6
-; GCN-NEXT: v_readlane_b32 s6, v1, 7
-; GCN-NEXT: v_readlane_b32 s7, v1, 8
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; use s[0:7]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v1, 9
-; GCN-NEXT: v_readlane_b32 s1, v1, 10
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; use s[0:1]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v0, 49
-; GCN-NEXT: v_readlane_b32 s1, v0, 50
-; GCN-NEXT: v_readlane_b32 s2, v0, 51
-; GCN-NEXT: v_readlane_b32 s3, v0, 52
-; GCN-NEXT: v_readlane_b32 s4, v0, 53
-; GCN-NEXT: v_readlane_b32 s5, v0, 54
-; GCN-NEXT: v_readlane_b32 s6, v0, 55
-; GCN-NEXT: v_readlane_b32 s7, v0, 56
-; GCN-NEXT: v_readlane_b32 s8, v0, 57
-; GCN-NEXT: v_readlane_b32 s9, v0, 58
-; GCN-NEXT: v_readlane_b32 s10, v0, 59
-; GCN-NEXT: v_readlane_b32 s11, v0, 60
-; GCN-NEXT: v_readlane_b32 s12, v0, 61
-; GCN-NEXT: v_readlane_b32 s13, v0, 62
-; GCN-NEXT: v_readlane_b32 s14, v0, 63
-; GCN-NEXT: v_readlane_b32 s15, v1, 0
+; GCN-NEXT: v_readlane_b32 s16, v1, 0
+; GCN-NEXT: v_readlane_b32 s17, v1, 1
+; GCN-NEXT: v_readlane_b32 s18, v1, 2
+; GCN-NEXT: v_readlane_b32 s19, v1, 3
+; GCN-NEXT: v_readlane_b32 s20, v1, 4
+; GCN-NEXT: v_readlane_b32 s21, v1, 5
+; GCN-NEXT: v_readlane_b32 s22, v1, 6
+; GCN-NEXT: v_readlane_b32 s23, v1, 7
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; use s[16:23]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: v_readlane_b32 s24, v1, 8
+; GCN-NEXT: v_readlane_b32 s25, v1, 9
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; use s[24:25]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: v_readlane_b32 s0, v0, 48
+; GCN-NEXT: v_readlane_b32 s1, v0, 49
+; GCN-NEXT: v_readlane_b32 s2, v0, 50
+; GCN-NEXT: v_readlane_b32 s3, v0, 51
+; GCN-NEXT: v_readlane_b32 s4, v0, 52
+; GCN-NEXT: v_readlane_b32 s5, v0, 53
+; GCN-NEXT: v_readlane_b32 s6, v0, 54
+; GCN-NEXT: v_readlane_b32 s7, v0, 55
+; GCN-NEXT: v_readlane_b32 s8, v0, 56
+; GCN-NEXT: v_readlane_b32 s9, v0, 57
+; GCN-NEXT: v_readlane_b32 s10, v0, 58
+; GCN-NEXT: v_readlane_b32 s11, v0, 59
+; GCN-NEXT: v_readlane_b32 s12, v0, 60
+; GCN-NEXT: v_readlane_b32 s13, v0, 61
+; GCN-NEXT: v_readlane_b32 s14, v0, 62
+; GCN-NEXT: v_readlane_b32 s15, v0, 63
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND
@@ -667,13 +663,13 @@ ret:
define amdgpu_kernel void @no_vgprs_last_sgpr_spill(i32 addrspace(1)* %out, i32 %in) #1 {
; GCN-LABEL: no_vgprs_last_sgpr_spill:
; GCN: ; %bb.0:
-; GCN-NEXT: s_mov_b32 s56, SCRATCH_RSRC_DWORD0
-; GCN-NEXT: s_mov_b32 s57, SCRATCH_RSRC_DWORD1
-; GCN-NEXT: s_mov_b32 s58, -1
-; GCN-NEXT: s_mov_b32 s59, 0xe8f000
-; GCN-NEXT: s_add_u32 s56, s56, s3
-; GCN-NEXT: s_addc_u32 s57, s57, 0
-; GCN-NEXT: s_load_dword s0, s[0:1], 0xb
+; GCN-NEXT: s_mov_b32 s20, SCRATCH_RSRC_DWORD0
+; GCN-NEXT: s_mov_b32 s21, SCRATCH_RSRC_DWORD1
+; GCN-NEXT: s_mov_b32 s22, -1
+; GCN-NEXT: s_mov_b32 s23, 0xe8f000
+; GCN-NEXT: s_add_u32 s20, s20, s3
+; GCN-NEXT: s_addc_u32 s21, s21, 0
+; GCN-NEXT: s_load_dword s2, s[0:1], 0xb
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART
@@ -692,179 +688,177 @@ define amdgpu_kernel void @no_vgprs_last_sgpr_spill(i32 addrspace(1)* %out, i32
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[36:51]
; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: v_writelane_b32 v31, s4, 0
+; GCN-NEXT: v_writelane_b32 v31, s5, 1
+; GCN-NEXT: v_writelane_b32 v31, s6, 2
+; GCN-NEXT: v_writelane_b32 v31, s7, 3
+; GCN-NEXT: v_writelane_b32 v31, s8, 4
+; GCN-NEXT: v_writelane_b32 v31, s9, 5
+; GCN-NEXT: v_writelane_b32 v31, s10, 6
+; GCN-NEXT: v_writelane_b32 v31, s11, 7
+; GCN-NEXT: v_writelane_b32 v31, s12, 8
+; GCN-NEXT: v_writelane_b32 v31, s13, 9
+; GCN-NEXT: v_writelane_b32 v31, s14, 10
+; GCN-NEXT: v_writelane_b32 v31, s15, 11
+; GCN-NEXT: v_writelane_b32 v31, s16, 12
+; GCN-NEXT: v_writelane_b32 v31, s17, 13
+; GCN-NEXT: v_writelane_b32 v31, s18, 14
+; GCN-NEXT: v_writelane_b32 v31, s19, 15
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; def s[4:19]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: v_writelane_b32 v31, s4, 16
+; GCN-NEXT: v_writelane_b32 v31, s5, 17
+; GCN-NEXT: v_writelane_b32 v31, s6, 18
+; GCN-NEXT: v_writelane_b32 v31, s7, 19
+; GCN-NEXT: v_writelane_b32 v31, s8, 20
+; GCN-NEXT: v_writelane_b32 v31, s9, 21
+; GCN-NEXT: v_writelane_b32 v31, s10, 22
+; GCN-NEXT: v_writelane_b32 v31, s11, 23
+; GCN-NEXT: v_writelane_b32 v31, s12, 24
+; GCN-NEXT: v_writelane_b32 v31, s13, 25
+; GCN-NEXT: v_writelane_b32 v31, s14, 26
+; GCN-NEXT: v_writelane_b32 v31, s15, 27
+; GCN-NEXT: v_writelane_b32 v31, s16, 28
+; GCN-NEXT: v_writelane_b32 v31, s17, 29
+; GCN-NEXT: v_writelane_b32 v31, s18, 30
+; GCN-NEXT: v_writelane_b32 v31, s19, 31
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; def s[4:19]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: ;;#ASMSTART
+; GCN-NEXT: ; def s[0:1]
+; GCN-NEXT: ;;#ASMEND
+; GCN-NEXT: s_mov_b32 s3, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)
-; GCN-NEXT: v_writelane_b32 v31, s0, 0
-; GCN-NEXT: v_writelane_b32 v31, s4, 1
-; GCN-NEXT: v_writelane_b32 v31, s5, 2
-; GCN-NEXT: v_writelane_b32 v31, s6, 3
-; GCN-NEXT: v_writelane_b32 v31, s7, 4
-; GCN-NEXT: v_writelane_b32 v31, s8, 5
-; GCN-NEXT: v_writelane_b32 v31, s9, 6
-; GCN-NEXT: v_writelane_b32 v31, s10, 7
-; GCN-NEXT: v_writelane_b32 v31, s11, 8
-; GCN-NEXT: v_writelane_b32 v31, s12, 9
-; GCN-NEXT: v_writelane_b32 v31, s13, 10
-; GCN-NEXT: v_writelane_b32 v31, s14, 11
-; GCN-NEXT: v_writelane_b32 v31, s15, 12
-; GCN-NEXT: v_writelane_b32 v31, s16, 13
-; GCN-NEXT: v_writelane_b32 v31, s17, 14
-; GCN-NEXT: v_writelane_b32 v31, s18, 15
-; GCN-NEXT: v_writelane_b32 v31, s19, 16
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; def s[0:15]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; def s[16:31]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; def s[34:35]
-; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: s_mov_b32 s33, 0
-; GCN-NEXT: v_readlane_b32 s52, v31, 0
-; GCN-NEXT: s_cmp_lg_u32 s52, s33
-; GCN-NEXT: v_writelane_b32 v31, s36, 17
-; GCN-NEXT: v_writelane_b32 v31, s37, 18
-; GCN-NEXT: v_writelane_b32 v31, s38, 19
-; GCN-NEXT: v_writelane_b32 v31, s39, 20
-; GCN-NEXT: v_writelane_b32 v31, s40, 21
-; GCN-NEXT: v_writelane_b32 v31, s41, 22
-; GCN-NEXT: v_writelane_b32 v31, s42, 23
-; GCN-NEXT: v_writelane_b32 v31, s43, 24
-; GCN-NEXT: v_writelane_b32 v31, s44, 25
-; GCN-NEXT: v_writelane_b32 v31, s45, 26
-; GCN-NEXT: v_writelane_b32 v31, s46, 27
-; GCN-NEXT: v_writelane_b32 v31, s47, 28
-; GCN-NEXT: v_writelane_b32 v31, s48, 29
-; GCN-NEXT: v_writelane_b32 v31, s49, 30
-; GCN-NEXT: v_writelane_b32 v31, s50, 31
-; GCN-NEXT: v_writelane_b32 v31, s51, 32
-; GCN-NEXT: v_writelane_b32 v31, s0, 33
-; GCN-NEXT: v_writelane_b32 v31, s1, 34
-; GCN-NEXT: v_writelane_b32 v31, s2, 35
-; GCN-NEXT: v_writelane_b32 v31, s3, 36
-; GCN-NEXT: v_writelane_b32 v31, s4, 37
-; GCN-NEXT: v_writelane_b32 v31, s5, 38
-; GCN-NEXT: v_writelane_b32 v31, s6, 39
-; GCN-NEXT: v_writelane_b32 v31, s7, 40
-; GCN-NEXT: v_writelane_b32 v31, s8, 41
-; GCN-NEXT: v_writelane_b32 v31, s9, 42
-; GCN-NEXT: v_writelane_b32 v31, s10, 43
-; GCN-NEXT: v_writelane_b32 v31, s11, 44
-; GCN-NEXT: v_writelane_b32 v31, s12, 45
-; GCN-NEXT: v_writelane_b32 v31, s13, 46
-; GCN-NEXT: v_writelane_b32 v31, s14, 47
-; GCN-NEXT: v_writelane_b32 v31, s15, 48
-; GCN-NEXT: buffer_store_dword v0, off, s[56:59], 0
-; GCN-NEXT: v_writelane_b32 v0, s16, 0
-; GCN-NEXT: v_writelane_b32 v0, s17, 1
-; GCN-NEXT: v_writelane_b32 v0, s18, 2
-; GCN-NEXT: v_writelane_b32 v0, s19, 3
-; GCN-NEXT: v_writelane_b32 v0, s20, 4
-; GCN-NEXT: v_writelane_b32 v0, s21, 5
-; GCN-NEXT: v_writelane_b32 v0, s22, 6
-; GCN-NEXT: v_writelane_b32 v0, s23, 7
-; GCN-NEXT: v_writelane_b32 v0, s24, 8
-; GCN-NEXT: v_writelane_b32 v0, s25, 9
-; GCN-NEXT: v_writelane_b32 v0, s26, 10
-; GCN-NEXT: v_writelane_b32 v0, s27, 11
-; GCN-NEXT: v_writelane_b32 v0, s28, 12
-; GCN-NEXT: v_writelane_b32 v0, s29, 13
-; GCN-NEXT: v_writelane_b32 v0, s30, 14
-; GCN-NEXT: v_writelane_b32 v0, s31, 15
-; GCN-NEXT: s_mov_b64 s[16:17], exec
-; GCN-NEXT: s_mov_b64 exec, 0xffff
-; GCN-NEXT: buffer_store_dword v0, off, s[56:59], 0 offset:4 ; 4-byte Folded Spill
-; GCN-NEXT: s_mov_b64 exec, s[16:17]
-; GCN-NEXT: v_writelane_b32 v31, s34, 49
-; GCN-NEXT: v_writelane_b32 v31, s35, 50
-; GCN-NEXT: buffer_load_dword v0, off, s[56:59], 0
+; GCN-NEXT: s_cmp_lg_u32 s2, s3
+; GCN-NEXT: v_writelane_b32 v31, s36, 32
+; GCN-NEXT: v_writelane_b32 v31, s37, 33
+; GCN-NEXT: v_writelane_b32 v31, s38, 34
+; GCN-NEXT: v_writelane_b32 v31, s39, 35
+; GCN-NEXT: v_writelane_b32 v31, s40, 36
+; GCN-NEXT: v_writelane_b32 v31, s41, 37
+; GCN-NEXT: v_writelane_b32 v31, s42, 38
+; GCN-NEXT: v_writelane_b32 v31, s43, 39
+; GCN-NEXT: v_writelane_b32 v31, s44, 40
+; GCN-NEXT: v_writelane_b32 v31, s45, 41
+; GCN-NEXT: v_writelane_b32 v31, s46, 42
+; GCN-NEXT: v_writelane_b32 v31, s47, 43
+; GCN-NEXT: v_writelane_b32 v31, s48, 44
+; GCN-NEXT: v_writelane_b32 v31, s49, 45
+; GCN-NEXT: v_writelane_b32 v31, s50, 46
+; GCN-NEXT: v_writelane_b32 v31, s51, 47
+; GCN-NEXT: v_writelane_b32 v31, s4, 48
+; GCN-NEXT: v_writelane_b32 v31, s5, 49
+; GCN-NEXT: v_writelane_b32 v31, s6, 50
+; GCN-NEXT: v_writelane_b32 v31, s7, 51
+; GCN-NEXT: v_writelane_b32 v31, s8, 52
+; GCN-NEXT: v_writelane_b32 v31, s9, 53
+; GCN-NEXT: v_writelane_b32 v31, s10, 54
+; GCN-NEXT: v_writelane_b32 v31, s11, 55
+; GCN-NEXT: v_writelane_b32 v31, s12, 56
+; GCN-NEXT: v_writelane_b32 v31, s13, 57
+; GCN-NEXT: v_writelane_b32 v31, s14, 58
+; GCN-NEXT: v_writelane_b32 v31, s15, 59
+; GCN-NEXT: v_writelane_b32 v31, s16, 60
+; GCN-NEXT: v_writelane_b32 v31, s17, 61
+; GCN-NEXT: v_writelane_b32 v31, s18, 62
+; GCN-NEXT: v_writelane_b32 v31, s19, 63
+; GCN-NEXT: buffer_store_dword v0, off, s[20:23], 0
+; GCN-NEXT: v_writelane_b32 v0, s0, 0
+; GCN-NEXT: v_writelane_b32 v0, s1, 1
+; GCN-NEXT: s_mov_b64 s[0:1], exec
+; GCN-NEXT: s_mov_b64 exec, 3
+; GCN-NEXT: buffer_store_dword v0, off, s[20:23], 0 offset:4 ; 4-byte Folded Spill
+; GCN-NEXT: s_mov_b64 exec, s[0:1]
+; GCN-NEXT: buffer_load_dword v0, off, s[20:23], 0
; GCN-NEXT: s_cbranch_scc1 BB2_2
; GCN-NEXT: ; %bb.1: ; %bb0
-; GCN-NEXT: v_readlane_b32 s0, v31, 1
-; GCN-NEXT: v_readlane_b32 s1, v31, 2
-; GCN-NEXT: v_readlane_b32 s2, v31, 3
-; GCN-NEXT: v_readlane_b32 s3, v31, 4
-; GCN-NEXT: v_readlane_b32 s4, v31, 5
-; GCN-NEXT: v_readlane_b32 s5, v31, 6
-; GCN-NEXT: v_readlane_b32 s6, v31, 7
-; GCN-NEXT: v_readlane_b32 s7, v31, 8
-; GCN-NEXT: v_readlane_b32 s8, v31, 9
-; GCN-NEXT: v_readlane_b32 s9, v31, 10
-; GCN-NEXT: v_readlane_b32 s10, v31, 11
-; GCN-NEXT: v_readlane_b32 s11, v31, 12
-; GCN-NEXT: v_readlane_b32 s12, v31, 13
-; GCN-NEXT: v_readlane_b32 s13, v31, 14
-; GCN-NEXT: v_readlane_b32 s14, v31, 15
-; GCN-NEXT: v_readlane_b32 s15, v31, 16
+; GCN-NEXT: v_readlane_b32 s0, v31, 0
+; GCN-NEXT: v_readlane_b32 s1, v31, 1
+; GCN-NEXT: v_readlane_b32 s2, v31, 2
+; GCN-NEXT: v_readlane_b32 s3, v31, 3
+; GCN-NEXT: v_readlane_b32 s4, v31, 4
+; GCN-NEXT: v_readlane_b32 s5, v31, 5
+; GCN-NEXT: v_readlane_b32 s6, v31, 6
+; GCN-NEXT: v_readlane_b32 s7, v31, 7
+; GCN-NEXT: v_readlane_b32 s8, v31, 8
+; GCN-NEXT: v_readlane_b32 s9, v31, 9
+; GCN-NEXT: v_readlane_b32 s10, v31, 10
+; GCN-NEXT: v_readlane_b32 s11, v31, 11
+; GCN-NEXT: v_readlane_b32 s12, v31, 12
+; GCN-NEXT: v_readlane_b32 s13, v31, 13
+; GCN-NEXT: v_readlane_b32 s14, v31, 14
+; GCN-NEXT: v_readlane_b32 s15, v31, 15
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v31, 17
-; GCN-NEXT: v_readlane_b32 s1, v31, 18
-; GCN-NEXT: v_readlane_b32 s2, v31, 19
-; GCN-NEXT: v_readlane_b32 s3, v31, 20
-; GCN-NEXT: v_readlane_b32 s4, v31, 21
-; GCN-NEXT: v_readlane_b32 s5, v31, 22
-; GCN-NEXT: v_readlane_b32 s6, v31, 23
-; GCN-NEXT: v_readlane_b32 s7, v31, 24
-; GCN-NEXT: v_readlane_b32 s8, v31, 25
-; GCN-NEXT: v_readlane_b32 s9, v31, 26
-; GCN-NEXT: v_readlane_b32 s10, v31, 27
-; GCN-NEXT: v_readlane_b32 s11, v31, 28
-; GCN-NEXT: v_readlane_b32 s12, v31, 29
-; GCN-NEXT: v_readlane_b32 s13, v31, 30
-; GCN-NEXT: v_readlane_b32 s14, v31, 31
-; GCN-NEXT: v_readlane_b32 s15, v31, 32
+; GCN-NEXT: v_readlane_b32 s0, v31, 32
+; GCN-NEXT: v_readlane_b32 s1, v31, 33
+; GCN-NEXT: v_readlane_b32 s2, v31, 34
+; GCN-NEXT: v_readlane_b32 s3, v31, 35
+; GCN-NEXT: v_readlane_b32 s4, v31, 36
+; GCN-NEXT: v_readlane_b32 s5, v31, 37
+; GCN-NEXT: v_readlane_b32 s6, v31, 38
+; GCN-NEXT: v_readlane_b32 s7, v31, 39
+; GCN-NEXT: v_readlane_b32 s8, v31, 40
+; GCN-NEXT: v_readlane_b32 s9, v31, 41
+; GCN-NEXT: v_readlane_b32 s10, v31, 42
+; GCN-NEXT: v_readlane_b32 s11, v31, 43
+; GCN-NEXT: v_readlane_b32 s12, v31, 44
+; GCN-NEXT: v_readlane_b32 s13, v31, 45
+; GCN-NEXT: v_readlane_b32 s14, v31, 46
+; GCN-NEXT: v_readlane_b32 s15, v31, 47
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v31, 33
-; GCN-NEXT: v_readlane_b32 s1, v31, 34
-; GCN-NEXT: v_readlane_b32 s2, v31, 35
-; GCN-NEXT: v_readlane_b32 s3, v31, 36
-; GCN-NEXT: v_readlane_b32 s4, v31, 37
-; GCN-NEXT: v_readlane_b32 s5, v31, 38
-; GCN-NEXT: v_readlane_b32 s6, v31, 39
-; GCN-NEXT: v_readlane_b32 s7, v31, 40
-; GCN-NEXT: v_readlane_b32 s8, v31, 41
-; GCN-NEXT: v_readlane_b32 s9, v31, 42
-; GCN-NEXT: v_readlane_b32 s10, v31, 43
-; GCN-NEXT: v_readlane_b32 s11, v31, 44
-; GCN-NEXT: v_readlane_b32 s12, v31, 45
-; GCN-NEXT: v_readlane_b32 s13, v31, 46
-; GCN-NEXT: v_readlane_b32 s14, v31, 47
-; GCN-NEXT: v_readlane_b32 s15, v31, 48
+; GCN-NEXT: v_readlane_b32 s0, v31, 16
+; GCN-NEXT: v_readlane_b32 s1, v31, 17
+; GCN-NEXT: v_readlane_b32 s2, v31, 18
+; GCN-NEXT: v_readlane_b32 s3, v31, 19
+; GCN-NEXT: v_readlane_b32 s4, v31, 20
+; GCN-NEXT: v_readlane_b32 s5, v31, 21
+; GCN-NEXT: v_readlane_b32 s6, v31, 22
+; GCN-NEXT: v_readlane_b32 s7, v31, 23
+; GCN-NEXT: v_readlane_b32 s8, v31, 24
+; GCN-NEXT: v_readlane_b32 s9, v31, 25
+; GCN-NEXT: v_readlane_b32 s10, v31, 26
+; GCN-NEXT: v_readlane_b32 s11, v31, 27
+; GCN-NEXT: v_readlane_b32 s12, v31, 28
+; GCN-NEXT: v_readlane_b32 s13, v31, 29
+; GCN-NEXT: v_readlane_b32 s14, v31, 30
+; GCN-NEXT: v_readlane_b32 s15, v31, 31
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: s_mov_b64 s[0:1], exec
-; GCN-NEXT: s_mov_b64 exec, 0xffff
-; GCN-NEXT: buffer_load_dword v0, off, s[56:59], 0 offset:4 ; 4-byte Folded Reload
-; GCN-NEXT: s_mov_b64 exec, s[0:1]
-; GCN-NEXT: s_waitcnt vmcnt(0)
-; GCN-NEXT: v_readlane_b32 s0, v0, 0
-; GCN-NEXT: v_readlane_b32 s1, v0, 1
-; GCN-NEXT: v_readlane_b32 s2, v0, 2
-; GCN-NEXT: v_readlane_b32 s3, v0, 3
-; GCN-NEXT: v_readlane_b32 s4, v0, 4
-; GCN-NEXT: v_readlane_b32 s5, v0, 5
-; GCN-NEXT: v_readlane_b32 s6, v0, 6
-; GCN-NEXT: v_readlane_b32 s7, v0, 7
-; GCN-NEXT: v_readlane_b32 s8, v0, 8
-; GCN-NEXT: v_readlane_b32 s9, v0, 9
-; GCN-NEXT: v_readlane_b32 s10, v0, 10
-; GCN-NEXT: v_readlane_b32 s11, v0, 11
-; GCN-NEXT: v_readlane_b32 s12, v0, 12
-; GCN-NEXT: v_readlane_b32 s13, v0, 13
-; GCN-NEXT: v_readlane_b32 s14, v0, 14
-; GCN-NEXT: v_readlane_b32 s15, v0, 15
+; GCN-NEXT: v_readlane_b32 s0, v31, 48
+; GCN-NEXT: v_readlane_b32 s1, v31, 49
+; GCN-NEXT: v_readlane_b32 s2, v31, 50
+; GCN-NEXT: v_readlane_b32 s3, v31, 51
+; GCN-NEXT: v_readlane_b32 s4, v31, 52
+; GCN-NEXT: v_readlane_b32 s5, v31, 53
+; GCN-NEXT: v_readlane_b32 s6, v31, 54
+; GCN-NEXT: v_readlane_b32 s7, v31, 55
+; GCN-NEXT: v_readlane_b32 s8, v31, 56
+; GCN-NEXT: v_readlane_b32 s9, v31, 57
+; GCN-NEXT: v_readlane_b32 s10, v31, 58
+; GCN-NEXT: v_readlane_b32 s11, v31, 59
+; GCN-NEXT: v_readlane_b32 s12, v31, 60
+; GCN-NEXT: v_readlane_b32 s13, v31, 61
+; GCN-NEXT: v_readlane_b32 s14, v31, 62
+; GCN-NEXT: v_readlane_b32 s15, v31, 63
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND
-; GCN-NEXT: v_readlane_b32 s0, v31, 49
-; GCN-NEXT: v_readlane_b32 s1, v31, 50
+; GCN-NEXT: s_mov_b64 s[16:17], exec
+; GCN-NEXT: s_mov_b64 exec, 3
+; GCN-NEXT: buffer_load_dword v0, off, s[20:23], 0 offset:4 ; 4-byte Folded Reload
+; GCN-NEXT: s_mov_b64 exec, s[16:17]
+; GCN-NEXT: s_waitcnt vmcnt(0)
+; GCN-NEXT: v_readlane_b32 s16, v0, 0
+; GCN-NEXT: v_readlane_b32 s17, v0, 1
; GCN-NEXT: ;;#ASMSTART
-; GCN-NEXT: ; use s[0:1]
+; GCN-NEXT: ; use s[16:17]
; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: BB2_2: ; %ret
; GCN-NEXT: s_endpgm
diff --git a/llvm/test/CodeGen/AMDGPU/shrink-add-sub-constant.ll b/llvm/test/CodeGen/AMDGPU/shrink-add-sub-constant.ll
index ff4a8296d8dd0..bf437cc5bb58a 100644
--- a/llvm/test/CodeGen/AMDGPU/shrink-add-sub-constant.ll
+++ b/llvm/test/CodeGen/AMDGPU/shrink-add-sub-constant.ll
@@ -1166,7 +1166,7 @@ define amdgpu_kernel void @v_test_v2i16_x_sub_7_64(<2 x i16> addrspace(1)* %out,
; GFX10-NEXT: v_add_co_u32_e64 v0, s0, s0, v2
; GFX10-NEXT: v_add_co_ci_u32_e64 v1, s0, s1, 0, s0
; GFX10-NEXT: s_waitcnt vmcnt(0)
-; GFX10-NEXT: v_pk_sub_i16 v2, v3, 7 op_sel_hi:[1,0]
+; GFX10-NEXT: v_pk_sub_i16 v2, v3, 0x400007
; GFX10-NEXT: global_store_dword v[0:1], v2, off
; GFX10-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
@@ -1250,7 +1250,7 @@ define amdgpu_kernel void @v_test_v2i16_x_sub_64_123(<2 x i16> addrspace(1)* %ou
; GFX10-NEXT: v_add_co_u32_e64 v0, s0, s0, v2
; GFX10-NEXT: v_add_co_ci_u32_e64 v1, s0, s1, 0, s0
; GFX10-NEXT: s_waitcnt vmcnt(0)
-; GFX10-NEXT: v_pk_sub_i16 v2, v3, 64 op_sel_hi:[1,0]
+; GFX10-NEXT: v_pk_sub_i16 v2, v3, 0x7b0040
; GFX10-NEXT: global_store_dword v[0:1], v2, off
; GFX10-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x()
diff --git a/llvm/test/CodeGen/AMDGPU/spill-m0.ll b/llvm/test/CodeGen/AMDGPU/spill-m0.ll
index 9b629a5f91110..a03318ead716c 100644
--- a/llvm/test/CodeGen/AMDGPU/spill-m0.ll
+++ b/llvm/test/CodeGen/AMDGPU/spill-m0.ll
@@ -77,101 +77,6 @@ endif: ; preds = %else, %if
ret void
}
-; Force save and restore of m0 during SMEM spill
-; GCN-LABEL: {{^}}m0_unavailable_spill:
-
-; GCN: ; def m0, 1
-
-; GCN: s_mov_b32 m0, s0
-; GCN: v_interp_mov_f32
-
-; GCN: ; clobber m0
-
-; TOSMEM: s_mov_b32 s2, m0
-; TOSMEM: s_add_u32 m0, s3, 0x100
-; TOSMEM-NEXT: s_buffer_store_dwordx2 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, m0 ; 8-byte Folded Spill
-; TOSMEM: s_mov_b32 m0, s2
-
-; TOSMEM: s_mov_b64 exec,
-; TOSMEM: s_cbranch_execz
-; TOSMEM: s_branch
-
-; TOSMEM: BB{{[0-9]+_[0-9]+}}:
-; TOSMEM: s_add_u32 m0, s3, 0x100
-; TOSMEM-NEXT: s_buffer_load_dwordx2 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, m0 ; 8-byte Folded Reload
-
-; GCN-NOT: v_readlane_b32 m0
-; GCN-NOT: s_buffer_store_dword m0
-; GCN-NOT: s_buffer_load_dword m0
-define amdgpu_kernel void @m0_unavailable_spill(i32 %m0.arg) #0 {
-main_body:
- %m0 = call i32 asm sideeffect "; def $0, 1", "={m0}"() #0
- %tmp = call float @llvm.amdgcn.interp.mov(i32 2, i32 0, i32 0, i32 %m0.arg)
- call void asm sideeffect "; clobber $0", "~{m0}"() #0
- %cmp = fcmp ueq float 0.000000e+00, %tmp
- br i1 %cmp, label %if, label %else
-
-if: ; preds = %main_body
- store volatile i32 8, i32 addrspace(1)* undef
- br label %endif
-
-else: ; preds = %main_body
- store volatile i32 11, i32 addrspace(1)* undef
- br label %endif
-
-endif:
- ret void
-}
-
-; GCN-LABEL: {{^}}restore_m0_lds:
-; TOSMEM: s_load_dwordx2 [[REG:s\[[0-9]+:[0-9]+\]]]
-; TOSMEM: s_cmp_eq_u32
-; FIXME: RegScavenger::isRegUsed() always returns true if m0 is reserved, so we have to save and restore it
-; FIXME-TOSMEM-NOT: m0
-; TOSMEM: s_add_u32 m0, s3, 0x100
-; TOSMEM: s_buffer_store_dword s{{[0-9]+}}, s[88:91], m0 ; 4-byte Folded Spill
-; FIXME-TOSMEM-NOT: m0
-; TOSMEM: s_add_u32 m0, s3, 0x200
-; TOSMEM: s_buffer_store_dwordx2 [[REG]], s[88:91], m0 ; 8-byte Folded Spill
-; FIXME-TOSMEM-NOT: m0
-; TOSMEM: s_cbranch_scc1
-
-; TOSMEM: s_mov_b32 m0, -1
-
-; TOSMEM: s_mov_b32 s2, m0
-; TOSMEM: s_add_u32 m0, s3, 0x200
-; TOSMEM: s_buffer_load_dwordx2 s{{\[[0-9]+:[0-9]+\]}}, s[88:91], m0 ; 8-byte Folded Reload
-; TOSMEM: s_mov_b32 m0, s2
-; TOSMEM: s_waitcnt lgkmcnt(0)
-
-; TOSMEM: ds_write_b64
-
-; FIXME-TOSMEM-NOT: m0
-; TOSMEM: s_add_u32 m0, s3, 0x100
-; TOSMEM: s_buffer_load_dword s2, s[88:91], m0 ; 4-byte Folded Reload
-; FIXME-TOSMEM-NOT: m0
-; TOSMEM: s_waitcnt lgkmcnt(0)
-; TOSMEM-NOT: m0
-; TOSMEM: s_mov_b32 m0, s2
-; TOSMEM: ; use m0
-
-; TOSMEM: s_dcache_wb
-; TOSMEM: s_endpgm
-define amdgpu_kernel void @restore_m0_lds(i32 %arg) {
- %m0 = call i32 asm sideeffect "s_mov_b32 m0, 0", "={m0}"() #0
- %sval = load volatile i64, i64 addrspace(4)* undef
- %cmp = icmp eq i32 %arg, 0
- br i1 %cmp, label %ret, label %bb
-
-bb:
- store volatile i64 %sval, i64 addrspace(3)* undef
- call void asm sideeffect "; use $0", "{m0}"(i32 %m0) #0
- br label %ret
-
-ret:
- ret void
-}
-
declare float @llvm.amdgcn.interp.mov(i32, i32, i32, i32) #1
declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #0
declare void @llvm.amdgcn.exp.compr.v2f16(i32, i32, <2 x half>, <2 x half>, i1, i1) #0
diff --git a/llvm/test/CodeGen/AMDGPU/wwm-reserved.ll b/llvm/test/CodeGen/AMDGPU/wwm-reserved.ll
index 1a48e76a241bb..e4beac77e1be2 100644
--- a/llvm/test/CodeGen/AMDGPU/wwm-reserved.ll
+++ b/llvm/test/CodeGen/AMDGPU/wwm-reserved.ll
@@ -94,10 +94,10 @@ define i32 @called(i32 %a) noinline {
; GFX9-LABEL: {{^}}call:
define amdgpu_kernel void @call(<4 x i32> inreg %tmp14, i32 inreg %arg) {
-; GFX9-O0: v_mov_b32_e32 v0, s0
+; GFX9-O0: v_mov_b32_e32 v0, s2
; GFX9-O3: v_mov_b32_e32 v2, s0
; GFX9-NEXT: s_not_b64 exec, exec
-; GFX9-O0-NEXT: v_mov_b32_e32 v0, s1
+; GFX9-O0-NEXT: v_mov_b32_e32 v0, s3
; GFX9-O3-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: s_not_b64 exec, exec
%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %arg, i32 0)
@@ -142,8 +142,8 @@ define amdgpu_kernel void @call_i64(<4 x i32> inreg %tmp14, i64 inreg %arg) {
; GFX9-O0: buffer_store_dword v1
; GFX9: s_swappc_b64
%tmp134 = call i64 @called_i64(i64 %tmp107)
-; GFX9-O0: buffer_load_dword v4
-; GFX9-O0: buffer_load_dword v5
+; GFX9-O0: buffer_load_dword v6
+; GFX9-O0: buffer_load_dword v7
%tmp136 = add i64 %tmp134, %tmp107
%tmp137 = tail call i64 @llvm.amdgcn.wwm.i64(i64 %tmp136)
%tmp138 = bitcast i64 %tmp137 to <2 x i32>
diff --git a/llvm/test/CodeGen/ARM/emutls.ll b/llvm/test/CodeGen/ARM/emutls.ll
index 4327086685e91..92b656d9ba095 100644
--- a/llvm/test/CodeGen/ARM/emutls.ll
+++ b/llvm/test/CodeGen/ARM/emutls.ll
@@ -238,7 +238,6 @@ entry:
; ARM32: .data{{$}}
; ARM32: .globl __emutls_v.i4
; ARM32-LABEL: __emutls_v.i4:
-; ARM32-NEXT: .L__emutls_v.i4$local:
; ARM32-NEXT: .long 4
; ARM32-NEXT: .long 4
; ARM32-NEXT: .long 0
@@ -246,7 +245,6 @@ entry:
; ARM32: .section .rodata,
; ARM32-LABEL: __emutls_t.i4:
-; ARM32-NEXT: .L__emutls_t.i4$local:
; ARM32-NEXT: .long 15
; ARM32-NOT: __emutls_v.i5:
diff --git a/llvm/test/CodeGen/ARM/fp16-args.ll b/llvm/test/CodeGen/ARM/fp16-args.ll
index 7ed1e883eef19..18bbcd12c768a 100644
--- a/llvm/test/CodeGen/ARM/fp16-args.ll
+++ b/llvm/test/CodeGen/ARM/fp16-args.ll
@@ -1,12 +1,12 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=armv7a--none-eabi -float-abi soft -mattr=+fp16 < %s | FileCheck %s --check-prefix=CHECK --check-prefix=SOFT
-; RUN: llc -mtriple=armv7a--none-eabi -float-abi hard -mattr=+fp16 < %s | FileCheck %s --check-prefix=CHECK --check-prefix=HARD
-; RUN: llc -mtriple=armv7a--none-eabi -float-abi soft -mattr=+fullfp16 < %s | FileCheck %s --check-prefix=CHECK --check-prefix=FULL-SOFT
-; RUN: llc -mtriple=armv7a--none-eabi -float-abi hard -mattr=+fullfp16 < %s | FileCheck %s --check-prefix=CHECK --check-prefix=FULL-HARD
-; RUN: llc -mtriple=armv7aeb--none-eabi -float-abi soft -mattr=+fp16 < %s | FileCheck %s --check-prefix=CHECK --check-prefix=SOFT
-; RUN: llc -mtriple=armv7aeb--none-eabi -float-abi hard -mattr=+fp16 < %s | FileCheck %s --check-prefix=CHECK --check-prefix=HARD
-; RUN: llc -mtriple=armv7aeb--none-eabi -float-abi soft -mattr=+fullfp16 < %s | FileCheck %s --check-prefix=CHECK --check-prefix=FULL-SOFT
-; RUN: llc -mtriple=armv7aeb--none-eabi -float-abi hard -mattr=+fullfp16 < %s | FileCheck %s --check-prefix=CHECK --check-prefix=FULL-HARD
+; RUN: llc -mtriple=armv7a--none-eabi -float-abi soft -mattr=+fp16 < %s | FileCheck %s --check-prefix=SOFT
+; RUN: llc -mtriple=armv7a--none-eabi -float-abi hard -mattr=+fp16 < %s | FileCheck %s --check-prefix=HARD
+; RUN: llc -mtriple=armv7a--none-eabi -float-abi soft -mattr=+fullfp16 < %s | FileCheck %s --check-prefix=FULL-SOFT --check-prefix=FULL-SOFT-LE
+; RUN: llc -mtriple=armv7a--none-eabi -float-abi hard -mattr=+fullfp16 < %s | FileCheck %s --check-prefix=FULL-HARD --check-prefix=FULL-HARD-LE
+; RUN: llc -mtriple=armv7aeb--none-eabi -float-abi soft -mattr=+fp16 < %s | FileCheck %s --check-prefix=SOFT
+; RUN: llc -mtriple=armv7aeb--none-eabi -float-abi hard -mattr=+fp16 < %s | FileCheck %s --check-prefix=HARD
+; RUN: llc -mtriple=armv7aeb--none-eabi -float-abi soft -mattr=+fullfp16 < %s | FileCheck %s --check-prefix=FULL-SOFT --check-prefix=FULL-SOFT-BE
+; RUN: llc -mtriple=armv7aeb--none-eabi -float-abi hard -mattr=+fullfp16 < %s | FileCheck %s --check-prefix=FULL-HARD --check-prefix=FULL-HARD-BE
define half @foo(half %a, half %b) {
; SOFT-LABEL: foo:
@@ -44,3 +44,76 @@ entry:
%0 = fadd half %a, %b
ret half %0
}
+
+define <4 x half> @foo_vec(<4 x half> %a) {
+; SOFT-LABEL: foo_vec:
+; SOFT: @ %bb.0: @ %entry
+; SOFT-NEXT: vmov s0, r3
+; SOFT-NEXT: vmov s2, r1
+; SOFT-NEXT: vcvtb.f32.f16 s0, s0
+; SOFT-NEXT: vmov s4, r0
+; SOFT-NEXT: vcvtb.f32.f16 s2, s2
+; SOFT-NEXT: vmov s6, r2
+; SOFT-NEXT: vcvtb.f32.f16 s4, s4
+; SOFT-NEXT: vcvtb.f32.f16 s6, s6
+; SOFT-NEXT: vadd.f32 s0, s0, s0
+; SOFT-NEXT: vadd.f32 s2, s2, s2
+; SOFT-NEXT: vcvtb.f16.f32 s0, s0
+; SOFT-NEXT: vadd.f32 s4, s4, s4
+; SOFT-NEXT: vcvtb.f16.f32 s2, s2
+; SOFT-NEXT: vadd.f32 s6, s6, s6
+; SOFT-NEXT: vcvtb.f16.f32 s4, s4
+; SOFT-NEXT: vcvtb.f16.f32 s6, s6
+; SOFT-NEXT: vmov r0, s4
+; SOFT-NEXT: vmov r1, s2
+; SOFT-NEXT: vmov r2, s6
+; SOFT-NEXT: vmov r3, s0
+; SOFT-NEXT: bx lr
+;
+; HARD-LABEL: foo_vec:
+; HARD: @ %bb.0: @ %entry
+; HARD-NEXT: vcvtb.f32.f16 s4, s3
+; HARD-NEXT: vcvtb.f32.f16 s2, s2
+; HARD-NEXT: vcvtb.f32.f16 s6, s1
+; HARD-NEXT: vcvtb.f32.f16 s0, s0
+; HARD-NEXT: vadd.f32 s2, s2, s2
+; HARD-NEXT: vadd.f32 s0, s0, s0
+; HARD-NEXT: vcvtb.f16.f32 s2, s2
+; HARD-NEXT: vadd.f32 s4, s4, s4
+; HARD-NEXT: vcvtb.f16.f32 s0, s0
+; HARD-NEXT: vadd.f32 s6, s6, s6
+; HARD-NEXT: vcvtb.f16.f32 s3, s4
+; HARD-NEXT: vcvtb.f16.f32 s1, s6
+; HARD-NEXT: bx lr
+;
+; FULL-SOFT-LE-LABEL: foo_vec:
+; FULL-SOFT-LE: @ %bb.0: @ %entry
+; FULL-SOFT-LE-NEXT: vmov d16, r0, r1
+; FULL-SOFT-LE-NEXT: vadd.f16 d16, d16, d16
+; FULL-SOFT-LE-NEXT: vmov r0, r1, d16
+; FULL-SOFT-LE-NEXT: bx lr
+;
+; FULL-HARD-LE-LABEL: foo_vec:
+; FULL-HARD-LE: @ %bb.0: @ %entry
+; FULL-HARD-LE-NEXT: vadd.f16 d0, d0, d0
+; FULL-HARD-LE-NEXT: bx lr
+;
+; FULL-SOFT-BE-LABEL: foo_vec:
+; FULL-SOFT-BE: @ %bb.0: @ %entry
+; FULL-SOFT-BE-NEXT: vmov d16, r1, r0
+; FULL-SOFT-BE-NEXT: vrev64.16 d16, d16
+; FULL-SOFT-BE-NEXT: vadd.f16 d16, d16, d16
+; FULL-SOFT-BE-NEXT: vrev64.16 d16, d16
+; FULL-SOFT-BE-NEXT: vmov r1, r0, d16
+; FULL-SOFT-BE-NEXT: bx lr
+;
+; FULL-HARD-BE-LABEL: foo_vec:
+; FULL-HARD-BE: @ %bb.0: @ %entry
+; FULL-HARD-BE-NEXT: vrev64.16 d16, d0
+; FULL-HARD-BE-NEXT: vadd.f16 d16, d16, d16
+; FULL-HARD-BE-NEXT: vrev64.16 d0, d16
+; FULL-HARD-BE-NEXT: bx lr
+entry:
+ %0 = fadd <4 x half> %a, %a
+ ret <4 x half> %0
+}
diff --git a/llvm/test/CodeGen/ARM/fp16-v3.ll b/llvm/test/CodeGen/ARM/fp16-v3.ll
index e84fee2c2e1b5..085503e80c7f2 100644
--- a/llvm/test/CodeGen/ARM/fp16-v3.ll
+++ b/llvm/test/CodeGen/ARM/fp16-v3.ll
@@ -28,9 +28,6 @@ define void @test_vec3(<3 x half>* %arr, i32 %i) #0 {
}
; CHECK-LABEL: test_bitcast:
-; CHECK: vcvtb.f16.f32
-; CHECK: vcvtb.f16.f32
-; CHECK: vcvtb.f16.f32
; CHECK: pkhbt
; CHECK: uxth
define void @test_bitcast(<3 x half> %inp, <3 x i16>* %arr) #0 {
diff --git a/llvm/test/CodeGen/ARM/legalize-bitcast.ll b/llvm/test/CodeGen/ARM/legalize-bitcast.ll
index 529775df5fd7d..478ff985bf475 100644
--- a/llvm/test/CodeGen/ARM/legalize-bitcast.ll
+++ b/llvm/test/CodeGen/ARM/legalize-bitcast.ll
@@ -49,9 +49,9 @@ define i16 @int_to_vec(i80 %in) {
; CHECK-NEXT: vmov.32 d16[0], r0
; CHECK-NEXT: @ implicit-def: $q9
; CHECK-NEXT: vmov.f64 d18, d16
-; CHECK-NEXT: vrev32.16 q8, q9
-; CHECK-NEXT: @ kill: def $d16 killed $d16 killed $q8
-; CHECK-NEXT: vmov.u16 r0, d16[0]
+; CHECK-NEXT: vrev32.16 q9, q9
+; CHECK-NEXT: @ kill: def $d18 killed $d18 killed $q9
+; CHECK-NEXT: vmov.u16 r0, d18[0]
; CHECK-NEXT: bx lr
%vec = bitcast i80 %in to <5 x i16>
%e0 = extractelement <5 x i16> %vec, i32 0
diff --git a/llvm/test/CodeGen/ARM/pr47454.ll b/llvm/test/CodeGen/ARM/pr47454.ll
new file mode 100644
index 0000000000000..d36a29c4e77ce
--- /dev/null
+++ b/llvm/test/CodeGen/ARM/pr47454.ll
@@ -0,0 +1,49 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=armv8-unknown-linux-unknown -mattr=-fp16 -O0 < %s | FileCheck %s
+
+declare fastcc half @getConstant()
+
+declare fastcc i1 @isEqual(half %0, half %1)
+
+define internal fastcc void @main() {
+; CHECK-LABEL: main:
+; CHECK: @ %bb.0: @ %Entry
+; CHECK-NEXT: push {r11, lr}
+; CHECK-NEXT: mov r11, sp
+; CHECK-NEXT: sub sp, sp, #16
+; CHECK-NEXT: mov r0, #31744
+; CHECK-NEXT: strh r0, [r11, #-2]
+; CHECK-NEXT: ldrh r0, [r11, #-2]
+; CHECK-NEXT: bl __gnu_h2f_ieee
+; CHECK-NEXT: vmov s0, r0
+; CHECK-NEXT: vstr s0, [sp, #8] @ 4-byte Spill
+; CHECK-NEXT: bl getConstant
+; CHECK-NEXT: vmov r0, s0
+; CHECK-NEXT: bl __gnu_h2f_ieee
+; CHECK-NEXT: vmov s0, r0
+; CHECK-NEXT: vmov r0, s0
+; CHECK-NEXT: bl __gnu_f2h_ieee
+; CHECK-NEXT: vldr s0, [sp, #8] @ 4-byte Reload
+; CHECK-NEXT: vmov r1, s0
+; CHECK-NEXT: str r0, [sp, #4] @ 4-byte Spill
+; CHECK-NEXT: mov r0, r1
+; CHECK-NEXT: bl __gnu_f2h_ieee
+; CHECK-NEXT: uxth r0, r0
+; CHECK-NEXT: vmov s0, r0
+; CHECK-NEXT: ldr r0, [sp, #4] @ 4-byte Reload
+; CHECK-NEXT: uxth r1, r0
+; CHECK-NEXT: vmov s1, r1
+; CHECK-NEXT: bl isEqual
+; CHECK-NEXT: mov sp, r11
+; CHECK-NEXT: pop {r11, pc}
+Entry:
+ ; First arg directly from constant
+ %const = alloca half, align 2
+ store half 0xH7C00, half* %const, align 2
+ %arg1 = load half, half* %const, align 2
+ ; Second arg from fucntion return
+ %arg2 = call fastcc half @getConstant()
+ ; Arguments should have equivalent mangling
+ %result = call fastcc i1 @isEqual(half %arg1, half %arg2)
+ ret void
+}
diff --git a/llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/fptosi_and_fptoui.ll b/llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/fptosi_and_fptoui.ll
index a98c6eb9fd6cb..c63f24ea692ce 100644
--- a/llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/fptosi_and_fptoui.ll
+++ b/llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/fptosi_and_fptoui.ll
@@ -235,15 +235,15 @@ define i32 @f64tou32(double %a) {
; FP32-NEXT: mfc1 $1, $f0
; FP32-NEXT: lui $2, 16864
; FP32-NEXT: ori $3, $zero, 0
-; FP32-NEXT: mtc1 $3, $f0
-; FP32-NEXT: mtc1 $2, $f1
-; FP32-NEXT: sub.d $f2, $f12, $f0
-; FP32-NEXT: trunc.w.d $f2, $f2
-; FP32-NEXT: mfc1 $2, $f2
+; FP32-NEXT: mtc1 $3, $f2
+; FP32-NEXT: mtc1 $2, $f3
+; FP32-NEXT: sub.d $f4, $f12, $f2
+; FP32-NEXT: trunc.w.d $f0, $f4
+; FP32-NEXT: mfc1 $2, $f0
; FP32-NEXT: lui $3, 32768
; FP32-NEXT: xor $2, $2, $3
; FP32-NEXT: addiu $3, $zero, 1
-; FP32-NEXT: c.ult.d $f12, $f0
+; FP32-NEXT: c.ult.d $f12, $f2
; FP32-NEXT: movf $3, $zero, $fcc0
; FP32-NEXT: andi $3, $3, 1
; FP32-NEXT: movn $2, $1, $3
@@ -256,15 +256,15 @@ define i32 @f64tou32(double %a) {
; FP64-NEXT: mfc1 $1, $f0
; FP64-NEXT: lui $2, 16864
; FP64-NEXT: ori $3, $zero, 0
-; FP64-NEXT: mtc1 $3, $f0
-; FP64-NEXT: mthc1 $2, $f0
-; FP64-NEXT: sub.d $f1, $f12, $f0
-; FP64-NEXT: trunc.w.d $f1, $f1
-; FP64-NEXT: mfc1 $2, $f1
+; FP64-NEXT: mtc1 $3, $f1
+; FP64-NEXT: mthc1 $2, $f1
+; FP64-NEXT: sub.d $f2, $f12, $f1
+; FP64-NEXT: trunc.w.d $f0, $f2
+; FP64-NEXT: mfc1 $2, $f0
; FP64-NEXT: lui $3, 32768
; FP64-NEXT: xor $2, $2, $3
; FP64-NEXT: addiu $3, $zero, 1
-; FP64-NEXT: c.ult.d $f12, $f0
+; FP64-NEXT: c.ult.d $f12, $f1
; FP64-NEXT: movf $3, $zero, $fcc0
; FP64-NEXT: andi $3, $3, 1
; FP64-NEXT: movn $2, $1, $3
@@ -282,15 +282,15 @@ define zeroext i16 @f64tou16(double %a) {
; FP32-NEXT: mfc1 $1, $f0
; FP32-NEXT: lui $2, 16864
; FP32-NEXT: ori $3, $zero, 0
-; FP32-NEXT: mtc1 $3, $f0
-; FP32-NEXT: mtc1 $2, $f1
-; FP32-NEXT: sub.d $f2, $f12, $f0
-; FP32-NEXT: trunc.w.d $f2, $f2
-; FP32-NEXT: mfc1 $2, $f2
+; FP32-NEXT: mtc1 $3, $f2
+; FP32-NEXT: mtc1 $2, $f3
+; FP32-NEXT: sub.d $f4, $f12, $f2
+; FP32-NEXT: trunc.w.d $f0, $f4
+; FP32-NEXT: mfc1 $2, $f0
; FP32-NEXT: lui $3, 32768
; FP32-NEXT: xor $2, $2, $3
; FP32-NEXT: addiu $3, $zero, 1
-; FP32-NEXT: c.ult.d $f12, $f0
+; FP32-NEXT: c.ult.d $f12, $f2
; FP32-NEXT: movf $3, $zero, $fcc0
; FP32-NEXT: andi $3, $3, 1
; FP32-NEXT: movn $2, $1, $3
@@ -304,15 +304,15 @@ define zeroext i16 @f64tou16(double %a) {
; FP64-NEXT: mfc1 $1, $f0
; FP64-NEXT: lui $2, 16864
; FP64-NEXT: ori $3, $zero, 0
-; FP64-NEXT: mtc1 $3, $f0
-; FP64-NEXT: mthc1 $2, $f0
-; FP64-NEXT: sub.d $f1, $f12, $f0
-; FP64-NEXT: trunc.w.d $f1, $f1
-; FP64-NEXT: mfc1 $2, $f1
+; FP64-NEXT: mtc1 $3, $f1
+; FP64-NEXT: mthc1 $2, $f1
+; FP64-NEXT: sub.d $f2, $f12, $f1
+; FP64-NEXT: trunc.w.d $f0, $f2
+; FP64-NEXT: mfc1 $2, $f0
; FP64-NEXT: lui $3, 32768
; FP64-NEXT: xor $2, $2, $3
; FP64-NEXT: addiu $3, $zero, 1
-; FP64-NEXT: c.ult.d $f12, $f0
+; FP64-NEXT: c.ult.d $f12, $f1
; FP64-NEXT: movf $3, $zero, $fcc0
; FP64-NEXT: andi $3, $3, 1
; FP64-NEXT: movn $2, $1, $3
@@ -331,15 +331,15 @@ define zeroext i8 @f64tou8(double %a) {
; FP32-NEXT: mfc1 $1, $f0
; FP32-NEXT: lui $2, 16864
; FP32-NEXT: ori $3, $zero, 0
-; FP32-NEXT: mtc1 $3, $f0
-; FP32-NEXT: mtc1 $2, $f1
-; FP32-NEXT: sub.d $f2, $f12, $f0
-; FP32-NEXT: trunc.w.d $f2, $f2
-; FP32-NEXT: mfc1 $2, $f2
+; FP32-NEXT: mtc1 $3, $f2
+; FP32-NEXT: mtc1 $2, $f3
+; FP32-NEXT: sub.d $f4, $f12, $f2
+; FP32-NEXT: trunc.w.d $f0, $f4
+; FP32-NEXT: mfc1 $2, $f0
; FP32-NEXT: lui $3, 32768
; FP32-NEXT: xor $2, $2, $3
; FP32-NEXT: addiu $3, $zero, 1
-; FP32-NEXT: c.ult.d $f12, $f0
+; FP32-NEXT: c.ult.d $f12, $f2
; FP32-NEXT: movf $3, $zero, $fcc0
; FP32-NEXT: andi $3, $3, 1
; FP32-NEXT: movn $2, $1, $3
@@ -353,15 +353,15 @@ define zeroext i8 @f64tou8(double %a) {
; FP64-NEXT: mfc1 $1, $f0
; FP64-NEXT: lui $2, 16864
; FP64-NEXT: ori $3, $zero, 0
-; FP64-NEXT: mtc1 $3, $f0
-; FP64-NEXT: mthc1 $2, $f0
-; FP64-NEXT: sub.d $f1, $f12, $f0
-; FP64-NEXT: trunc.w.d $f1, $f1
-; FP64-NEXT: mfc1 $2, $f1
+; FP64-NEXT: mtc1 $3, $f1
+; FP64-NEXT: mthc1 $2, $f1
+; FP64-NEXT: sub.d $f2, $f12, $f1
+; FP64-NEXT: trunc.w.d $f0, $f2
+; FP64-NEXT: mfc1 $2, $f0
; FP64-NEXT: lui $3, 32768
; FP64-NEXT: xor $2, $2, $3
; FP64-NEXT: addiu $3, $zero, 1
-; FP64-NEXT: c.ult.d $f12, $f0
+; FP64-NEXT: c.ult.d $f12, $f1
; FP64-NEXT: movf $3, $zero, $fcc0
; FP64-NEXT: andi $3, $3, 1
; FP64-NEXT: movn $2, $1, $3
diff --git a/llvm/test/CodeGen/Mips/atomic-min-max.ll b/llvm/test/CodeGen/Mips/atomic-min-max.ll
index 646af650c00e7..a6200851940cd 100644
--- a/llvm/test/CodeGen/Mips/atomic-min-max.ll
+++ b/llvm/test/CodeGen/Mips/atomic-min-max.ll
@@ -1154,26 +1154,26 @@ define i16 @test_max_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64-NEXT: sll $2, $2, 3
; MIPS64-NEXT: ori $3, $zero, 65535
; MIPS64-NEXT: sllv $3, $3, $2
-; MIPS64-NEXT: nor $4, $zero, $3
+; MIPS64-NEXT: nor $6, $zero, $3
; MIPS64-NEXT: sllv $5, $5, $2
; MIPS64-NEXT: .LBB4_1: # %entry
; MIPS64-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64-NEXT: ll $7, 0($1)
-; MIPS64-NEXT: slt $10, $7, $5
-; MIPS64-NEXT: move $8, $7
-; MIPS64-NEXT: movn $8, $5, $10
-; MIPS64-NEXT: and $8, $8, $3
-; MIPS64-NEXT: and $9, $7, $4
-; MIPS64-NEXT: or $9, $9, $8
-; MIPS64-NEXT: sc $9, 0($1)
-; MIPS64-NEXT: beqz $9, .LBB4_1
+; MIPS64-NEXT: ll $8, 0($1)
+; MIPS64-NEXT: slt $11, $8, $5
+; MIPS64-NEXT: move $9, $8
+; MIPS64-NEXT: movn $9, $5, $11
+; MIPS64-NEXT: and $9, $9, $3
+; MIPS64-NEXT: and $10, $8, $6
+; MIPS64-NEXT: or $10, $10, $9
+; MIPS64-NEXT: sc $10, 0($1)
+; MIPS64-NEXT: beqz $10, .LBB4_1
; MIPS64-NEXT: nop
; MIPS64-NEXT: # %bb.2: # %entry
-; MIPS64-NEXT: and $6, $7, $3
-; MIPS64-NEXT: srlv $6, $6, $2
-; MIPS64-NEXT: seh $6, $6
+; MIPS64-NEXT: and $7, $8, $3
+; MIPS64-NEXT: srlv $7, $7, $2
+; MIPS64-NEXT: seh $7, $7
; MIPS64-NEXT: # %bb.3: # %entry
-; MIPS64-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64-NEXT: # %bb.4: # %entry
; MIPS64-NEXT: sync
; MIPS64-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -1194,26 +1194,26 @@ define i16 @test_max_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64R6-NEXT: sll $2, $2, 3
; MIPS64R6-NEXT: ori $3, $zero, 65535
; MIPS64R6-NEXT: sllv $3, $3, $2
-; MIPS64R6-NEXT: nor $4, $zero, $3
+; MIPS64R6-NEXT: nor $6, $zero, $3
; MIPS64R6-NEXT: sllv $5, $5, $2
; MIPS64R6-NEXT: .LBB4_1: # %entry
; MIPS64R6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6-NEXT: ll $7, 0($1)
-; MIPS64R6-NEXT: slt $10, $7, $5
-; MIPS64R6-NEXT: seleqz $8, $7, $10
-; MIPS64R6-NEXT: selnez $10, $5, $10
-; MIPS64R6-NEXT: or $8, $8, $10
-; MIPS64R6-NEXT: and $8, $8, $3
-; MIPS64R6-NEXT: and $9, $7, $4
-; MIPS64R6-NEXT: or $9, $9, $8
-; MIPS64R6-NEXT: sc $9, 0($1)
-; MIPS64R6-NEXT: beqzc $9, .LBB4_1
+; MIPS64R6-NEXT: ll $8, 0($1)
+; MIPS64R6-NEXT: slt $11, $8, $5
+; MIPS64R6-NEXT: seleqz $9, $8, $11
+; MIPS64R6-NEXT: selnez $11, $5, $11
+; MIPS64R6-NEXT: or $9, $9, $11
+; MIPS64R6-NEXT: and $9, $9, $3
+; MIPS64R6-NEXT: and $10, $8, $6
+; MIPS64R6-NEXT: or $10, $10, $9
+; MIPS64R6-NEXT: sc $10, 0($1)
+; MIPS64R6-NEXT: beqzc $10, .LBB4_1
; MIPS64R6-NEXT: # %bb.2: # %entry
-; MIPS64R6-NEXT: and $6, $7, $3
-; MIPS64R6-NEXT: srlv $6, $6, $2
-; MIPS64R6-NEXT: seh $6, $6
+; MIPS64R6-NEXT: and $7, $8, $3
+; MIPS64R6-NEXT: srlv $7, $7, $2
+; MIPS64R6-NEXT: seh $7, $7
; MIPS64R6-NEXT: # %bb.3: # %entry
-; MIPS64R6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64R6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64R6-NEXT: # %bb.4: # %entry
; MIPS64R6-NEXT: sync
; MIPS64R6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -1232,28 +1232,28 @@ define i16 @test_max_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64EL-NEXT: sll $2, $2, 3
; MIPS64EL-NEXT: ori $3, $zero, 65535
; MIPS64EL-NEXT: sllv $3, $3, $2
-; MIPS64EL-NEXT: nor $4, $zero, $3
+; MIPS64EL-NEXT: nor $6, $zero, $3
; MIPS64EL-NEXT: sllv $5, $5, $2
; MIPS64EL-NEXT: .LBB4_1: # %entry
; MIPS64EL-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64EL-NEXT: ll $7, 0($1)
-; MIPS64EL-NEXT: and $7, $7, $3
-; MIPS64EL-NEXT: and $5, $5, $3
-; MIPS64EL-NEXT: slt $10, $7, $5
-; MIPS64EL-NEXT: move $8, $7
-; MIPS64EL-NEXT: movn $8, $5, $10
+; MIPS64EL-NEXT: ll $8, 0($1)
; MIPS64EL-NEXT: and $8, $8, $3
-; MIPS64EL-NEXT: and $9, $7, $4
-; MIPS64EL-NEXT: or $9, $9, $8
-; MIPS64EL-NEXT: sc $9, 0($1)
-; MIPS64EL-NEXT: beqz $9, .LBB4_1
+; MIPS64EL-NEXT: and $5, $5, $3
+; MIPS64EL-NEXT: slt $11, $8, $5
+; MIPS64EL-NEXT: move $9, $8
+; MIPS64EL-NEXT: movn $9, $5, $11
+; MIPS64EL-NEXT: and $9, $9, $3
+; MIPS64EL-NEXT: and $10, $8, $6
+; MIPS64EL-NEXT: or $10, $10, $9
+; MIPS64EL-NEXT: sc $10, 0($1)
+; MIPS64EL-NEXT: beqz $10, .LBB4_1
; MIPS64EL-NEXT: nop
; MIPS64EL-NEXT: # %bb.2: # %entry
-; MIPS64EL-NEXT: and $6, $7, $3
-; MIPS64EL-NEXT: srlv $6, $6, $2
-; MIPS64EL-NEXT: seh $6, $6
+; MIPS64EL-NEXT: and $7, $8, $3
+; MIPS64EL-NEXT: srlv $7, $7, $2
+; MIPS64EL-NEXT: seh $7, $7
; MIPS64EL-NEXT: # %bb.3: # %entry
-; MIPS64EL-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64EL-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64EL-NEXT: # %bb.4: # %entry
; MIPS64EL-NEXT: sync
; MIPS64EL-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -1273,28 +1273,28 @@ define i16 @test_max_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64ELR6-NEXT: sll $2, $2, 3
; MIPS64ELR6-NEXT: ori $3, $zero, 65535
; MIPS64ELR6-NEXT: sllv $3, $3, $2
-; MIPS64ELR6-NEXT: nor $4, $zero, $3
+; MIPS64ELR6-NEXT: nor $6, $zero, $3
; MIPS64ELR6-NEXT: sllv $5, $5, $2
; MIPS64ELR6-NEXT: .LBB4_1: # %entry
; MIPS64ELR6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64ELR6-NEXT: ll $7, 0($1)
-; MIPS64ELR6-NEXT: and $7, $7, $3
-; MIPS64ELR6-NEXT: and $5, $5, $3
-; MIPS64ELR6-NEXT: slt $10, $7, $5
-; MIPS64ELR6-NEXT: seleqz $8, $7, $10
-; MIPS64ELR6-NEXT: selnez $10, $5, $10
-; MIPS64ELR6-NEXT: or $8, $8, $10
+; MIPS64ELR6-NEXT: ll $8, 0($1)
; MIPS64ELR6-NEXT: and $8, $8, $3
-; MIPS64ELR6-NEXT: and $9, $7, $4
-; MIPS64ELR6-NEXT: or $9, $9, $8
-; MIPS64ELR6-NEXT: sc $9, 0($1)
-; MIPS64ELR6-NEXT: beqzc $9, .LBB4_1
+; MIPS64ELR6-NEXT: and $5, $5, $3
+; MIPS64ELR6-NEXT: slt $11, $8, $5
+; MIPS64ELR6-NEXT: seleqz $9, $8, $11
+; MIPS64ELR6-NEXT: selnez $11, $5, $11
+; MIPS64ELR6-NEXT: or $9, $9, $11
+; MIPS64ELR6-NEXT: and $9, $9, $3
+; MIPS64ELR6-NEXT: and $10, $8, $6
+; MIPS64ELR6-NEXT: or $10, $10, $9
+; MIPS64ELR6-NEXT: sc $10, 0($1)
+; MIPS64ELR6-NEXT: beqzc $10, .LBB4_1
; MIPS64ELR6-NEXT: # %bb.2: # %entry
-; MIPS64ELR6-NEXT: and $6, $7, $3
-; MIPS64ELR6-NEXT: srlv $6, $6, $2
-; MIPS64ELR6-NEXT: seh $6, $6
+; MIPS64ELR6-NEXT: and $7, $8, $3
+; MIPS64ELR6-NEXT: srlv $7, $7, $2
+; MIPS64ELR6-NEXT: seh $7, $7
; MIPS64ELR6-NEXT: # %bb.3: # %entry
-; MIPS64ELR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64ELR6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64ELR6-NEXT: # %bb.4: # %entry
; MIPS64ELR6-NEXT: sync
; MIPS64ELR6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -1635,26 +1635,26 @@ define i16 @test_min_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64-NEXT: sll $2, $2, 3
; MIPS64-NEXT: ori $3, $zero, 65535
; MIPS64-NEXT: sllv $3, $3, $2
-; MIPS64-NEXT: nor $4, $zero, $3
+; MIPS64-NEXT: nor $6, $zero, $3
; MIPS64-NEXT: sllv $5, $5, $2
; MIPS64-NEXT: .LBB5_1: # %entry
; MIPS64-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64-NEXT: ll $7, 0($1)
-; MIPS64-NEXT: slt $10, $7, $5
-; MIPS64-NEXT: move $8, $7
-; MIPS64-NEXT: movz $8, $5, $10
-; MIPS64-NEXT: and $8, $8, $3
-; MIPS64-NEXT: and $9, $7, $4
-; MIPS64-NEXT: or $9, $9, $8
-; MIPS64-NEXT: sc $9, 0($1)
-; MIPS64-NEXT: beqz $9, .LBB5_1
+; MIPS64-NEXT: ll $8, 0($1)
+; MIPS64-NEXT: slt $11, $8, $5
+; MIPS64-NEXT: move $9, $8
+; MIPS64-NEXT: movz $9, $5, $11
+; MIPS64-NEXT: and $9, $9, $3
+; MIPS64-NEXT: and $10, $8, $6
+; MIPS64-NEXT: or $10, $10, $9
+; MIPS64-NEXT: sc $10, 0($1)
+; MIPS64-NEXT: beqz $10, .LBB5_1
; MIPS64-NEXT: nop
; MIPS64-NEXT: # %bb.2: # %entry
-; MIPS64-NEXT: and $6, $7, $3
-; MIPS64-NEXT: srlv $6, $6, $2
-; MIPS64-NEXT: seh $6, $6
+; MIPS64-NEXT: and $7, $8, $3
+; MIPS64-NEXT: srlv $7, $7, $2
+; MIPS64-NEXT: seh $7, $7
; MIPS64-NEXT: # %bb.3: # %entry
-; MIPS64-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64-NEXT: # %bb.4: # %entry
; MIPS64-NEXT: sync
; MIPS64-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -1675,26 +1675,26 @@ define i16 @test_min_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64R6-NEXT: sll $2, $2, 3
; MIPS64R6-NEXT: ori $3, $zero, 65535
; MIPS64R6-NEXT: sllv $3, $3, $2
-; MIPS64R6-NEXT: nor $4, $zero, $3
+; MIPS64R6-NEXT: nor $6, $zero, $3
; MIPS64R6-NEXT: sllv $5, $5, $2
; MIPS64R6-NEXT: .LBB5_1: # %entry
; MIPS64R6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6-NEXT: ll $7, 0($1)
-; MIPS64R6-NEXT: slt $10, $7, $5
-; MIPS64R6-NEXT: selnez $8, $7, $10
-; MIPS64R6-NEXT: seleqz $10, $5, $10
-; MIPS64R6-NEXT: or $8, $8, $10
-; MIPS64R6-NEXT: and $8, $8, $3
-; MIPS64R6-NEXT: and $9, $7, $4
-; MIPS64R6-NEXT: or $9, $9, $8
-; MIPS64R6-NEXT: sc $9, 0($1)
-; MIPS64R6-NEXT: beqzc $9, .LBB5_1
+; MIPS64R6-NEXT: ll $8, 0($1)
+; MIPS64R6-NEXT: slt $11, $8, $5
+; MIPS64R6-NEXT: selnez $9, $8, $11
+; MIPS64R6-NEXT: seleqz $11, $5, $11
+; MIPS64R6-NEXT: or $9, $9, $11
+; MIPS64R6-NEXT: and $9, $9, $3
+; MIPS64R6-NEXT: and $10, $8, $6
+; MIPS64R6-NEXT: or $10, $10, $9
+; MIPS64R6-NEXT: sc $10, 0($1)
+; MIPS64R6-NEXT: beqzc $10, .LBB5_1
; MIPS64R6-NEXT: # %bb.2: # %entry
-; MIPS64R6-NEXT: and $6, $7, $3
-; MIPS64R6-NEXT: srlv $6, $6, $2
-; MIPS64R6-NEXT: seh $6, $6
+; MIPS64R6-NEXT: and $7, $8, $3
+; MIPS64R6-NEXT: srlv $7, $7, $2
+; MIPS64R6-NEXT: seh $7, $7
; MIPS64R6-NEXT: # %bb.3: # %entry
-; MIPS64R6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64R6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64R6-NEXT: # %bb.4: # %entry
; MIPS64R6-NEXT: sync
; MIPS64R6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -1713,28 +1713,28 @@ define i16 @test_min_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64EL-NEXT: sll $2, $2, 3
; MIPS64EL-NEXT: ori $3, $zero, 65535
; MIPS64EL-NEXT: sllv $3, $3, $2
-; MIPS64EL-NEXT: nor $4, $zero, $3
+; MIPS64EL-NEXT: nor $6, $zero, $3
; MIPS64EL-NEXT: sllv $5, $5, $2
; MIPS64EL-NEXT: .LBB5_1: # %entry
; MIPS64EL-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64EL-NEXT: ll $7, 0($1)
-; MIPS64EL-NEXT: and $7, $7, $3
-; MIPS64EL-NEXT: and $5, $5, $3
-; MIPS64EL-NEXT: slt $10, $7, $5
-; MIPS64EL-NEXT: move $8, $7
-; MIPS64EL-NEXT: movz $8, $5, $10
+; MIPS64EL-NEXT: ll $8, 0($1)
; MIPS64EL-NEXT: and $8, $8, $3
-; MIPS64EL-NEXT: and $9, $7, $4
-; MIPS64EL-NEXT: or $9, $9, $8
-; MIPS64EL-NEXT: sc $9, 0($1)
-; MIPS64EL-NEXT: beqz $9, .LBB5_1
+; MIPS64EL-NEXT: and $5, $5, $3
+; MIPS64EL-NEXT: slt $11, $8, $5
+; MIPS64EL-NEXT: move $9, $8
+; MIPS64EL-NEXT: movz $9, $5, $11
+; MIPS64EL-NEXT: and $9, $9, $3
+; MIPS64EL-NEXT: and $10, $8, $6
+; MIPS64EL-NEXT: or $10, $10, $9
+; MIPS64EL-NEXT: sc $10, 0($1)
+; MIPS64EL-NEXT: beqz $10, .LBB5_1
; MIPS64EL-NEXT: nop
; MIPS64EL-NEXT: # %bb.2: # %entry
-; MIPS64EL-NEXT: and $6, $7, $3
-; MIPS64EL-NEXT: srlv $6, $6, $2
-; MIPS64EL-NEXT: seh $6, $6
+; MIPS64EL-NEXT: and $7, $8, $3
+; MIPS64EL-NEXT: srlv $7, $7, $2
+; MIPS64EL-NEXT: seh $7, $7
; MIPS64EL-NEXT: # %bb.3: # %entry
-; MIPS64EL-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64EL-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64EL-NEXT: # %bb.4: # %entry
; MIPS64EL-NEXT: sync
; MIPS64EL-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -1754,28 +1754,28 @@ define i16 @test_min_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64ELR6-NEXT: sll $2, $2, 3
; MIPS64ELR6-NEXT: ori $3, $zero, 65535
; MIPS64ELR6-NEXT: sllv $3, $3, $2
-; MIPS64ELR6-NEXT: nor $4, $zero, $3
+; MIPS64ELR6-NEXT: nor $6, $zero, $3
; MIPS64ELR6-NEXT: sllv $5, $5, $2
; MIPS64ELR6-NEXT: .LBB5_1: # %entry
; MIPS64ELR6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64ELR6-NEXT: ll $7, 0($1)
-; MIPS64ELR6-NEXT: and $7, $7, $3
-; MIPS64ELR6-NEXT: and $5, $5, $3
-; MIPS64ELR6-NEXT: slt $10, $7, $5
-; MIPS64ELR6-NEXT: selnez $8, $7, $10
-; MIPS64ELR6-NEXT: seleqz $10, $5, $10
-; MIPS64ELR6-NEXT: or $8, $8, $10
+; MIPS64ELR6-NEXT: ll $8, 0($1)
; MIPS64ELR6-NEXT: and $8, $8, $3
-; MIPS64ELR6-NEXT: and $9, $7, $4
-; MIPS64ELR6-NEXT: or $9, $9, $8
-; MIPS64ELR6-NEXT: sc $9, 0($1)
-; MIPS64ELR6-NEXT: beqzc $9, .LBB5_1
+; MIPS64ELR6-NEXT: and $5, $5, $3
+; MIPS64ELR6-NEXT: slt $11, $8, $5
+; MIPS64ELR6-NEXT: selnez $9, $8, $11
+; MIPS64ELR6-NEXT: seleqz $11, $5, $11
+; MIPS64ELR6-NEXT: or $9, $9, $11
+; MIPS64ELR6-NEXT: and $9, $9, $3
+; MIPS64ELR6-NEXT: and $10, $8, $6
+; MIPS64ELR6-NEXT: or $10, $10, $9
+; MIPS64ELR6-NEXT: sc $10, 0($1)
+; MIPS64ELR6-NEXT: beqzc $10, .LBB5_1
; MIPS64ELR6-NEXT: # %bb.2: # %entry
-; MIPS64ELR6-NEXT: and $6, $7, $3
-; MIPS64ELR6-NEXT: srlv $6, $6, $2
-; MIPS64ELR6-NEXT: seh $6, $6
+; MIPS64ELR6-NEXT: and $7, $8, $3
+; MIPS64ELR6-NEXT: srlv $7, $7, $2
+; MIPS64ELR6-NEXT: seh $7, $7
; MIPS64ELR6-NEXT: # %bb.3: # %entry
-; MIPS64ELR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64ELR6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64ELR6-NEXT: # %bb.4: # %entry
; MIPS64ELR6-NEXT: sync
; MIPS64ELR6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -2116,26 +2116,26 @@ define i16 @test_umax_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64-NEXT: sll $2, $2, 3
; MIPS64-NEXT: ori $3, $zero, 65535
; MIPS64-NEXT: sllv $3, $3, $2
-; MIPS64-NEXT: nor $4, $zero, $3
+; MIPS64-NEXT: nor $6, $zero, $3
; MIPS64-NEXT: sllv $5, $5, $2
; MIPS64-NEXT: .LBB6_1: # %entry
; MIPS64-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64-NEXT: ll $7, 0($1)
-; MIPS64-NEXT: sltu $10, $7, $5
-; MIPS64-NEXT: move $8, $7
-; MIPS64-NEXT: movn $8, $5, $10
-; MIPS64-NEXT: and $8, $8, $3
-; MIPS64-NEXT: and $9, $7, $4
-; MIPS64-NEXT: or $9, $9, $8
-; MIPS64-NEXT: sc $9, 0($1)
-; MIPS64-NEXT: beqz $9, .LBB6_1
+; MIPS64-NEXT: ll $8, 0($1)
+; MIPS64-NEXT: sltu $11, $8, $5
+; MIPS64-NEXT: move $9, $8
+; MIPS64-NEXT: movn $9, $5, $11
+; MIPS64-NEXT: and $9, $9, $3
+; MIPS64-NEXT: and $10, $8, $6
+; MIPS64-NEXT: or $10, $10, $9
+; MIPS64-NEXT: sc $10, 0($1)
+; MIPS64-NEXT: beqz $10, .LBB6_1
; MIPS64-NEXT: nop
; MIPS64-NEXT: # %bb.2: # %entry
-; MIPS64-NEXT: and $6, $7, $3
-; MIPS64-NEXT: srlv $6, $6, $2
-; MIPS64-NEXT: seh $6, $6
+; MIPS64-NEXT: and $7, $8, $3
+; MIPS64-NEXT: srlv $7, $7, $2
+; MIPS64-NEXT: seh $7, $7
; MIPS64-NEXT: # %bb.3: # %entry
-; MIPS64-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64-NEXT: # %bb.4: # %entry
; MIPS64-NEXT: sync
; MIPS64-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -2156,26 +2156,26 @@ define i16 @test_umax_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64R6-NEXT: sll $2, $2, 3
; MIPS64R6-NEXT: ori $3, $zero, 65535
; MIPS64R6-NEXT: sllv $3, $3, $2
-; MIPS64R6-NEXT: nor $4, $zero, $3
+; MIPS64R6-NEXT: nor $6, $zero, $3
; MIPS64R6-NEXT: sllv $5, $5, $2
; MIPS64R6-NEXT: .LBB6_1: # %entry
; MIPS64R6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6-NEXT: ll $7, 0($1)
-; MIPS64R6-NEXT: sltu $10, $7, $5
-; MIPS64R6-NEXT: seleqz $8, $7, $10
-; MIPS64R6-NEXT: selnez $10, $5, $10
-; MIPS64R6-NEXT: or $8, $8, $10
-; MIPS64R6-NEXT: and $8, $8, $3
-; MIPS64R6-NEXT: and $9, $7, $4
-; MIPS64R6-NEXT: or $9, $9, $8
-; MIPS64R6-NEXT: sc $9, 0($1)
-; MIPS64R6-NEXT: beqzc $9, .LBB6_1
+; MIPS64R6-NEXT: ll $8, 0($1)
+; MIPS64R6-NEXT: sltu $11, $8, $5
+; MIPS64R6-NEXT: seleqz $9, $8, $11
+; MIPS64R6-NEXT: selnez $11, $5, $11
+; MIPS64R6-NEXT: or $9, $9, $11
+; MIPS64R6-NEXT: and $9, $9, $3
+; MIPS64R6-NEXT: and $10, $8, $6
+; MIPS64R6-NEXT: or $10, $10, $9
+; MIPS64R6-NEXT: sc $10, 0($1)
+; MIPS64R6-NEXT: beqzc $10, .LBB6_1
; MIPS64R6-NEXT: # %bb.2: # %entry
-; MIPS64R6-NEXT: and $6, $7, $3
-; MIPS64R6-NEXT: srlv $6, $6, $2
-; MIPS64R6-NEXT: seh $6, $6
+; MIPS64R6-NEXT: and $7, $8, $3
+; MIPS64R6-NEXT: srlv $7, $7, $2
+; MIPS64R6-NEXT: seh $7, $7
; MIPS64R6-NEXT: # %bb.3: # %entry
-; MIPS64R6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64R6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64R6-NEXT: # %bb.4: # %entry
; MIPS64R6-NEXT: sync
; MIPS64R6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -2194,28 +2194,28 @@ define i16 @test_umax_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64EL-NEXT: sll $2, $2, 3
; MIPS64EL-NEXT: ori $3, $zero, 65535
; MIPS64EL-NEXT: sllv $3, $3, $2
-; MIPS64EL-NEXT: nor $4, $zero, $3
+; MIPS64EL-NEXT: nor $6, $zero, $3
; MIPS64EL-NEXT: sllv $5, $5, $2
; MIPS64EL-NEXT: .LBB6_1: # %entry
; MIPS64EL-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64EL-NEXT: ll $7, 0($1)
-; MIPS64EL-NEXT: and $7, $7, $3
-; MIPS64EL-NEXT: and $5, $5, $3
-; MIPS64EL-NEXT: sltu $10, $7, $5
-; MIPS64EL-NEXT: move $8, $7
-; MIPS64EL-NEXT: movn $8, $5, $10
+; MIPS64EL-NEXT: ll $8, 0($1)
; MIPS64EL-NEXT: and $8, $8, $3
-; MIPS64EL-NEXT: and $9, $7, $4
-; MIPS64EL-NEXT: or $9, $9, $8
-; MIPS64EL-NEXT: sc $9, 0($1)
-; MIPS64EL-NEXT: beqz $9, .LBB6_1
+; MIPS64EL-NEXT: and $5, $5, $3
+; MIPS64EL-NEXT: sltu $11, $8, $5
+; MIPS64EL-NEXT: move $9, $8
+; MIPS64EL-NEXT: movn $9, $5, $11
+; MIPS64EL-NEXT: and $9, $9, $3
+; MIPS64EL-NEXT: and $10, $8, $6
+; MIPS64EL-NEXT: or $10, $10, $9
+; MIPS64EL-NEXT: sc $10, 0($1)
+; MIPS64EL-NEXT: beqz $10, .LBB6_1
; MIPS64EL-NEXT: nop
; MIPS64EL-NEXT: # %bb.2: # %entry
-; MIPS64EL-NEXT: and $6, $7, $3
-; MIPS64EL-NEXT: srlv $6, $6, $2
-; MIPS64EL-NEXT: seh $6, $6
+; MIPS64EL-NEXT: and $7, $8, $3
+; MIPS64EL-NEXT: srlv $7, $7, $2
+; MIPS64EL-NEXT: seh $7, $7
; MIPS64EL-NEXT: # %bb.3: # %entry
-; MIPS64EL-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64EL-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64EL-NEXT: # %bb.4: # %entry
; MIPS64EL-NEXT: sync
; MIPS64EL-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -2235,28 +2235,28 @@ define i16 @test_umax_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64ELR6-NEXT: sll $2, $2, 3
; MIPS64ELR6-NEXT: ori $3, $zero, 65535
; MIPS64ELR6-NEXT: sllv $3, $3, $2
-; MIPS64ELR6-NEXT: nor $4, $zero, $3
+; MIPS64ELR6-NEXT: nor $6, $zero, $3
; MIPS64ELR6-NEXT: sllv $5, $5, $2
; MIPS64ELR6-NEXT: .LBB6_1: # %entry
; MIPS64ELR6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64ELR6-NEXT: ll $7, 0($1)
-; MIPS64ELR6-NEXT: and $7, $7, $3
-; MIPS64ELR6-NEXT: and $5, $5, $3
-; MIPS64ELR6-NEXT: sltu $10, $7, $5
-; MIPS64ELR6-NEXT: seleqz $8, $7, $10
-; MIPS64ELR6-NEXT: selnez $10, $5, $10
-; MIPS64ELR6-NEXT: or $8, $8, $10
+; MIPS64ELR6-NEXT: ll $8, 0($1)
; MIPS64ELR6-NEXT: and $8, $8, $3
-; MIPS64ELR6-NEXT: and $9, $7, $4
-; MIPS64ELR6-NEXT: or $9, $9, $8
-; MIPS64ELR6-NEXT: sc $9, 0($1)
-; MIPS64ELR6-NEXT: beqzc $9, .LBB6_1
+; MIPS64ELR6-NEXT: and $5, $5, $3
+; MIPS64ELR6-NEXT: sltu $11, $8, $5
+; MIPS64ELR6-NEXT: seleqz $9, $8, $11
+; MIPS64ELR6-NEXT: selnez $11, $5, $11
+; MIPS64ELR6-NEXT: or $9, $9, $11
+; MIPS64ELR6-NEXT: and $9, $9, $3
+; MIPS64ELR6-NEXT: and $10, $8, $6
+; MIPS64ELR6-NEXT: or $10, $10, $9
+; MIPS64ELR6-NEXT: sc $10, 0($1)
+; MIPS64ELR6-NEXT: beqzc $10, .LBB6_1
; MIPS64ELR6-NEXT: # %bb.2: # %entry
-; MIPS64ELR6-NEXT: and $6, $7, $3
-; MIPS64ELR6-NEXT: srlv $6, $6, $2
-; MIPS64ELR6-NEXT: seh $6, $6
+; MIPS64ELR6-NEXT: and $7, $8, $3
+; MIPS64ELR6-NEXT: srlv $7, $7, $2
+; MIPS64ELR6-NEXT: seh $7, $7
; MIPS64ELR6-NEXT: # %bb.3: # %entry
-; MIPS64ELR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64ELR6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64ELR6-NEXT: # %bb.4: # %entry
; MIPS64ELR6-NEXT: sync
; MIPS64ELR6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -2597,26 +2597,26 @@ define i16 @test_umin_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64-NEXT: sll $2, $2, 3
; MIPS64-NEXT: ori $3, $zero, 65535
; MIPS64-NEXT: sllv $3, $3, $2
-; MIPS64-NEXT: nor $4, $zero, $3
+; MIPS64-NEXT: nor $6, $zero, $3
; MIPS64-NEXT: sllv $5, $5, $2
; MIPS64-NEXT: .LBB7_1: # %entry
; MIPS64-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64-NEXT: ll $7, 0($1)
-; MIPS64-NEXT: sltu $10, $7, $5
-; MIPS64-NEXT: move $8, $7
-; MIPS64-NEXT: movz $8, $5, $10
-; MIPS64-NEXT: and $8, $8, $3
-; MIPS64-NEXT: and $9, $7, $4
-; MIPS64-NEXT: or $9, $9, $8
-; MIPS64-NEXT: sc $9, 0($1)
-; MIPS64-NEXT: beqz $9, .LBB7_1
+; MIPS64-NEXT: ll $8, 0($1)
+; MIPS64-NEXT: sltu $11, $8, $5
+; MIPS64-NEXT: move $9, $8
+; MIPS64-NEXT: movz $9, $5, $11
+; MIPS64-NEXT: and $9, $9, $3
+; MIPS64-NEXT: and $10, $8, $6
+; MIPS64-NEXT: or $10, $10, $9
+; MIPS64-NEXT: sc $10, 0($1)
+; MIPS64-NEXT: beqz $10, .LBB7_1
; MIPS64-NEXT: nop
; MIPS64-NEXT: # %bb.2: # %entry
-; MIPS64-NEXT: and $6, $7, $3
-; MIPS64-NEXT: srlv $6, $6, $2
-; MIPS64-NEXT: seh $6, $6
+; MIPS64-NEXT: and $7, $8, $3
+; MIPS64-NEXT: srlv $7, $7, $2
+; MIPS64-NEXT: seh $7, $7
; MIPS64-NEXT: # %bb.3: # %entry
-; MIPS64-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64-NEXT: # %bb.4: # %entry
; MIPS64-NEXT: sync
; MIPS64-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -2637,26 +2637,26 @@ define i16 @test_umin_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64R6-NEXT: sll $2, $2, 3
; MIPS64R6-NEXT: ori $3, $zero, 65535
; MIPS64R6-NEXT: sllv $3, $3, $2
-; MIPS64R6-NEXT: nor $4, $zero, $3
+; MIPS64R6-NEXT: nor $6, $zero, $3
; MIPS64R6-NEXT: sllv $5, $5, $2
; MIPS64R6-NEXT: .LBB7_1: # %entry
; MIPS64R6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6-NEXT: ll $7, 0($1)
-; MIPS64R6-NEXT: sltu $10, $7, $5
-; MIPS64R6-NEXT: selnez $8, $7, $10
-; MIPS64R6-NEXT: seleqz $10, $5, $10
-; MIPS64R6-NEXT: or $8, $8, $10
-; MIPS64R6-NEXT: and $8, $8, $3
-; MIPS64R6-NEXT: and $9, $7, $4
-; MIPS64R6-NEXT: or $9, $9, $8
-; MIPS64R6-NEXT: sc $9, 0($1)
-; MIPS64R6-NEXT: beqzc $9, .LBB7_1
+; MIPS64R6-NEXT: ll $8, 0($1)
+; MIPS64R6-NEXT: sltu $11, $8, $5
+; MIPS64R6-NEXT: selnez $9, $8, $11
+; MIPS64R6-NEXT: seleqz $11, $5, $11
+; MIPS64R6-NEXT: or $9, $9, $11
+; MIPS64R6-NEXT: and $9, $9, $3
+; MIPS64R6-NEXT: and $10, $8, $6
+; MIPS64R6-NEXT: or $10, $10, $9
+; MIPS64R6-NEXT: sc $10, 0($1)
+; MIPS64R6-NEXT: beqzc $10, .LBB7_1
; MIPS64R6-NEXT: # %bb.2: # %entry
-; MIPS64R6-NEXT: and $6, $7, $3
-; MIPS64R6-NEXT: srlv $6, $6, $2
-; MIPS64R6-NEXT: seh $6, $6
+; MIPS64R6-NEXT: and $7, $8, $3
+; MIPS64R6-NEXT: srlv $7, $7, $2
+; MIPS64R6-NEXT: seh $7, $7
; MIPS64R6-NEXT: # %bb.3: # %entry
-; MIPS64R6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64R6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64R6-NEXT: # %bb.4: # %entry
; MIPS64R6-NEXT: sync
; MIPS64R6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -2675,28 +2675,28 @@ define i16 @test_umin_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64EL-NEXT: sll $2, $2, 3
; MIPS64EL-NEXT: ori $3, $zero, 65535
; MIPS64EL-NEXT: sllv $3, $3, $2
-; MIPS64EL-NEXT: nor $4, $zero, $3
+; MIPS64EL-NEXT: nor $6, $zero, $3
; MIPS64EL-NEXT: sllv $5, $5, $2
; MIPS64EL-NEXT: .LBB7_1: # %entry
; MIPS64EL-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64EL-NEXT: ll $7, 0($1)
-; MIPS64EL-NEXT: and $7, $7, $3
-; MIPS64EL-NEXT: and $5, $5, $3
-; MIPS64EL-NEXT: sltu $10, $7, $5
-; MIPS64EL-NEXT: move $8, $7
-; MIPS64EL-NEXT: movz $8, $5, $10
+; MIPS64EL-NEXT: ll $8, 0($1)
; MIPS64EL-NEXT: and $8, $8, $3
-; MIPS64EL-NEXT: and $9, $7, $4
-; MIPS64EL-NEXT: or $9, $9, $8
-; MIPS64EL-NEXT: sc $9, 0($1)
-; MIPS64EL-NEXT: beqz $9, .LBB7_1
+; MIPS64EL-NEXT: and $5, $5, $3
+; MIPS64EL-NEXT: sltu $11, $8, $5
+; MIPS64EL-NEXT: move $9, $8
+; MIPS64EL-NEXT: movz $9, $5, $11
+; MIPS64EL-NEXT: and $9, $9, $3
+; MIPS64EL-NEXT: and $10, $8, $6
+; MIPS64EL-NEXT: or $10, $10, $9
+; MIPS64EL-NEXT: sc $10, 0($1)
+; MIPS64EL-NEXT: beqz $10, .LBB7_1
; MIPS64EL-NEXT: nop
; MIPS64EL-NEXT: # %bb.2: # %entry
-; MIPS64EL-NEXT: and $6, $7, $3
-; MIPS64EL-NEXT: srlv $6, $6, $2
-; MIPS64EL-NEXT: seh $6, $6
+; MIPS64EL-NEXT: and $7, $8, $3
+; MIPS64EL-NEXT: srlv $7, $7, $2
+; MIPS64EL-NEXT: seh $7, $7
; MIPS64EL-NEXT: # %bb.3: # %entry
-; MIPS64EL-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64EL-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64EL-NEXT: # %bb.4: # %entry
; MIPS64EL-NEXT: sync
; MIPS64EL-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -2716,28 +2716,28 @@ define i16 @test_umin_16(i16* nocapture %ptr, i16 signext %val) {
; MIPS64ELR6-NEXT: sll $2, $2, 3
; MIPS64ELR6-NEXT: ori $3, $zero, 65535
; MIPS64ELR6-NEXT: sllv $3, $3, $2
-; MIPS64ELR6-NEXT: nor $4, $zero, $3
+; MIPS64ELR6-NEXT: nor $6, $zero, $3
; MIPS64ELR6-NEXT: sllv $5, $5, $2
; MIPS64ELR6-NEXT: .LBB7_1: # %entry
; MIPS64ELR6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64ELR6-NEXT: ll $7, 0($1)
-; MIPS64ELR6-NEXT: and $7, $7, $3
-; MIPS64ELR6-NEXT: and $5, $5, $3
-; MIPS64ELR6-NEXT: sltu $10, $7, $5
-; MIPS64ELR6-NEXT: selnez $8, $7, $10
-; MIPS64ELR6-NEXT: seleqz $10, $5, $10
-; MIPS64ELR6-NEXT: or $8, $8, $10
+; MIPS64ELR6-NEXT: ll $8, 0($1)
; MIPS64ELR6-NEXT: and $8, $8, $3
-; MIPS64ELR6-NEXT: and $9, $7, $4
-; MIPS64ELR6-NEXT: or $9, $9, $8
-; MIPS64ELR6-NEXT: sc $9, 0($1)
-; MIPS64ELR6-NEXT: beqzc $9, .LBB7_1
+; MIPS64ELR6-NEXT: and $5, $5, $3
+; MIPS64ELR6-NEXT: sltu $11, $8, $5
+; MIPS64ELR6-NEXT: selnez $9, $8, $11
+; MIPS64ELR6-NEXT: seleqz $11, $5, $11
+; MIPS64ELR6-NEXT: or $9, $9, $11
+; MIPS64ELR6-NEXT: and $9, $9, $3
+; MIPS64ELR6-NEXT: and $10, $8, $6
+; MIPS64ELR6-NEXT: or $10, $10, $9
+; MIPS64ELR6-NEXT: sc $10, 0($1)
+; MIPS64ELR6-NEXT: beqzc $10, .LBB7_1
; MIPS64ELR6-NEXT: # %bb.2: # %entry
-; MIPS64ELR6-NEXT: and $6, $7, $3
-; MIPS64ELR6-NEXT: srlv $6, $6, $2
-; MIPS64ELR6-NEXT: seh $6, $6
+; MIPS64ELR6-NEXT: and $7, $8, $3
+; MIPS64ELR6-NEXT: srlv $7, $7, $2
+; MIPS64ELR6-NEXT: seh $7, $7
; MIPS64ELR6-NEXT: # %bb.3: # %entry
-; MIPS64ELR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64ELR6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64ELR6-NEXT: # %bb.4: # %entry
; MIPS64ELR6-NEXT: sync
; MIPS64ELR6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -3079,26 +3079,26 @@ define i8 @test_max_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64-NEXT: sll $2, $2, 3
; MIPS64-NEXT: ori $3, $zero, 255
; MIPS64-NEXT: sllv $3, $3, $2
-; MIPS64-NEXT: nor $4, $zero, $3
+; MIPS64-NEXT: nor $6, $zero, $3
; MIPS64-NEXT: sllv $5, $5, $2
; MIPS64-NEXT: .LBB8_1: # %entry
; MIPS64-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64-NEXT: ll $7, 0($1)
-; MIPS64-NEXT: slt $10, $7, $5
-; MIPS64-NEXT: move $8, $7
-; MIPS64-NEXT: movn $8, $5, $10
-; MIPS64-NEXT: and $8, $8, $3
-; MIPS64-NEXT: and $9, $7, $4
-; MIPS64-NEXT: or $9, $9, $8
-; MIPS64-NEXT: sc $9, 0($1)
-; MIPS64-NEXT: beqz $9, .LBB8_1
+; MIPS64-NEXT: ll $8, 0($1)
+; MIPS64-NEXT: slt $11, $8, $5
+; MIPS64-NEXT: move $9, $8
+; MIPS64-NEXT: movn $9, $5, $11
+; MIPS64-NEXT: and $9, $9, $3
+; MIPS64-NEXT: and $10, $8, $6
+; MIPS64-NEXT: or $10, $10, $9
+; MIPS64-NEXT: sc $10, 0($1)
+; MIPS64-NEXT: beqz $10, .LBB8_1
; MIPS64-NEXT: nop
; MIPS64-NEXT: # %bb.2: # %entry
-; MIPS64-NEXT: and $6, $7, $3
-; MIPS64-NEXT: srlv $6, $6, $2
-; MIPS64-NEXT: seh $6, $6
+; MIPS64-NEXT: and $7, $8, $3
+; MIPS64-NEXT: srlv $7, $7, $2
+; MIPS64-NEXT: seh $7, $7
; MIPS64-NEXT: # %bb.3: # %entry
-; MIPS64-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64-NEXT: # %bb.4: # %entry
; MIPS64-NEXT: sync
; MIPS64-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -3119,26 +3119,26 @@ define i8 @test_max_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64R6-NEXT: sll $2, $2, 3
; MIPS64R6-NEXT: ori $3, $zero, 255
; MIPS64R6-NEXT: sllv $3, $3, $2
-; MIPS64R6-NEXT: nor $4, $zero, $3
+; MIPS64R6-NEXT: nor $6, $zero, $3
; MIPS64R6-NEXT: sllv $5, $5, $2
; MIPS64R6-NEXT: .LBB8_1: # %entry
; MIPS64R6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6-NEXT: ll $7, 0($1)
-; MIPS64R6-NEXT: slt $10, $7, $5
-; MIPS64R6-NEXT: seleqz $8, $7, $10
-; MIPS64R6-NEXT: selnez $10, $5, $10
-; MIPS64R6-NEXT: or $8, $8, $10
-; MIPS64R6-NEXT: and $8, $8, $3
-; MIPS64R6-NEXT: and $9, $7, $4
-; MIPS64R6-NEXT: or $9, $9, $8
-; MIPS64R6-NEXT: sc $9, 0($1)
-; MIPS64R6-NEXT: beqzc $9, .LBB8_1
+; MIPS64R6-NEXT: ll $8, 0($1)
+; MIPS64R6-NEXT: slt $11, $8, $5
+; MIPS64R6-NEXT: seleqz $9, $8, $11
+; MIPS64R6-NEXT: selnez $11, $5, $11
+; MIPS64R6-NEXT: or $9, $9, $11
+; MIPS64R6-NEXT: and $9, $9, $3
+; MIPS64R6-NEXT: and $10, $8, $6
+; MIPS64R6-NEXT: or $10, $10, $9
+; MIPS64R6-NEXT: sc $10, 0($1)
+; MIPS64R6-NEXT: beqzc $10, .LBB8_1
; MIPS64R6-NEXT: # %bb.2: # %entry
-; MIPS64R6-NEXT: and $6, $7, $3
-; MIPS64R6-NEXT: srlv $6, $6, $2
-; MIPS64R6-NEXT: seh $6, $6
+; MIPS64R6-NEXT: and $7, $8, $3
+; MIPS64R6-NEXT: srlv $7, $7, $2
+; MIPS64R6-NEXT: seh $7, $7
; MIPS64R6-NEXT: # %bb.3: # %entry
-; MIPS64R6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64R6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64R6-NEXT: # %bb.4: # %entry
; MIPS64R6-NEXT: sync
; MIPS64R6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -3157,28 +3157,28 @@ define i8 @test_max_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64EL-NEXT: sll $2, $2, 3
; MIPS64EL-NEXT: ori $3, $zero, 255
; MIPS64EL-NEXT: sllv $3, $3, $2
-; MIPS64EL-NEXT: nor $4, $zero, $3
+; MIPS64EL-NEXT: nor $6, $zero, $3
; MIPS64EL-NEXT: sllv $5, $5, $2
; MIPS64EL-NEXT: .LBB8_1: # %entry
; MIPS64EL-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64EL-NEXT: ll $7, 0($1)
-; MIPS64EL-NEXT: and $7, $7, $3
-; MIPS64EL-NEXT: and $5, $5, $3
-; MIPS64EL-NEXT: slt $10, $7, $5
-; MIPS64EL-NEXT: move $8, $7
-; MIPS64EL-NEXT: movn $8, $5, $10
+; MIPS64EL-NEXT: ll $8, 0($1)
; MIPS64EL-NEXT: and $8, $8, $3
-; MIPS64EL-NEXT: and $9, $7, $4
-; MIPS64EL-NEXT: or $9, $9, $8
-; MIPS64EL-NEXT: sc $9, 0($1)
-; MIPS64EL-NEXT: beqz $9, .LBB8_1
+; MIPS64EL-NEXT: and $5, $5, $3
+; MIPS64EL-NEXT: slt $11, $8, $5
+; MIPS64EL-NEXT: move $9, $8
+; MIPS64EL-NEXT: movn $9, $5, $11
+; MIPS64EL-NEXT: and $9, $9, $3
+; MIPS64EL-NEXT: and $10, $8, $6
+; MIPS64EL-NEXT: or $10, $10, $9
+; MIPS64EL-NEXT: sc $10, 0($1)
+; MIPS64EL-NEXT: beqz $10, .LBB8_1
; MIPS64EL-NEXT: nop
; MIPS64EL-NEXT: # %bb.2: # %entry
-; MIPS64EL-NEXT: and $6, $7, $3
-; MIPS64EL-NEXT: srlv $6, $6, $2
-; MIPS64EL-NEXT: seh $6, $6
+; MIPS64EL-NEXT: and $7, $8, $3
+; MIPS64EL-NEXT: srlv $7, $7, $2
+; MIPS64EL-NEXT: seh $7, $7
; MIPS64EL-NEXT: # %bb.3: # %entry
-; MIPS64EL-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64EL-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64EL-NEXT: # %bb.4: # %entry
; MIPS64EL-NEXT: sync
; MIPS64EL-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -3198,28 +3198,28 @@ define i8 @test_max_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64ELR6-NEXT: sll $2, $2, 3
; MIPS64ELR6-NEXT: ori $3, $zero, 255
; MIPS64ELR6-NEXT: sllv $3, $3, $2
-; MIPS64ELR6-NEXT: nor $4, $zero, $3
+; MIPS64ELR6-NEXT: nor $6, $zero, $3
; MIPS64ELR6-NEXT: sllv $5, $5, $2
; MIPS64ELR6-NEXT: .LBB8_1: # %entry
; MIPS64ELR6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64ELR6-NEXT: ll $7, 0($1)
-; MIPS64ELR6-NEXT: and $7, $7, $3
-; MIPS64ELR6-NEXT: and $5, $5, $3
-; MIPS64ELR6-NEXT: slt $10, $7, $5
-; MIPS64ELR6-NEXT: seleqz $8, $7, $10
-; MIPS64ELR6-NEXT: selnez $10, $5, $10
-; MIPS64ELR6-NEXT: or $8, $8, $10
+; MIPS64ELR6-NEXT: ll $8, 0($1)
; MIPS64ELR6-NEXT: and $8, $8, $3
-; MIPS64ELR6-NEXT: and $9, $7, $4
-; MIPS64ELR6-NEXT: or $9, $9, $8
-; MIPS64ELR6-NEXT: sc $9, 0($1)
-; MIPS64ELR6-NEXT: beqzc $9, .LBB8_1
+; MIPS64ELR6-NEXT: and $5, $5, $3
+; MIPS64ELR6-NEXT: slt $11, $8, $5
+; MIPS64ELR6-NEXT: seleqz $9, $8, $11
+; MIPS64ELR6-NEXT: selnez $11, $5, $11
+; MIPS64ELR6-NEXT: or $9, $9, $11
+; MIPS64ELR6-NEXT: and $9, $9, $3
+; MIPS64ELR6-NEXT: and $10, $8, $6
+; MIPS64ELR6-NEXT: or $10, $10, $9
+; MIPS64ELR6-NEXT: sc $10, 0($1)
+; MIPS64ELR6-NEXT: beqzc $10, .LBB8_1
; MIPS64ELR6-NEXT: # %bb.2: # %entry
-; MIPS64ELR6-NEXT: and $6, $7, $3
-; MIPS64ELR6-NEXT: srlv $6, $6, $2
-; MIPS64ELR6-NEXT: seh $6, $6
+; MIPS64ELR6-NEXT: and $7, $8, $3
+; MIPS64ELR6-NEXT: srlv $7, $7, $2
+; MIPS64ELR6-NEXT: seh $7, $7
; MIPS64ELR6-NEXT: # %bb.3: # %entry
-; MIPS64ELR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64ELR6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64ELR6-NEXT: # %bb.4: # %entry
; MIPS64ELR6-NEXT: sync
; MIPS64ELR6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -3560,26 +3560,26 @@ define i8 @test_min_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64-NEXT: sll $2, $2, 3
; MIPS64-NEXT: ori $3, $zero, 255
; MIPS64-NEXT: sllv $3, $3, $2
-; MIPS64-NEXT: nor $4, $zero, $3
+; MIPS64-NEXT: nor $6, $zero, $3
; MIPS64-NEXT: sllv $5, $5, $2
; MIPS64-NEXT: .LBB9_1: # %entry
; MIPS64-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64-NEXT: ll $7, 0($1)
-; MIPS64-NEXT: slt $10, $7, $5
-; MIPS64-NEXT: move $8, $7
-; MIPS64-NEXT: movz $8, $5, $10
-; MIPS64-NEXT: and $8, $8, $3
-; MIPS64-NEXT: and $9, $7, $4
-; MIPS64-NEXT: or $9, $9, $8
-; MIPS64-NEXT: sc $9, 0($1)
-; MIPS64-NEXT: beqz $9, .LBB9_1
+; MIPS64-NEXT: ll $8, 0($1)
+; MIPS64-NEXT: slt $11, $8, $5
+; MIPS64-NEXT: move $9, $8
+; MIPS64-NEXT: movz $9, $5, $11
+; MIPS64-NEXT: and $9, $9, $3
+; MIPS64-NEXT: and $10, $8, $6
+; MIPS64-NEXT: or $10, $10, $9
+; MIPS64-NEXT: sc $10, 0($1)
+; MIPS64-NEXT: beqz $10, .LBB9_1
; MIPS64-NEXT: nop
; MIPS64-NEXT: # %bb.2: # %entry
-; MIPS64-NEXT: and $6, $7, $3
-; MIPS64-NEXT: srlv $6, $6, $2
-; MIPS64-NEXT: seh $6, $6
+; MIPS64-NEXT: and $7, $8, $3
+; MIPS64-NEXT: srlv $7, $7, $2
+; MIPS64-NEXT: seh $7, $7
; MIPS64-NEXT: # %bb.3: # %entry
-; MIPS64-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64-NEXT: # %bb.4: # %entry
; MIPS64-NEXT: sync
; MIPS64-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -3600,26 +3600,26 @@ define i8 @test_min_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64R6-NEXT: sll $2, $2, 3
; MIPS64R6-NEXT: ori $3, $zero, 255
; MIPS64R6-NEXT: sllv $3, $3, $2
-; MIPS64R6-NEXT: nor $4, $zero, $3
+; MIPS64R6-NEXT: nor $6, $zero, $3
; MIPS64R6-NEXT: sllv $5, $5, $2
; MIPS64R6-NEXT: .LBB9_1: # %entry
; MIPS64R6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6-NEXT: ll $7, 0($1)
-; MIPS64R6-NEXT: slt $10, $7, $5
-; MIPS64R6-NEXT: selnez $8, $7, $10
-; MIPS64R6-NEXT: seleqz $10, $5, $10
-; MIPS64R6-NEXT: or $8, $8, $10
-; MIPS64R6-NEXT: and $8, $8, $3
-; MIPS64R6-NEXT: and $9, $7, $4
-; MIPS64R6-NEXT: or $9, $9, $8
-; MIPS64R6-NEXT: sc $9, 0($1)
-; MIPS64R6-NEXT: beqzc $9, .LBB9_1
+; MIPS64R6-NEXT: ll $8, 0($1)
+; MIPS64R6-NEXT: slt $11, $8, $5
+; MIPS64R6-NEXT: selnez $9, $8, $11
+; MIPS64R6-NEXT: seleqz $11, $5, $11
+; MIPS64R6-NEXT: or $9, $9, $11
+; MIPS64R6-NEXT: and $9, $9, $3
+; MIPS64R6-NEXT: and $10, $8, $6
+; MIPS64R6-NEXT: or $10, $10, $9
+; MIPS64R6-NEXT: sc $10, 0($1)
+; MIPS64R6-NEXT: beqzc $10, .LBB9_1
; MIPS64R6-NEXT: # %bb.2: # %entry
-; MIPS64R6-NEXT: and $6, $7, $3
-; MIPS64R6-NEXT: srlv $6, $6, $2
-; MIPS64R6-NEXT: seh $6, $6
+; MIPS64R6-NEXT: and $7, $8, $3
+; MIPS64R6-NEXT: srlv $7, $7, $2
+; MIPS64R6-NEXT: seh $7, $7
; MIPS64R6-NEXT: # %bb.3: # %entry
-; MIPS64R6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64R6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64R6-NEXT: # %bb.4: # %entry
; MIPS64R6-NEXT: sync
; MIPS64R6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -3638,28 +3638,28 @@ define i8 @test_min_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64EL-NEXT: sll $2, $2, 3
; MIPS64EL-NEXT: ori $3, $zero, 255
; MIPS64EL-NEXT: sllv $3, $3, $2
-; MIPS64EL-NEXT: nor $4, $zero, $3
+; MIPS64EL-NEXT: nor $6, $zero, $3
; MIPS64EL-NEXT: sllv $5, $5, $2
; MIPS64EL-NEXT: .LBB9_1: # %entry
; MIPS64EL-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64EL-NEXT: ll $7, 0($1)
-; MIPS64EL-NEXT: and $7, $7, $3
-; MIPS64EL-NEXT: and $5, $5, $3
-; MIPS64EL-NEXT: slt $10, $7, $5
-; MIPS64EL-NEXT: move $8, $7
-; MIPS64EL-NEXT: movz $8, $5, $10
+; MIPS64EL-NEXT: ll $8, 0($1)
; MIPS64EL-NEXT: and $8, $8, $3
-; MIPS64EL-NEXT: and $9, $7, $4
-; MIPS64EL-NEXT: or $9, $9, $8
-; MIPS64EL-NEXT: sc $9, 0($1)
-; MIPS64EL-NEXT: beqz $9, .LBB9_1
+; MIPS64EL-NEXT: and $5, $5, $3
+; MIPS64EL-NEXT: slt $11, $8, $5
+; MIPS64EL-NEXT: move $9, $8
+; MIPS64EL-NEXT: movz $9, $5, $11
+; MIPS64EL-NEXT: and $9, $9, $3
+; MIPS64EL-NEXT: and $10, $8, $6
+; MIPS64EL-NEXT: or $10, $10, $9
+; MIPS64EL-NEXT: sc $10, 0($1)
+; MIPS64EL-NEXT: beqz $10, .LBB9_1
; MIPS64EL-NEXT: nop
; MIPS64EL-NEXT: # %bb.2: # %entry
-; MIPS64EL-NEXT: and $6, $7, $3
-; MIPS64EL-NEXT: srlv $6, $6, $2
-; MIPS64EL-NEXT: seh $6, $6
+; MIPS64EL-NEXT: and $7, $8, $3
+; MIPS64EL-NEXT: srlv $7, $7, $2
+; MIPS64EL-NEXT: seh $7, $7
; MIPS64EL-NEXT: # %bb.3: # %entry
-; MIPS64EL-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64EL-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64EL-NEXT: # %bb.4: # %entry
; MIPS64EL-NEXT: sync
; MIPS64EL-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -3679,28 +3679,28 @@ define i8 @test_min_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64ELR6-NEXT: sll $2, $2, 3
; MIPS64ELR6-NEXT: ori $3, $zero, 255
; MIPS64ELR6-NEXT: sllv $3, $3, $2
-; MIPS64ELR6-NEXT: nor $4, $zero, $3
+; MIPS64ELR6-NEXT: nor $6, $zero, $3
; MIPS64ELR6-NEXT: sllv $5, $5, $2
; MIPS64ELR6-NEXT: .LBB9_1: # %entry
; MIPS64ELR6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64ELR6-NEXT: ll $7, 0($1)
-; MIPS64ELR6-NEXT: and $7, $7, $3
-; MIPS64ELR6-NEXT: and $5, $5, $3
-; MIPS64ELR6-NEXT: slt $10, $7, $5
-; MIPS64ELR6-NEXT: selnez $8, $7, $10
-; MIPS64ELR6-NEXT: seleqz $10, $5, $10
-; MIPS64ELR6-NEXT: or $8, $8, $10
+; MIPS64ELR6-NEXT: ll $8, 0($1)
; MIPS64ELR6-NEXT: and $8, $8, $3
-; MIPS64ELR6-NEXT: and $9, $7, $4
-; MIPS64ELR6-NEXT: or $9, $9, $8
-; MIPS64ELR6-NEXT: sc $9, 0($1)
-; MIPS64ELR6-NEXT: beqzc $9, .LBB9_1
+; MIPS64ELR6-NEXT: and $5, $5, $3
+; MIPS64ELR6-NEXT: slt $11, $8, $5
+; MIPS64ELR6-NEXT: selnez $9, $8, $11
+; MIPS64ELR6-NEXT: seleqz $11, $5, $11
+; MIPS64ELR6-NEXT: or $9, $9, $11
+; MIPS64ELR6-NEXT: and $9, $9, $3
+; MIPS64ELR6-NEXT: and $10, $8, $6
+; MIPS64ELR6-NEXT: or $10, $10, $9
+; MIPS64ELR6-NEXT: sc $10, 0($1)
+; MIPS64ELR6-NEXT: beqzc $10, .LBB9_1
; MIPS64ELR6-NEXT: # %bb.2: # %entry
-; MIPS64ELR6-NEXT: and $6, $7, $3
-; MIPS64ELR6-NEXT: srlv $6, $6, $2
-; MIPS64ELR6-NEXT: seh $6, $6
+; MIPS64ELR6-NEXT: and $7, $8, $3
+; MIPS64ELR6-NEXT: srlv $7, $7, $2
+; MIPS64ELR6-NEXT: seh $7, $7
; MIPS64ELR6-NEXT: # %bb.3: # %entry
-; MIPS64ELR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64ELR6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64ELR6-NEXT: # %bb.4: # %entry
; MIPS64ELR6-NEXT: sync
; MIPS64ELR6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -4041,26 +4041,26 @@ define i8 @test_umax_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64-NEXT: sll $2, $2, 3
; MIPS64-NEXT: ori $3, $zero, 255
; MIPS64-NEXT: sllv $3, $3, $2
-; MIPS64-NEXT: nor $4, $zero, $3
+; MIPS64-NEXT: nor $6, $zero, $3
; MIPS64-NEXT: sllv $5, $5, $2
; MIPS64-NEXT: .LBB10_1: # %entry
; MIPS64-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64-NEXT: ll $7, 0($1)
-; MIPS64-NEXT: sltu $10, $7, $5
-; MIPS64-NEXT: move $8, $7
-; MIPS64-NEXT: movn $8, $5, $10
-; MIPS64-NEXT: and $8, $8, $3
-; MIPS64-NEXT: and $9, $7, $4
-; MIPS64-NEXT: or $9, $9, $8
-; MIPS64-NEXT: sc $9, 0($1)
-; MIPS64-NEXT: beqz $9, .LBB10_1
+; MIPS64-NEXT: ll $8, 0($1)
+; MIPS64-NEXT: sltu $11, $8, $5
+; MIPS64-NEXT: move $9, $8
+; MIPS64-NEXT: movn $9, $5, $11
+; MIPS64-NEXT: and $9, $9, $3
+; MIPS64-NEXT: and $10, $8, $6
+; MIPS64-NEXT: or $10, $10, $9
+; MIPS64-NEXT: sc $10, 0($1)
+; MIPS64-NEXT: beqz $10, .LBB10_1
; MIPS64-NEXT: nop
; MIPS64-NEXT: # %bb.2: # %entry
-; MIPS64-NEXT: and $6, $7, $3
-; MIPS64-NEXT: srlv $6, $6, $2
-; MIPS64-NEXT: seh $6, $6
+; MIPS64-NEXT: and $7, $8, $3
+; MIPS64-NEXT: srlv $7, $7, $2
+; MIPS64-NEXT: seh $7, $7
; MIPS64-NEXT: # %bb.3: # %entry
-; MIPS64-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64-NEXT: # %bb.4: # %entry
; MIPS64-NEXT: sync
; MIPS64-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -4081,26 +4081,26 @@ define i8 @test_umax_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64R6-NEXT: sll $2, $2, 3
; MIPS64R6-NEXT: ori $3, $zero, 255
; MIPS64R6-NEXT: sllv $3, $3, $2
-; MIPS64R6-NEXT: nor $4, $zero, $3
+; MIPS64R6-NEXT: nor $6, $zero, $3
; MIPS64R6-NEXT: sllv $5, $5, $2
; MIPS64R6-NEXT: .LBB10_1: # %entry
; MIPS64R6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6-NEXT: ll $7, 0($1)
-; MIPS64R6-NEXT: sltu $10, $7, $5
-; MIPS64R6-NEXT: seleqz $8, $7, $10
-; MIPS64R6-NEXT: selnez $10, $5, $10
-; MIPS64R6-NEXT: or $8, $8, $10
-; MIPS64R6-NEXT: and $8, $8, $3
-; MIPS64R6-NEXT: and $9, $7, $4
-; MIPS64R6-NEXT: or $9, $9, $8
-; MIPS64R6-NEXT: sc $9, 0($1)
-; MIPS64R6-NEXT: beqzc $9, .LBB10_1
+; MIPS64R6-NEXT: ll $8, 0($1)
+; MIPS64R6-NEXT: sltu $11, $8, $5
+; MIPS64R6-NEXT: seleqz $9, $8, $11
+; MIPS64R6-NEXT: selnez $11, $5, $11
+; MIPS64R6-NEXT: or $9, $9, $11
+; MIPS64R6-NEXT: and $9, $9, $3
+; MIPS64R6-NEXT: and $10, $8, $6
+; MIPS64R6-NEXT: or $10, $10, $9
+; MIPS64R6-NEXT: sc $10, 0($1)
+; MIPS64R6-NEXT: beqzc $10, .LBB10_1
; MIPS64R6-NEXT: # %bb.2: # %entry
-; MIPS64R6-NEXT: and $6, $7, $3
-; MIPS64R6-NEXT: srlv $6, $6, $2
-; MIPS64R6-NEXT: seh $6, $6
+; MIPS64R6-NEXT: and $7, $8, $3
+; MIPS64R6-NEXT: srlv $7, $7, $2
+; MIPS64R6-NEXT: seh $7, $7
; MIPS64R6-NEXT: # %bb.3: # %entry
-; MIPS64R6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64R6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64R6-NEXT: # %bb.4: # %entry
; MIPS64R6-NEXT: sync
; MIPS64R6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -4119,28 +4119,28 @@ define i8 @test_umax_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64EL-NEXT: sll $2, $2, 3
; MIPS64EL-NEXT: ori $3, $zero, 255
; MIPS64EL-NEXT: sllv $3, $3, $2
-; MIPS64EL-NEXT: nor $4, $zero, $3
+; MIPS64EL-NEXT: nor $6, $zero, $3
; MIPS64EL-NEXT: sllv $5, $5, $2
; MIPS64EL-NEXT: .LBB10_1: # %entry
; MIPS64EL-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64EL-NEXT: ll $7, 0($1)
-; MIPS64EL-NEXT: and $7, $7, $3
-; MIPS64EL-NEXT: and $5, $5, $3
-; MIPS64EL-NEXT: sltu $10, $7, $5
-; MIPS64EL-NEXT: move $8, $7
-; MIPS64EL-NEXT: movn $8, $5, $10
+; MIPS64EL-NEXT: ll $8, 0($1)
; MIPS64EL-NEXT: and $8, $8, $3
-; MIPS64EL-NEXT: and $9, $7, $4
-; MIPS64EL-NEXT: or $9, $9, $8
-; MIPS64EL-NEXT: sc $9, 0($1)
-; MIPS64EL-NEXT: beqz $9, .LBB10_1
+; MIPS64EL-NEXT: and $5, $5, $3
+; MIPS64EL-NEXT: sltu $11, $8, $5
+; MIPS64EL-NEXT: move $9, $8
+; MIPS64EL-NEXT: movn $9, $5, $11
+; MIPS64EL-NEXT: and $9, $9, $3
+; MIPS64EL-NEXT: and $10, $8, $6
+; MIPS64EL-NEXT: or $10, $10, $9
+; MIPS64EL-NEXT: sc $10, 0($1)
+; MIPS64EL-NEXT: beqz $10, .LBB10_1
; MIPS64EL-NEXT: nop
; MIPS64EL-NEXT: # %bb.2: # %entry
-; MIPS64EL-NEXT: and $6, $7, $3
-; MIPS64EL-NEXT: srlv $6, $6, $2
-; MIPS64EL-NEXT: seh $6, $6
+; MIPS64EL-NEXT: and $7, $8, $3
+; MIPS64EL-NEXT: srlv $7, $7, $2
+; MIPS64EL-NEXT: seh $7, $7
; MIPS64EL-NEXT: # %bb.3: # %entry
-; MIPS64EL-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64EL-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64EL-NEXT: # %bb.4: # %entry
; MIPS64EL-NEXT: sync
; MIPS64EL-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -4160,28 +4160,28 @@ define i8 @test_umax_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64ELR6-NEXT: sll $2, $2, 3
; MIPS64ELR6-NEXT: ori $3, $zero, 255
; MIPS64ELR6-NEXT: sllv $3, $3, $2
-; MIPS64ELR6-NEXT: nor $4, $zero, $3
+; MIPS64ELR6-NEXT: nor $6, $zero, $3
; MIPS64ELR6-NEXT: sllv $5, $5, $2
; MIPS64ELR6-NEXT: .LBB10_1: # %entry
; MIPS64ELR6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64ELR6-NEXT: ll $7, 0($1)
-; MIPS64ELR6-NEXT: and $7, $7, $3
-; MIPS64ELR6-NEXT: and $5, $5, $3
-; MIPS64ELR6-NEXT: sltu $10, $7, $5
-; MIPS64ELR6-NEXT: seleqz $8, $7, $10
-; MIPS64ELR6-NEXT: selnez $10, $5, $10
-; MIPS64ELR6-NEXT: or $8, $8, $10
+; MIPS64ELR6-NEXT: ll $8, 0($1)
; MIPS64ELR6-NEXT: and $8, $8, $3
-; MIPS64ELR6-NEXT: and $9, $7, $4
-; MIPS64ELR6-NEXT: or $9, $9, $8
-; MIPS64ELR6-NEXT: sc $9, 0($1)
-; MIPS64ELR6-NEXT: beqzc $9, .LBB10_1
+; MIPS64ELR6-NEXT: and $5, $5, $3
+; MIPS64ELR6-NEXT: sltu $11, $8, $5
+; MIPS64ELR6-NEXT: seleqz $9, $8, $11
+; MIPS64ELR6-NEXT: selnez $11, $5, $11
+; MIPS64ELR6-NEXT: or $9, $9, $11
+; MIPS64ELR6-NEXT: and $9, $9, $3
+; MIPS64ELR6-NEXT: and $10, $8, $6
+; MIPS64ELR6-NEXT: or $10, $10, $9
+; MIPS64ELR6-NEXT: sc $10, 0($1)
+; MIPS64ELR6-NEXT: beqzc $10, .LBB10_1
; MIPS64ELR6-NEXT: # %bb.2: # %entry
-; MIPS64ELR6-NEXT: and $6, $7, $3
-; MIPS64ELR6-NEXT: srlv $6, $6, $2
-; MIPS64ELR6-NEXT: seh $6, $6
+; MIPS64ELR6-NEXT: and $7, $8, $3
+; MIPS64ELR6-NEXT: srlv $7, $7, $2
+; MIPS64ELR6-NEXT: seh $7, $7
; MIPS64ELR6-NEXT: # %bb.3: # %entry
-; MIPS64ELR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64ELR6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64ELR6-NEXT: # %bb.4: # %entry
; MIPS64ELR6-NEXT: sync
; MIPS64ELR6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -4522,26 +4522,26 @@ define i8 @test_umin_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64-NEXT: sll $2, $2, 3
; MIPS64-NEXT: ori $3, $zero, 255
; MIPS64-NEXT: sllv $3, $3, $2
-; MIPS64-NEXT: nor $4, $zero, $3
+; MIPS64-NEXT: nor $6, $zero, $3
; MIPS64-NEXT: sllv $5, $5, $2
; MIPS64-NEXT: .LBB11_1: # %entry
; MIPS64-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64-NEXT: ll $7, 0($1)
-; MIPS64-NEXT: sltu $10, $7, $5
-; MIPS64-NEXT: move $8, $7
-; MIPS64-NEXT: movz $8, $5, $10
-; MIPS64-NEXT: and $8, $8, $3
-; MIPS64-NEXT: and $9, $7, $4
-; MIPS64-NEXT: or $9, $9, $8
-; MIPS64-NEXT: sc $9, 0($1)
-; MIPS64-NEXT: beqz $9, .LBB11_1
+; MIPS64-NEXT: ll $8, 0($1)
+; MIPS64-NEXT: sltu $11, $8, $5
+; MIPS64-NEXT: move $9, $8
+; MIPS64-NEXT: movz $9, $5, $11
+; MIPS64-NEXT: and $9, $9, $3
+; MIPS64-NEXT: and $10, $8, $6
+; MIPS64-NEXT: or $10, $10, $9
+; MIPS64-NEXT: sc $10, 0($1)
+; MIPS64-NEXT: beqz $10, .LBB11_1
; MIPS64-NEXT: nop
; MIPS64-NEXT: # %bb.2: # %entry
-; MIPS64-NEXT: and $6, $7, $3
-; MIPS64-NEXT: srlv $6, $6, $2
-; MIPS64-NEXT: seh $6, $6
+; MIPS64-NEXT: and $7, $8, $3
+; MIPS64-NEXT: srlv $7, $7, $2
+; MIPS64-NEXT: seh $7, $7
; MIPS64-NEXT: # %bb.3: # %entry
-; MIPS64-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64-NEXT: # %bb.4: # %entry
; MIPS64-NEXT: sync
; MIPS64-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -4562,26 +4562,26 @@ define i8 @test_umin_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64R6-NEXT: sll $2, $2, 3
; MIPS64R6-NEXT: ori $3, $zero, 255
; MIPS64R6-NEXT: sllv $3, $3, $2
-; MIPS64R6-NEXT: nor $4, $zero, $3
+; MIPS64R6-NEXT: nor $6, $zero, $3
; MIPS64R6-NEXT: sllv $5, $5, $2
; MIPS64R6-NEXT: .LBB11_1: # %entry
; MIPS64R6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6-NEXT: ll $7, 0($1)
-; MIPS64R6-NEXT: sltu $10, $7, $5
-; MIPS64R6-NEXT: selnez $8, $7, $10
-; MIPS64R6-NEXT: seleqz $10, $5, $10
-; MIPS64R6-NEXT: or $8, $8, $10
-; MIPS64R6-NEXT: and $8, $8, $3
-; MIPS64R6-NEXT: and $9, $7, $4
-; MIPS64R6-NEXT: or $9, $9, $8
-; MIPS64R6-NEXT: sc $9, 0($1)
-; MIPS64R6-NEXT: beqzc $9, .LBB11_1
+; MIPS64R6-NEXT: ll $8, 0($1)
+; MIPS64R6-NEXT: sltu $11, $8, $5
+; MIPS64R6-NEXT: selnez $9, $8, $11
+; MIPS64R6-NEXT: seleqz $11, $5, $11
+; MIPS64R6-NEXT: or $9, $9, $11
+; MIPS64R6-NEXT: and $9, $9, $3
+; MIPS64R6-NEXT: and $10, $8, $6
+; MIPS64R6-NEXT: or $10, $10, $9
+; MIPS64R6-NEXT: sc $10, 0($1)
+; MIPS64R6-NEXT: beqzc $10, .LBB11_1
; MIPS64R6-NEXT: # %bb.2: # %entry
-; MIPS64R6-NEXT: and $6, $7, $3
-; MIPS64R6-NEXT: srlv $6, $6, $2
-; MIPS64R6-NEXT: seh $6, $6
+; MIPS64R6-NEXT: and $7, $8, $3
+; MIPS64R6-NEXT: srlv $7, $7, $2
+; MIPS64R6-NEXT: seh $7, $7
; MIPS64R6-NEXT: # %bb.3: # %entry
-; MIPS64R6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64R6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64R6-NEXT: # %bb.4: # %entry
; MIPS64R6-NEXT: sync
; MIPS64R6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -4600,28 +4600,28 @@ define i8 @test_umin_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64EL-NEXT: sll $2, $2, 3
; MIPS64EL-NEXT: ori $3, $zero, 255
; MIPS64EL-NEXT: sllv $3, $3, $2
-; MIPS64EL-NEXT: nor $4, $zero, $3
+; MIPS64EL-NEXT: nor $6, $zero, $3
; MIPS64EL-NEXT: sllv $5, $5, $2
; MIPS64EL-NEXT: .LBB11_1: # %entry
; MIPS64EL-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64EL-NEXT: ll $7, 0($1)
-; MIPS64EL-NEXT: and $7, $7, $3
-; MIPS64EL-NEXT: and $5, $5, $3
-; MIPS64EL-NEXT: sltu $10, $7, $5
-; MIPS64EL-NEXT: move $8, $7
-; MIPS64EL-NEXT: movz $8, $5, $10
+; MIPS64EL-NEXT: ll $8, 0($1)
; MIPS64EL-NEXT: and $8, $8, $3
-; MIPS64EL-NEXT: and $9, $7, $4
-; MIPS64EL-NEXT: or $9, $9, $8
-; MIPS64EL-NEXT: sc $9, 0($1)
-; MIPS64EL-NEXT: beqz $9, .LBB11_1
+; MIPS64EL-NEXT: and $5, $5, $3
+; MIPS64EL-NEXT: sltu $11, $8, $5
+; MIPS64EL-NEXT: move $9, $8
+; MIPS64EL-NEXT: movz $9, $5, $11
+; MIPS64EL-NEXT: and $9, $9, $3
+; MIPS64EL-NEXT: and $10, $8, $6
+; MIPS64EL-NEXT: or $10, $10, $9
+; MIPS64EL-NEXT: sc $10, 0($1)
+; MIPS64EL-NEXT: beqz $10, .LBB11_1
; MIPS64EL-NEXT: nop
; MIPS64EL-NEXT: # %bb.2: # %entry
-; MIPS64EL-NEXT: and $6, $7, $3
-; MIPS64EL-NEXT: srlv $6, $6, $2
-; MIPS64EL-NEXT: seh $6, $6
+; MIPS64EL-NEXT: and $7, $8, $3
+; MIPS64EL-NEXT: srlv $7, $7, $2
+; MIPS64EL-NEXT: seh $7, $7
; MIPS64EL-NEXT: # %bb.3: # %entry
-; MIPS64EL-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64EL-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64EL-NEXT: # %bb.4: # %entry
; MIPS64EL-NEXT: sync
; MIPS64EL-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -4641,28 +4641,28 @@ define i8 @test_umin_8(i8* nocapture %ptr, i8 signext %val) {
; MIPS64ELR6-NEXT: sll $2, $2, 3
; MIPS64ELR6-NEXT: ori $3, $zero, 255
; MIPS64ELR6-NEXT: sllv $3, $3, $2
-; MIPS64ELR6-NEXT: nor $4, $zero, $3
+; MIPS64ELR6-NEXT: nor $6, $zero, $3
; MIPS64ELR6-NEXT: sllv $5, $5, $2
; MIPS64ELR6-NEXT: .LBB11_1: # %entry
; MIPS64ELR6-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64ELR6-NEXT: ll $7, 0($1)
-; MIPS64ELR6-NEXT: and $7, $7, $3
-; MIPS64ELR6-NEXT: and $5, $5, $3
-; MIPS64ELR6-NEXT: sltu $10, $7, $5
-; MIPS64ELR6-NEXT: selnez $8, $7, $10
-; MIPS64ELR6-NEXT: seleqz $10, $5, $10
-; MIPS64ELR6-NEXT: or $8, $8, $10
+; MIPS64ELR6-NEXT: ll $8, 0($1)
; MIPS64ELR6-NEXT: and $8, $8, $3
-; MIPS64ELR6-NEXT: and $9, $7, $4
-; MIPS64ELR6-NEXT: or $9, $9, $8
-; MIPS64ELR6-NEXT: sc $9, 0($1)
-; MIPS64ELR6-NEXT: beqzc $9, .LBB11_1
+; MIPS64ELR6-NEXT: and $5, $5, $3
+; MIPS64ELR6-NEXT: sltu $11, $8, $5
+; MIPS64ELR6-NEXT: selnez $9, $8, $11
+; MIPS64ELR6-NEXT: seleqz $11, $5, $11
+; MIPS64ELR6-NEXT: or $9, $9, $11
+; MIPS64ELR6-NEXT: and $9, $9, $3
+; MIPS64ELR6-NEXT: and $10, $8, $6
+; MIPS64ELR6-NEXT: or $10, $10, $9
+; MIPS64ELR6-NEXT: sc $10, 0($1)
+; MIPS64ELR6-NEXT: beqzc $10, .LBB11_1
; MIPS64ELR6-NEXT: # %bb.2: # %entry
-; MIPS64ELR6-NEXT: and $6, $7, $3
-; MIPS64ELR6-NEXT: srlv $6, $6, $2
-; MIPS64ELR6-NEXT: seh $6, $6
+; MIPS64ELR6-NEXT: and $7, $8, $3
+; MIPS64ELR6-NEXT: srlv $7, $7, $2
+; MIPS64ELR6-NEXT: seh $7, $7
; MIPS64ELR6-NEXT: # %bb.3: # %entry
-; MIPS64ELR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64ELR6-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64ELR6-NEXT: # %bb.4: # %entry
; MIPS64ELR6-NEXT: sync
; MIPS64ELR6-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
diff --git a/llvm/test/CodeGen/Mips/atomic.ll b/llvm/test/CodeGen/Mips/atomic.ll
index 59ff83e4969cc..3846fda47b138 100644
--- a/llvm/test/CodeGen/Mips/atomic.ll
+++ b/llvm/test/CodeGen/Mips/atomic.ll
@@ -2559,28 +2559,28 @@ define signext i8 @AtomicLoadAdd8(i8 signext %incr) nounwind {
; MIPS64R6O0-NEXT: ld $1, %got_disp(y)($1)
; MIPS64R6O0-NEXT: daddiu $2, $zero, -4
; MIPS64R6O0-NEXT: and $2, $1, $2
-; MIPS64R6O0-NEXT: andi $1, $1, 3
-; MIPS64R6O0-NEXT: xori $1, $1, 3
-; MIPS64R6O0-NEXT: sll $1, $1, 3
-; MIPS64R6O0-NEXT: ori $3, $zero, 255
-; MIPS64R6O0-NEXT: sllv $3, $3, $1
-; MIPS64R6O0-NEXT: nor $5, $zero, $3
-; MIPS64R6O0-NEXT: sllv $4, $4, $1
+; MIPS64R6O0-NEXT: andi $3, $1, 3
+; MIPS64R6O0-NEXT: xori $3, $3, 3
+; MIPS64R6O0-NEXT: sll $3, $3, 3
+; MIPS64R6O0-NEXT: ori $5, $zero, 255
+; MIPS64R6O0-NEXT: sllv $5, $5, $3
+; MIPS64R6O0-NEXT: nor $6, $zero, $5
+; MIPS64R6O0-NEXT: sllv $4, $4, $3
; MIPS64R6O0-NEXT: .LBB8_1: # %entry
; MIPS64R6O0-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6O0-NEXT: ll $7, 0($2)
-; MIPS64R6O0-NEXT: addu $8, $7, $4
-; MIPS64R6O0-NEXT: and $8, $8, $3
-; MIPS64R6O0-NEXT: and $9, $7, $5
-; MIPS64R6O0-NEXT: or $9, $9, $8
-; MIPS64R6O0-NEXT: sc $9, 0($2)
-; MIPS64R6O0-NEXT: beqzc $9, .LBB8_1
+; MIPS64R6O0-NEXT: ll $8, 0($2)
+; MIPS64R6O0-NEXT: addu $9, $8, $4
+; MIPS64R6O0-NEXT: and $9, $9, $5
+; MIPS64R6O0-NEXT: and $10, $8, $6
+; MIPS64R6O0-NEXT: or $10, $10, $9
+; MIPS64R6O0-NEXT: sc $10, 0($2)
+; MIPS64R6O0-NEXT: beqzc $10, .LBB8_1
; MIPS64R6O0-NEXT: # %bb.2: # %entry
-; MIPS64R6O0-NEXT: and $6, $7, $3
-; MIPS64R6O0-NEXT: srlv $6, $6, $1
-; MIPS64R6O0-NEXT: seb $6, $6
+; MIPS64R6O0-NEXT: and $7, $8, $5
+; MIPS64R6O0-NEXT: srlv $7, $7, $3
+; MIPS64R6O0-NEXT: seb $7, $7
; MIPS64R6O0-NEXT: # %bb.3: # %entry
-; MIPS64R6O0-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64R6O0-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64R6O0-NEXT: # %bb.4: # %entry
; MIPS64R6O0-NEXT: lw $1, 12($sp) # 4-byte Folded Reload
; MIPS64R6O0-NEXT: seb $2, $1
@@ -3075,28 +3075,28 @@ define signext i8 @AtomicLoadSub8(i8 signext %incr) nounwind {
; MIPS64R6O0-NEXT: ld $1, %got_disp(y)($1)
; MIPS64R6O0-NEXT: daddiu $2, $zero, -4
; MIPS64R6O0-NEXT: and $2, $1, $2
-; MIPS64R6O0-NEXT: andi $1, $1, 3
-; MIPS64R6O0-NEXT: xori $1, $1, 3
-; MIPS64R6O0-NEXT: sll $1, $1, 3
-; MIPS64R6O0-NEXT: ori $3, $zero, 255
-; MIPS64R6O0-NEXT: sllv $3, $3, $1
-; MIPS64R6O0-NEXT: nor $5, $zero, $3
-; MIPS64R6O0-NEXT: sllv $4, $4, $1
+; MIPS64R6O0-NEXT: andi $3, $1, 3
+; MIPS64R6O0-NEXT: xori $3, $3, 3
+; MIPS64R6O0-NEXT: sll $3, $3, 3
+; MIPS64R6O0-NEXT: ori $5, $zero, 255
+; MIPS64R6O0-NEXT: sllv $5, $5, $3
+; MIPS64R6O0-NEXT: nor $6, $zero, $5
+; MIPS64R6O0-NEXT: sllv $4, $4, $3
; MIPS64R6O0-NEXT: .LBB9_1: # %entry
; MIPS64R6O0-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6O0-NEXT: ll $7, 0($2)
-; MIPS64R6O0-NEXT: subu $8, $7, $4
-; MIPS64R6O0-NEXT: and $8, $8, $3
-; MIPS64R6O0-NEXT: and $9, $7, $5
-; MIPS64R6O0-NEXT: or $9, $9, $8
-; MIPS64R6O0-NEXT: sc $9, 0($2)
-; MIPS64R6O0-NEXT: beqzc $9, .LBB9_1
+; MIPS64R6O0-NEXT: ll $8, 0($2)
+; MIPS64R6O0-NEXT: subu $9, $8, $4
+; MIPS64R6O0-NEXT: and $9, $9, $5
+; MIPS64R6O0-NEXT: and $10, $8, $6
+; MIPS64R6O0-NEXT: or $10, $10, $9
+; MIPS64R6O0-NEXT: sc $10, 0($2)
+; MIPS64R6O0-NEXT: beqzc $10, .LBB9_1
; MIPS64R6O0-NEXT: # %bb.2: # %entry
-; MIPS64R6O0-NEXT: and $6, $7, $3
-; MIPS64R6O0-NEXT: srlv $6, $6, $1
-; MIPS64R6O0-NEXT: seb $6, $6
+; MIPS64R6O0-NEXT: and $7, $8, $5
+; MIPS64R6O0-NEXT: srlv $7, $7, $3
+; MIPS64R6O0-NEXT: seb $7, $7
; MIPS64R6O0-NEXT: # %bb.3: # %entry
-; MIPS64R6O0-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64R6O0-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64R6O0-NEXT: # %bb.4: # %entry
; MIPS64R6O0-NEXT: lw $1, 12($sp) # 4-byte Folded Reload
; MIPS64R6O0-NEXT: seb $2, $1
@@ -3601,29 +3601,29 @@ define signext i8 @AtomicLoadNand8(i8 signext %incr) nounwind {
; MIPS64R6O0-NEXT: ld $1, %got_disp(y)($1)
; MIPS64R6O0-NEXT: daddiu $2, $zero, -4
; MIPS64R6O0-NEXT: and $2, $1, $2
-; MIPS64R6O0-NEXT: andi $1, $1, 3
-; MIPS64R6O0-NEXT: xori $1, $1, 3
-; MIPS64R6O0-NEXT: sll $1, $1, 3
-; MIPS64R6O0-NEXT: ori $3, $zero, 255
-; MIPS64R6O0-NEXT: sllv $3, $3, $1
-; MIPS64R6O0-NEXT: nor $5, $zero, $3
-; MIPS64R6O0-NEXT: sllv $4, $4, $1
+; MIPS64R6O0-NEXT: andi $3, $1, 3
+; MIPS64R6O0-NEXT: xori $3, $3, 3
+; MIPS64R6O0-NEXT: sll $3, $3, 3
+; MIPS64R6O0-NEXT: ori $5, $zero, 255
+; MIPS64R6O0-NEXT: sllv $5, $5, $3
+; MIPS64R6O0-NEXT: nor $6, $zero, $5
+; MIPS64R6O0-NEXT: sllv $4, $4, $3
; MIPS64R6O0-NEXT: .LBB10_1: # %entry
; MIPS64R6O0-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6O0-NEXT: ll $7, 0($2)
-; MIPS64R6O0-NEXT: and $8, $7, $4
-; MIPS64R6O0-NEXT: nor $8, $zero, $8
-; MIPS64R6O0-NEXT: and $8, $8, $3
-; MIPS64R6O0-NEXT: and $9, $7, $5
-; MIPS64R6O0-NEXT: or $9, $9, $8
-; MIPS64R6O0-NEXT: sc $9, 0($2)
-; MIPS64R6O0-NEXT: beqzc $9, .LBB10_1
+; MIPS64R6O0-NEXT: ll $8, 0($2)
+; MIPS64R6O0-NEXT: and $9, $8, $4
+; MIPS64R6O0-NEXT: nor $9, $zero, $9
+; MIPS64R6O0-NEXT: and $9, $9, $5
+; MIPS64R6O0-NEXT: and $10, $8, $6
+; MIPS64R6O0-NEXT: or $10, $10, $9
+; MIPS64R6O0-NEXT: sc $10, 0($2)
+; MIPS64R6O0-NEXT: beqzc $10, .LBB10_1
; MIPS64R6O0-NEXT: # %bb.2: # %entry
-; MIPS64R6O0-NEXT: and $6, $7, $3
-; MIPS64R6O0-NEXT: srlv $6, $6, $1
-; MIPS64R6O0-NEXT: seb $6, $6
+; MIPS64R6O0-NEXT: and $7, $8, $5
+; MIPS64R6O0-NEXT: srlv $7, $7, $3
+; MIPS64R6O0-NEXT: seb $7, $7
; MIPS64R6O0-NEXT: # %bb.3: # %entry
-; MIPS64R6O0-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64R6O0-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64R6O0-NEXT: # %bb.4: # %entry
; MIPS64R6O0-NEXT: lw $1, 12($sp) # 4-byte Folded Reload
; MIPS64R6O0-NEXT: seb $2, $1
@@ -4115,27 +4115,27 @@ define signext i8 @AtomicSwap8(i8 signext %newval) nounwind {
; MIPS64R6O0-NEXT: ld $1, %got_disp(y)($1)
; MIPS64R6O0-NEXT: daddiu $2, $zero, -4
; MIPS64R6O0-NEXT: and $2, $1, $2
-; MIPS64R6O0-NEXT: andi $1, $1, 3
-; MIPS64R6O0-NEXT: xori $1, $1, 3
-; MIPS64R6O0-NEXT: sll $1, $1, 3
-; MIPS64R6O0-NEXT: ori $3, $zero, 255
-; MIPS64R6O0-NEXT: sllv $3, $3, $1
-; MIPS64R6O0-NEXT: nor $5, $zero, $3
-; MIPS64R6O0-NEXT: sllv $4, $4, $1
+; MIPS64R6O0-NEXT: andi $3, $1, 3
+; MIPS64R6O0-NEXT: xori $3, $3, 3
+; MIPS64R6O0-NEXT: sll $3, $3, 3
+; MIPS64R6O0-NEXT: ori $5, $zero, 255
+; MIPS64R6O0-NEXT: sllv $5, $5, $3
+; MIPS64R6O0-NEXT: nor $6, $zero, $5
+; MIPS64R6O0-NEXT: sllv $4, $4, $3
; MIPS64R6O0-NEXT: .LBB11_1: # %entry
; MIPS64R6O0-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6O0-NEXT: ll $7, 0($2)
-; MIPS64R6O0-NEXT: and $8, $4, $3
-; MIPS64R6O0-NEXT: and $9, $7, $5
-; MIPS64R6O0-NEXT: or $9, $9, $8
-; MIPS64R6O0-NEXT: sc $9, 0($2)
-; MIPS64R6O0-NEXT: beqzc $9, .LBB11_1
+; MIPS64R6O0-NEXT: ll $8, 0($2)
+; MIPS64R6O0-NEXT: and $9, $4, $5
+; MIPS64R6O0-NEXT: and $10, $8, $6
+; MIPS64R6O0-NEXT: or $10, $10, $9
+; MIPS64R6O0-NEXT: sc $10, 0($2)
+; MIPS64R6O0-NEXT: beqzc $10, .LBB11_1
; MIPS64R6O0-NEXT: # %bb.2: # %entry
-; MIPS64R6O0-NEXT: and $6, $7, $3
-; MIPS64R6O0-NEXT: srlv $6, $6, $1
-; MIPS64R6O0-NEXT: seb $6, $6
+; MIPS64R6O0-NEXT: and $7, $8, $5
+; MIPS64R6O0-NEXT: srlv $7, $7, $3
+; MIPS64R6O0-NEXT: seb $7, $7
; MIPS64R6O0-NEXT: # %bb.3: # %entry
-; MIPS64R6O0-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64R6O0-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64R6O0-NEXT: # %bb.4: # %entry
; MIPS64R6O0-NEXT: lw $1, 12($sp) # 4-byte Folded Reload
; MIPS64R6O0-NEXT: seb $2, $1
@@ -4666,32 +4666,32 @@ define signext i8 @AtomicCmpSwap8(i8 signext %oldval, i8 signext %newval) nounwi
; MIPS64R6O0-NEXT: ld $1, %got_disp(y)($1)
; MIPS64R6O0-NEXT: daddiu $2, $zero, -4
; MIPS64R6O0-NEXT: and $2, $1, $2
-; MIPS64R6O0-NEXT: andi $1, $1, 3
-; MIPS64R6O0-NEXT: xori $1, $1, 3
-; MIPS64R6O0-NEXT: sll $1, $1, 3
-; MIPS64R6O0-NEXT: ori $3, $zero, 255
-; MIPS64R6O0-NEXT: sllv $3, $3, $1
-; MIPS64R6O0-NEXT: nor $6, $zero, $3
+; MIPS64R6O0-NEXT: andi $3, $1, 3
+; MIPS64R6O0-NEXT: xori $3, $3, 3
+; MIPS64R6O0-NEXT: sll $3, $3, 3
+; MIPS64R6O0-NEXT: ori $6, $zero, 255
+; MIPS64R6O0-NEXT: sllv $6, $6, $3
+; MIPS64R6O0-NEXT: nor $7, $zero, $6
; MIPS64R6O0-NEXT: andi $4, $4, 255
-; MIPS64R6O0-NEXT: sllv $4, $4, $1
+; MIPS64R6O0-NEXT: sllv $4, $4, $3
; MIPS64R6O0-NEXT: andi $5, $5, 255
-; MIPS64R6O0-NEXT: sllv $5, $5, $1
+; MIPS64R6O0-NEXT: sllv $5, $5, $3
; MIPS64R6O0-NEXT: .LBB12_1: # %entry
; MIPS64R6O0-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6O0-NEXT: ll $8, 0($2)
-; MIPS64R6O0-NEXT: and $9, $8, $3
-; MIPS64R6O0-NEXT: bnec $9, $4, .LBB12_3
+; MIPS64R6O0-NEXT: ll $9, 0($2)
+; MIPS64R6O0-NEXT: and $10, $9, $6
+; MIPS64R6O0-NEXT: bnec $10, $4, .LBB12_3
; MIPS64R6O0-NEXT: # %bb.2: # %entry
; MIPS64R6O0-NEXT: # in Loop: Header=BB12_1 Depth=1
-; MIPS64R6O0-NEXT: and $8, $8, $6
-; MIPS64R6O0-NEXT: or $8, $8, $5
-; MIPS64R6O0-NEXT: sc $8, 0($2)
-; MIPS64R6O0-NEXT: beqzc $8, .LBB12_1
+; MIPS64R6O0-NEXT: and $9, $9, $7
+; MIPS64R6O0-NEXT: or $9, $9, $5
+; MIPS64R6O0-NEXT: sc $9, 0($2)
+; MIPS64R6O0-NEXT: beqzc $9, .LBB12_1
; MIPS64R6O0-NEXT: .LBB12_3: # %entry
-; MIPS64R6O0-NEXT: srlv $7, $9, $1
-; MIPS64R6O0-NEXT: seb $7, $7
+; MIPS64R6O0-NEXT: srlv $8, $10, $3
+; MIPS64R6O0-NEXT: seb $8, $8
; MIPS64R6O0-NEXT: # %bb.4: # %entry
-; MIPS64R6O0-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
+; MIPS64R6O0-NEXT: sw $8, 12($sp) # 4-byte Folded Spill
; MIPS64R6O0-NEXT: # %bb.5: # %entry
; MIPS64R6O0-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
; MIPS64R6O0-NEXT: daddiu $sp, $sp, 16
@@ -5236,28 +5236,28 @@ define i1 @AtomicCmpSwapRes8(i8* %ptr, i8 signext %oldval, i8 signext %newval) n
; MIPS64R6O0-NEXT: sll $2, $2, 3
; MIPS64R6O0-NEXT: ori $3, $zero, 255
; MIPS64R6O0-NEXT: sllv $3, $3, $2
-; MIPS64R6O0-NEXT: nor $4, $zero, $3
-; MIPS64R6O0-NEXT: andi $7, $5, 255
-; MIPS64R6O0-NEXT: sllv $7, $7, $2
+; MIPS64R6O0-NEXT: nor $7, $zero, $3
+; MIPS64R6O0-NEXT: andi $8, $5, 255
+; MIPS64R6O0-NEXT: sllv $8, $8, $2
; MIPS64R6O0-NEXT: andi $6, $6, 255
; MIPS64R6O0-NEXT: sllv $6, $6, $2
; MIPS64R6O0-NEXT: .LBB13_1: # %entry
; MIPS64R6O0-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6O0-NEXT: ll $9, 0($1)
-; MIPS64R6O0-NEXT: and $10, $9, $3
-; MIPS64R6O0-NEXT: bnec $10, $7, .LBB13_3
+; MIPS64R6O0-NEXT: ll $10, 0($1)
+; MIPS64R6O0-NEXT: and $11, $10, $3
+; MIPS64R6O0-NEXT: bnec $11, $8, .LBB13_3
; MIPS64R6O0-NEXT: # %bb.2: # %entry
; MIPS64R6O0-NEXT: # in Loop: Header=BB13_1 Depth=1
-; MIPS64R6O0-NEXT: and $9, $9, $4
-; MIPS64R6O0-NEXT: or $9, $9, $6
-; MIPS64R6O0-NEXT: sc $9, 0($1)
-; MIPS64R6O0-NEXT: beqzc $9, .LBB13_1
+; MIPS64R6O0-NEXT: and $10, $10, $7
+; MIPS64R6O0-NEXT: or $10, $10, $6
+; MIPS64R6O0-NEXT: sc $10, 0($1)
+; MIPS64R6O0-NEXT: beqzc $10, .LBB13_1
; MIPS64R6O0-NEXT: .LBB13_3: # %entry
-; MIPS64R6O0-NEXT: srlv $8, $10, $2
-; MIPS64R6O0-NEXT: seb $8, $8
+; MIPS64R6O0-NEXT: srlv $9, $11, $2
+; MIPS64R6O0-NEXT: seb $9, $9
; MIPS64R6O0-NEXT: # %bb.4: # %entry
; MIPS64R6O0-NEXT: sw $5, 12($sp) # 4-byte Folded Spill
-; MIPS64R6O0-NEXT: sw $8, 8($sp) # 4-byte Folded Spill
+; MIPS64R6O0-NEXT: sw $9, 8($sp) # 4-byte Folded Spill
; MIPS64R6O0-NEXT: # %bb.5: # %entry
; MIPS64R6O0-NEXT: lw $1, 8($sp) # 4-byte Folded Reload
; MIPS64R6O0-NEXT: lw $2, 12($sp) # 4-byte Folded Reload
@@ -5775,28 +5775,28 @@ define signext i16 @AtomicLoadAdd16(i16 signext %incr) nounwind {
; MIPS64R6O0-NEXT: ld $1, %got_disp(z)($1)
; MIPS64R6O0-NEXT: daddiu $2, $zero, -4
; MIPS64R6O0-NEXT: and $2, $1, $2
-; MIPS64R6O0-NEXT: andi $1, $1, 3
-; MIPS64R6O0-NEXT: xori $1, $1, 2
-; MIPS64R6O0-NEXT: sll $1, $1, 3
-; MIPS64R6O0-NEXT: ori $3, $zero, 65535
-; MIPS64R6O0-NEXT: sllv $3, $3, $1
-; MIPS64R6O0-NEXT: nor $5, $zero, $3
-; MIPS64R6O0-NEXT: sllv $4, $4, $1
+; MIPS64R6O0-NEXT: andi $3, $1, 3
+; MIPS64R6O0-NEXT: xori $3, $3, 2
+; MIPS64R6O0-NEXT: sll $3, $3, 3
+; MIPS64R6O0-NEXT: ori $5, $zero, 65535
+; MIPS64R6O0-NEXT: sllv $5, $5, $3
+; MIPS64R6O0-NEXT: nor $6, $zero, $5
+; MIPS64R6O0-NEXT: sllv $4, $4, $3
; MIPS64R6O0-NEXT: .LBB14_1: # %entry
; MIPS64R6O0-NEXT: # =>This Inner Loop Header: Depth=1
-; MIPS64R6O0-NEXT: ll $7, 0($2)
-; MIPS64R6O0-NEXT: addu $8, $7, $4
-; MIPS64R6O0-NEXT: and $8, $8, $3
-; MIPS64R6O0-NEXT: and $9, $7, $5
-; MIPS64R6O0-NEXT: or $9, $9, $8
-; MIPS64R6O0-NEXT: sc $9, 0($2)
-; MIPS64R6O0-NEXT: beqzc $9, .LBB14_1
+; MIPS64R6O0-NEXT: ll $8, 0($2)
+; MIPS64R6O0-NEXT: addu $9, $8, $4
+; MIPS64R6O0-NEXT: and $9, $9, $5
+; MIPS64R6O0-NEXT: and $10, $8, $6
+; MIPS64R6O0-NEXT: or $10, $10, $9
+; MIPS64R6O0-NEXT: sc $10, 0($2)
+; MIPS64R6O0-NEXT: beqzc $10, .LBB14_1
; MIPS64R6O0-NEXT: # %bb.2: # %entry
-; MIPS64R6O0-NEXT: and $6, $7, $3
-; MIPS64R6O0-NEXT: srlv $6, $6, $1
-; MIPS64R6O0-NEXT: seh $6, $6
+; MIPS64R6O0-NEXT: and $7, $8, $5
+; MIPS64R6O0-NEXT: srlv $7, $7, $3
+; MIPS64R6O0-NEXT: seh $7, $7
; MIPS64R6O0-NEXT: # %bb.3: # %entry
-; MIPS64R6O0-NEXT: sw $6, 12($sp) # 4-byte Folded Spill
+; MIPS64R6O0-NEXT: sw $7, 12($sp) # 4-byte Folded Spill
; MIPS64R6O0-NEXT: # %bb.4: # %entry
; MIPS64R6O0-NEXT: lw $1, 12($sp) # 4-byte Folded Reload
; MIPS64R6O0-NEXT: seh $2, $1
@@ -6359,33 +6359,33 @@ define {i16, i1} @foo(i16* %addr, i16 %l, i16 %r, i16 %new) {
; MIPS64R6O0-NEXT: sll $3, $5, 0
; MIPS64R6O0-NEXT: addu $2, $3, $2
; MIPS64R6O0-NEXT: sync
-; MIPS64R6O0-NEXT: daddiu $3, $zero, -4
-; MIPS64R6O0-NEXT: and $3, $4, $3
-; MIPS64R6O0-NEXT: andi $4, $4, 3
-; MIPS64R6O0-NEXT: xori $4, $4, 2
-; MIPS64R6O0-NEXT: sll $4, $4, 3
+; MIPS64R6O0-NEXT: daddiu $8, $zero, -4
+; MIPS64R6O0-NEXT: and $8, $4, $8
+; MIPS64R6O0-NEXT: andi $3, $4, 3
+; MIPS64R6O0-NEXT: xori $3, $3, 2
+; MIPS64R6O0-NEXT: sll $3, $3, 3
; MIPS64R6O0-NEXT: ori $5, $zero, 65535
-; MIPS64R6O0-NEXT: sllv $5, $5, $4
+; MIPS64R6O0-NEXT: sllv $5, $5, $3
; MIPS64R6O0-NEXT: nor $6, $zero, $5
; MIPS64R6O0-NEXT: andi $7, $2, 65535
-; MIPS64R6O0-NEXT: sllv $7, $7, $4
+; MIPS64R6O0-NEXT: sllv $7, $7, $3
; MIPS64R6O0-NEXT: andi $1, $1, 65535
-; MIPS64R6O0-NEXT: sllv $1, $1, $4
+; MIPS64R6O0-NEXT: sllv $1, $1, $3
; MIPS64R6O0-NEXT: .LBB15_1: # =>This Inner Loop Header: Depth=1
-; MIPS64R6O0-NEXT: ll $9, 0($3)
-; MIPS64R6O0-NEXT: and $10, $9, $5
-; MIPS64R6O0-NEXT: bnec $10, $7, .LBB15_3
+; MIPS64R6O0-NEXT: ll $10, 0($8)
+; MIPS64R6O0-NEXT: and $11, $10, $5
+; MIPS64R6O0-NEXT: bnec $11, $7, .LBB15_3
; MIPS64R6O0-NEXT: # %bb.2: # in Loop: Header=BB15_1 Depth=1
-; MIPS64R6O0-NEXT: and $9, $9, $6
-; MIPS64R6O0-NEXT: or $9, $9, $1
-; MIPS64R6O0-NEXT: sc $9, 0($3)
-; MIPS64R6O0-NEXT: beqzc $9, .LBB15_1
+; MIPS64R6O0-NEXT: and $10, $10, $6
+; MIPS64R6O0-NEXT: or $10, $10, $1
+; MIPS64R6O0-NEXT: sc $10, 0($8)
+; MIPS64R6O0-NEXT: beqzc $10, .LBB15_1
; MIPS64R6O0-NEXT: .LBB15_3:
-; MIPS64R6O0-NEXT: srlv $8, $10, $4
-; MIPS64R6O0-NEXT: seh $8, $8
+; MIPS64R6O0-NEXT: srlv $9, $11, $3
+; MIPS64R6O0-NEXT: seh $9, $9
; MIPS64R6O0-NEXT: # %bb.4:
; MIPS64R6O0-NEXT: sw $2, 12($sp) # 4-byte Folded Spill
-; MIPS64R6O0-NEXT: sw $8, 8($sp) # 4-byte Folded Spill
+; MIPS64R6O0-NEXT: sw $9, 8($sp) # 4-byte Folded Spill
; MIPS64R6O0-NEXT: # %bb.5:
; MIPS64R6O0-NEXT: lw $1, 12($sp) # 4-byte Folded Reload
; MIPS64R6O0-NEXT: seh $2, $1
@@ -7145,8 +7145,8 @@ define i32 @zeroreg() nounwind {
; MIPS64R6O0-NEXT: sc $6, 0($1)
; MIPS64R6O0-NEXT: beqzc $6, .LBB17_1
; MIPS64R6O0-NEXT: .LBB17_3: # %entry
-; MIPS64R6O0-NEXT: xor $1, $5, $3
-; MIPS64R6O0-NEXT: sltiu $2, $1, 1
+; MIPS64R6O0-NEXT: xor $2, $5, $3
+; MIPS64R6O0-NEXT: sltiu $2, $2, 1
; MIPS64R6O0-NEXT: sync
; MIPS64R6O0-NEXT: jrc $ra
;
diff --git a/llvm/test/CodeGen/Mips/implicit-sret.ll b/llvm/test/CodeGen/Mips/implicit-sret.ll
index b9f6568e40c92..e86cec37d5100 100644
--- a/llvm/test/CodeGen/Mips/implicit-sret.ll
+++ b/llvm/test/CodeGen/Mips/implicit-sret.ll
@@ -48,8 +48,8 @@ define internal { i32, i128, i64 } @implicit_sret_impl() unnamed_addr nounwind {
; CHECK-NEXT: sd $zero, 8($4)
; CHECK-NEXT: daddiu $3, $zero, 30
; CHECK-NEXT: sd $3, 24($4)
-; CHECK-NEXT: addiu $3, $zero, 10
-; CHECK-NEXT: sw $3, 0($4)
+; CHECK-NEXT: addiu $5, $zero, 10
+; CHECK-NEXT: sw $5, 0($4)
; CHECK-NEXT: jr $ra
; CHECK-NEXT: nop
ret { i32, i128, i64 } { i32 10, i128 20, i64 30 }
@@ -70,12 +70,10 @@ define internal void @test2() unnamed_addr nounwind {
; CHECK-NEXT: lw $3, 4($sp)
; CHECK-NEXT: # implicit-def: $a0_64
; CHECK-NEXT: move $4, $3
-; CHECK-NEXT: # implicit-def: $v1_64
-; CHECK-NEXT: move $3, $2
-; CHECK-NEXT: # implicit-def: $v0_64
-; CHECK-NEXT: move $2, $1
-; CHECK-NEXT: move $5, $3
-; CHECK-NEXT: move $6, $2
+; CHECK-NEXT: # implicit-def: $a1_64
+; CHECK-NEXT: move $5, $2
+; CHECK-NEXT: # implicit-def: $a2_64
+; CHECK-NEXT: move $6, $1
; CHECK-NEXT: jal use_sret2
; CHECK-NEXT: nop
; CHECK-NEXT: ld $ra, 24($sp) # 8-byte Folded Reload
diff --git a/llvm/test/CodeGen/PowerPC/addegluecrash.ll b/llvm/test/CodeGen/PowerPC/addegluecrash.ll
index c38f377869f86..a1d9805458368 100644
--- a/llvm/test/CodeGen/PowerPC/addegluecrash.ll
+++ b/llvm/test/CodeGen/PowerPC/addegluecrash.ll
@@ -21,11 +21,11 @@ define void @bn_mul_comba8(i64* nocapture %r, i64* nocapture readonly %a, i64* n
; CHECK-NEXT: addze 5, 5
; CHECK-NEXT: add 4, 5, 4
; CHECK-NEXT: cmpld 7, 4, 5
-; CHECK-NEXT: mfocrf 4, 1
-; CHECK-NEXT: rlwinm 4, 4, 29, 31, 31
-; CHECK-NEXT: # implicit-def: $x5
-; CHECK-NEXT: mr 5, 4
-; CHECK-NEXT: clrldi 4, 5, 32
+; CHECK-NEXT: mfocrf 10, 1
+; CHECK-NEXT: rlwinm 10, 10, 29, 31, 31
+; CHECK-NEXT: # implicit-def: $x4
+; CHECK-NEXT: mr 4, 10
+; CHECK-NEXT: clrldi 4, 4, 32
; CHECK-NEXT: std 4, 0(3)
; CHECK-NEXT: blr
%1 = load i64, i64* %a, align 8
diff --git a/llvm/test/CodeGen/PowerPC/atomics-indexed.ll b/llvm/test/CodeGen/PowerPC/atomics-indexed.ll
index b4790adfd9088..cf7225a5fc200 100644
--- a/llvm/test/CodeGen/PowerPC/atomics-indexed.ll
+++ b/llvm/test/CodeGen/PowerPC/atomics-indexed.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=powerpc-unknown-linux-gnu -verify-machineinstrs -ppc-asm-full-reg-names | FileCheck %s --check-prefix=CHECK --check-prefix=PPC32
; FIXME: -verify-machineinstrs currently fail on ppc64 (mismatched register/instruction).
; This is already checked for in Atomics-64.ll
@@ -8,9 +9,25 @@
; Indexed version of loads
define i8 @load_x_i8_seq_cst([100000 x i8]* %mem) {
-; CHECK-LABEL: load_x_i8_seq_cst
-; CHECK: sync
-; CHECK: lbzx [[VAL:r[0-9]+]]
+; PPC32-LABEL: load_x_i8_seq_cst:
+; PPC32: # %bb.0:
+; PPC32-NEXT: lis r4, 1
+; PPC32-NEXT: sync
+; PPC32-NEXT: ori r4, r4, 24464
+; PPC32-NEXT: lbzx r3, r3, r4
+; PPC32-NEXT: lwsync
+; PPC32-NEXT: blr
+;
+; PPC64-LABEL: load_x_i8_seq_cst:
+; PPC64: # %bb.0:
+; PPC64-NEXT: lis r4, 1
+; PPC64-NEXT: sync
+; PPC64-NEXT: ori r4, r4, 24464
+; PPC64-NEXT: lbzx r3, r3, r4
+; PPC64-NEXT: cmpd cr7, r3, r3
+; PPC64-NEXT: bne- cr7, .+4
+; PPC64-NEXT: isync
+; PPC64-NEXT: blr
; CHECK-PPC32: lwsync
; CHECK-PPC64: cmpw [[CR:cr[0-9]+]], [[VAL]], [[VAL]]
; CHECK-PPC64: bne- [[CR]], .+4
@@ -20,8 +37,23 @@ define i8 @load_x_i8_seq_cst([100000 x i8]* %mem) {
ret i8 %val
}
define i16 @load_x_i16_acquire([100000 x i16]* %mem) {
-; CHECK-LABEL: load_x_i16_acquire
-; CHECK: lhzx [[VAL:r[0-9]+]]
+; PPC32-LABEL: load_x_i16_acquire:
+; PPC32: # %bb.0:
+; PPC32-NEXT: lis r4, 2
+; PPC32-NEXT: ori r4, r4, 48928
+; PPC32-NEXT: lhzx r3, r3, r4
+; PPC32-NEXT: lwsync
+; PPC32-NEXT: blr
+;
+; PPC64-LABEL: load_x_i16_acquire:
+; PPC64: # %bb.0:
+; PPC64-NEXT: lis r4, 2
+; PPC64-NEXT: ori r4, r4, 48928
+; PPC64-NEXT: lhzx r3, r3, r4
+; PPC64-NEXT: cmpd cr7, r3, r3
+; PPC64-NEXT: bne- cr7, .+4
+; PPC64-NEXT: isync
+; PPC64-NEXT: blr
; CHECK-PPC32: lwsync
; CHECK-PPC64: cmpw [[CR:cr[0-9]+]], [[VAL]], [[VAL]]
; CHECK-PPC64: bne- [[CR]], .+4
@@ -31,19 +63,39 @@ define i16 @load_x_i16_acquire([100000 x i16]* %mem) {
ret i16 %val
}
define i32 @load_x_i32_monotonic([100000 x i32]* %mem) {
-; CHECK-LABEL: load_x_i32_monotonic
-; CHECK: lwzx
-; CHECK-NOT: sync
+; CHECK-LABEL: load_x_i32_monotonic:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lis r4, 5
+; CHECK-NEXT: ori r4, r4, 32320
+; CHECK-NEXT: lwzx r3, r3, r4
+; CHECK-NEXT: blr
%ptr = getelementptr inbounds [100000 x i32], [100000 x i32]* %mem, i64 0, i64 90000
%val = load atomic i32, i32* %ptr monotonic, align 4
ret i32 %val
}
define i64 @load_x_i64_unordered([100000 x i64]* %mem) {
-; CHECK-LABEL: load_x_i64_unordered
-; PPC32: __sync_
-; PPC64-NOT: __sync_
-; PPC64: ldx
-; CHECK-NOT: sync
+; PPC32-LABEL: load_x_i64_unordered:
+; PPC32: # %bb.0:
+; PPC32-NEXT: mflr r0
+; PPC32-NEXT: stw r0, 4(r1)
+; PPC32-NEXT: stwu r1, -16(r1)
+; PPC32-NEXT: .cfi_def_cfa_offset 16
+; PPC32-NEXT: .cfi_offset lr, 4
+; PPC32-NEXT: addi r3, r3, -896
+; PPC32-NEXT: addis r3, r3, 11
+; PPC32-NEXT: li r4, 0
+; PPC32-NEXT: bl __atomic_load_8
+; PPC32-NEXT: lwz r0, 20(r1)
+; PPC32-NEXT: addi r1, r1, 16
+; PPC32-NEXT: mtlr r0
+; PPC32-NEXT: blr
+;
+; PPC64-LABEL: load_x_i64_unordered:
+; PPC64: # %bb.0:
+; PPC64-NEXT: lis r4, 10
+; PPC64-NEXT: ori r4, r4, 64640
+; PPC64-NEXT: ldx r3, r3, r4
+; PPC64-NEXT: blr
%ptr = getelementptr inbounds [100000 x i64], [100000 x i64]* %mem, i64 0, i64 90000
%val = load atomic i64, i64* %ptr unordered, align 8
ret i64 %val
@@ -51,35 +103,69 @@ define i64 @load_x_i64_unordered([100000 x i64]* %mem) {
; Indexed version of stores
define void @store_x_i8_seq_cst([100000 x i8]* %mem) {
-; CHECK-LABEL: store_x_i8_seq_cst
-; CHECK: sync
-; CHECK: stbx
+; CHECK-LABEL: store_x_i8_seq_cst:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lis r4, 1
+; CHECK-NEXT: ori r4, r4, 24464
+; CHECK-NEXT: li r5, 42
+; CHECK-NEXT: sync
+; CHECK-NEXT: stbx r5, r3, r4
+; CHECK-NEXT: blr
%ptr = getelementptr inbounds [100000 x i8], [100000 x i8]* %mem, i64 0, i64 90000
store atomic i8 42, i8* %ptr seq_cst, align 1
ret void
}
define void @store_x_i16_release([100000 x i16]* %mem) {
-; CHECK-LABEL: store_x_i16_release
-; CHECK: lwsync
-; CHECK: sthx
+; CHECK-LABEL: store_x_i16_release:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lis r4, 2
+; CHECK-NEXT: ori r4, r4, 48928
+; CHECK-NEXT: li r5, 42
+; CHECK-NEXT: lwsync
+; CHECK-NEXT: sthx r5, r3, r4
+; CHECK-NEXT: blr
%ptr = getelementptr inbounds [100000 x i16], [100000 x i16]* %mem, i64 0, i64 90000
store atomic i16 42, i16* %ptr release, align 2
ret void
}
define void @store_x_i32_monotonic([100000 x i32]* %mem) {
-; CHECK-LABEL: store_x_i32_monotonic
-; CHECK-NOT: sync
-; CHECK: stwx
+; CHECK-LABEL: store_x_i32_monotonic:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lis r4, 5
+; CHECK-NEXT: ori r4, r4, 32320
+; CHECK-NEXT: li r5, 42
+; CHECK-NEXT: stwx r5, r3, r4
+; CHECK-NEXT: blr
%ptr = getelementptr inbounds [100000 x i32], [100000 x i32]* %mem, i64 0, i64 90000
store atomic i32 42, i32* %ptr monotonic, align 4
ret void
}
define void @store_x_i64_unordered([100000 x i64]* %mem) {
-; CHECK-LABEL: store_x_i64_unordered
-; CHECK-NOT: sync
-; PPC32: __sync_
-; PPC64-NOT: __sync_
-; PPC64: stdx
+; PPC32-LABEL: store_x_i64_unordered:
+; PPC32: # %bb.0:
+; PPC32-NEXT: mflr r0
+; PPC32-NEXT: stw r0, 4(r1)
+; PPC32-NEXT: stwu r1, -16(r1)
+; PPC32-NEXT: .cfi_def_cfa_offset 16
+; PPC32-NEXT: .cfi_offset lr, 4
+; PPC32-NEXT: addi r3, r3, -896
+; PPC32-NEXT: addis r3, r3, 11
+; PPC32-NEXT: li r5, 0
+; PPC32-NEXT: li r6, 42
+; PPC32-NEXT: li r7, 0
+; PPC32-NEXT: bl __atomic_store_8
+; PPC32-NEXT: lwz r0, 20(r1)
+; PPC32-NEXT: addi r1, r1, 16
+; PPC32-NEXT: mtlr r0
+; PPC32-NEXT: blr
+;
+; PPC64-LABEL: store_x_i64_unordered:
+; PPC64: # %bb.0:
+; PPC64-NEXT: lis r4, 10
+; PPC64-NEXT: ori r4, r4, 64640
+; PPC64-NEXT: li r5, 42
+; PPC64-NEXT: stdx r5, r3, r4
+; PPC64-NEXT: blr
%ptr = getelementptr inbounds [100000 x i64], [100000 x i64]* %mem, i64 0, i64 90000
store atomic i64 42, i64* %ptr unordered, align 8
ret void
diff --git a/llvm/test/CodeGen/PowerPC/atomics.ll b/llvm/test/CodeGen/PowerPC/atomics.ll
index c964218cb60bf..008cd4c7157c1 100644
--- a/llvm/test/CodeGen/PowerPC/atomics.ll
+++ b/llvm/test/CodeGen/PowerPC/atomics.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --force-update
; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc-unknown-linux-gnu -verify-machineinstrs -ppc-asm-full-reg-names | FileCheck %s --check-prefix=CHECK --check-prefix=PPC32
; This is already checked for in Atomics-64.ll
; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -ppc-asm-full-reg-names | FileCheck %s --check-prefix=CHECK --check-prefix=PPC64
@@ -9,22 +10,35 @@
; We first check loads, for all sizes from i8 to i64.
; We also vary orderings to check for barriers.
define i8 @load_i8_unordered(i8* %mem) {
-; CHECK-LABEL: load_i8_unordered
-; CHECK: lbz
-; CHECK-NOT: sync
+; CHECK-LABEL: load_i8_unordered:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lbz r3, 0(r3)
+; CHECK-NEXT: blr
%val = load atomic i8, i8* %mem unordered, align 1
ret i8 %val
}
define i16 @load_i16_monotonic(i16* %mem) {
-; CHECK-LABEL: load_i16_monotonic
-; CHECK: lhz
-; CHECK-NOT: sync
+; CHECK-LABEL: load_i16_monotonic:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lhz r3, 0(r3)
+; CHECK-NEXT: blr
%val = load atomic i16, i16* %mem monotonic, align 2
ret i16 %val
}
define i32 @load_i32_acquire(i32* %mem) {
-; CHECK-LABEL: load_i32_acquire
-; CHECK: lwz [[VAL:r[0-9]+]]
+; PPC32-LABEL: load_i32_acquire:
+; PPC32: # %bb.0:
+; PPC32-NEXT: lwz r3, 0(r3)
+; PPC32-NEXT: lwsync
+; PPC32-NEXT: blr
+;
+; PPC64-LABEL: load_i32_acquire:
+; PPC64: # %bb.0:
+; PPC64-NEXT: lwz r3, 0(r3)
+; PPC64-NEXT: cmpd cr7, r3, r3
+; PPC64-NEXT: bne- cr7, .+4
+; PPC64-NEXT: isync
+; PPC64-NEXT: blr
%val = load atomic i32, i32* %mem acquire, align 4
; CHECK-PPC32: lwsync
; CHECK-PPC64: cmpw [[CR:cr[0-9]+]], [[VAL]], [[VAL]]
@@ -33,11 +47,28 @@ define i32 @load_i32_acquire(i32* %mem) {
ret i32 %val
}
define i64 @load_i64_seq_cst(i64* %mem) {
-; CHECK-LABEL: load_i64_seq_cst
-; CHECK: sync
-; PPC32: __sync_
-; PPC64-NOT: __sync_
-; PPC64: ld [[VAL:r[0-9]+]]
+; PPC32-LABEL: load_i64_seq_cst:
+; PPC32: # %bb.0:
+; PPC32-NEXT: mflr r0
+; PPC32-NEXT: stw r0, 4(r1)
+; PPC32-NEXT: stwu r1, -16(r1)
+; PPC32-NEXT: .cfi_def_cfa_offset 16
+; PPC32-NEXT: .cfi_offset lr, 4
+; PPC32-NEXT: li r4, 5
+; PPC32-NEXT: bl __atomic_load_8
+; PPC32-NEXT: lwz r0, 20(r1)
+; PPC32-NEXT: addi r1, r1, 16
+; PPC32-NEXT: mtlr r0
+; PPC32-NEXT: blr
+;
+; PPC64-LABEL: load_i64_seq_cst:
+; PPC64: # %bb.0:
+; PPC64-NEXT: sync
+; PPC64-NEXT: ld r3, 0(r3)
+; PPC64-NEXT: cmpd cr7, r3, r3
+; PPC64-NEXT: bne- cr7, .+4
+; PPC64-NEXT: isync
+; PPC64-NEXT: blr
%val = load atomic i64, i64* %mem seq_cst, align 8
; CHECK-PPC32: lwsync
; CHECK-PPC64: cmpw [[CR:cr[0-9]+]], [[VAL]], [[VAL]]
@@ -48,95 +79,401 @@ define i64 @load_i64_seq_cst(i64* %mem) {
; Stores
define void @store_i8_unordered(i8* %mem) {
-; CHECK-LABEL: store_i8_unordered
-; CHECK-NOT: sync
-; CHECK: stb
+; CHECK-LABEL: store_i8_unordered:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li r4, 42
+; CHECK-NEXT: stb r4, 0(r3)
+; CHECK-NEXT: blr
store atomic i8 42, i8* %mem unordered, align 1
ret void
}
define void @store_i16_monotonic(i16* %mem) {
-; CHECK-LABEL: store_i16_monotonic
-; CHECK-NOT: sync
-; CHECK: sth
+; CHECK-LABEL: store_i16_monotonic:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li r4, 42
+; CHECK-NEXT: sth r4, 0(r3)
+; CHECK-NEXT: blr
store atomic i16 42, i16* %mem monotonic, align 2
ret void
}
define void @store_i32_release(i32* %mem) {
-; CHECK-LABEL: store_i32_release
-; CHECK: lwsync
-; CHECK: stw
+; CHECK-LABEL: store_i32_release:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li r4, 42
+; CHECK-NEXT: lwsync
+; CHECK-NEXT: stw r4, 0(r3)
+; CHECK-NEXT: blr
store atomic i32 42, i32* %mem release, align 4
ret void
}
define void @store_i64_seq_cst(i64* %mem) {
-; CHECK-LABEL: store_i64_seq_cst
-; CHECK: sync
-; PPC32: __sync_
-; PPC64-NOT: __sync_
-; PPC64: std
+; PPC32-LABEL: store_i64_seq_cst:
+; PPC32: # %bb.0:
+; PPC32-NEXT: mflr r0
+; PPC32-NEXT: stw r0, 4(r1)
+; PPC32-NEXT: stwu r1, -16(r1)
+; PPC32-NEXT: .cfi_def_cfa_offset 16
+; PPC32-NEXT: .cfi_offset lr, 4
+; PPC32-NEXT: li r5, 0
+; PPC32-NEXT: li r6, 42
+; PPC32-NEXT: li r7, 5
+; PPC32-NEXT: bl __atomic_store_8
+; PPC32-NEXT: lwz r0, 20(r1)
+; PPC32-NEXT: addi r1, r1, 16
+; PPC32-NEXT: mtlr r0
+; PPC32-NEXT: blr
+;
+; PPC64-LABEL: store_i64_seq_cst:
+; PPC64: # %bb.0:
+; PPC64-NEXT: li r4, 42
+; PPC64-NEXT: sync
+; PPC64-NEXT: std r4, 0(r3)
+; PPC64-NEXT: blr
store atomic i64 42, i64* %mem seq_cst, align 8
ret void
}
; Atomic CmpXchg
define i8 @cas_strong_i8_sc_sc(i8* %mem) {
-; CHECK-LABEL: cas_strong_i8_sc_sc
-; CHECK: sync
+; PPC32-LABEL: cas_strong_i8_sc_sc:
+; PPC32: # %bb.0:
+; PPC32-NEXT: rlwinm r8, r3, 3, 27, 28
+; PPC32-NEXT: li r5, 1
+; PPC32-NEXT: li r6, 0
+; PPC32-NEXT: li r7, 255
+; PPC32-NEXT: rlwinm r4, r3, 0, 0, 29
+; PPC32-NEXT: xori r3, r8, 24
+; PPC32-NEXT: slw r5, r5, r3
+; PPC32-NEXT: slw r8, r6, r3
+; PPC32-NEXT: slw r6, r7, r3
+; PPC32-NEXT: and r7, r5, r6
+; PPC32-NEXT: and r8, r8, r6
+; PPC32-NEXT: sync
+; PPC32-NEXT: .LBB8_1:
+; PPC32-NEXT: lwarx r9, 0, r4
+; PPC32-NEXT: and r5, r9, r6
+; PPC32-NEXT: cmpw r5, r8
+; PPC32-NEXT: bne cr0, .LBB8_3
+; PPC32-NEXT: # %bb.2:
+; PPC32-NEXT: andc r9, r9, r6
+; PPC32-NEXT: or r9, r9, r7
+; PPC32-NEXT: stwcx. r9, 0, r4
+; PPC32-NEXT: bne cr0, .LBB8_1
+; PPC32-NEXT: b .LBB8_4
+; PPC32-NEXT: .LBB8_3:
+; PPC32-NEXT: stwcx. r9, 0, r4
+; PPC32-NEXT: .LBB8_4:
+; PPC32-NEXT: srw r3, r5, r3
+; PPC32-NEXT: lwsync
+; PPC32-NEXT: blr
+;
+; PPC64-LABEL: cas_strong_i8_sc_sc:
+; PPC64: # %bb.0:
+; PPC64-NEXT: rlwinm r8, r3, 3, 27, 28
+; PPC64-NEXT: li r5, 1
+; PPC64-NEXT: li r6, 0
+; PPC64-NEXT: li r7, 255
+; PPC64-NEXT: rldicr r4, r3, 0, 61
+; PPC64-NEXT: xori r3, r8, 24
+; PPC64-NEXT: slw r5, r5, r3
+; PPC64-NEXT: slw r8, r6, r3
+; PPC64-NEXT: slw r6, r7, r3
+; PPC64-NEXT: and r7, r5, r6
+; PPC64-NEXT: and r8, r8, r6
+; PPC64-NEXT: sync
+; PPC64-NEXT: .LBB8_1:
+; PPC64-NEXT: lwarx r9, 0, r4
+; PPC64-NEXT: and r5, r9, r6
+; PPC64-NEXT: cmpw r5, r8
+; PPC64-NEXT: bne cr0, .LBB8_3
+; PPC64-NEXT: # %bb.2:
+; PPC64-NEXT: andc r9, r9, r6
+; PPC64-NEXT: or r9, r9, r7
+; PPC64-NEXT: stwcx. r9, 0, r4
+; PPC64-NEXT: bne cr0, .LBB8_1
+; PPC64-NEXT: b .LBB8_4
+; PPC64-NEXT: .LBB8_3:
+; PPC64-NEXT: stwcx. r9, 0, r4
+; PPC64-NEXT: .LBB8_4:
+; PPC64-NEXT: srw r3, r5, r3
+; PPC64-NEXT: lwsync
+; PPC64-NEXT: blr
%val = cmpxchg i8* %mem, i8 0, i8 1 seq_cst seq_cst
-; CHECK: lwsync
%loaded = extractvalue { i8, i1} %val, 0
ret i8 %loaded
}
define i16 @cas_weak_i16_acquire_acquire(i16* %mem) {
-; CHECK-LABEL: cas_weak_i16_acquire_acquire
-;CHECK-NOT: sync
+; PPC32-LABEL: cas_weak_i16_acquire_acquire:
+; PPC32: # %bb.0:
+; PPC32-NEXT: li r6, 0
+; PPC32-NEXT: rlwinm r4, r3, 3, 27, 27
+; PPC32-NEXT: li r5, 1
+; PPC32-NEXT: ori r7, r6, 65535
+; PPC32-NEXT: xori r4, r4, 16
+; PPC32-NEXT: slw r8, r5, r4
+; PPC32-NEXT: slw r9, r6, r4
+; PPC32-NEXT: slw r5, r7, r4
+; PPC32-NEXT: rlwinm r3, r3, 0, 0, 29
+; PPC32-NEXT: and r6, r8, r5
+; PPC32-NEXT: and r8, r9, r5
+; PPC32-NEXT: .LBB9_1:
+; PPC32-NEXT: lwarx r9, 0, r3
+; PPC32-NEXT: and r7, r9, r5
+; PPC32-NEXT: cmpw r7, r8
+; PPC32-NEXT: bne cr0, .LBB9_3
+; PPC32-NEXT: # %bb.2:
+; PPC32-NEXT: andc r9, r9, r5
+; PPC32-NEXT: or r9, r9, r6
+; PPC32-NEXT: stwcx. r9, 0, r3
+; PPC32-NEXT: bne cr0, .LBB9_1
+; PPC32-NEXT: b .LBB9_4
+; PPC32-NEXT: .LBB9_3:
+; PPC32-NEXT: stwcx. r9, 0, r3
+; PPC32-NEXT: .LBB9_4:
+; PPC32-NEXT: srw r3, r7, r4
+; PPC32-NEXT: lwsync
+; PPC32-NEXT: blr
+;
+; PPC64-LABEL: cas_weak_i16_acquire_acquire:
+; PPC64: # %bb.0:
+; PPC64-NEXT: li r6, 0
+; PPC64-NEXT: rlwinm r4, r3, 3, 27, 27
+; PPC64-NEXT: li r5, 1
+; PPC64-NEXT: ori r7, r6, 65535
+; PPC64-NEXT: xori r4, r4, 16
+; PPC64-NEXT: slw r8, r5, r4
+; PPC64-NEXT: slw r9, r6, r4
+; PPC64-NEXT: slw r5, r7, r4
+; PPC64-NEXT: rldicr r3, r3, 0, 61
+; PPC64-NEXT: and r6, r8, r5
+; PPC64-NEXT: and r8, r9, r5
+; PPC64-NEXT: .LBB9_1:
+; PPC64-NEXT: lwarx r9, 0, r3
+; PPC64-NEXT: and r7, r9, r5
+; PPC64-NEXT: cmpw r7, r8
+; PPC64-NEXT: bne cr0, .LBB9_3
+; PPC64-NEXT: # %bb.2:
+; PPC64-NEXT: andc r9, r9, r5
+; PPC64-NEXT: or r9, r9, r6
+; PPC64-NEXT: stwcx. r9, 0, r3
+; PPC64-NEXT: bne cr0, .LBB9_1
+; PPC64-NEXT: b .LBB9_4
+; PPC64-NEXT: .LBB9_3:
+; PPC64-NEXT: stwcx. r9, 0, r3
+; PPC64-NEXT: .LBB9_4:
+; PPC64-NEXT: srw r3, r7, r4
+; PPC64-NEXT: lwsync
+; PPC64-NEXT: blr
%val = cmpxchg weak i16* %mem, i16 0, i16 1 acquire acquire
-; CHECK: lwsync
%loaded = extractvalue { i16, i1} %val, 0
ret i16 %loaded
}
define i32 @cas_strong_i32_acqrel_acquire(i32* %mem) {
-; CHECK-LABEL: cas_strong_i32_acqrel_acquire
-; CHECK: lwsync
+; CHECK-LABEL: cas_strong_i32_acqrel_acquire:
+; CHECK: # %bb.0:
+; CHECK-NEXT: li r5, 1
+; CHECK-NEXT: li r6, 0
+; CHECK-NEXT: lwsync
+; CHECK-NEXT: .LBB10_1:
+; CHECK-NEXT: lwarx r4, 0, r3
+; CHECK-NEXT: cmpw r6, r4
+; CHECK-NEXT: bne cr0, .LBB10_3
+; CHECK-NEXT: # %bb.2:
+; CHECK-NEXT: stwcx. r5, 0, r3
+; CHECK-NEXT: bne cr0, .LBB10_1
+; CHECK-NEXT: b .LBB10_4
+; CHECK-NEXT: .LBB10_3:
+; CHECK-NEXT: stwcx. r4, 0, r3
+; CHECK-NEXT: .LBB10_4:
+; CHECK-NEXT: mr r3, r4
+; CHECK-NEXT: lwsync
+; CHECK-NEXT: blr
%val = cmpxchg i32* %mem, i32 0, i32 1 acq_rel acquire
-; CHECK: lwsync
%loaded = extractvalue { i32, i1} %val, 0
ret i32 %loaded
}
define i64 @cas_weak_i64_release_monotonic(i64* %mem) {
-; CHECK-LABEL: cas_weak_i64_release_monotonic
-; CHECK: lwsync
+; PPC32-LABEL: cas_weak_i64_release_monotonic:
+; PPC32: # %bb.0:
+; PPC32-NEXT: mflr r0
+; PPC32-NEXT: stw r0, 4(r1)
+; PPC32-NEXT: stwu r1, -16(r1)
+; PPC32-NEXT: .cfi_def_cfa_offset 16
+; PPC32-NEXT: .cfi_offset lr, 4
+; PPC32-NEXT: li r4, 0
+; PPC32-NEXT: stw r4, 12(r1)
+; PPC32-NEXT: li r5, 0
+; PPC32-NEXT: stw r4, 8(r1)
+; PPC32-NEXT: addi r4, r1, 8
+; PPC32-NEXT: li r6, 1
+; PPC32-NEXT: li r7, 3
+; PPC32-NEXT: li r8, 0
+; PPC32-NEXT: bl __atomic_compare_exchange_8
+; PPC32-NEXT: lwz r4, 12(r1)
+; PPC32-NEXT: lwz r3, 8(r1)
+; PPC32-NEXT: lwz r0, 20(r1)
+; PPC32-NEXT: addi r1, r1, 16
+; PPC32-NEXT: mtlr r0
+; PPC32-NEXT: blr
+;
+; PPC64-LABEL: cas_weak_i64_release_monotonic:
+; PPC64: # %bb.0:
+; PPC64-NEXT: li r5, 1
+; PPC64-NEXT: li r6, 0
+; PPC64-NEXT: lwsync
+; PPC64-NEXT: .LBB11_1:
+; PPC64-NEXT: ldarx r4, 0, r3
+; PPC64-NEXT: cmpd r6, r4
+; PPC64-NEXT: bne cr0, .LBB11_4
+; PPC64-NEXT: # %bb.2:
+; PPC64-NEXT: stdcx. r5, 0, r3
+; PPC64-NEXT: bne cr0, .LBB11_1
+; PPC64-NEXT: # %bb.3:
+; PPC64-NEXT: mr r3, r4
+; PPC64-NEXT: blr
+; PPC64-NEXT: .LBB11_4:
+; PPC64-NEXT: stdcx. r4, 0, r3
+; PPC64-NEXT: mr r3, r4
+; PPC64-NEXT: blr
%val = cmpxchg weak i64* %mem, i64 0, i64 1 release monotonic
-; CHECK-NOT: [sync ]
%loaded = extractvalue { i64, i1} %val, 0
ret i64 %loaded
}
; AtomicRMW
define i8 @add_i8_monotonic(i8* %mem, i8 %operand) {
-; CHECK-LABEL: add_i8_monotonic
-; CHECK-NOT: sync
+; PPC32-LABEL: add_i8_monotonic:
+; PPC32: # %bb.0:
+; PPC32-NEXT: rlwinm r7, r3, 3, 27, 28
+; PPC32-NEXT: li r6, 255
+; PPC32-NEXT: rlwinm r5, r3, 0, 0, 29
+; PPC32-NEXT: xori r3, r7, 24
+; PPC32-NEXT: slw r4, r4, r3
+; PPC32-NEXT: slw r6, r6, r3
+; PPC32-NEXT: .LBB12_1:
+; PPC32-NEXT: lwarx r7, 0, r5
+; PPC32-NEXT: add r8, r4, r7
+; PPC32-NEXT: andc r9, r7, r6
+; PPC32-NEXT: and r8, r8, r6
+; PPC32-NEXT: or r8, r8, r9
+; PPC32-NEXT: stwcx. r8, 0, r5
+; PPC32-NEXT: bne cr0, .LBB12_1
+; PPC32-NEXT: # %bb.2:
+; PPC32-NEXT: srw r3, r7, r3
+; PPC32-NEXT: blr
+;
+; PPC64-LABEL: add_i8_monotonic:
+; PPC64: # %bb.0:
+; PPC64-NEXT: rlwinm r7, r3, 3, 27, 28
+; PPC64-NEXT: li r6, 255
+; PPC64-NEXT: rldicr r5, r3, 0, 61
+; PPC64-NEXT: xori r3, r7, 24
+; PPC64-NEXT: slw r4, r4, r3
+; PPC64-NEXT: slw r6, r6, r3
+; PPC64-NEXT: .LBB12_1:
+; PPC64-NEXT: lwarx r7, 0, r5
+; PPC64-NEXT: add r8, r4, r7
+; PPC64-NEXT: andc r9, r7, r6
+; PPC64-NEXT: and r8, r8, r6
+; PPC64-NEXT: or r8, r8, r9
+; PPC64-NEXT: stwcx. r8, 0, r5
+; PPC64-NEXT: bne cr0, .LBB12_1
+; PPC64-NEXT: # %bb.2:
+; PPC64-NEXT: srw r3, r7, r3
+; PPC64-NEXT: blr
%val = atomicrmw add i8* %mem, i8 %operand monotonic
ret i8 %val
}
define i16 @xor_i16_seq_cst(i16* %mem, i16 %operand) {
-; CHECK-LABEL: xor_i16_seq_cst
-; CHECK: sync
+; PPC32-LABEL: xor_i16_seq_cst:
+; PPC32: # %bb.0:
+; PPC32-NEXT: li r6, 0
+; PPC32-NEXT: rlwinm r7, r3, 3, 27, 27
+; PPC32-NEXT: rlwinm r5, r3, 0, 0, 29
+; PPC32-NEXT: ori r6, r6, 65535
+; PPC32-NEXT: xori r3, r7, 16
+; PPC32-NEXT: slw r4, r4, r3
+; PPC32-NEXT: slw r6, r6, r3
+; PPC32-NEXT: sync
+; PPC32-NEXT: .LBB13_1:
+; PPC32-NEXT: lwarx r7, 0, r5
+; PPC32-NEXT: xor r8, r4, r7
+; PPC32-NEXT: andc r9, r7, r6
+; PPC32-NEXT: and r8, r8, r6
+; PPC32-NEXT: or r8, r8, r9
+; PPC32-NEXT: stwcx. r8, 0, r5
+; PPC32-NEXT: bne cr0, .LBB13_1
+; PPC32-NEXT: # %bb.2:
+; PPC32-NEXT: srw r3, r7, r3
+; PPC32-NEXT: lwsync
+; PPC32-NEXT: blr
+;
+; PPC64-LABEL: xor_i16_seq_cst:
+; PPC64: # %bb.0:
+; PPC64-NEXT: li r6, 0
+; PPC64-NEXT: rlwinm r7, r3, 3, 27, 27
+; PPC64-NEXT: rldicr r5, r3, 0, 61
+; PPC64-NEXT: ori r6, r6, 65535
+; PPC64-NEXT: xori r3, r7, 16
+; PPC64-NEXT: slw r4, r4, r3
+; PPC64-NEXT: slw r6, r6, r3
+; PPC64-NEXT: sync
+; PPC64-NEXT: .LBB13_1:
+; PPC64-NEXT: lwarx r7, 0, r5
+; PPC64-NEXT: xor r8, r4, r7
+; PPC64-NEXT: andc r9, r7, r6
+; PPC64-NEXT: and r8, r8, r6
+; PPC64-NEXT: or r8, r8, r9
+; PPC64-NEXT: stwcx. r8, 0, r5
+; PPC64-NEXT: bne cr0, .LBB13_1
+; PPC64-NEXT: # %bb.2:
+; PPC64-NEXT: srw r3, r7, r3
+; PPC64-NEXT: lwsync
+; PPC64-NEXT: blr
%val = atomicrmw xor i16* %mem, i16 %operand seq_cst
-; CHECK: lwsync
ret i16 %val
}
define i32 @xchg_i32_acq_rel(i32* %mem, i32 %operand) {
-; CHECK-LABEL: xchg_i32_acq_rel
-; CHECK: lwsync
+; CHECK-LABEL: xchg_i32_acq_rel:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lwsync
+; CHECK-NEXT: .LBB14_1:
+; CHECK-NEXT: lwarx r5, 0, r3
+; CHECK-NEXT: stwcx. r4, 0, r3
+; CHECK-NEXT: bne cr0, .LBB14_1
+; CHECK-NEXT: # %bb.2:
+; CHECK-NEXT: mr r3, r5
+; CHECK-NEXT: lwsync
+; CHECK-NEXT: blr
%val = atomicrmw xchg i32* %mem, i32 %operand acq_rel
-; CHECK: lwsync
ret i32 %val
}
define i64 @and_i64_release(i64* %mem, i64 %operand) {
-; CHECK-LABEL: and_i64_release
-; CHECK: lwsync
+; PPC32-LABEL: and_i64_release:
+; PPC32: # %bb.0:
+; PPC32-NEXT: mflr r0
+; PPC32-NEXT: stw r0, 4(r1)
+; PPC32-NEXT: stwu r1, -16(r1)
+; PPC32-NEXT: .cfi_def_cfa_offset 16
+; PPC32-NEXT: .cfi_offset lr, 4
+; PPC32-NEXT: li r7, 3
+; PPC32-NEXT: bl __atomic_fetch_and_8
+; PPC32-NEXT: lwz r0, 20(r1)
+; PPC32-NEXT: addi r1, r1, 16
+; PPC32-NEXT: mtlr r0
+; PPC32-NEXT: blr
+;
+; PPC64-LABEL: and_i64_release:
+; PPC64: # %bb.0:
+; PPC64-NEXT: lwsync
+; PPC64-NEXT: .LBB15_1:
+; PPC64-NEXT: ldarx r5, 0, r3
+; PPC64-NEXT: and r6, r4, r5
+; PPC64-NEXT: stdcx. r6, 0, r3
+; PPC64-NEXT: bne cr0, .LBB15_1
+; PPC64-NEXT: # %bb.2:
+; PPC64-NEXT: mr r3, r5
+; PPC64-NEXT: blr
%val = atomicrmw and i64* %mem, i64 %operand release
-; CHECK-NOT: [sync ]
ret i64 %val
}
diff --git a/llvm/test/CodeGen/PowerPC/fneg.ll b/llvm/test/CodeGen/PowerPC/fneg.ll
index 328ffecd17624..aea34e216d644 100644
--- a/llvm/test/CodeGen/PowerPC/fneg.ll
+++ b/llvm/test/CodeGen/PowerPC/fneg.ll
@@ -39,3 +39,20 @@ define float @fma_fneg_fsub(float %x, float %y0, float %y1, float %z) {
%r = call float @llvm.fmuladd.f32(float %negx, float %negy, float %z)
ret float %r
}
+
+; Verify that we didn't hit assertion for this case.
+define double @fneg_no_ice(float %x) {
+; CHECK-LABEL: fneg_no_ice:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lis r3, .LCPI3_0@ha
+; CHECK-NEXT: lfs f0, .LCPI3_0@l(r3)
+; CHECK-NEXT: fsubs f0, f0, f1
+; CHECK-NEXT: fmul f1, f0, f0
+; CHECK-NEXT: fmul f1, f0, f1
+; CHECK-NEXT: blr
+ %y = fsub fast float 1.0, %x
+ %e = fpext float %y to double
+ %e2 = fmul double %e, %e
+ %e3 = fmul double %e, %e2
+ ret double %e3
+}
diff --git a/llvm/test/CodeGen/PowerPC/jump-tables-collapse-rotate-remove-SrcMI.mir b/llvm/test/CodeGen/PowerPC/jump-tables-collapse-rotate-remove-SrcMI.mir
index 7c14e7750df90..2f7a85a111ebb 100644
--- a/llvm/test/CodeGen/PowerPC/jump-tables-collapse-rotate-remove-SrcMI.mir
+++ b/llvm/test/CodeGen/PowerPC/jump-tables-collapse-rotate-remove-SrcMI.mir
@@ -51,4 +51,4 @@ body: |
#
# CHECK-PASS-NOT: %2:g8rc = RLDICL killed %1, 0, 32
# CHECK-PASS-NOT: %3:g8rc = RLDICR %2, 2, 61
-# CHECK-PASS: %3:g8rc = RLDIC %1, 2, 30
+# CHECK-PASS: %3:g8rc = RLDIC killed %1, 2, 30
diff --git a/llvm/test/CodeGen/PowerPC/mi-peephole.mir b/llvm/test/CodeGen/PowerPC/mi-peephole.mir
index 8bf72461d5453..c7f41cd0bc4c9 100644
--- a/llvm/test/CodeGen/PowerPC/mi-peephole.mir
+++ b/llvm/test/CodeGen/PowerPC/mi-peephole.mir
@@ -31,7 +31,7 @@ body: |
; CHECK: bb.0.entry:
; CHECK: %1:g8rc = COPY $x4
; CHECK: %0:g8rc = COPY $x3
- ; CHECK: %3:g8rc = RLDIC %1, 2, 30
+ ; CHECK: %3:g8rc = RLDIC killed %1, 2, 30
; CHECK: $x3 = COPY %3
; CHECK: BLR8 implicit $lr8, implicit $rm, implicit $x3
...
diff --git a/llvm/test/CodeGen/PowerPC/popcount.ll b/llvm/test/CodeGen/PowerPC/popcount.ll
index fb20f1d3ee43b..170d3d77d0886 100644
--- a/llvm/test/CodeGen/PowerPC/popcount.ll
+++ b/llvm/test/CodeGen/PowerPC/popcount.ll
@@ -58,17 +58,17 @@ define <1 x i128> @popcount1x128(<1 x i128> %0) {
; CHECK-NEXT: # kill: def $f0 killed $f0 killed $vsl0
; CHECK-NEXT: mffprd 3, 0
; CHECK-NEXT: popcntd 3, 3
-; CHECK-NEXT: xxswapd 0, 34
-; CHECK-NEXT: # kill: def $f0 killed $f0 killed $vsl0
-; CHECK-NEXT: mffprd 4, 0
+; CHECK-NEXT: xxswapd 1, 34
+; CHECK-NEXT: # kill: def $f1 killed $f1 killed $vsl1
+; CHECK-NEXT: mffprd 4, 1
; CHECK-NEXT: popcntd 4, 4
; CHECK-NEXT: add 3, 4, 3
; CHECK-NEXT: mtfprd 0, 3
-; CHECK-NEXT: # kill: def $vsl0 killed $f0
+; CHECK-NEXT: fmr 2, 0
; CHECK-NEXT: li 3, 0
-; CHECK-NEXT: mtfprd 1, 3
-; CHECK-NEXT: # kill: def $vsl1 killed $f1
-; CHECK-NEXT: xxmrghd 34, 1, 0
+; CHECK-NEXT: mtfprd 0, 3
+; CHECK-NEXT: fmr 3, 0
+; CHECK-NEXT: xxmrghd 34, 3, 2
; CHECK-NEXT: blr
Entry:
%1 = tail call <1 x i128> @llvm.ctpop.v1.i128(<1 x i128> %0)
diff --git a/llvm/test/CodeGen/PowerPC/pr46923.ll b/llvm/test/CodeGen/PowerPC/pr46923.ll
new file mode 100644
index 0000000000000..3e9faa60422af
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/pr46923.ll
@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-unknown \
+; RUN: -ppc-asm-full-reg-names < %s | FileCheck %s
+
+@bar = external constant i64, align 8
+
+define i1 @foo() {
+; CHECK-LABEL: foo:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: li r3, 0
+; CHECK-NEXT: isel r3, 0, r3, 4*cr5+lt
+; CHECK-NEXT: blr
+entry:
+ br label %next
+
+next:
+ br i1 undef, label %true, label %false
+
+true:
+ br label %end
+
+false:
+ br label %end
+
+end:
+ %a = phi i1 [ icmp ugt (i64 0, i64 ptrtoint (i64* @bar to i64)), %true ],
+ [ icmp ugt (i64 0, i64 2), %false ]
+ ret i1 %a
+}
diff --git a/llvm/test/CodeGen/PowerPC/pr47373.ll b/llvm/test/CodeGen/PowerPC/pr47373.ll
new file mode 100644
index 0000000000000..d09a5fe8fb0b6
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/pr47373.ll
@@ -0,0 +1,180 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=powerpc64-unknown-freebsd13.0 -verify-machineinstrs \
+; RUN: -mcpu=ppc64 -ppc-asm-full-reg-names < %s | FileCheck %s
+@a = local_unnamed_addr global float* null, align 8
+
+; Function Attrs: nounwind
+define void @d() local_unnamed_addr #0 {
+; CHECK-LABEL: d:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: mflr r0
+; CHECK-NEXT: std r0, 16(r1)
+; CHECK-NEXT: stdu r1, -208(r1)
+; CHECK-NEXT: addis r3, r2, .LC0@toc@ha
+; CHECK-NEXT: std r29, 184(r1) # 8-byte Folded Spill
+; CHECK-NEXT: ld r3, .LC0@toc@l(r3)
+; CHECK-NEXT: std r30, 192(r1) # 8-byte Folded Spill
+; CHECK-NEXT: ld r29, 0(r3)
+; CHECK-NEXT: bl c
+; CHECK-NEXT: nop
+; CHECK-NEXT: mr r30, r3
+; CHECK-NEXT: bl b
+; CHECK-NEXT: nop
+; CHECK-NEXT: cmpwi r30, 1
+; CHECK-NEXT: blt cr0, .LBB0_9
+; CHECK-NEXT: # %bb.1: # %for.body.preheader
+; CHECK-NEXT: cmplwi r30, 4
+; CHECK-NEXT: clrldi r4, r30, 32
+; CHECK-NEXT: li r5, 0
+; CHECK-NEXT: blt cr0, .LBB0_7
+; CHECK-NEXT: # %bb.2: # %vector.memcheck
+; CHECK-NEXT: rldic r6, r30, 2, 30
+; CHECK-NEXT: add r7, r3, r6
+; CHECK-NEXT: cmpld r29, r7
+; CHECK-NEXT: add r6, r29, r6
+; CHECK-NEXT: bc 4, lt, .LBB0_4
+; CHECK-NEXT: # %bb.3: # %vector.memcheck
+; CHECK-NEXT: cmpld r3, r6
+; CHECK-NEXT: bc 12, lt, .LBB0_7
+; CHECK-NEXT: .LBB0_4: # %vector.ph
+; CHECK-NEXT: rlwinm r5, r4, 0, 0, 29
+; CHECK-NEXT: li r7, 15
+; CHECK-NEXT: addi r6, r5, -4
+; CHECK-NEXT: addi r8, r1, 144
+; CHECK-NEXT: rldicl r6, r6, 62, 2
+; CHECK-NEXT: addi r9, r1, 128
+; CHECK-NEXT: addi r6, r6, 1
+; CHECK-NEXT: addi r10, r1, 160
+; CHECK-NEXT: mtctr r6
+; CHECK-NEXT: li r6, 0
+; CHECK-NEXT: addi r11, r1, 112
+; CHECK-NEXT: .LBB0_5: # %vector.body
+; CHECK-NEXT: #
+; CHECK-NEXT: add r12, r3, r6
+; CHECK-NEXT: lvx v3, r3, r6
+; CHECK-NEXT: lvx v5, r12, r7
+; CHECK-NEXT: add r12, r29, r6
+; CHECK-NEXT: lvsl v2, r3, r6
+; CHECK-NEXT: vperm v2, v3, v5, v2
+; CHECK-NEXT: lvx v3, r29, r6
+; CHECK-NEXT: lvx v5, r12, r7
+; CHECK-NEXT: lvsl v4, r29, r6
+; CHECK-NEXT: stvx v2, 0, r8
+; CHECK-NEXT: vperm v2, v3, v5, v4
+; CHECK-NEXT: stvx v2, 0, r9
+; CHECK-NEXT: lfs f0, 156(r1)
+; CHECK-NEXT: lfs f1, 140(r1)
+; CHECK-NEXT: fdivs f0, f1, f0
+; CHECK-NEXT: lfs f1, 136(r1)
+; CHECK-NEXT: stfs f0, 172(r1)
+; CHECK-NEXT: lfs f0, 152(r1)
+; CHECK-NEXT: fdivs f0, f1, f0
+; CHECK-NEXT: lfs f1, 132(r1)
+; CHECK-NEXT: stfs f0, 168(r1)
+; CHECK-NEXT: lfs f0, 148(r1)
+; CHECK-NEXT: fdivs f0, f1, f0
+; CHECK-NEXT: lfs f1, 128(r1)
+; CHECK-NEXT: stfs f0, 164(r1)
+; CHECK-NEXT: lfs f0, 144(r1)
+; CHECK-NEXT: fdivs f0, f1, f0
+; CHECK-NEXT: stfs f0, 160(r1)
+; CHECK-NEXT: lvx v2, 0, r10
+; CHECK-NEXT: stvx v2, 0, r11
+; CHECK-NEXT: ld r0, 112(r1)
+; CHECK-NEXT: stdx r0, r29, r6
+; CHECK-NEXT: addi r6, r6, 16
+; CHECK-NEXT: ld r0, 120(r1)
+; CHECK-NEXT: std r0, 8(r12)
+; CHECK-NEXT: bdnz .LBB0_5
+; CHECK-NEXT: # %bb.6: # %middle.block
+; CHECK-NEXT: cmpld r5, r4
+; CHECK-NEXT: beq cr0, .LBB0_9
+; CHECK-NEXT: .LBB0_7: # %for.body.preheader18
+; CHECK-NEXT: sldi r6, r5, 2
+; CHECK-NEXT: sub r5, r4, r5
+; CHECK-NEXT: addi r6, r6, -4
+; CHECK-NEXT: add r3, r3, r6
+; CHECK-NEXT: add r4, r29, r6
+; CHECK-NEXT: mtctr r5
+; CHECK-NEXT: .LBB0_8: # %for.body
+; CHECK-NEXT: #
+; CHECK-NEXT: lfsu f0, 4(r4)
+; CHECK-NEXT: lfsu f1, 4(r3)
+; CHECK-NEXT: fdivs f0, f0, f1
+; CHECK-NEXT: stfs f0, 0(r4)
+; CHECK-NEXT: bdnz .LBB0_8
+; CHECK-NEXT: .LBB0_9: # %for.end
+; CHECK-NEXT: ld r30, 192(r1) # 8-byte Folded Reload
+; CHECK-NEXT: ld r29, 184(r1) # 8-byte Folded Reload
+; CHECK-NEXT: addi r1, r1, 208
+; CHECK-NEXT: ld r0, 16(r1)
+; CHECK-NEXT: mtlr r0
+; CHECK-NEXT: blr
+entry:
+ %0 = load float*, float** @a, align 8
+ %call = call signext i32 bitcast (i32 (...)* @c to i32 ()*)() #2
+ %call1 = call float* bitcast (float* (...)* @b to float* ()*)() #2
+ %cmp11 = icmp sgt i32 %call, 0
+ br i1 %cmp11, label %for.body.preheader, label %for.end
+
+for.body.preheader: ; preds = %entry
+ %wide.trip.count = zext i32 %call to i64
+ %min.iters.check = icmp ult i32 %call, 4
+ br i1 %min.iters.check, label %for.body.preheader18, label %vector.memcheck
+
+vector.memcheck: ; preds = %for.body.preheader
+ %scevgep = getelementptr float, float* %0, i64 %wide.trip.count
+ %scevgep15 = getelementptr float, float* %call1, i64 %wide.trip.count
+ %bound0 = icmp ult float* %0, %scevgep15
+ %bound1 = icmp ult float* %call1, %scevgep
+ %found.conflict = and i1 %bound0, %bound1
+ br i1 %found.conflict, label %for.body.preheader18, label %vector.ph
+
+vector.ph: ; preds = %vector.memcheck
+ %n.vec = and i64 %wide.trip.count, 4294967292
+ br label %vector.body
+
+vector.body: ; preds = %vector.body, %vector.ph
+ %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
+ %1 = getelementptr inbounds float, float* %call1, i64 %index
+ %2 = bitcast float* %1 to <4 x float>*
+ %wide.load = load <4 x float>, <4 x float>* %2, align 4
+ %3 = getelementptr inbounds float, float* %0, i64 %index
+ %4 = bitcast float* %3 to <4 x float>*
+ %wide.load17 = load <4 x float>, <4 x float>* %4, align 4
+ %5 = fdiv reassoc nsz arcp afn <4 x float> %wide.load17, %wide.load
+ %6 = bitcast float* %3 to <4 x float>*
+ store <4 x float> %5, <4 x float>* %6, align 4
+ %index.next = add i64 %index, 4
+ %7 = icmp eq i64 %index.next, %n.vec
+ br i1 %7, label %middle.block, label %vector.body
+
+middle.block: ; preds = %vector.body
+ %cmp.n = icmp eq i64 %n.vec, %wide.trip.count
+ br i1 %cmp.n, label %for.end, label %for.body.preheader18
+
+for.body.preheader18: ; preds = %middle.block, %vector.memcheck, %for.body.preheader
+ %indvars.iv.ph = phi i64 [ 0, %vector.memcheck ], [ 0, %for.body.preheader ], [ %n.vec, %middle.block ]
+ br label %for.body
+
+for.body: ; preds = %for.body.preheader18, %for.body
+ %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %indvars.iv.ph, %for.body.preheader18 ]
+ %arrayidx = getelementptr inbounds float, float* %call1, i64 %indvars.iv
+ %8 = load float, float* %arrayidx, align 4
+ %arrayidx3 = getelementptr inbounds float, float* %0, i64 %indvars.iv
+ %9 = load float, float* %arrayidx3, align 4
+ %div = fdiv reassoc nsz arcp afn float %9, %8
+ store float %div, float* %arrayidx3, align 4
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+ %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
+ br i1 %exitcond.not, label %for.end, label %for.body
+
+for.end: ; preds = %for.body, %middle.block, %entry
+ ret void
+}
+
+declare signext i32 @c(...) local_unnamed_addr #1
+
+declare float* @b(...) local_unnamed_addr #1
+
+attributes #0 = { nounwind }
diff --git a/llvm/test/CodeGen/PowerPC/setcc-vector.ll b/llvm/test/CodeGen/PowerPC/setcc-vector.ll
new file mode 100644
index 0000000000000..5917ccabf84ed
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/setcc-vector.ll
@@ -0,0 +1,49 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown \
+; RUN: -ppc-asm-full-reg-names < %s | FileCheck -check-prefixes=CHECK-PWR9 %s
+; RUN: llc -verify-machineinstrs -mcpu=pwr8 -mtriple=powerpc64le-unknown-unknown \
+; RUN: -ppc-asm-full-reg-names < %s | FileCheck -check-prefixes=CHECK-PWR8 %s
+; RUN: llc -verify-machineinstrs -mcpu=pwr7 -mtriple=powerpc64-unknown-unknown \
+; RUN: -ppc-asm-full-reg-names < %s | FileCheck -check-prefixes=CHECK-PWR7 %s
+
+define <1 x i64> @setcc_v1i128(<1 x i128> %a) {
+; CHECK-PWR9-LABEL: setcc_v1i128:
+; CHECK-PWR9: # %bb.0: # %entry
+; CHECK-PWR9-NEXT: mfvsrld r3, vs34
+; CHECK-PWR9-NEXT: cmpldi r3, 35708
+; CHECK-PWR9-NEXT: mfvsrd r3, vs34
+; CHECK-PWR9-NEXT: cmpdi cr1, r3, 0
+; CHECK-PWR9-NEXT: li r3, 1
+; CHECK-PWR9-NEXT: crnand 4*cr5+lt, 4*cr1+eq, lt
+; CHECK-PWR9-NEXT: isel r3, 0, r3, 4*cr5+lt
+; CHECK-PWR9-NEXT: blr
+;
+; CHECK-PWR8-LABEL: setcc_v1i128:
+; CHECK-PWR8: # %bb.0: # %entry
+; CHECK-PWR8-NEXT: xxswapd vs0, vs34
+; CHECK-PWR8-NEXT: mfvsrd r3, vs34
+; CHECK-PWR8-NEXT: cmpdi r3, 0
+; CHECK-PWR8-NEXT: li r3, 1
+; CHECK-PWR8-NEXT: mffprd r4, f0
+; CHECK-PWR8-NEXT: cmpldi cr1, r4, 35708
+; CHECK-PWR8-NEXT: crnand 4*cr5+lt, eq, 4*cr1+lt
+; CHECK-PWR8-NEXT: isel r3, 0, r3, 4*cr5+lt
+; CHECK-PWR8-NEXT: blr
+;
+; CHECK-PWR7-LABEL: setcc_v1i128:
+; CHECK-PWR7: # %bb.0: # %entry
+; CHECK-PWR7-NEXT: li r5, 0
+; CHECK-PWR7-NEXT: cntlzd r3, r3
+; CHECK-PWR7-NEXT: ori r5, r5, 35708
+; CHECK-PWR7-NEXT: rldicl r3, r3, 58, 63
+; CHECK-PWR7-NEXT: subc r5, r4, r5
+; CHECK-PWR7-NEXT: subfe r4, r4, r4
+; CHECK-PWR7-NEXT: neg r4, r4
+; CHECK-PWR7-NEXT: and r3, r3, r4
+; CHECK-PWR7-NEXT: blr
+entry:
+ %0 = icmp ult <1 x i128> %a,
+ %1 = zext <1 x i1> %0 to <1 x i64>
+ ret <1 x i64> %1
+}
+
diff --git a/llvm/test/CodeGen/PowerPC/vsx.ll b/llvm/test/CodeGen/PowerPC/vsx.ll
index 4a78218262ca0..39469d63b9078 100644
--- a/llvm/test/CodeGen/PowerPC/vsx.ll
+++ b/llvm/test/CodeGen/PowerPC/vsx.ll
@@ -1548,8 +1548,8 @@ define <2 x i64> @test46(<2 x float> %a) {
; CHECK-FISL-NEXT: ld r3, -24(r1)
; CHECK-FISL-NEXT: std r3, -16(r1)
; CHECK-FISL-NEXT: addi r3, r1, -16
-; CHECK-FISL-NEXT: lxvd2x vs0, 0, r3
-; CHECK-FISL-NEXT: xxlor v2, vs0, vs0
+; CHECK-FISL-NEXT: lxvd2x vs1, 0, r3
+; CHECK-FISL-NEXT: xxlor v2, vs1, vs1
; CHECK-FISL-NEXT: blr
;
; CHECK-LE-LABEL: test46:
@@ -1616,8 +1616,8 @@ define <2 x i64> @test47(<2 x float> %a) {
; CHECK-FISL-NEXT: ld r3, -24(r1)
; CHECK-FISL-NEXT: std r3, -16(r1)
; CHECK-FISL-NEXT: addi r3, r1, -16
-; CHECK-FISL-NEXT: lxvd2x vs0, 0, r3
-; CHECK-FISL-NEXT: xxlor v2, vs0, vs0
+; CHECK-FISL-NEXT: lxvd2x vs1, 0, r3
+; CHECK-FISL-NEXT: xxlor v2, vs1, vs1
; CHECK-FISL-NEXT: blr
;
; CHECK-LE-LABEL: test47:
@@ -1859,13 +1859,13 @@ define <2 x i64> @test60(<2 x i64> %a, <2 x i64> %b) {
; CHECK-FISL-NEXT: stxvd2x v3, 0, r3
; CHECK-FISL-NEXT: addi r3, r1, -48
; CHECK-FISL-NEXT: stxvd2x v2, 0, r3
-; CHECK-FISL-NEXT: lwz r3, -20(r1)
-; CHECK-FISL-NEXT: ld r4, -40(r1)
-; CHECK-FISL-NEXT: sld r3, r4, r3
+; CHECK-FISL-NEXT: lwz r4, -20(r1)
+; CHECK-FISL-NEXT: ld r3, -40(r1)
+; CHECK-FISL-NEXT: sld r3, r3, r4
; CHECK-FISL-NEXT: std r3, -8(r1)
-; CHECK-FISL-NEXT: lwz r3, -28(r1)
-; CHECK-FISL-NEXT: ld r4, -48(r1)
-; CHECK-FISL-NEXT: sld r3, r4, r3
+; CHECK-FISL-NEXT: lwz r4, -28(r1)
+; CHECK-FISL-NEXT: ld r3, -48(r1)
+; CHECK-FISL-NEXT: sld r3, r3, r4
; CHECK-FISL-NEXT: std r3, -16(r1)
; CHECK-FISL-NEXT: addi r3, r1, -16
; CHECK-FISL-NEXT: lxvd2x vs0, 0, r3
@@ -1925,13 +1925,13 @@ define <2 x i64> @test61(<2 x i64> %a, <2 x i64> %b) {
; CHECK-FISL-NEXT: stxvd2x v3, 0, r3
; CHECK-FISL-NEXT: addi r3, r1, -48
; CHECK-FISL-NEXT: stxvd2x v2, 0, r3
-; CHECK-FISL-NEXT: lwz r3, -20(r1)
-; CHECK-FISL-NEXT: ld r4, -40(r1)
-; CHECK-FISL-NEXT: srd r3, r4, r3
+; CHECK-FISL-NEXT: lwz r4, -20(r1)
+; CHECK-FISL-NEXT: ld r3, -40(r1)
+; CHECK-FISL-NEXT: srd r3, r3, r4
; CHECK-FISL-NEXT: std r3, -8(r1)
-; CHECK-FISL-NEXT: lwz r3, -28(r1)
-; CHECK-FISL-NEXT: ld r4, -48(r1)
-; CHECK-FISL-NEXT: srd r3, r4, r3
+; CHECK-FISL-NEXT: lwz r4, -28(r1)
+; CHECK-FISL-NEXT: ld r3, -48(r1)
+; CHECK-FISL-NEXT: srd r3, r3, r4
; CHECK-FISL-NEXT: std r3, -16(r1)
; CHECK-FISL-NEXT: addi r3, r1, -16
; CHECK-FISL-NEXT: lxvd2x vs0, 0, r3
@@ -1991,13 +1991,13 @@ define <2 x i64> @test62(<2 x i64> %a, <2 x i64> %b) {
; CHECK-FISL-NEXT: stxvd2x v3, 0, r3
; CHECK-FISL-NEXT: addi r3, r1, -48
; CHECK-FISL-NEXT: stxvd2x v2, 0, r3
-; CHECK-FISL-NEXT: lwz r3, -20(r1)
-; CHECK-FISL-NEXT: ld r4, -40(r1)
-; CHECK-FISL-NEXT: srad r3, r4, r3
+; CHECK-FISL-NEXT: lwz r4, -20(r1)
+; CHECK-FISL-NEXT: ld r3, -40(r1)
+; CHECK-FISL-NEXT: srad r3, r3, r4
; CHECK-FISL-NEXT: std r3, -8(r1)
-; CHECK-FISL-NEXT: lwz r3, -28(r1)
-; CHECK-FISL-NEXT: ld r4, -48(r1)
-; CHECK-FISL-NEXT: srad r3, r4, r3
+; CHECK-FISL-NEXT: lwz r4, -28(r1)
+; CHECK-FISL-NEXT: ld r3, -48(r1)
+; CHECK-FISL-NEXT: srad r3, r3, r4
; CHECK-FISL-NEXT: std r3, -16(r1)
; CHECK-FISL-NEXT: addi r3, r1, -16
; CHECK-FISL-NEXT: lxvd2x vs0, 0, r3
@@ -2426,12 +2426,12 @@ define <2 x i32> @test80(i32 %v) {
; CHECK-FISL: # %bb.0:
; CHECK-FISL-NEXT: # kill: def $r3 killed $r3 killed $x3
; CHECK-FISL-NEXT: stw r3, -16(r1)
-; CHECK-FISL-NEXT: addi r3, r1, -16
-; CHECK-FISL-NEXT: lxvw4x vs0, 0, r3
+; CHECK-FISL-NEXT: addi r4, r1, -16
+; CHECK-FISL-NEXT: lxvw4x vs0, 0, r4
; CHECK-FISL-NEXT: xxspltw v2, vs0, 0
-; CHECK-FISL-NEXT: addis r3, r2, .LCPI65_0@toc@ha
-; CHECK-FISL-NEXT: addi r3, r3, .LCPI65_0@toc@l
-; CHECK-FISL-NEXT: lxvw4x v3, 0, r3
+; CHECK-FISL-NEXT: addis r4, r2, .LCPI65_0@toc@ha
+; CHECK-FISL-NEXT: addi r4, r4, .LCPI65_0@toc@l
+; CHECK-FISL-NEXT: lxvw4x v3, 0, r4
; CHECK-FISL-NEXT: vadduwm v2, v2, v3
; CHECK-FISL-NEXT: blr
;
diff --git a/llvm/test/CodeGen/SPARC/fp16-promote.ll b/llvm/test/CodeGen/SPARC/fp16-promote.ll
index 0c402430dadc1..9709322f48a57 100644
--- a/llvm/test/CodeGen/SPARC/fp16-promote.ll
+++ b/llvm/test/CodeGen/SPARC/fp16-promote.ll
@@ -182,11 +182,11 @@ define void @test_fptrunc_double(double %d, half* %p) nounwind {
; V8-UNOPT-NEXT: std %i4, [%fp+-8]
; V8-UNOPT-NEXT: ldd [%fp+-8], %f0
; V8-UNOPT-NEXT: std %f0, [%fp+-16]
-; V8-UNOPT-NEXT: ldd [%fp+-16], %i0
-; V8-UNOPT-NEXT: mov %i0, %i3
-; V8-UNOPT-NEXT: ! kill: def $i1 killed $i1 killed $i0_i1
-; V8-UNOPT-NEXT: mov %i3, %o0
-; V8-UNOPT-NEXT: mov %i1, %o1
+; V8-UNOPT-NEXT: ldd [%fp+-16], %i4
+; V8-UNOPT-NEXT: mov %i4, %i0
+; V8-UNOPT-NEXT: ! kill: def $i5 killed $i5 killed $i4_i5
+; V8-UNOPT-NEXT: mov %i0, %o0
+; V8-UNOPT-NEXT: mov %i5, %o1
; V8-UNOPT-NEXT: call __truncdfhf2
; V8-UNOPT-NEXT: st %i2, [%fp+-20]
; V8-UNOPT-NEXT: ld [%fp+-20], %i0 ! 4-byte Folded Reload
diff --git a/llvm/test/CodeGen/WebAssembly/pr47375.ll b/llvm/test/CodeGen/WebAssembly/pr47375.ll
new file mode 100644
index 0000000000000..4c04631f26b11
--- /dev/null
+++ b/llvm/test/CodeGen/WebAssembly/pr47375.ll
@@ -0,0 +1,36 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s | FileCheck %s
+
+target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
+target triple = "wasm32-unknown-unknown"
+
+; Regression test for pr47375, in which an assertion was triggering
+; because WebAssemblyTargetLowering::isVectorLoadExtDesirable was
+; improperly assuming the use of simple value types.
+
+define void @sext_vec() {
+; CHECK-LABEL: sext_vec:
+; CHECK: .functype sext_vec () -> ()
+; CHECK-NEXT: .local i32
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: local.get 0
+; CHECK-NEXT: i32.load8_u 0
+; CHECK-NEXT: local.set 0
+; CHECK-NEXT: local.get 0
+; CHECK-NEXT: i32.const 0
+; CHECK-NEXT: i32.store8 0
+; CHECK-NEXT: local.get 0
+; CHECK-NEXT: local.get 0
+; CHECK-NEXT: local.get 0
+; CHECK-NEXT: i32.const 7
+; CHECK-NEXT: i32.shl
+; CHECK-NEXT: i32.or
+; CHECK-NEXT: i32.const 7175
+; CHECK-NEXT: i32.and
+; CHECK-NEXT: i32.store16 0
+; CHECK-NEXT: # fallthrough-return
+ %L1 = load <2 x i3>, <2 x i3>* undef, align 2
+ %zext = zext <2 x i3> %L1 to <2 x i10>
+ store <2 x i10> %zext, <2 x i10>* undef, align 4
+ ret void
+}
diff --git a/llvm/test/CodeGen/X86/2008-03-12-ThreadLocalAlias.ll b/llvm/test/CodeGen/X86/2008-03-12-ThreadLocalAlias.ll
index 89d249c091786..2ca003e052aa6 100644
--- a/llvm/test/CodeGen/X86/2008-03-12-ThreadLocalAlias.ll
+++ b/llvm/test/CodeGen/X86/2008-03-12-ThreadLocalAlias.ll
@@ -12,7 +12,7 @@ target triple = "i386-pc-linux-gnu"
define i32 @foo() {
; CHECK-LABEL: foo:
-; CHECK: leal .L__libc_resp$local@TLSLDM
+; CHECK: leal __libc_resp@TLSLD
entry:
%retval = alloca i32 ; [#uses=1]
%"alloca point" = bitcast i32 0 to i32 ; [#uses=0]
@@ -27,7 +27,7 @@ return: ; preds = %entry
define i32 @bar() {
; CHECK-LABEL: bar:
-; CHECK: leal .L__libc_resp$local@TLSLDM
+; CHECK: leal __libc_resp@TLSLD
entry:
%retval = alloca i32 ; [#uses=1]
%"alloca point" = bitcast i32 0 to i32 ; [#uses=0]
diff --git a/llvm/test/CodeGen/X86/2009-04-14-IllegalRegs.ll b/llvm/test/CodeGen/X86/2009-04-14-IllegalRegs.ll
index b5635c7e0f067..48ad2a2c07770 100644
--- a/llvm/test/CodeGen/X86/2009-04-14-IllegalRegs.ll
+++ b/llvm/test/CodeGen/X86/2009-04-14-IllegalRegs.ll
@@ -8,34 +8,34 @@
define i32 @z() nounwind ssp {
; CHECK-LABEL: z:
; CHECK: ## %bb.0: ## %entry
+; CHECK-NEXT: pushl %ebx
; CHECK-NEXT: pushl %edi
; CHECK-NEXT: pushl %esi
-; CHECK-NEXT: subl $148, %esp
+; CHECK-NEXT: subl $144, %esp
; CHECK-NEXT: movl L___stack_chk_guard$non_lazy_ptr, %eax
; CHECK-NEXT: movl (%eax), %eax
; CHECK-NEXT: movl %eax, {{[0-9]+}}(%esp)
; CHECK-NEXT: movb $48, {{[0-9]+}}(%esp)
-; CHECK-NEXT: movb {{[0-9]+}}(%esp), %al
-; CHECK-NEXT: movb %al, {{[0-9]+}}(%esp)
+; CHECK-NEXT: movb {{[0-9]+}}(%esp), %cl
+; CHECK-NEXT: movb %cl, {{[0-9]+}}(%esp)
; CHECK-NEXT: movb $15, {{[0-9]+}}(%esp)
; CHECK-NEXT: movl %esp, %eax
-; CHECK-NEXT: movl $8, %ecx
-; CHECK-NEXT: leal {{[0-9]+}}(%esp), %edx
-; CHECK-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill
+; CHECK-NEXT: movl $8, %edx
+; CHECK-NEXT: leal {{[0-9]+}}(%esp), %esi
+; CHECK-NEXT: movl %edx, %ecx
; CHECK-NEXT: movl %eax, %edi
-; CHECK-NEXT: movl %edx, %esi
+; CHECK-NEXT: movl %esi, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill
; CHECK-NEXT: rep;movsl (%esi), %es:(%edi)
; CHECK-NEXT: movl %eax, %ecx
; CHECK-NEXT: addl $36, %ecx
-; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %esi ## 4-byte Reload
; CHECK-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) ## 4-byte Spill
-; CHECK-NEXT: movl %esi, %ecx
+; CHECK-NEXT: movl %edx, %ecx
; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edi ## 4-byte Reload
-; CHECK-NEXT: movl %edx, %esi
+; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %esi ## 4-byte Reload
; CHECK-NEXT: rep;movsl (%esi), %es:(%edi)
-; CHECK-NEXT: movb {{[0-9]+}}(%esp), %cl
-; CHECK-NEXT: movb %cl, 32(%eax)
-; CHECK-NEXT: movb %cl, 68(%eax)
+; CHECK-NEXT: movb {{[0-9]+}}(%esp), %bl
+; CHECK-NEXT: movb %bl, 32(%eax)
+; CHECK-NEXT: movb %bl, 68(%eax)
; CHECK-NEXT: calll _f
; CHECK-NEXT: movl %eax, {{[0-9]+}}(%esp)
; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
@@ -50,9 +50,10 @@ define i32 @z() nounwind ssp {
; CHECK-NEXT: jne LBB0_3
; CHECK-NEXT: ## %bb.2: ## %SP_return
; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax ## 4-byte Reload
-; CHECK-NEXT: addl $148, %esp
+; CHECK-NEXT: addl $144, %esp
; CHECK-NEXT: popl %esi
; CHECK-NEXT: popl %edi
+; CHECK-NEXT: popl %ebx
; CHECK-NEXT: retl
; CHECK-NEXT: LBB0_3: ## %CallStackCheckFailBlk
; CHECK-NEXT: calll ___stack_chk_fail
diff --git a/llvm/test/CodeGen/X86/atomic-unordered.ll b/llvm/test/CodeGen/X86/atomic-unordered.ll
index 7a1f34c65c183..16fde4074ea0e 100644
--- a/llvm/test/CodeGen/X86/atomic-unordered.ll
+++ b/llvm/test/CodeGen/X86/atomic-unordered.ll
@@ -126,8 +126,8 @@ define void @narrow_writeback_and(i64* %ptr) {
; CHECK-O0-NEXT: movq (%rdi), %rax
; CHECK-O0-NEXT: # kill: def $eax killed $eax killed $rax
; CHECK-O0-NEXT: andl $-256, %eax
-; CHECK-O0-NEXT: # kill: def $rax killed $eax
-; CHECK-O0-NEXT: movq %rax, (%rdi)
+; CHECK-O0-NEXT: movl %eax, %ecx
+; CHECK-O0-NEXT: movq %rcx, (%rdi)
; CHECK-O0-NEXT: retq
;
; CHECK-O3-LABEL: narrow_writeback_and:
@@ -231,10 +231,10 @@ define i128 @load_i128(i128* %ptr) {
; CHECK-O0-NEXT: .cfi_def_cfa_offset 16
; CHECK-O0-NEXT: .cfi_offset %rbx, -16
; CHECK-O0-NEXT: xorl %eax, %eax
-; CHECK-O0-NEXT: # kill: def $rax killed $eax
-; CHECK-O0-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-O0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdx # 8-byte Reload
-; CHECK-O0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rcx # 8-byte Reload
+; CHECK-O0-NEXT: movl %eax, %ecx
+; CHECK-O0-NEXT: movq %rcx, %rax
+; CHECK-O0-NEXT: movq %rcx, %rdx
+; CHECK-O0-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; CHECK-O0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rbx # 8-byte Reload
; CHECK-O0-NEXT: lock cmpxchg16b (%rdi)
; CHECK-O0-NEXT: popq %rbx
@@ -326,14 +326,14 @@ define i256 @load_i256(i256* %ptr) {
; CHECK-O0-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; CHECK-O0-NEXT: callq __atomic_load
; CHECK-O0-NEXT: movq {{[0-9]+}}(%rsp), %rax
-; CHECK-O0-NEXT: movq {{[0-9]+}}(%rsp), %rcx
; CHECK-O0-NEXT: movq {{[0-9]+}}(%rsp), %rdx
; CHECK-O0-NEXT: movq {{[0-9]+}}(%rsp), %rsi
-; CHECK-O0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdi # 8-byte Reload
-; CHECK-O0-NEXT: movq %rsi, 24(%rdi)
-; CHECK-O0-NEXT: movq %rdx, 16(%rdi)
-; CHECK-O0-NEXT: movq %rcx, 8(%rdi)
-; CHECK-O0-NEXT: movq %rax, (%rdi)
+; CHECK-O0-NEXT: movq {{[0-9]+}}(%rsp), %rdi
+; CHECK-O0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r9 # 8-byte Reload
+; CHECK-O0-NEXT: movq %rdi, 24(%r9)
+; CHECK-O0-NEXT: movq %rsi, 16(%r9)
+; CHECK-O0-NEXT: movq %rdx, 8(%r9)
+; CHECK-O0-NEXT: movq %rax, (%r9)
; CHECK-O0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rax # 8-byte Reload
; CHECK-O0-NEXT: addq $56, %rsp
; CHECK-O0-NEXT: .cfi_def_cfa_offset 8
@@ -831,8 +831,8 @@ define i64 @load_fold_udiv1(i64* %p) {
; CHECK-O0-NEXT: movq (%rdi), %rax
; CHECK-O0-NEXT: xorl %ecx, %ecx
; CHECK-O0-NEXT: movl %ecx, %edx
-; CHECK-O0-NEXT: movl $15, %ecx
-; CHECK-O0-NEXT: divq %rcx
+; CHECK-O0-NEXT: movl $15, %esi
+; CHECK-O0-NEXT: divq %rsi
; CHECK-O0-NEXT: retq
;
; CHECK-O3-CUR-LABEL: load_fold_udiv1:
@@ -1024,8 +1024,8 @@ define i64 @load_fold_urem1(i64* %p) {
; CHECK-O0-NEXT: movq (%rdi), %rax
; CHECK-O0-NEXT: xorl %ecx, %ecx
; CHECK-O0-NEXT: movl %ecx, %edx
-; CHECK-O0-NEXT: movl $15, %ecx
-; CHECK-O0-NEXT: divq %rcx
+; CHECK-O0-NEXT: movl $15, %esi
+; CHECK-O0-NEXT: divq %rsi
; CHECK-O0-NEXT: movq %rdx, %rax
; CHECK-O0-NEXT: retq
;
@@ -1475,9 +1475,9 @@ define i1 @load_fold_icmp3(i64* %p1, i64* %p2) {
; CHECK-O0-NEXT: movq (%rdi), %rax
; CHECK-O0-NEXT: movq (%rsi), %rcx
; CHECK-O0-NEXT: subq %rcx, %rax
-; CHECK-O0-NEXT: sete %cl
+; CHECK-O0-NEXT: sete %dl
; CHECK-O0-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-O0-NEXT: movb %cl, %al
+; CHECK-O0-NEXT: movb %dl, %al
; CHECK-O0-NEXT: retq
;
; CHECK-O3-CUR-LABEL: load_fold_icmp3:
@@ -2076,8 +2076,8 @@ define void @rmw_fold_and1(i64* %p, i64 %v) {
; CHECK-O0-NEXT: movq (%rdi), %rax
; CHECK-O0-NEXT: # kill: def $eax killed $eax killed $rax
; CHECK-O0-NEXT: andl $15, %eax
-; CHECK-O0-NEXT: # kill: def $rax killed $eax
-; CHECK-O0-NEXT: movq %rax, (%rdi)
+; CHECK-O0-NEXT: movl %eax, %ecx
+; CHECK-O0-NEXT: movq %rcx, (%rdi)
; CHECK-O0-NEXT: retq
;
; CHECK-O3-LABEL: rmw_fold_and1:
@@ -2541,8 +2541,9 @@ define i16 @load_i8_anyext_i16(i8* %ptr) {
; CHECK-O0-CUR-LABEL: load_i8_anyext_i16:
; CHECK-O0-CUR: # %bb.0:
; CHECK-O0-CUR-NEXT: movb (%rdi), %al
-; CHECK-O0-CUR-NEXT: movzbl %al, %eax
-; CHECK-O0-CUR-NEXT: # kill: def $ax killed $ax killed $eax
+; CHECK-O0-CUR-NEXT: movzbl %al, %ecx
+; CHECK-O0-CUR-NEXT: # kill: def $cx killed $cx killed $ecx
+; CHECK-O0-CUR-NEXT: movw %cx, %ax
; CHECK-O0-CUR-NEXT: retq
;
; CHECK-O3-CUR-LABEL: load_i8_anyext_i16:
@@ -2670,12 +2671,13 @@ define i16 @load_combine(i8* %p) {
; CHECK-O0: # %bb.0:
; CHECK-O0-NEXT: movb (%rdi), %al
; CHECK-O0-NEXT: movb 1(%rdi), %cl
-; CHECK-O0-NEXT: movzbl %al, %eax
-; CHECK-O0-NEXT: # kill: def $ax killed $ax killed $eax
-; CHECK-O0-NEXT: movzbl %cl, %ecx
-; CHECK-O0-NEXT: # kill: def $cx killed $cx killed $ecx
-; CHECK-O0-NEXT: shlw $8, %cx
-; CHECK-O0-NEXT: orw %cx, %ax
+; CHECK-O0-NEXT: movzbl %al, %edx
+; CHECK-O0-NEXT: # kill: def $dx killed $dx killed $edx
+; CHECK-O0-NEXT: movzbl %cl, %esi
+; CHECK-O0-NEXT: # kill: def $si killed $si killed $esi
+; CHECK-O0-NEXT: shlw $8, %si
+; CHECK-O0-NEXT: orw %si, %dx
+; CHECK-O0-NEXT: movw %dx, %ax
; CHECK-O0-NEXT: retq
;
; CHECK-O3-LABEL: load_combine:
diff --git a/llvm/test/CodeGen/X86/atomic32.ll b/llvm/test/CodeGen/X86/atomic32.ll
index 3fe5ef8311ce7..4fb03356f99f4 100644
--- a/llvm/test/CodeGen/X86/atomic32.ll
+++ b/llvm/test/CodeGen/X86/atomic32.ll
@@ -70,8 +70,8 @@ define void @atomic_fetch_and32() nounwind {
; X64-NEXT: movl %eax, %ecx
; X64-NEXT: andl $5, %ecx
; X64-NEXT: lock cmpxchgl %ecx, {{.*}}(%rip)
-; X64-NEXT: sete %cl
-; X64-NEXT: testb $1, %cl
+; X64-NEXT: sete %dl
+; X64-NEXT: testb $1, %dl
; X64-NEXT: movl %eax, %ecx
; X64-NEXT: movl %ecx, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
; X64-NEXT: movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
@@ -94,8 +94,8 @@ define void @atomic_fetch_and32() nounwind {
; X86-NEXT: movl %eax, %ecx
; X86-NEXT: andl $5, %ecx
; X86-NEXT: lock cmpxchgl %ecx, sc32
-; X86-NEXT: sete %cl
-; X86-NEXT: testb $1, %cl
+; X86-NEXT: sete %dl
+; X86-NEXT: testb $1, %dl
; X86-NEXT: movl %eax, %ecx
; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
@@ -124,8 +124,8 @@ define void @atomic_fetch_or32() nounwind {
; X64-NEXT: movl %eax, %ecx
; X64-NEXT: orl $5, %ecx
; X64-NEXT: lock cmpxchgl %ecx, {{.*}}(%rip)
-; X64-NEXT: sete %cl
-; X64-NEXT: testb $1, %cl
+; X64-NEXT: sete %dl
+; X64-NEXT: testb $1, %dl
; X64-NEXT: movl %eax, %ecx
; X64-NEXT: movl %ecx, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
; X64-NEXT: movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
@@ -148,8 +148,8 @@ define void @atomic_fetch_or32() nounwind {
; X86-NEXT: movl %eax, %ecx
; X86-NEXT: orl $5, %ecx
; X86-NEXT: lock cmpxchgl %ecx, sc32
-; X86-NEXT: sete %cl
-; X86-NEXT: testb $1, %cl
+; X86-NEXT: sete %dl
+; X86-NEXT: testb $1, %dl
; X86-NEXT: movl %eax, %ecx
; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
@@ -178,8 +178,8 @@ define void @atomic_fetch_xor32() nounwind {
; X64-NEXT: movl %eax, %ecx
; X64-NEXT: xorl $5, %ecx
; X64-NEXT: lock cmpxchgl %ecx, {{.*}}(%rip)
-; X64-NEXT: sete %cl
-; X64-NEXT: testb $1, %cl
+; X64-NEXT: sete %dl
+; X64-NEXT: testb $1, %dl
; X64-NEXT: movl %eax, %ecx
; X64-NEXT: movl %ecx, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
; X64-NEXT: movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
@@ -202,8 +202,8 @@ define void @atomic_fetch_xor32() nounwind {
; X86-NEXT: movl %eax, %ecx
; X86-NEXT: xorl $5, %ecx
; X86-NEXT: lock cmpxchgl %ecx, sc32
-; X86-NEXT: sete %cl
-; X86-NEXT: testb $1, %cl
+; X86-NEXT: sete %dl
+; X86-NEXT: testb $1, %dl
; X86-NEXT: movl %eax, %ecx
; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
@@ -234,8 +234,8 @@ define void @atomic_fetch_nand32(i32 %x) nounwind {
; X64-NEXT: andl %edx, %ecx
; X64-NEXT: notl %ecx
; X64-NEXT: lock cmpxchgl %ecx, {{.*}}(%rip)
-; X64-NEXT: sete %cl
-; X64-NEXT: testb $1, %cl
+; X64-NEXT: sete %sil
+; X64-NEXT: testb $1, %sil
; X64-NEXT: movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
; X64-NEXT: jne .LBB5_2
; X64-NEXT: jmp .LBB5_1
@@ -244,6 +244,7 @@ define void @atomic_fetch_nand32(i32 %x) nounwind {
;
; X86-LABEL: atomic_fetch_nand32:
; X86: # %bb.0:
+; X86-NEXT: pushl %ebx
; X86-NEXT: subl $8, %esp
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl sc32, %ecx
@@ -257,13 +258,14 @@ define void @atomic_fetch_nand32(i32 %x) nounwind {
; X86-NEXT: andl %edx, %ecx
; X86-NEXT: notl %ecx
; X86-NEXT: lock cmpxchgl %ecx, sc32
-; X86-NEXT: sete %cl
-; X86-NEXT: testb $1, %cl
+; X86-NEXT: sete %bl
+; X86-NEXT: testb $1, %bl
; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
; X86-NEXT: jne .LBB5_2
; X86-NEXT: jmp .LBB5_1
; X86-NEXT: .LBB5_2: # %atomicrmw.end
; X86-NEXT: addl $8, %esp
+; X86-NEXT: popl %ebx
; X86-NEXT: retl
%t1 = atomicrmw nand i32* @sc32, i32 %x acquire
ret void
@@ -283,8 +285,8 @@ define void @atomic_fetch_max32(i32 %x) nounwind {
; X64-NEXT: subl %edx, %ecx
; X64-NEXT: cmovgel %eax, %edx
; X64-NEXT: lock cmpxchgl %edx, {{.*}}(%rip)
-; X64-NEXT: sete %dl
-; X64-NEXT: testb $1, %dl
+; X64-NEXT: sete %sil
+; X64-NEXT: testb $1, %sil
; X64-NEXT: movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
; X64-NEXT: movl %ecx, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
; X64-NEXT: jne .LBB6_2
@@ -294,6 +296,7 @@ define void @atomic_fetch_max32(i32 %x) nounwind {
;
; X86-CMOV-LABEL: atomic_fetch_max32:
; X86-CMOV: # %bb.0:
+; X86-CMOV-NEXT: pushl %ebx
; X86-CMOV-NEXT: subl $12, %esp
; X86-CMOV-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-CMOV-NEXT: movl sc32, %ecx
@@ -307,18 +310,20 @@ define void @atomic_fetch_max32(i32 %x) nounwind {
; X86-CMOV-NEXT: subl %edx, %ecx
; X86-CMOV-NEXT: cmovgel %eax, %edx
; X86-CMOV-NEXT: lock cmpxchgl %edx, sc32
-; X86-CMOV-NEXT: sete %dl
-; X86-CMOV-NEXT: testb $1, %dl
+; X86-CMOV-NEXT: sete %bl
+; X86-CMOV-NEXT: testb $1, %bl
; X86-CMOV-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-CMOV-NEXT: movl %ecx, (%esp) # 4-byte Spill
; X86-CMOV-NEXT: jne .LBB6_2
; X86-CMOV-NEXT: jmp .LBB6_1
; X86-CMOV-NEXT: .LBB6_2: # %atomicrmw.end
; X86-CMOV-NEXT: addl $12, %esp
+; X86-CMOV-NEXT: popl %ebx
; X86-CMOV-NEXT: retl
;
; X86-NOCMOV-LABEL: atomic_fetch_max32:
; X86-NOCMOV: # %bb.0:
+; X86-NOCMOV-NEXT: pushl %ebx
; X86-NOCMOV-NEXT: pushl %esi
; X86-NOCMOV-NEXT: subl $20, %esp
; X86-NOCMOV-NEXT: movl {{[0-9]+}}(%esp), %eax
@@ -347,18 +352,20 @@ define void @atomic_fetch_max32(i32 %x) nounwind {
; X86-NOCMOV-NEXT: movl %ecx, %eax
; X86-NOCMOV-NEXT: movl (%esp), %edx # 4-byte Reload
; X86-NOCMOV-NEXT: lock cmpxchgl %edx, sc32
-; X86-NOCMOV-NEXT: sete %dl
-; X86-NOCMOV-NEXT: testb $1, %dl
+; X86-NOCMOV-NEXT: sete %bl
+; X86-NOCMOV-NEXT: testb $1, %bl
; X86-NOCMOV-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NOCMOV-NEXT: jne .LBB6_2
; X86-NOCMOV-NEXT: jmp .LBB6_1
; X86-NOCMOV-NEXT: .LBB6_2: # %atomicrmw.end
; X86-NOCMOV-NEXT: addl $20, %esp
; X86-NOCMOV-NEXT: popl %esi
+; X86-NOCMOV-NEXT: popl %ebx
; X86-NOCMOV-NEXT: retl
;
; X86-NOX87-LABEL: atomic_fetch_max32:
; X86-NOX87: # %bb.0:
+; X86-NOX87-NEXT: pushl %ebx
; X86-NOX87-NEXT: pushl %esi
; X86-NOX87-NEXT: subl $20, %esp
; X86-NOX87-NEXT: movl {{[0-9]+}}(%esp), %eax
@@ -387,14 +394,15 @@ define void @atomic_fetch_max32(i32 %x) nounwind {
; X86-NOX87-NEXT: movl %ecx, %eax
; X86-NOX87-NEXT: movl (%esp), %edx # 4-byte Reload
; X86-NOX87-NEXT: lock cmpxchgl %edx, sc32
-; X86-NOX87-NEXT: sete %dl
-; X86-NOX87-NEXT: testb $1, %dl
+; X86-NOX87-NEXT: sete %bl
+; X86-NOX87-NEXT: testb $1, %bl
; X86-NOX87-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NOX87-NEXT: jne .LBB6_2
; X86-NOX87-NEXT: jmp .LBB6_1
; X86-NOX87-NEXT: .LBB6_2: # %atomicrmw.end
; X86-NOX87-NEXT: addl $20, %esp
; X86-NOX87-NEXT: popl %esi
+; X86-NOX87-NEXT: popl %ebx
; X86-NOX87-NEXT: retl
%t1 = atomicrmw max i32* @sc32, i32 %x acquire
ret void
@@ -414,8 +422,8 @@ define void @atomic_fetch_min32(i32 %x) nounwind {
; X64-NEXT: subl %edx, %ecx
; X64-NEXT: cmovlel %eax, %edx
; X64-NEXT: lock cmpxchgl %edx, {{.*}}(%rip)
-; X64-NEXT: sete %dl
-; X64-NEXT: testb $1, %dl
+; X64-NEXT: sete %sil
+; X64-NEXT: testb $1, %sil
; X64-NEXT: movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
; X64-NEXT: movl %ecx, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
; X64-NEXT: jne .LBB7_2
@@ -425,6 +433,7 @@ define void @atomic_fetch_min32(i32 %x) nounwind {
;
; X86-CMOV-LABEL: atomic_fetch_min32:
; X86-CMOV: # %bb.0:
+; X86-CMOV-NEXT: pushl %ebx
; X86-CMOV-NEXT: subl $12, %esp
; X86-CMOV-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-CMOV-NEXT: movl sc32, %ecx
@@ -438,18 +447,20 @@ define void @atomic_fetch_min32(i32 %x) nounwind {
; X86-CMOV-NEXT: subl %edx, %ecx
; X86-CMOV-NEXT: cmovlel %eax, %edx
; X86-CMOV-NEXT: lock cmpxchgl %edx, sc32
-; X86-CMOV-NEXT: sete %dl
-; X86-CMOV-NEXT: testb $1, %dl
+; X86-CMOV-NEXT: sete %bl
+; X86-CMOV-NEXT: testb $1, %bl
; X86-CMOV-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-CMOV-NEXT: movl %ecx, (%esp) # 4-byte Spill
; X86-CMOV-NEXT: jne .LBB7_2
; X86-CMOV-NEXT: jmp .LBB7_1
; X86-CMOV-NEXT: .LBB7_2: # %atomicrmw.end
; X86-CMOV-NEXT: addl $12, %esp
+; X86-CMOV-NEXT: popl %ebx
; X86-CMOV-NEXT: retl
;
; X86-NOCMOV-LABEL: atomic_fetch_min32:
; X86-NOCMOV: # %bb.0:
+; X86-NOCMOV-NEXT: pushl %ebx
; X86-NOCMOV-NEXT: pushl %esi
; X86-NOCMOV-NEXT: subl $20, %esp
; X86-NOCMOV-NEXT: movl {{[0-9]+}}(%esp), %eax
@@ -478,18 +489,20 @@ define void @atomic_fetch_min32(i32 %x) nounwind {
; X86-NOCMOV-NEXT: movl %ecx, %eax
; X86-NOCMOV-NEXT: movl (%esp), %edx # 4-byte Reload
; X86-NOCMOV-NEXT: lock cmpxchgl %edx, sc32
-; X86-NOCMOV-NEXT: sete %dl
-; X86-NOCMOV-NEXT: testb $1, %dl
+; X86-NOCMOV-NEXT: sete %bl
+; X86-NOCMOV-NEXT: testb $1, %bl
; X86-NOCMOV-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NOCMOV-NEXT: jne .LBB7_2
; X86-NOCMOV-NEXT: jmp .LBB7_1
; X86-NOCMOV-NEXT: .LBB7_2: # %atomicrmw.end
; X86-NOCMOV-NEXT: addl $20, %esp
; X86-NOCMOV-NEXT: popl %esi
+; X86-NOCMOV-NEXT: popl %ebx
; X86-NOCMOV-NEXT: retl
;
; X86-NOX87-LABEL: atomic_fetch_min32:
; X86-NOX87: # %bb.0:
+; X86-NOX87-NEXT: pushl %ebx
; X86-NOX87-NEXT: pushl %esi
; X86-NOX87-NEXT: subl $20, %esp
; X86-NOX87-NEXT: movl {{[0-9]+}}(%esp), %eax
@@ -518,14 +531,15 @@ define void @atomic_fetch_min32(i32 %x) nounwind {
; X86-NOX87-NEXT: movl %ecx, %eax
; X86-NOX87-NEXT: movl (%esp), %edx # 4-byte Reload
; X86-NOX87-NEXT: lock cmpxchgl %edx, sc32
-; X86-NOX87-NEXT: sete %dl
-; X86-NOX87-NEXT: testb $1, %dl
+; X86-NOX87-NEXT: sete %bl
+; X86-NOX87-NEXT: testb $1, %bl
; X86-NOX87-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NOX87-NEXT: jne .LBB7_2
; X86-NOX87-NEXT: jmp .LBB7_1
; X86-NOX87-NEXT: .LBB7_2: # %atomicrmw.end
; X86-NOX87-NEXT: addl $20, %esp
; X86-NOX87-NEXT: popl %esi
+; X86-NOX87-NEXT: popl %ebx
; X86-NOX87-NEXT: retl
%t1 = atomicrmw min i32* @sc32, i32 %x acquire
ret void
@@ -545,8 +559,8 @@ define void @atomic_fetch_umax32(i32 %x) nounwind {
; X64-NEXT: subl %edx, %ecx
; X64-NEXT: cmoval %eax, %edx
; X64-NEXT: lock cmpxchgl %edx, {{.*}}(%rip)
-; X64-NEXT: sete %dl
-; X64-NEXT: testb $1, %dl
+; X64-NEXT: sete %sil
+; X64-NEXT: testb $1, %sil
; X64-NEXT: movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
; X64-NEXT: movl %ecx, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
; X64-NEXT: jne .LBB8_2
@@ -556,6 +570,7 @@ define void @atomic_fetch_umax32(i32 %x) nounwind {
;
; X86-CMOV-LABEL: atomic_fetch_umax32:
; X86-CMOV: # %bb.0:
+; X86-CMOV-NEXT: pushl %ebx
; X86-CMOV-NEXT: subl $12, %esp
; X86-CMOV-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-CMOV-NEXT: movl sc32, %ecx
@@ -569,18 +584,20 @@ define void @atomic_fetch_umax32(i32 %x) nounwind {
; X86-CMOV-NEXT: subl %edx, %ecx
; X86-CMOV-NEXT: cmoval %eax, %edx
; X86-CMOV-NEXT: lock cmpxchgl %edx, sc32
-; X86-CMOV-NEXT: sete %dl
-; X86-CMOV-NEXT: testb $1, %dl
+; X86-CMOV-NEXT: sete %bl
+; X86-CMOV-NEXT: testb $1, %bl
; X86-CMOV-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-CMOV-NEXT: movl %ecx, (%esp) # 4-byte Spill
; X86-CMOV-NEXT: jne .LBB8_2
; X86-CMOV-NEXT: jmp .LBB8_1
; X86-CMOV-NEXT: .LBB8_2: # %atomicrmw.end
; X86-CMOV-NEXT: addl $12, %esp
+; X86-CMOV-NEXT: popl %ebx
; X86-CMOV-NEXT: retl
;
; X86-NOCMOV-LABEL: atomic_fetch_umax32:
; X86-NOCMOV: # %bb.0:
+; X86-NOCMOV-NEXT: pushl %ebx
; X86-NOCMOV-NEXT: pushl %esi
; X86-NOCMOV-NEXT: subl $20, %esp
; X86-NOCMOV-NEXT: movl {{[0-9]+}}(%esp), %eax
@@ -609,18 +626,20 @@ define void @atomic_fetch_umax32(i32 %x) nounwind {
; X86-NOCMOV-NEXT: movl %ecx, %eax
; X86-NOCMOV-NEXT: movl (%esp), %edx # 4-byte Reload
; X86-NOCMOV-NEXT: lock cmpxchgl %edx, sc32
-; X86-NOCMOV-NEXT: sete %dl
-; X86-NOCMOV-NEXT: testb $1, %dl
+; X86-NOCMOV-NEXT: sete %bl
+; X86-NOCMOV-NEXT: testb $1, %bl
; X86-NOCMOV-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NOCMOV-NEXT: jne .LBB8_2
; X86-NOCMOV-NEXT: jmp .LBB8_1
; X86-NOCMOV-NEXT: .LBB8_2: # %atomicrmw.end
; X86-NOCMOV-NEXT: addl $20, %esp
; X86-NOCMOV-NEXT: popl %esi
+; X86-NOCMOV-NEXT: popl %ebx
; X86-NOCMOV-NEXT: retl
;
; X86-NOX87-LABEL: atomic_fetch_umax32:
; X86-NOX87: # %bb.0:
+; X86-NOX87-NEXT: pushl %ebx
; X86-NOX87-NEXT: pushl %esi
; X86-NOX87-NEXT: subl $20, %esp
; X86-NOX87-NEXT: movl {{[0-9]+}}(%esp), %eax
@@ -649,14 +668,15 @@ define void @atomic_fetch_umax32(i32 %x) nounwind {
; X86-NOX87-NEXT: movl %ecx, %eax
; X86-NOX87-NEXT: movl (%esp), %edx # 4-byte Reload
; X86-NOX87-NEXT: lock cmpxchgl %edx, sc32
-; X86-NOX87-NEXT: sete %dl
-; X86-NOX87-NEXT: testb $1, %dl
+; X86-NOX87-NEXT: sete %bl
+; X86-NOX87-NEXT: testb $1, %bl
; X86-NOX87-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NOX87-NEXT: jne .LBB8_2
; X86-NOX87-NEXT: jmp .LBB8_1
; X86-NOX87-NEXT: .LBB8_2: # %atomicrmw.end
; X86-NOX87-NEXT: addl $20, %esp
; X86-NOX87-NEXT: popl %esi
+; X86-NOX87-NEXT: popl %ebx
; X86-NOX87-NEXT: retl
%t1 = atomicrmw umax i32* @sc32, i32 %x acquire
ret void
@@ -676,8 +696,8 @@ define void @atomic_fetch_umin32(i32 %x) nounwind {
; X64-NEXT: subl %edx, %ecx
; X64-NEXT: cmovbel %eax, %edx
; X64-NEXT: lock cmpxchgl %edx, {{.*}}(%rip)
-; X64-NEXT: sete %dl
-; X64-NEXT: testb $1, %dl
+; X64-NEXT: sete %sil
+; X64-NEXT: testb $1, %sil
; X64-NEXT: movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
; X64-NEXT: movl %ecx, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
; X64-NEXT: jne .LBB9_2
@@ -687,6 +707,7 @@ define void @atomic_fetch_umin32(i32 %x) nounwind {
;
; X86-CMOV-LABEL: atomic_fetch_umin32:
; X86-CMOV: # %bb.0:
+; X86-CMOV-NEXT: pushl %ebx
; X86-CMOV-NEXT: subl $12, %esp
; X86-CMOV-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-CMOV-NEXT: movl sc32, %ecx
@@ -700,18 +721,20 @@ define void @atomic_fetch_umin32(i32 %x) nounwind {
; X86-CMOV-NEXT: subl %edx, %ecx
; X86-CMOV-NEXT: cmovbel %eax, %edx
; X86-CMOV-NEXT: lock cmpxchgl %edx, sc32
-; X86-CMOV-NEXT: sete %dl
-; X86-CMOV-NEXT: testb $1, %dl
+; X86-CMOV-NEXT: sete %bl
+; X86-CMOV-NEXT: testb $1, %bl
; X86-CMOV-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-CMOV-NEXT: movl %ecx, (%esp) # 4-byte Spill
; X86-CMOV-NEXT: jne .LBB9_2
; X86-CMOV-NEXT: jmp .LBB9_1
; X86-CMOV-NEXT: .LBB9_2: # %atomicrmw.end
; X86-CMOV-NEXT: addl $12, %esp
+; X86-CMOV-NEXT: popl %ebx
; X86-CMOV-NEXT: retl
;
; X86-NOCMOV-LABEL: atomic_fetch_umin32:
; X86-NOCMOV: # %bb.0:
+; X86-NOCMOV-NEXT: pushl %ebx
; X86-NOCMOV-NEXT: pushl %esi
; X86-NOCMOV-NEXT: subl $20, %esp
; X86-NOCMOV-NEXT: movl {{[0-9]+}}(%esp), %eax
@@ -740,18 +763,20 @@ define void @atomic_fetch_umin32(i32 %x) nounwind {
; X86-NOCMOV-NEXT: movl %ecx, %eax
; X86-NOCMOV-NEXT: movl (%esp), %edx # 4-byte Reload
; X86-NOCMOV-NEXT: lock cmpxchgl %edx, sc32
-; X86-NOCMOV-NEXT: sete %dl
-; X86-NOCMOV-NEXT: testb $1, %dl
+; X86-NOCMOV-NEXT: sete %bl
+; X86-NOCMOV-NEXT: testb $1, %bl
; X86-NOCMOV-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NOCMOV-NEXT: jne .LBB9_2
; X86-NOCMOV-NEXT: jmp .LBB9_1
; X86-NOCMOV-NEXT: .LBB9_2: # %atomicrmw.end
; X86-NOCMOV-NEXT: addl $20, %esp
; X86-NOCMOV-NEXT: popl %esi
+; X86-NOCMOV-NEXT: popl %ebx
; X86-NOCMOV-NEXT: retl
;
; X86-NOX87-LABEL: atomic_fetch_umin32:
; X86-NOX87: # %bb.0:
+; X86-NOX87-NEXT: pushl %ebx
; X86-NOX87-NEXT: pushl %esi
; X86-NOX87-NEXT: subl $20, %esp
; X86-NOX87-NEXT: movl {{[0-9]+}}(%esp), %eax
@@ -780,14 +805,15 @@ define void @atomic_fetch_umin32(i32 %x) nounwind {
; X86-NOX87-NEXT: movl %ecx, %eax
; X86-NOX87-NEXT: movl (%esp), %edx # 4-byte Reload
; X86-NOX87-NEXT: lock cmpxchgl %edx, sc32
-; X86-NOX87-NEXT: sete %dl
-; X86-NOX87-NEXT: testb $1, %dl
+; X86-NOX87-NEXT: sete %bl
+; X86-NOX87-NEXT: testb $1, %bl
; X86-NOX87-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NOX87-NEXT: jne .LBB9_2
; X86-NOX87-NEXT: jmp .LBB9_1
; X86-NOX87-NEXT: .LBB9_2: # %atomicrmw.end
; X86-NOX87-NEXT: addl $20, %esp
; X86-NOX87-NEXT: popl %esi
+; X86-NOX87-NEXT: popl %ebx
; X86-NOX87-NEXT: retl
%t1 = atomicrmw umin i32* @sc32, i32 %x acquire
ret void
diff --git a/llvm/test/CodeGen/X86/atomic64.ll b/llvm/test/CodeGen/X86/atomic64.ll
index fe7635bdc3ff5..0149851ea4671 100644
--- a/llvm/test/CodeGen/X86/atomic64.ll
+++ b/llvm/test/CodeGen/X86/atomic64.ll
@@ -137,12 +137,12 @@ define void @atomic_fetch_and64() nounwind {
; X64-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rax # 8-byte Reload
; X64-NEXT: movl %eax, %ecx
; X64-NEXT: andl $5, %ecx
-; X64-NEXT: # kill: def $rcx killed $ecx
-; X64-NEXT: lock cmpxchgq %rcx, {{.*}}(%rip)
-; X64-NEXT: sete %cl
-; X64-NEXT: testb $1, %cl
-; X64-NEXT: movq %rax, %rcx
-; X64-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
+; X64-NEXT: movl %ecx, %edx
+; X64-NEXT: lock cmpxchgq %rdx, {{.*}}(%rip)
+; X64-NEXT: sete %sil
+; X64-NEXT: testb $1, %sil
+; X64-NEXT: movq %rax, %rdx
+; X64-NEXT: movq %rdx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; X64-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; X64-NEXT: jne .LBB2_2
; X64-NEXT: jmp .LBB2_1
@@ -202,8 +202,8 @@ define void @atomic_fetch_or64() nounwind {
; X64-NEXT: movq %rax, %rcx
; X64-NEXT: orq $5, %rcx
; X64-NEXT: lock cmpxchgq %rcx, {{.*}}(%rip)
-; X64-NEXT: sete %cl
-; X64-NEXT: testb $1, %cl
+; X64-NEXT: sete %dl
+; X64-NEXT: testb $1, %dl
; X64-NEXT: movq %rax, %rcx
; X64-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; X64-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
@@ -265,8 +265,8 @@ define void @atomic_fetch_xor64() nounwind {
; X64-NEXT: movq %rax, %rcx
; X64-NEXT: xorq $5, %rcx
; X64-NEXT: lock cmpxchgq %rcx, {{.*}}(%rip)
-; X64-NEXT: sete %cl
-; X64-NEXT: testb $1, %cl
+; X64-NEXT: sete %dl
+; X64-NEXT: testb $1, %dl
; X64-NEXT: movq %rax, %rcx
; X64-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; X64-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
@@ -330,8 +330,8 @@ define void @atomic_fetch_nand64(i64 %x) nounwind {
; X64-NEXT: andq %rdx, %rcx
; X64-NEXT: notq %rcx
; X64-NEXT: lock cmpxchgq %rcx, {{.*}}(%rip)
-; X64-NEXT: sete %cl
-; X64-NEXT: testb $1, %cl
+; X64-NEXT: sete %sil
+; X64-NEXT: testb $1, %sil
; X64-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; X64-NEXT: jne .LBB5_2
; X64-NEXT: jmp .LBB5_1
@@ -373,8 +373,8 @@ define void @atomic_fetch_max64(i64 %x) nounwind {
; X64-NEXT: subq %rdx, %rcx
; X64-NEXT: cmovgeq %rax, %rdx
; X64-NEXT: lock cmpxchgq %rdx, {{.*}}(%rip)
-; X64-NEXT: sete %dl
-; X64-NEXT: testb $1, %dl
+; X64-NEXT: sete %sil
+; X64-NEXT: testb $1, %sil
; X64-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; X64-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; X64-NEXT: jne .LBB6_2
@@ -473,8 +473,8 @@ define void @atomic_fetch_min64(i64 %x) nounwind {
; X64-NEXT: subq %rdx, %rcx
; X64-NEXT: cmovleq %rax, %rdx
; X64-NEXT: lock cmpxchgq %rdx, {{.*}}(%rip)
-; X64-NEXT: sete %dl
-; X64-NEXT: testb $1, %dl
+; X64-NEXT: sete %sil
+; X64-NEXT: testb $1, %sil
; X64-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; X64-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; X64-NEXT: jne .LBB7_2
@@ -571,8 +571,8 @@ define void @atomic_fetch_umax64(i64 %x) nounwind {
; X64-NEXT: subq %rdx, %rcx
; X64-NEXT: cmovaq %rax, %rdx
; X64-NEXT: lock cmpxchgq %rdx, {{.*}}(%rip)
-; X64-NEXT: sete %dl
-; X64-NEXT: testb $1, %dl
+; X64-NEXT: sete %sil
+; X64-NEXT: testb $1, %sil
; X64-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; X64-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; X64-NEXT: jne .LBB8_2
@@ -669,8 +669,8 @@ define void @atomic_fetch_umin64(i64 %x) nounwind {
; X64-NEXT: subq %rdx, %rcx
; X64-NEXT: cmovbeq %rax, %rdx
; X64-NEXT: lock cmpxchgq %rdx, {{.*}}(%rip)
-; X64-NEXT: sete %dl
-; X64-NEXT: testb $1, %dl
+; X64-NEXT: sete %sil
+; X64-NEXT: testb $1, %sil
; X64-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; X64-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; X64-NEXT: jne .LBB9_2
diff --git a/llvm/test/CodeGen/X86/avx-load-store.ll b/llvm/test/CodeGen/X86/avx-load-store.ll
index f448bfec2ec99..718449d7a771f 100644
--- a/llvm/test/CodeGen/X86/avx-load-store.ll
+++ b/llvm/test/CodeGen/X86/avx-load-store.ll
@@ -175,8 +175,8 @@ define void @double_save(<4 x i32> %A, <4 x i32> %B, <8 x i32>* %P) nounwind ssp
; CHECK_O0: # %bb.0:
; CHECK_O0-NEXT: # implicit-def: $ymm2
; CHECK_O0-NEXT: vmovaps %xmm0, %xmm2
-; CHECK_O0-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm0
-; CHECK_O0-NEXT: vmovdqu %ymm0, (%rdi)
+; CHECK_O0-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm2
+; CHECK_O0-NEXT: vmovdqu %ymm2, (%rdi)
; CHECK_O0-NEXT: vzeroupper
; CHECK_O0-NEXT: retq
%Z = shufflevector <4 x i32>%A, <4 x i32>%B, <8 x i32>
@@ -197,8 +197,8 @@ define void @double_save_volatile(<4 x i32> %A, <4 x i32> %B, <8 x i32>* %P) nou
; CHECK_O0: # %bb.0:
; CHECK_O0-NEXT: # implicit-def: $ymm2
; CHECK_O0-NEXT: vmovaps %xmm0, %xmm2
-; CHECK_O0-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm0
-; CHECK_O0-NEXT: vmovdqu %ymm0, (%rdi)
+; CHECK_O0-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm2
+; CHECK_O0-NEXT: vmovdqu %ymm2, (%rdi)
; CHECK_O0-NEXT: vzeroupper
; CHECK_O0-NEXT: retq
%Z = shufflevector <4 x i32>%A, <4 x i32>%B, <8 x i32>
@@ -239,10 +239,10 @@ define void @f_f() nounwind {
; CHECK_O0-NEXT: .LBB9_3: # %cif_mixed_test_all
; CHECK_O0-NEXT: vmovdqa {{.*#+}} xmm0 = [4294967295,0,0,0]
; CHECK_O0-NEXT: vmovdqa %xmm0, %xmm0
-; CHECK_O0-NEXT: # kill: def $ymm0 killed $xmm0
+; CHECK_O0-NEXT: vmovaps %xmm0, %xmm1
; CHECK_O0-NEXT: # implicit-def: $rax
-; CHECK_O0-NEXT: # implicit-def: $ymm1
-; CHECK_O0-NEXT: vmaskmovps %ymm1, %ymm0, (%rax)
+; CHECK_O0-NEXT: # implicit-def: $ymm2
+; CHECK_O0-NEXT: vmaskmovps %ymm2, %ymm1, (%rax)
; CHECK_O0-NEXT: .LBB9_4: # %cif_mixed_test_any_check
allocas:
br i1 undef, label %cif_mask_all, label %cif_mask_mixed
@@ -276,8 +276,8 @@ define void @add8i32(<8 x i32>* %ret, <8 x i32>* %bp) nounwind {
; CHECK_O0-NEXT: vmovdqu 16(%rsi), %xmm1
; CHECK_O0-NEXT: # implicit-def: $ymm2
; CHECK_O0-NEXT: vmovaps %xmm0, %xmm2
-; CHECK_O0-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm0
-; CHECK_O0-NEXT: vmovdqu %ymm0, (%rdi)
+; CHECK_O0-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm2
+; CHECK_O0-NEXT: vmovdqu %ymm2, (%rdi)
; CHECK_O0-NEXT: vzeroupper
; CHECK_O0-NEXT: retq
%b = load <8 x i32>, <8 x i32>* %bp, align 1
@@ -321,8 +321,8 @@ define void @add4i64a16(<4 x i64>* %ret, <4 x i64>* %bp) nounwind {
; CHECK_O0-NEXT: vmovdqa 16(%rsi), %xmm1
; CHECK_O0-NEXT: # implicit-def: $ymm2
; CHECK_O0-NEXT: vmovaps %xmm0, %xmm2
-; CHECK_O0-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm0
-; CHECK_O0-NEXT: vmovdqu %ymm0, (%rdi)
+; CHECK_O0-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm2
+; CHECK_O0-NEXT: vmovdqu %ymm2, (%rdi)
; CHECK_O0-NEXT: vzeroupper
; CHECK_O0-NEXT: retq
%b = load <4 x i64>, <4 x i64>* %bp, align 16
diff --git a/llvm/test/CodeGen/X86/avx512-mask-zext-bugfix.ll b/llvm/test/CodeGen/X86/avx512-mask-zext-bugfix.ll
index 186370ca675c7..c4e009d54ec7a 100755
--- a/llvm/test/CodeGen/X86/avx512-mask-zext-bugfix.ll
+++ b/llvm/test/CodeGen/X86/avx512-mask-zext-bugfix.ll
@@ -40,20 +40,22 @@ define void @test_xmm(i32 %shift, i32 %mulp, <2 x i64> %a,i8* %arraydecay,i8* %f
; CHECK-NEXT: vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 ## 16-byte Reload
; CHECK-NEXT: vpmovd2m %xmm0, %k0
; CHECK-NEXT: kmovq %k0, %k1
-; CHECK-NEXT: kmovd %k0, %ecx
-; CHECK-NEXT: ## kill: def $cl killed $cl killed $ecx
-; CHECK-NEXT: movzbl %cl, %ecx
-; CHECK-NEXT: ## kill: def $cx killed $cx killed $ecx
-; CHECK-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdi ## 8-byte Reload
-; CHECK-NEXT: movl $4, %edx
-; CHECK-NEXT: movl %edx, %esi
+; CHECK-NEXT: kmovd %k0, %esi
+; CHECK-NEXT: ## kill: def $sil killed $sil killed $esi
+; CHECK-NEXT: movzbl %sil, %edi
+; CHECK-NEXT: ## kill: def $di killed $di killed $edi
+; CHECK-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rcx ## 8-byte Reload
+; CHECK-NEXT: movw %di, {{[-0-9]+}}(%r{{[sb]}}p) ## 2-byte Spill
+; CHECK-NEXT: movq %rcx, %rdi
+; CHECK-NEXT: movl $4, %r8d
+; CHECK-NEXT: movl %r8d, %esi
+; CHECK-NEXT: movl %r8d, %edx
; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) ## 4-byte Spill
; CHECK-NEXT: kmovw %k1, {{[-0-9]+}}(%r{{[sb]}}p) ## 2-byte Spill
-; CHECK-NEXT: movw %cx, {{[-0-9]+}}(%r{{[sb]}}p) ## 2-byte Spill
; CHECK-NEXT: callq _calc_expected_mask_val
; CHECK-NEXT: ## kill: def $ax killed $ax killed $rax
-; CHECK-NEXT: movw {{[-0-9]+}}(%r{{[sb]}}p), %cx ## 2-byte Reload
-; CHECK-NEXT: movzwl %cx, %edi
+; CHECK-NEXT: movw {{[-0-9]+}}(%r{{[sb]}}p), %r9w ## 2-byte Reload
+; CHECK-NEXT: movzwl %r9w, %edi
; CHECK-NEXT: movzwl %ax, %esi
; CHECK-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdx ## 8-byte Reload
; CHECK-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rcx ## 8-byte Reload
diff --git a/llvm/test/CodeGen/X86/callbr-asm-phi-placement.ll b/llvm/test/CodeGen/X86/callbr-asm-phi-placement.ll
new file mode 100644
index 0000000000000..9bad6a7e0892f
--- /dev/null
+++ b/llvm/test/CodeGen/X86/callbr-asm-phi-placement.ll
@@ -0,0 +1,44 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu -verify-machineinstrs -O2 < %s | FileCheck %s
+
+;; https://bugs.llvm.org/PR47468
+
+;; PHI elimination should place copies BEFORE the inline asm, not
+;; after, even if the inline-asm uses as an input the same value as
+;; the PHI.
+
+declare void @foo(i8*)
+
+define void @test1(i8* %arg, i8** %mem) nounwind {
+; CHECK-LABEL: test1:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: pushq %r14
+; CHECK-NEXT: pushq %rbx
+; CHECK-NEXT: pushq %rax
+; CHECK-NEXT: movq %rsi, %r14
+; CHECK-NEXT: .Ltmp0: # Block address taken
+; CHECK-NEXT: .LBB0_1: # %loop
+; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT: movq (%r14), %rbx
+; CHECK-NEXT: callq foo
+; CHECK-NEXT: movq %rbx, %rdi
+; CHECK-NEXT: #APP
+; CHECK-NEXT: #NO_APP
+; CHECK-NEXT: # %bb.2: # %end
+; CHECK-NEXT: addq $8, %rsp
+; CHECK-NEXT: popq %rbx
+; CHECK-NEXT: popq %r14
+; CHECK-NEXT: retq
+entry:
+ br label %loop
+
+loop:
+ %a = phi i8* [ %arg, %entry ], [ %b, %loop ]
+ %b = load i8*, i8** %mem, align 8
+ call void @foo(i8* %a)
+ callbr void asm sideeffect "", "*m,X"(i8* %b, i8* blockaddress(@test1, %loop))
+ to label %end [label %loop]
+
+end:
+ ret void
+}
diff --git a/llvm/test/CodeGen/X86/crash-O0.ll b/llvm/test/CodeGen/X86/crash-O0.ll
index 9f9e5584d6f21..a93d3dd267b52 100644
--- a/llvm/test/CodeGen/X86/crash-O0.ll
+++ b/llvm/test/CodeGen/X86/crash-O0.ll
@@ -79,12 +79,11 @@ define i64 @addressModeWith32bitIndex(i32 %V) {
; CHECK-NEXT: movq %rsp, %rbp
; CHECK-NEXT: .cfi_def_cfa_register %rbp
; CHECK-NEXT: xorl %eax, %eax
-; CHECK-NEXT: ## kill: def $rax killed $eax
-; CHECK-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) ## 8-byte Spill
+; CHECK-NEXT: movl %eax, %ecx
+; CHECK-NEXT: movq %rcx, %rax
; CHECK-NEXT: cqto
-; CHECK-NEXT: movslq %edi, %rcx
-; CHECK-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rsi ## 8-byte Reload
-; CHECK-NEXT: idivq (%rsi,%rcx,8)
+; CHECK-NEXT: movslq %edi, %rsi
+; CHECK-NEXT: idivq (%rcx,%rsi,8)
; CHECK-NEXT: popq %rbp
; CHECK-NEXT: retq
%gep = getelementptr i64, i64* null, i32 %V
diff --git a/llvm/test/CodeGen/X86/extend-set-cc-uses-dbg.ll b/llvm/test/CodeGen/X86/extend-set-cc-uses-dbg.ll
index 664d9ded1e0e1..7d05a869be893 100644
--- a/llvm/test/CodeGen/X86/extend-set-cc-uses-dbg.ll
+++ b/llvm/test/CodeGen/X86/extend-set-cc-uses-dbg.ll
@@ -7,8 +7,8 @@ define void @foo(i32* %p) !dbg !4 {
bb:
%tmp = load i32, i32* %p, align 4, !dbg !7
; CHECK: $eax = MOV32rm killed {{.*}} $rdi, {{.*}} debug-location !7 :: (load 4 from %ir.p)
- ; CHECK-NEXT: $rax = KILL killed renamable $eax, debug-location !7
- ; CHECK-NEXT: $rcx = MOV64rr $rax, debug-location !7
+ ; CHECK-NEXT: $ecx = MOV32rr killed $eax, implicit-def $rcx, debug-location !7
+ ; CHECK-NEXT: $rdx = MOV64rr $rcx, debug-location !7
switch i32 %tmp, label %bb7 [
i32 0, label %bb1
diff --git a/llvm/test/CodeGen/X86/fast-isel-nontemporal.ll b/llvm/test/CodeGen/X86/fast-isel-nontemporal.ll
index 7fffa21f0d24d..5d7c83fa19d44 100644
--- a/llvm/test/CodeGen/X86/fast-isel-nontemporal.ll
+++ b/llvm/test/CodeGen/X86/fast-isel-nontemporal.ll
@@ -1013,11 +1013,11 @@ define <16 x float> @test_load_nt16xfloat(<16 x float>* nocapture %ptr) {
; AVX1-NEXT: vmovaps %xmm0, %xmm1
; AVX1-NEXT: vmovntdqa 16(%rdi), %xmm0
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
-; AVX1-NEXT: vmovntdqa 32(%rdi), %xmm1
-; AVX1-NEXT: # implicit-def: $ymm2
-; AVX1-NEXT: vmovaps %xmm1, %xmm2
-; AVX1-NEXT: vmovntdqa 48(%rdi), %xmm1
-; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm1
+; AVX1-NEXT: vmovntdqa 32(%rdi), %xmm2
+; AVX1-NEXT: # implicit-def: $ymm1
+; AVX1-NEXT: vmovaps %xmm2, %xmm1
+; AVX1-NEXT: vmovntdqa 48(%rdi), %xmm2
+; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
; AVX1-NEXT: retq
;
; AVX2-LABEL: test_load_nt16xfloat:
@@ -1067,11 +1067,11 @@ define <8 x double> @test_load_nt8xdouble(<8 x double>* nocapture %ptr) {
; AVX1-NEXT: vmovaps %xmm0, %xmm1
; AVX1-NEXT: vmovntdqa 16(%rdi), %xmm0
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
-; AVX1-NEXT: vmovntdqa 32(%rdi), %xmm1
-; AVX1-NEXT: # implicit-def: $ymm2
-; AVX1-NEXT: vmovaps %xmm1, %xmm2
-; AVX1-NEXT: vmovntdqa 48(%rdi), %xmm1
-; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm1
+; AVX1-NEXT: vmovntdqa 32(%rdi), %xmm2
+; AVX1-NEXT: # implicit-def: $ymm1
+; AVX1-NEXT: vmovaps %xmm2, %xmm1
+; AVX1-NEXT: vmovntdqa 48(%rdi), %xmm2
+; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
; AVX1-NEXT: retq
;
; AVX2-LABEL: test_load_nt8xdouble:
@@ -1121,11 +1121,11 @@ define <64 x i8> @test_load_nt64xi8(<64 x i8>* nocapture %ptr) {
; AVX1-NEXT: vmovaps %xmm0, %xmm1
; AVX1-NEXT: vmovntdqa 16(%rdi), %xmm0
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
-; AVX1-NEXT: vmovntdqa 32(%rdi), %xmm1
-; AVX1-NEXT: # implicit-def: $ymm2
-; AVX1-NEXT: vmovaps %xmm1, %xmm2
-; AVX1-NEXT: vmovntdqa 48(%rdi), %xmm1
-; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm1
+; AVX1-NEXT: vmovntdqa 32(%rdi), %xmm2
+; AVX1-NEXT: # implicit-def: $ymm1
+; AVX1-NEXT: vmovaps %xmm2, %xmm1
+; AVX1-NEXT: vmovntdqa 48(%rdi), %xmm2
+; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
; AVX1-NEXT: retq
;
; AVX2-LABEL: test_load_nt64xi8:
@@ -1175,11 +1175,11 @@ define <32 x i16> @test_load_nt32xi16(<32 x i16>* nocapture %ptr) {
; AVX1-NEXT: vmovaps %xmm0, %xmm1
; AVX1-NEXT: vmovntdqa 16(%rdi), %xmm0
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
-; AVX1-NEXT: vmovntdqa 32(%rdi), %xmm1
-; AVX1-NEXT: # implicit-def: $ymm2
-; AVX1-NEXT: vmovaps %xmm1, %xmm2
-; AVX1-NEXT: vmovntdqa 48(%rdi), %xmm1
-; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm1
+; AVX1-NEXT: vmovntdqa 32(%rdi), %xmm2
+; AVX1-NEXT: # implicit-def: $ymm1
+; AVX1-NEXT: vmovaps %xmm2, %xmm1
+; AVX1-NEXT: vmovntdqa 48(%rdi), %xmm2
+; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
; AVX1-NEXT: retq
;
; AVX2-LABEL: test_load_nt32xi16:
@@ -1229,11 +1229,11 @@ define <16 x i32> @test_load_nt16xi32(<16 x i32>* nocapture %ptr) {
; AVX1-NEXT: vmovaps %xmm0, %xmm1
; AVX1-NEXT: vmovntdqa 16(%rdi), %xmm0
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
-; AVX1-NEXT: vmovntdqa 32(%rdi), %xmm1
-; AVX1-NEXT: # implicit-def: $ymm2
-; AVX1-NEXT: vmovaps %xmm1, %xmm2
-; AVX1-NEXT: vmovntdqa 48(%rdi), %xmm1
-; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm1
+; AVX1-NEXT: vmovntdqa 32(%rdi), %xmm2
+; AVX1-NEXT: # implicit-def: $ymm1
+; AVX1-NEXT: vmovaps %xmm2, %xmm1
+; AVX1-NEXT: vmovntdqa 48(%rdi), %xmm2
+; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
; AVX1-NEXT: retq
;
; AVX2-LABEL: test_load_nt16xi32:
@@ -1283,11 +1283,11 @@ define <8 x i64> @test_load_nt8xi64(<8 x i64>* nocapture %ptr) {
; AVX1-NEXT: vmovaps %xmm0, %xmm1
; AVX1-NEXT: vmovntdqa 16(%rdi), %xmm0
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
-; AVX1-NEXT: vmovntdqa 32(%rdi), %xmm1
-; AVX1-NEXT: # implicit-def: $ymm2
-; AVX1-NEXT: vmovaps %xmm1, %xmm2
-; AVX1-NEXT: vmovntdqa 48(%rdi), %xmm1
-; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm1
+; AVX1-NEXT: vmovntdqa 32(%rdi), %xmm2
+; AVX1-NEXT: # implicit-def: $ymm1
+; AVX1-NEXT: vmovaps %xmm2, %xmm1
+; AVX1-NEXT: vmovntdqa 48(%rdi), %xmm2
+; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
; AVX1-NEXT: retq
;
; AVX2-LABEL: test_load_nt8xi64:
diff --git a/llvm/test/CodeGen/X86/linux-preemption.ll b/llvm/test/CodeGen/X86/linux-preemption.ll
index 49a7becf13432..15265f4019924 100644
--- a/llvm/test/CodeGen/X86/linux-preemption.ll
+++ b/llvm/test/CodeGen/X86/linux-preemption.ll
@@ -20,6 +20,14 @@ define i32* @get_strong_default_global() {
; STATIC: movl $strong_default_global, %eax
; CHECK32: movl strong_default_global@GOT(%eax), %eax
+@strong_hidden_global = hidden global i32 42
+define i32* @get_hidden_default_global() {
+ ret i32* @strong_hidden_global
+}
+; CHECK: leaq strong_hidden_global(%rip), %rax
+; STATIC: movl $strong_hidden_global, %eax
+; CHECK32: leal strong_hidden_global@GOTOFF(%eax), %eax
+
@weak_default_global = weak global i32 42
define i32* @get_weak_default_global() {
ret i32* @weak_default_global
@@ -96,6 +104,14 @@ define i32* @get_strong_default_alias() {
; STATIC: movl $strong_default_alias, %eax
; CHECK32: movl strong_default_alias@GOT(%eax), %eax
+@strong_hidden_alias = hidden alias i32, i32* @aliasee
+define i32* @get_strong_hidden_alias() {
+ ret i32* @strong_hidden_alias
+}
+; CHECK: leaq strong_hidden_alias(%rip), %rax
+; STATIC: movl $strong_hidden_alias, %eax
+; CHECK32: leal strong_hidden_alias@GOTOFF(%eax), %eax
+
@weak_default_alias = weak alias i32, i32* @aliasee
define i32* @get_weak_default_alias() {
ret i32* @weak_default_alias
@@ -149,6 +165,16 @@ define void()* @get_strong_default_function() {
; STATIC: movl $strong_default_function, %eax
; CHECK32: movl strong_default_function@GOT(%eax), %eax
+define hidden void @strong_hidden_function() {
+ ret void
+}
+define void()* @get_strong_hidden_function() {
+ ret void()* @strong_hidden_function
+}
+; CHECK: leaq strong_hidden_function(%rip), %rax
+; STATIC: movl $strong_hidden_function, %eax
+; CHECK32: leal strong_hidden_function@GOTOFF(%eax), %eax
+
define weak void @weak_default_function() {
ret void
}
@@ -234,6 +260,9 @@ define void()* @get_external_preemptable_function() {
; COMMON: .globl strong_default_alias
; COMMON-NEXT: .set strong_default_alias, aliasee
+; COMMON-NEXT: .globl strong_hidden_alias
+; COMMON-NEXT: .hidden strong_hidden_alias
+; COMMON-NEXT: .set strong_hidden_alias, aliasee
; COMMON-NEXT: .weak weak_default_alias
; COMMON-NEXT: .set weak_default_alias, aliasee
; COMMON-NEXT: .globl strong_local_alias
diff --git a/llvm/test/CodeGen/X86/lvi-hardening-loads.ll b/llvm/test/CodeGen/X86/lvi-hardening-loads.ll
index ff8276f6f1c22..e660f306ef75b 100644
--- a/llvm/test/CodeGen/X86/lvi-hardening-loads.ll
+++ b/llvm/test/CodeGen/X86/lvi-hardening-loads.ll
@@ -117,9 +117,9 @@ if.then: ; preds = %for.body
; X64-NOOPT-NEXT: lfence
; X64-NOOPT-NEXT: movq (%rax,%rcx,8), %rax
; X64-NOOPT-NEXT: lfence
-; X64-NOOPT-NEXT: movl (%rax), %eax
+; X64-NOOPT-NEXT: movl (%rax), %edx
; X64-NOOPT-NEXT: lfence
-; X64-NOOPT-NEXT: movl %eax, -{{[0-9]+}}(%rsp)
+; X64-NOOPT-NEXT: movl %edx, -{{[0-9]+}}(%rsp)
if.end: ; preds = %if.then, %for.body
br label %for.inc
diff --git a/llvm/test/CodeGen/X86/machine-cp-mask-reg.mir b/llvm/test/CodeGen/X86/machine-cp-mask-reg.mir
new file mode 100644
index 0000000000000..86a077e64764f
--- /dev/null
+++ b/llvm/test/CodeGen/X86/machine-cp-mask-reg.mir
@@ -0,0 +1,59 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skx -run-pass=machine-cp -o - | FileCheck %s
+
+# machine-cp previously asserted trying to determine if the k0->eax copy below
+# could be combined with the k0->rax copy.
+
+--- |
+ ; ModuleID = 'test.ll'
+ source_filename = "test.ll"
+ target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+
+ define i8 @foo(<64 x i8> %x, i64* %y, i64 %z) #0 {
+ %a = icmp eq <64 x i8> %x, zeroinitializer
+ %b = bitcast <64 x i1> %a to i64
+ %c = add i64 %b, %z
+ store i64 %c, i64* %y, align 8
+ %d = extractelement <64 x i1> %a, i32 0
+ %e = zext i1 %d to i8
+ ret i8 %e
+ }
+
+ attributes #0 = { "target-cpu"="skx" }
+
+...
+---
+name: foo
+alignment: 16
+tracksRegLiveness: true
+liveins:
+ - { reg: '$zmm0' }
+ - { reg: '$rdi' }
+ - { reg: '$rsi' }
+frameInfo:
+ maxAlignment: 1
+machineFunctionInfo: {}
+body: |
+ bb.0 (%ir-block.0):
+ liveins: $rdi, $rsi, $zmm0
+
+ ; CHECK-LABEL: name: foo
+ ; CHECK: liveins: $rdi, $rsi, $zmm0
+ ; CHECK: renamable $k0 = VPTESTNMBZrr killed renamable $zmm0, renamable $zmm0
+ ; CHECK: renamable $rax = COPY renamable $k0
+ ; CHECK: renamable $rsi = ADD64rr killed renamable $rsi, killed renamable $rax, implicit-def dead $eflags
+ ; CHECK: MOV64mr killed renamable $rdi, 1, $noreg, 0, $noreg, killed renamable $rsi :: (store 8 into %ir.y)
+ ; CHECK: renamable $eax = COPY killed renamable $k0
+ ; CHECK: renamable $al = AND8ri renamable $al, 1, implicit-def dead $eflags, implicit killed $eax, implicit-def $eax
+ ; CHECK: $al = KILL renamable $al, implicit killed $eax
+ ; CHECK: RET 0, $al
+ renamable $k0 = VPTESTNMBZrr killed renamable $zmm0, renamable $zmm0
+ renamable $rax = COPY renamable $k0
+ renamable $rsi = ADD64rr killed renamable $rsi, killed renamable $rax, implicit-def dead $eflags
+ MOV64mr killed renamable $rdi, 1, $noreg, 0, $noreg, killed renamable $rsi :: (store 8 into %ir.y)
+ renamable $eax = COPY killed renamable $k0
+ renamable $al = AND8ri renamable $al, 1, implicit-def dead $eflags, implicit killed $eax, implicit-def $eax
+ $al = KILL renamable $al, implicit killed $eax
+ RET 0, $al
+
+...
diff --git a/llvm/test/CodeGen/X86/masked_gather_scatter.ll b/llvm/test/CodeGen/X86/masked_gather_scatter.ll
index df3af4c246596..b654b2a579fca 100644
--- a/llvm/test/CodeGen/X86/masked_gather_scatter.ll
+++ b/llvm/test/CodeGen/X86/masked_gather_scatter.ll
@@ -3319,3 +3319,51 @@ define void @scatter_16i64_constant_indices(i32* %ptr, <16 x i1> %mask, <16 x i3
call void @llvm.masked.scatter.v16i32.v16p0i32(<16 x i32> %src0, <16 x i32*> %gep, i32 4, <16 x i1> %mask)
ret void
}
+
+%struct.foo = type { i8*, i64, i16, i16, i32 }
+
+; This used to cause fast-isel to generate bad copy instructions that would
+; cause an error in copyPhysReg.
+define <8 x i64> @pr45906(<8 x %struct.foo*> %ptr) {
+; KNL_64-LABEL: pr45906:
+; KNL_64: # %bb.0: # %bb
+; KNL_64-NEXT: vpaddq {{.*}}(%rip){1to8}, %zmm0, %zmm1
+; KNL_64-NEXT: kxnorw %k0, %k0, %k1
+; KNL_64-NEXT: vpgatherqq (,%zmm1), %zmm0 {%k1}
+; KNL_64-NEXT: retq
+;
+; KNL_32-LABEL: pr45906:
+; KNL_32: # %bb.0: # %bb
+; KNL_32-NEXT: vpbroadcastd {{.*#+}} ymm1 = [4,4,4,4,4,4,4,4]
+; KNL_32-NEXT: vpaddd %ymm1, %ymm0, %ymm1
+; KNL_32-NEXT: kxnorw %k0, %k0, %k1
+; KNL_32-NEXT: vpgatherdq (,%ymm1), %zmm0 {%k1}
+; KNL_32-NEXT: retl
+;
+; SKX_SMALL-LABEL: pr45906:
+; SKX_SMALL: # %bb.0: # %bb
+; SKX_SMALL-NEXT: vpaddq {{.*}}(%rip){1to8}, %zmm0, %zmm1
+; SKX_SMALL-NEXT: kxnorw %k0, %k0, %k1
+; SKX_SMALL-NEXT: vpgatherqq (,%zmm1), %zmm0 {%k1}
+; SKX_SMALL-NEXT: retq
+;
+; SKX_LARGE-LABEL: pr45906:
+; SKX_LARGE: # %bb.0: # %bb
+; SKX_LARGE-NEXT: movabsq ${{\.LCPI.*}}, %rax
+; SKX_LARGE-NEXT: vpaddq (%rax){1to8}, %zmm0, %zmm1
+; SKX_LARGE-NEXT: kxnorw %k0, %k0, %k1
+; SKX_LARGE-NEXT: vpgatherqq (,%zmm1), %zmm0 {%k1}
+; SKX_LARGE-NEXT: retq
+;
+; SKX_32-LABEL: pr45906:
+; SKX_32: # %bb.0: # %bb
+; SKX_32-NEXT: vpaddd {{\.LCPI.*}}{1to8}, %ymm0, %ymm1
+; SKX_32-NEXT: kxnorw %k0, %k0, %k1
+; SKX_32-NEXT: vpgatherdq (,%ymm1), %zmm0 {%k1}
+; SKX_32-NEXT: retl
+bb:
+ %tmp = getelementptr inbounds %struct.foo, <8 x %struct.foo*> %ptr, i64 0, i32 1
+ %tmp1 = call <8 x i64> @llvm.masked.gather.v8i64.v8p0i64(<8 x i64*> %tmp, i32 8, <8 x i1> , <8 x i64> undef)
+ ret <8 x i64> %tmp1
+}
+declare <8 x i64> @llvm.masked.gather.v8i64.v8p0i64(<8 x i64*>, i32, <8 x i1>, <8 x i64>)
diff --git a/llvm/test/CodeGen/X86/mixed-ptr-sizes.ll b/llvm/test/CodeGen/X86/mixed-ptr-sizes.ll
index ac55e1a1fc653..a1ad7f3c0f534 100644
--- a/llvm/test/CodeGen/X86/mixed-ptr-sizes.ll
+++ b/llvm/test/CodeGen/X86/mixed-ptr-sizes.ll
@@ -69,8 +69,8 @@ define dso_local void @test_zero_ext(%struct.Foo* %f, i32 addrspace(271)* %i) {
; CHECK-O0-LABEL: test_zero_ext:
; CHECK-O0: # %bb.0: # %entry
; CHECK-O0-NEXT: movl %edx, %eax
-; CHECK-O0-NEXT: # kill: def $rax killed $eax
-; CHECK-O0-NEXT: movq %rax, 8(%rcx)
+; CHECK-O0-NEXT: movl %eax, %r8d
+; CHECK-O0-NEXT: movq %r8, 8(%rcx)
; CHECK-O0-NEXT: jmp use_foo # TAILCALL
entry:
%0 = addrspacecast i32 addrspace(271)* %i to i32*
@@ -125,23 +125,19 @@ entry:
; Test that null can be passed as a 32-bit pointer.
define dso_local void @test_null_arg(%struct.Foo* %f) {
-; CHECK-LABEL: test_null_arg:
-; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: subq $40, %rsp
-; CHECK: xorl %edx, %edx
-; CHECK-NEXT: callq test_noop1
-; CHECK-NEXT: nop
-; CHECK-NEXT: addq $40, %rsp
-; CHECK-NEXT: retq
-;
-; CHECK-O0-LABEL: test_null_arg:
-; CHECK-O0: # %bb.0: # %entry
-; CHECK-O0-NEXT: subq $40, %rsp
-; CHECK-O0: xorl %edx, %edx
-; CHECK-O0-NEXT: callq test_noop1
-; CHECK-O0-NEXT: nop
-; CHECK-O0-NEXT: addq $40, %rsp
-; CHECK-O0-NEXT: retq
+; ALL-LABEL: test_null_arg:
+; ALL: # %bb.0: # %entry
+; ALL-NEXT: subq $40, %rsp
+; ALL-NEXT: .seh_stackalloc 40
+; ALL-NEXT: .seh_endprologue
+; ALL-NEXT: xorl %edx, %edx
+; ALL-NEXT: callq test_noop1
+; ALL-NEXT: nop
+; ALL-NEXT: addq $40, %rsp
+; ALL-NEXT: retq
+; ALL-NEXT: .seh_handlerdata
+; ALL-NEXT: .text
+; ALL-NEXT: .seh_endproc
entry:
call void @test_noop1(%struct.Foo* %f, i32 addrspace(270)* null)
ret void
@@ -177,8 +173,8 @@ define void @test_unrecognized2(%struct.Foo* %f, i32 addrspace(271)* %i) {
; CHECK-O0-LABEL: test_unrecognized2:
; CHECK-O0: # %bb.0: # %entry
; CHECK-O0-NEXT: movl %edx, %eax
-; CHECK-O0-NEXT: # kill: def $rax killed $eax
-; CHECK-O0-NEXT: movq %rax, 16(%rcx)
+; CHECK-O0-NEXT: movl %eax, %r8d
+; CHECK-O0-NEXT: movq %r8, 16(%rcx)
; CHECK-O0-NEXT: jmp use_foo # TAILCALL
entry:
%0 = addrspacecast i32 addrspace(271)* %i to i32 addrspace(9)*
@@ -189,16 +185,11 @@ entry:
}
define i32 @test_load_sptr32(i32 addrspace(270)* %i) {
-; CHECK-LABEL: test_load_sptr32:
-; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: movslq %ecx, %rax
-; CHECK-NEXT: movl (%rax), %eax
-; CHECK-NEXT: retq
-; CHECK-O0-LABEL: test_load_sptr32:
-; CHECK-O0: # %bb.0: # %entry
-; CHECK-O0-NEXT: movslq %ecx, %rax
-; CHECK-O0-NEXT: movl (%rax), %eax
-; CHECK-O0-NEXT: retq
+; ALL-LABEL: test_load_sptr32:
+; ALL: # %bb.0: # %entry
+; ALL-NEXT: movslq %ecx, %rax
+; ALL-NEXT: movl (%rax), %eax
+; ALL-NEXT: retq
entry:
%0 = load i32, i32 addrspace(270)* %i, align 4
ret i32 %0
@@ -210,11 +201,12 @@ define i32 @test_load_uptr32(i32 addrspace(271)* %i) {
; CHECK-NEXT: movl %ecx, %eax
; CHECK-NEXT: movl (%rax), %eax
; CHECK-NEXT: retq
+;
; CHECK-O0-LABEL: test_load_uptr32:
; CHECK-O0: # %bb.0: # %entry
; CHECK-O0-NEXT: movl %ecx, %eax
-; CHECK-O0-NEXT: # kill: def $rax killed $eax
-; CHECK-O0-NEXT: movl (%rax), %eax
+; CHECK-O0-NEXT: movl %eax, %edx
+; CHECK-O0-NEXT: movl (%rdx), %eax
; CHECK-O0-NEXT: retq
entry:
%0 = load i32, i32 addrspace(271)* %i, align 4
@@ -222,30 +214,21 @@ entry:
}
define i32 @test_load_ptr64(i32 addrspace(272)* %i) {
-; CHECK-LABEL: test_load_ptr64:
-; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: movl (%rcx), %eax
-; CHECK-NEXT: retq
-; CHECK-O0-LABEL: test_load_ptr64:
-; CHECK-O0: # %bb.0: # %entry
-; CHECK-O0-NEXT: movl (%rcx), %eax
-; CHECK-O0-NEXT: retq
+; ALL-LABEL: test_load_ptr64:
+; ALL: # %bb.0: # %entry
+; ALL-NEXT: movl (%rcx), %eax
+; ALL-NEXT: retq
entry:
%0 = load i32, i32 addrspace(272)* %i, align 8
ret i32 %0
}
define void @test_store_sptr32(i32 addrspace(270)* %s, i32 %i) {
-; CHECK-LABEL: test_store_sptr32:
-; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: movslq %ecx, %rax
-; CHECK-NEXT: movl %edx, (%rax)
-; CHECK-NEXT: retq
-; CHECK-O0-LABEL: test_store_sptr32:
-; CHECK-O0: # %bb.0: # %entry
-; CHECK-O0-NEXT: movslq %ecx, %rax
-; CHECK-O0-NEXT: movl %edx, (%rax)
-; CHECK-O0-NEXT: retq
+; ALL-LABEL: test_store_sptr32:
+; ALL: # %bb.0: # %entry
+; ALL-NEXT: movslq %ecx, %rax
+; ALL-NEXT: movl %edx, (%rax)
+; ALL-NEXT: retq
entry:
store i32 %i, i32 addrspace(270)* %s, align 4
ret void
@@ -257,11 +240,12 @@ define void @test_store_uptr32(i32 addrspace(271)* %s, i32 %i) {
; CHECK-NEXT: movl %ecx, %eax
; CHECK-NEXT: movl %edx, (%rax)
; CHECK-NEXT: retq
+;
; CHECK-O0-LABEL: test_store_uptr32:
; CHECK-O0: # %bb.0: # %entry
; CHECK-O0-NEXT: movl %ecx, %eax
-; CHECK-O0-NEXT: # kill: def $rax killed $eax
-; CHECK-O0-NEXT: movl %edx, (%rax)
+; CHECK-O0-NEXT: movl %eax, %r8d
+; CHECK-O0-NEXT: movl %edx, (%r8)
; CHECK-O0-NEXT: retq
entry:
store i32 %i, i32 addrspace(271)* %s, align 4
@@ -269,14 +253,10 @@ entry:
}
define void @test_store_ptr64(i32 addrspace(272)* %s, i32 %i) {
-; CHECK-LABEL: test_store_ptr64:
-; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: movl %edx, (%rcx)
-; CHECK-NEXT: retq
-; CHECK-O0-LABEL: test_store_ptr64:
-; CHECK-O0: # %bb.0: # %entry
-; CHECK-O0-NEXT: movl %edx, (%rcx)
-; CHECK-O0-NEXT: retq
+; ALL-LABEL: test_store_ptr64:
+; ALL: # %bb.0: # %entry
+; ALL-NEXT: movl %edx, (%rcx)
+; ALL-NEXT: retq
entry:
store i32 %i, i32 addrspace(272)* %s, align 8
ret void
diff --git a/llvm/test/CodeGen/X86/pr1489.ll b/llvm/test/CodeGen/X86/pr1489.ll
index d1148eecb0da9..6226ea6caf90f 100644
--- a/llvm/test/CodeGen/X86/pr1489.ll
+++ b/llvm/test/CodeGen/X86/pr1489.ll
@@ -16,9 +16,9 @@ define i32 @quux() nounwind {
; CHECK-NEXT: movl $1082126238, (%eax) ## imm = 0x407FEF9E
; CHECK-NEXT: calll _lrintf
; CHECK-NEXT: cmpl $1, %eax
-; CHECK-NEXT: setl %al
-; CHECK-NEXT: andb $1, %al
-; CHECK-NEXT: movzbl %al, %eax
+; CHECK-NEXT: setl %cl
+; CHECK-NEXT: andb $1, %cl
+; CHECK-NEXT: movzbl %cl, %eax
; CHECK-NEXT: addl $8, %esp
; CHECK-NEXT: popl %ebp
; CHECK-NEXT: retl
@@ -42,9 +42,9 @@ define i32 @foo() nounwind {
; CHECK-NEXT: movl $-1236950581, (%eax) ## imm = 0xB645A1CB
; CHECK-NEXT: calll _lrint
; CHECK-NEXT: cmpl $1, %eax
-; CHECK-NEXT: setl %al
-; CHECK-NEXT: andb $1, %al
-; CHECK-NEXT: movzbl %al, %eax
+; CHECK-NEXT: setl %cl
+; CHECK-NEXT: andb $1, %cl
+; CHECK-NEXT: movzbl %cl, %eax
; CHECK-NEXT: addl $8, %esp
; CHECK-NEXT: popl %ebp
; CHECK-NEXT: retl
@@ -67,9 +67,9 @@ define i32 @bar() nounwind {
; CHECK-NEXT: movl $1082126238, (%eax) ## imm = 0x407FEF9E
; CHECK-NEXT: calll _lrintf
; CHECK-NEXT: cmpl $1, %eax
-; CHECK-NEXT: setl %al
-; CHECK-NEXT: andb $1, %al
-; CHECK-NEXT: movzbl %al, %eax
+; CHECK-NEXT: setl %cl
+; CHECK-NEXT: andb $1, %cl
+; CHECK-NEXT: movzbl %cl, %eax
; CHECK-NEXT: addl $8, %esp
; CHECK-NEXT: popl %ebp
; CHECK-NEXT: retl
@@ -90,9 +90,9 @@ define i32 @baz() nounwind {
; CHECK-NEXT: movl $1082126238, (%eax) ## imm = 0x407FEF9E
; CHECK-NEXT: calll _lrintf
; CHECK-NEXT: cmpl $1, %eax
-; CHECK-NEXT: setl %al
-; CHECK-NEXT: andb $1, %al
-; CHECK-NEXT: movzbl %al, %eax
+; CHECK-NEXT: setl %cl
+; CHECK-NEXT: andb $1, %cl
+; CHECK-NEXT: movzbl %cl, %eax
; CHECK-NEXT: addl $8, %esp
; CHECK-NEXT: popl %ebp
; CHECK-NEXT: retl
diff --git a/llvm/test/CodeGen/X86/pr27591.ll b/llvm/test/CodeGen/X86/pr27591.ll
index 7455584ac698a..97ad6814f1926 100644
--- a/llvm/test/CodeGen/X86/pr27591.ll
+++ b/llvm/test/CodeGen/X86/pr27591.ll
@@ -9,9 +9,9 @@ define void @test1(i32 %x) #0 {
; CHECK-NEXT: pushq %rax
; CHECK-NEXT: cmpl $0, %edi
; CHECK-NEXT: setne %al
-; CHECK-NEXT: movzbl %al, %eax
-; CHECK-NEXT: andl $1, %eax
-; CHECK-NEXT: movl %eax, %edi
+; CHECK-NEXT: movzbl %al, %ecx
+; CHECK-NEXT: andl $1, %ecx
+; CHECK-NEXT: movl %ecx, %edi
; CHECK-NEXT: callq callee1
; CHECK-NEXT: popq %rax
; CHECK-NEXT: retq
@@ -27,10 +27,10 @@ define void @test2(i32 %x) #0 {
; CHECK-NEXT: pushq %rax
; CHECK-NEXT: cmpl $0, %edi
; CHECK-NEXT: setne %al
-; CHECK-NEXT: movzbl %al, %eax
-; CHECK-NEXT: andl $1, %eax
-; CHECK-NEXT: negl %eax
-; CHECK-NEXT: movl %eax, %edi
+; CHECK-NEXT: movzbl %al, %ecx
+; CHECK-NEXT: andl $1, %ecx
+; CHECK-NEXT: negl %ecx
+; CHECK-NEXT: movl %ecx, %edi
; CHECK-NEXT: callq callee2
; CHECK-NEXT: popq %rax
; CHECK-NEXT: retq
diff --git a/llvm/test/CodeGen/X86/pr30430.ll b/llvm/test/CodeGen/X86/pr30430.ll
index e524245daa112..4d40aa09eeab1 100644
--- a/llvm/test/CodeGen/X86/pr30430.ll
+++ b/llvm/test/CodeGen/X86/pr30430.ll
@@ -75,28 +75,28 @@ define <16 x float> @makefloat(float %f1, float %f2, float %f3, float %f4, float
; CHECK-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],xmm2[0]
; CHECK-NEXT: # implicit-def: $ymm2
; CHECK-NEXT: vmovaps %xmm1, %xmm2
-; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm2, %ymm0
+; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm2, %ymm2
+; CHECK-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
+; CHECK-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
+; CHECK-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
+; CHECK-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]
+; CHECK-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
+; CHECK-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0]
; CHECK-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; CHECK-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
-; CHECK-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
-; CHECK-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
-; CHECK-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]
-; CHECK-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
-; CHECK-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],xmm2[0]
-; CHECK-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
; CHECK-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
-; CHECK-NEXT: vinsertps {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[2,3]
+; CHECK-NEXT: vinsertps {{.*#+}} xmm1 = xmm3[0],xmm1[0],xmm3[2,3]
; CHECK-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
-; CHECK-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1],xmm3[0],xmm2[3]
+; CHECK-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm3[0],xmm1[3]
; CHECK-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
-; CHECK-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1,2],xmm3[0]
+; CHECK-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],xmm3[0]
; CHECK-NEXT: # implicit-def: $ymm3
-; CHECK-NEXT: vmovaps %xmm2, %xmm3
-; CHECK-NEXT: vinsertf128 $1, %xmm1, %ymm3, %ymm1
-; CHECK-NEXT: # implicit-def: $zmm2
-; CHECK-NEXT: vmovaps %ymm1, %ymm2
-; CHECK-NEXT: vinsertf64x4 $1, %ymm0, %zmm2, %zmm0
-; CHECK-NEXT: vmovaps %zmm0, {{[0-9]+}}(%rsp)
+; CHECK-NEXT: vmovaps %xmm1, %xmm3
+; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm3, %ymm3
+; CHECK-NEXT: # implicit-def: $zmm24
+; CHECK-NEXT: vmovaps %zmm3, %zmm24
+; CHECK-NEXT: vinsertf64x4 $1, %ymm2, %zmm24, %zmm24
+; CHECK-NEXT: vmovaps %zmm24, {{[0-9]+}}(%rsp)
; CHECK-NEXT: vmovaps {{[0-9]+}}(%rsp), %zmm0
; CHECK-NEXT: movq %rbp, %rsp
; CHECK-NEXT: popq %rbp
diff --git a/llvm/test/CodeGen/X86/pr30813.ll b/llvm/test/CodeGen/X86/pr30813.ll
index 7266c5bd8d015..e3e096bda6c28 100644
--- a/llvm/test/CodeGen/X86/pr30813.ll
+++ b/llvm/test/CodeGen/X86/pr30813.ll
@@ -1,8 +1,9 @@
; RUN: llc -mtriple=x86_64-linux-gnu -O0 %s -o - | FileCheck %s
; CHECK: patatino:
; CHECK: .cfi_startproc
-; CHECK: movzwl (%rax), %e[[REG0:[abcd]x]]
-; CHECK: movq %r[[REG0]], ({{%r[abcd]x}})
+; CHECK: movzwl (%rax), [[REG0:%e[abcd]x]]
+; CHECK: movl [[REG0]], %e[[REG1C:[abcd]]]x
+; CHECK: movq %r[[REG1C]]x, ({{%r[abcd]x}})
; CHECK: retq
define void @patatino() {
diff --git a/llvm/test/CodeGen/X86/pr32241.ll b/llvm/test/CodeGen/X86/pr32241.ll
index 1f3d273dfc416..6d628e6962eda 100644
--- a/llvm/test/CodeGen/X86/pr32241.ll
+++ b/llvm/test/CodeGen/X86/pr32241.ll
@@ -23,14 +23,14 @@ define i32 @_Z3foov() {
; CHECK-NEXT: .LBB0_2: # %lor.end
; CHECK-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %al # 1-byte Reload
; CHECK-NEXT: andb $1, %al
-; CHECK-NEXT: movzbl %al, %eax
-; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
-; CHECK-NEXT: cmpl %eax, %ecx
+; CHECK-NEXT: movzbl %al, %ecx
+; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload
+; CHECK-NEXT: cmpl %ecx, %edx
; CHECK-NEXT: setl %al
; CHECK-NEXT: andb $1, %al
-; CHECK-NEXT: movzbl %al, %eax
-; CHECK-NEXT: xorl $-1, %eax
-; CHECK-NEXT: cmpl $0, %eax
+; CHECK-NEXT: movzbl %al, %ecx
+; CHECK-NEXT: xorl $-1, %ecx
+; CHECK-NEXT: cmpl $0, %ecx
; CHECK-NEXT: movb $1, %al
; CHECK-NEXT: movb %al, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
; CHECK-NEXT: jne .LBB0_4
@@ -42,9 +42,9 @@ define i32 @_Z3foov() {
; CHECK-NEXT: .LBB0_4: # %lor.end5
; CHECK-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %al # 1-byte Reload
; CHECK-NEXT: andb $1, %al
-; CHECK-NEXT: movzbl %al, %eax
-; CHECK-NEXT: # kill: def $ax killed $ax killed $eax
-; CHECK-NEXT: movw %ax, {{[0-9]+}}(%esp)
+; CHECK-NEXT: movzbl %al, %ecx
+; CHECK-NEXT: # kill: def $cx killed $cx killed $ecx
+; CHECK-NEXT: movw %cx, {{[0-9]+}}(%esp)
; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %eax
; CHECK-NEXT: addl $16, %esp
; CHECK-NEXT: .cfi_def_cfa_offset 4
diff --git a/llvm/test/CodeGen/X86/pr32284.ll b/llvm/test/CodeGen/X86/pr32284.ll
index 533473663d73b..a1041ab889c23 100644
--- a/llvm/test/CodeGen/X86/pr32284.ll
+++ b/llvm/test/CodeGen/X86/pr32284.ll
@@ -10,28 +10,28 @@ define void @foo() {
; X86-O0-LABEL: foo:
; X86-O0: # %bb.0: # %entry
; X86-O0-NEXT: xorl %eax, %eax
-; X86-O0-NEXT: # kill: def $rax killed $eax
-; X86-O0-NEXT: xorl %ecx, %ecx
+; X86-O0-NEXT: movl %eax, %ecx
+; X86-O0-NEXT: xorl %eax, %eax
; X86-O0-NEXT: movzbl c, %edx
-; X86-O0-NEXT: subl %edx, %ecx
-; X86-O0-NEXT: movslq %ecx, %rcx
-; X86-O0-NEXT: subq %rcx, %rax
-; X86-O0-NEXT: # kill: def $al killed $al killed $rax
-; X86-O0-NEXT: cmpb $0, %al
-; X86-O0-NEXT: setne %al
-; X86-O0-NEXT: andb $1, %al
-; X86-O0-NEXT: movb %al, -{{[0-9]+}}(%rsp)
+; X86-O0-NEXT: subl %edx, %eax
+; X86-O0-NEXT: movslq %eax, %rsi
+; X86-O0-NEXT: subq %rsi, %rcx
+; X86-O0-NEXT: # kill: def $cl killed $cl killed $rcx
+; X86-O0-NEXT: cmpb $0, %cl
+; X86-O0-NEXT: setne %cl
+; X86-O0-NEXT: andb $1, %cl
+; X86-O0-NEXT: movb %cl, -{{[0-9]+}}(%rsp)
; X86-O0-NEXT: cmpb $0, c
-; X86-O0-NEXT: setne %al
-; X86-O0-NEXT: xorb $-1, %al
-; X86-O0-NEXT: xorb $-1, %al
-; X86-O0-NEXT: andb $1, %al
-; X86-O0-NEXT: movzbl %al, %eax
-; X86-O0-NEXT: movzbl c, %ecx
-; X86-O0-NEXT: cmpl %ecx, %eax
-; X86-O0-NEXT: setle %al
-; X86-O0-NEXT: andb $1, %al
-; X86-O0-NEXT: movzbl %al, %eax
+; X86-O0-NEXT: setne %cl
+; X86-O0-NEXT: xorb $-1, %cl
+; X86-O0-NEXT: xorb $-1, %cl
+; X86-O0-NEXT: andb $1, %cl
+; X86-O0-NEXT: movzbl %cl, %eax
+; X86-O0-NEXT: movzbl c, %edx
+; X86-O0-NEXT: cmpl %edx, %eax
+; X86-O0-NEXT: setle %cl
+; X86-O0-NEXT: andb $1, %cl
+; X86-O0-NEXT: movzbl %cl, %eax
; X86-O0-NEXT: movl %eax, -{{[0-9]+}}(%rsp)
; X86-O0-NEXT: retq
;
@@ -63,13 +63,13 @@ define void @foo() {
; 686-O0-NEXT: xorb $-1, %al
; 686-O0-NEXT: xorb $-1, %al
; 686-O0-NEXT: andb $1, %al
-; 686-O0-NEXT: movzbl %al, %eax
-; 686-O0-NEXT: movzbl c, %ecx
-; 686-O0-NEXT: cmpl %ecx, %eax
+; 686-O0-NEXT: movzbl %al, %ecx
+; 686-O0-NEXT: movzbl c, %edx
+; 686-O0-NEXT: cmpl %edx, %ecx
; 686-O0-NEXT: setle %al
; 686-O0-NEXT: andb $1, %al
-; 686-O0-NEXT: movzbl %al, %eax
-; 686-O0-NEXT: movl %eax, (%esp)
+; 686-O0-NEXT: movzbl %al, %ecx
+; 686-O0-NEXT: movl %ecx, (%esp)
; 686-O0-NEXT: addl $8, %esp
; 686-O0-NEXT: .cfi_def_cfa_offset 4
; 686-O0-NEXT: retl
@@ -126,33 +126,33 @@ define void @f1() {
; X86-O0-NEXT: movabsq $8381627093, %rcx # imm = 0x1F3957AD5
; X86-O0-NEXT: addq %rcx, %rax
; X86-O0-NEXT: cmpq $0, %rax
-; X86-O0-NEXT: setne %al
-; X86-O0-NEXT: andb $1, %al
-; X86-O0-NEXT: movb %al, -{{[0-9]+}}(%rsp)
-; X86-O0-NEXT: movl var_5, %eax
-; X86-O0-NEXT: xorl $-1, %eax
-; X86-O0-NEXT: cmpl $0, %eax
-; X86-O0-NEXT: setne %al
-; X86-O0-NEXT: xorb $-1, %al
-; X86-O0-NEXT: andb $1, %al
-; X86-O0-NEXT: movzbl %al, %eax
-; X86-O0-NEXT: # kill: def $rax killed $eax
+; X86-O0-NEXT: setne %dl
+; X86-O0-NEXT: andb $1, %dl
+; X86-O0-NEXT: movb %dl, -{{[0-9]+}}(%rsp)
+; X86-O0-NEXT: movl var_5, %esi
+; X86-O0-NEXT: xorl $-1, %esi
+; X86-O0-NEXT: cmpl $0, %esi
+; X86-O0-NEXT: setne %dl
+; X86-O0-NEXT: xorb $-1, %dl
+; X86-O0-NEXT: andb $1, %dl
+; X86-O0-NEXT: movzbl %dl, %esi
+; X86-O0-NEXT: movl %esi, %eax
; X86-O0-NEXT: movslq var_5, %rcx
; X86-O0-NEXT: addq $7093, %rcx # imm = 0x1BB5
; X86-O0-NEXT: cmpq %rcx, %rax
-; X86-O0-NEXT: setg %al
-; X86-O0-NEXT: andb $1, %al
-; X86-O0-NEXT: movzbl %al, %eax
-; X86-O0-NEXT: # kill: def $rax killed $eax
+; X86-O0-NEXT: setg %dl
+; X86-O0-NEXT: andb $1, %dl
+; X86-O0-NEXT: movzbl %dl, %esi
+; X86-O0-NEXT: movl %esi, %eax
; X86-O0-NEXT: movq %rax, var_57
-; X86-O0-NEXT: movl var_5, %eax
-; X86-O0-NEXT: xorl $-1, %eax
-; X86-O0-NEXT: cmpl $0, %eax
-; X86-O0-NEXT: setne %al
-; X86-O0-NEXT: xorb $-1, %al
-; X86-O0-NEXT: andb $1, %al
-; X86-O0-NEXT: movzbl %al, %eax
-; X86-O0-NEXT: # kill: def $rax killed $eax
+; X86-O0-NEXT: movl var_5, %esi
+; X86-O0-NEXT: xorl $-1, %esi
+; X86-O0-NEXT: cmpl $0, %esi
+; X86-O0-NEXT: setne %dl
+; X86-O0-NEXT: xorb $-1, %dl
+; X86-O0-NEXT: andb $1, %dl
+; X86-O0-NEXT: movzbl %dl, %esi
+; X86-O0-NEXT: movl %esi, %eax
; X86-O0-NEXT: movq %rax, _ZN8struct_210member_2_0E
; X86-O0-NEXT: retq
;
@@ -178,17 +178,20 @@ define void @f1() {
;
; 686-O0-LABEL: f1:
; 686-O0: # %bb.0: # %entry
-; 686-O0-NEXT: pushl %ebx
+; 686-O0-NEXT: pushl %ebp
; 686-O0-NEXT: .cfi_def_cfa_offset 8
-; 686-O0-NEXT: pushl %edi
+; 686-O0-NEXT: pushl %ebx
; 686-O0-NEXT: .cfi_def_cfa_offset 12
-; 686-O0-NEXT: pushl %esi
+; 686-O0-NEXT: pushl %edi
; 686-O0-NEXT: .cfi_def_cfa_offset 16
+; 686-O0-NEXT: pushl %esi
+; 686-O0-NEXT: .cfi_def_cfa_offset 20
; 686-O0-NEXT: subl $1, %esp
-; 686-O0-NEXT: .cfi_def_cfa_offset 17
-; 686-O0-NEXT: .cfi_offset %esi, -16
-; 686-O0-NEXT: .cfi_offset %edi, -12
-; 686-O0-NEXT: .cfi_offset %ebx, -8
+; 686-O0-NEXT: .cfi_def_cfa_offset 21
+; 686-O0-NEXT: .cfi_offset %esi, -20
+; 686-O0-NEXT: .cfi_offset %edi, -16
+; 686-O0-NEXT: .cfi_offset %ebx, -12
+; 686-O0-NEXT: .cfi_offset %ebp, -8
; 686-O0-NEXT: movl var_5, %eax
; 686-O0-NEXT: movl %eax, %ecx
; 686-O0-NEXT: sarl $31, %ecx
@@ -214,16 +217,18 @@ define void @f1() {
; 686-O0-NEXT: movl var_5, %edi
; 686-O0-NEXT: subl $-1, %edi
; 686-O0-NEXT: sete %bl
-; 686-O0-NEXT: movzbl %bl, %ebx
-; 686-O0-NEXT: movl %ebx, _ZN8struct_210member_2_0E
+; 686-O0-NEXT: movzbl %bl, %ebp
+; 686-O0-NEXT: movl %ebp, _ZN8struct_210member_2_0E
; 686-O0-NEXT: movl $0, _ZN8struct_210member_2_0E+4
; 686-O0-NEXT: addl $1, %esp
-; 686-O0-NEXT: .cfi_def_cfa_offset 16
+; 686-O0-NEXT: .cfi_def_cfa_offset 20
; 686-O0-NEXT: popl %esi
-; 686-O0-NEXT: .cfi_def_cfa_offset 12
+; 686-O0-NEXT: .cfi_def_cfa_offset 16
; 686-O0-NEXT: popl %edi
-; 686-O0-NEXT: .cfi_def_cfa_offset 8
+; 686-O0-NEXT: .cfi_def_cfa_offset 12
; 686-O0-NEXT: popl %ebx
+; 686-O0-NEXT: .cfi_def_cfa_offset 8
+; 686-O0-NEXT: popl %ebp
; 686-O0-NEXT: .cfi_def_cfa_offset 4
; 686-O0-NEXT: retl
;
@@ -305,25 +310,25 @@ define void @f2() {
; X86-O0-NEXT: setne %cl
; X86-O0-NEXT: xorb $-1, %cl
; X86-O0-NEXT: andb $1, %cl
-; X86-O0-NEXT: movzbl %cl, %ecx
-; X86-O0-NEXT: xorl %ecx, %eax
+; X86-O0-NEXT: movzbl %cl, %edx
+; X86-O0-NEXT: xorl %edx, %eax
; X86-O0-NEXT: # kill: def $ax killed $ax killed $eax
; X86-O0-NEXT: movw %ax, -{{[0-9]+}}(%rsp)
-; X86-O0-NEXT: movzbl var_7, %eax
-; X86-O0-NEXT: # kill: def $ax killed $ax killed $eax
-; X86-O0-NEXT: cmpw $0, %ax
-; X86-O0-NEXT: setne %al
-; X86-O0-NEXT: xorb $-1, %al
-; X86-O0-NEXT: andb $1, %al
-; X86-O0-NEXT: movzbl %al, %eax
-; X86-O0-NEXT: movzbl var_7, %ecx
-; X86-O0-NEXT: cmpl %ecx, %eax
-; X86-O0-NEXT: sete %al
-; X86-O0-NEXT: andb $1, %al
-; X86-O0-NEXT: movzbl %al, %eax
-; X86-O0-NEXT: # kill: def $ax killed $ax killed $eax
-; X86-O0-NEXT: # implicit-def: $rcx
-; X86-O0-NEXT: movw %ax, (%rcx)
+; X86-O0-NEXT: movzbl var_7, %edx
+; X86-O0-NEXT: # kill: def $dx killed $dx killed $edx
+; X86-O0-NEXT: cmpw $0, %dx
+; X86-O0-NEXT: setne %cl
+; X86-O0-NEXT: xorb $-1, %cl
+; X86-O0-NEXT: andb $1, %cl
+; X86-O0-NEXT: movzbl %cl, %esi
+; X86-O0-NEXT: movzbl var_7, %edi
+; X86-O0-NEXT: cmpl %edi, %esi
+; X86-O0-NEXT: sete %cl
+; X86-O0-NEXT: andb $1, %cl
+; X86-O0-NEXT: movzbl %cl, %esi
+; X86-O0-NEXT: # kill: def $si killed $si killed $esi
+; X86-O0-NEXT: # implicit-def: $r8
+; X86-O0-NEXT: movw %si, (%r8)
; X86-O0-NEXT: retq
;
; X64-LABEL: f2:
@@ -345,33 +350,43 @@ define void @f2() {
;
; 686-O0-LABEL: f2:
; 686-O0: # %bb.0: # %entry
+; 686-O0-NEXT: pushl %edi
+; 686-O0-NEXT: .cfi_def_cfa_offset 8
+; 686-O0-NEXT: pushl %esi
+; 686-O0-NEXT: .cfi_def_cfa_offset 12
; 686-O0-NEXT: subl $2, %esp
-; 686-O0-NEXT: .cfi_def_cfa_offset 6
+; 686-O0-NEXT: .cfi_def_cfa_offset 14
+; 686-O0-NEXT: .cfi_offset %esi, -12
+; 686-O0-NEXT: .cfi_offset %edi, -8
; 686-O0-NEXT: movzbl var_7, %eax
; 686-O0-NEXT: cmpb $0, var_7
; 686-O0-NEXT: setne %cl
; 686-O0-NEXT: xorb $-1, %cl
; 686-O0-NEXT: andb $1, %cl
-; 686-O0-NEXT: movzbl %cl, %ecx
-; 686-O0-NEXT: xorl %ecx, %eax
+; 686-O0-NEXT: movzbl %cl, %edx
+; 686-O0-NEXT: xorl %edx, %eax
; 686-O0-NEXT: # kill: def $ax killed $ax killed $eax
; 686-O0-NEXT: movw %ax, (%esp)
-; 686-O0-NEXT: movzbl var_7, %eax
-; 686-O0-NEXT: # kill: def $ax killed $ax killed $eax
-; 686-O0-NEXT: cmpw $0, %ax
-; 686-O0-NEXT: setne %al
-; 686-O0-NEXT: xorb $-1, %al
-; 686-O0-NEXT: andb $1, %al
-; 686-O0-NEXT: movzbl %al, %eax
-; 686-O0-NEXT: movzbl var_7, %ecx
-; 686-O0-NEXT: cmpl %ecx, %eax
-; 686-O0-NEXT: sete %al
-; 686-O0-NEXT: andb $1, %al
-; 686-O0-NEXT: movzbl %al, %eax
-; 686-O0-NEXT: # kill: def $ax killed $ax killed $eax
-; 686-O0-NEXT: # implicit-def: $ecx
-; 686-O0-NEXT: movw %ax, (%ecx)
+; 686-O0-NEXT: movzbl var_7, %edx
+; 686-O0-NEXT: # kill: def $dx killed $dx killed $edx
+; 686-O0-NEXT: cmpw $0, %dx
+; 686-O0-NEXT: setne %cl
+; 686-O0-NEXT: xorb $-1, %cl
+; 686-O0-NEXT: andb $1, %cl
+; 686-O0-NEXT: movzbl %cl, %esi
+; 686-O0-NEXT: movzbl var_7, %edi
+; 686-O0-NEXT: cmpl %edi, %esi
+; 686-O0-NEXT: sete %cl
+; 686-O0-NEXT: andb $1, %cl
+; 686-O0-NEXT: movzbl %cl, %esi
+; 686-O0-NEXT: # kill: def $si killed $si killed $esi
+; 686-O0-NEXT: # implicit-def: $edi
+; 686-O0-NEXT: movw %si, (%edi)
; 686-O0-NEXT: addl $2, %esp
+; 686-O0-NEXT: .cfi_def_cfa_offset 12
+; 686-O0-NEXT: popl %esi
+; 686-O0-NEXT: .cfi_def_cfa_offset 8
+; 686-O0-NEXT: popl %edi
; 686-O0-NEXT: .cfi_def_cfa_offset 4
; 686-O0-NEXT: retl
;
@@ -431,35 +446,35 @@ define void @f3() #0 {
; X86-O0-NEXT: movl var_13, %eax
; X86-O0-NEXT: xorl $-1, %eax
; X86-O0-NEXT: movl %eax, %eax
-; X86-O0-NEXT: # kill: def $rax killed $eax
+; X86-O0-NEXT: movl %eax, %ecx
; X86-O0-NEXT: cmpl $0, var_13
-; X86-O0-NEXT: setne %cl
-; X86-O0-NEXT: xorb $-1, %cl
-; X86-O0-NEXT: andb $1, %cl
-; X86-O0-NEXT: movzbl %cl, %ecx
-; X86-O0-NEXT: # kill: def $rcx killed $ecx
-; X86-O0-NEXT: movl var_13, %edx
-; X86-O0-NEXT: xorl $-1, %edx
-; X86-O0-NEXT: xorl var_16, %edx
-; X86-O0-NEXT: movl %edx, %edx
-; X86-O0-NEXT: # kill: def $rdx killed $edx
-; X86-O0-NEXT: andq %rdx, %rcx
-; X86-O0-NEXT: orq %rcx, %rax
-; X86-O0-NEXT: movq %rax, -{{[0-9]+}}(%rsp)
+; X86-O0-NEXT: setne %dl
+; X86-O0-NEXT: xorb $-1, %dl
+; X86-O0-NEXT: andb $1, %dl
+; X86-O0-NEXT: movzbl %dl, %eax
+; X86-O0-NEXT: movl %eax, %esi
; X86-O0-NEXT: movl var_13, %eax
; X86-O0-NEXT: xorl $-1, %eax
+; X86-O0-NEXT: xorl var_16, %eax
; X86-O0-NEXT: movl %eax, %eax
-; X86-O0-NEXT: # kill: def $rax killed $eax
+; X86-O0-NEXT: movl %eax, %edi
+; X86-O0-NEXT: andq %rdi, %rsi
+; X86-O0-NEXT: orq %rsi, %rcx
+; X86-O0-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)
+; X86-O0-NEXT: movl var_13, %eax
+; X86-O0-NEXT: xorl $-1, %eax
+; X86-O0-NEXT: movl %eax, %eax
+; X86-O0-NEXT: movl %eax, %ecx
; X86-O0-NEXT: cmpl $0, var_13
-; X86-O0-NEXT: setne %cl
-; X86-O0-NEXT: xorb $-1, %cl
-; X86-O0-NEXT: andb $1, %cl
-; X86-O0-NEXT: movzbl %cl, %ecx
-; X86-O0-NEXT: # kill: def $rcx killed $ecx
-; X86-O0-NEXT: andq $0, %rcx
-; X86-O0-NEXT: orq %rcx, %rax
-; X86-O0-NEXT: # kill: def $eax killed $eax killed $rax
-; X86-O0-NEXT: movl %eax, var_46
+; X86-O0-NEXT: setne %dl
+; X86-O0-NEXT: xorb $-1, %dl
+; X86-O0-NEXT: andb $1, %dl
+; X86-O0-NEXT: movzbl %dl, %eax
+; X86-O0-NEXT: movl %eax, %esi
+; X86-O0-NEXT: andq $0, %rsi
+; X86-O0-NEXT: orq %rsi, %rcx
+; X86-O0-NEXT: # kill: def $ecx killed $ecx killed $rcx
+; X86-O0-NEXT: movl %ecx, var_46
; X86-O0-NEXT: retq
;
; X64-LABEL: f3:
@@ -484,28 +499,31 @@ define void @f3() #0 {
; 686-O0-NEXT: .cfi_offset %ebp, -8
; 686-O0-NEXT: movl %esp, %ebp
; 686-O0-NEXT: .cfi_def_cfa_register %ebp
+; 686-O0-NEXT: pushl %edi
; 686-O0-NEXT: pushl %esi
; 686-O0-NEXT: andl $-8, %esp
-; 686-O0-NEXT: subl $16, %esp
-; 686-O0-NEXT: .cfi_offset %esi, -12
+; 686-O0-NEXT: subl $8, %esp
+; 686-O0-NEXT: .cfi_offset %esi, -16
+; 686-O0-NEXT: .cfi_offset %edi, -12
; 686-O0-NEXT: movl var_13, %eax
; 686-O0-NEXT: movl %eax, %ecx
; 686-O0-NEXT: notl %ecx
; 686-O0-NEXT: testl %eax, %eax
-; 686-O0-NEXT: sete %al
-; 686-O0-NEXT: movzbl %al, %eax
-; 686-O0-NEXT: movl var_16, %edx
-; 686-O0-NEXT: movl %ecx, %esi
-; 686-O0-NEXT: xorl %edx, %esi
-; 686-O0-NEXT: andl %esi, %eax
+; 686-O0-NEXT: sete %dl
+; 686-O0-NEXT: movzbl %dl, %eax
+; 686-O0-NEXT: movl var_16, %esi
+; 686-O0-NEXT: movl %ecx, %edi
+; 686-O0-NEXT: xorl %esi, %edi
+; 686-O0-NEXT: andl %edi, %eax
; 686-O0-NEXT: orl %eax, %ecx
; 686-O0-NEXT: movl %ecx, (%esp)
; 686-O0-NEXT: movl $0, {{[0-9]+}}(%esp)
; 686-O0-NEXT: movl var_13, %eax
; 686-O0-NEXT: notl %eax
; 686-O0-NEXT: movl %eax, var_46
-; 686-O0-NEXT: leal -4(%ebp), %esp
+; 686-O0-NEXT: leal -8(%ebp), %esp
; 686-O0-NEXT: popl %esi
+; 686-O0-NEXT: popl %edi
; 686-O0-NEXT: popl %ebp
; 686-O0-NEXT: .cfi_def_cfa %esp, 4
; 686-O0-NEXT: retl
diff --git a/llvm/test/CodeGen/X86/pr32340.ll b/llvm/test/CodeGen/X86/pr32340.ll
index 98685b959f642..1e428ac7d83a6 100644
--- a/llvm/test/CodeGen/X86/pr32340.ll
+++ b/llvm/test/CodeGen/X86/pr32340.ll
@@ -14,37 +14,37 @@ define void @foo() {
; X64-LABEL: foo:
; X64: # %bb.0: # %entry
; X64-NEXT: xorl %eax, %eax
-; X64-NEXT: # kill: def $rax killed $eax
+; X64-NEXT: movl %eax, %ecx
; X64-NEXT: movw $0, var_825
-; X64-NEXT: movzwl var_32, %ecx
+; X64-NEXT: movzwl var_32, %eax
; X64-NEXT: movzwl var_901, %edx
-; X64-NEXT: movl %ecx, %esi
+; X64-NEXT: movl %eax, %esi
; X64-NEXT: xorl %edx, %esi
-; X64-NEXT: movl %ecx, %edx
+; X64-NEXT: movl %eax, %edx
; X64-NEXT: xorl %esi, %edx
-; X64-NEXT: addl %ecx, %edx
-; X64-NEXT: movslq %edx, %rcx
-; X64-NEXT: movq %rcx, var_826
-; X64-NEXT: movzwl var_32, %ecx
-; X64-NEXT: # kill: def $rcx killed $ecx
-; X64-NEXT: movzwl var_901, %edx
-; X64-NEXT: xorl $51981, %edx # imm = 0xCB0D
-; X64-NEXT: movslq %edx, %rdx
-; X64-NEXT: movabsq $-1142377792914660288, %rsi # imm = 0xF02575732E06E440
-; X64-NEXT: xorq %rsi, %rdx
-; X64-NEXT: movq %rcx, %rsi
-; X64-NEXT: xorq %rdx, %rsi
-; X64-NEXT: xorq $-1, %rsi
-; X64-NEXT: xorq %rsi, %rcx
-; X64-NEXT: movq %rcx, %rdx
-; X64-NEXT: orq var_57, %rdx
-; X64-NEXT: orq %rdx, %rcx
-; X64-NEXT: # kill: def $cx killed $cx killed $rcx
-; X64-NEXT: movw %cx, var_900
-; X64-NEXT: cmpq var_28, %rax
-; X64-NEXT: setne %al
-; X64-NEXT: andb $1, %al
-; X64-NEXT: movzbl %al, %eax
+; X64-NEXT: addl %eax, %edx
+; X64-NEXT: movslq %edx, %rdi
+; X64-NEXT: movq %rdi, var_826
+; X64-NEXT: movzwl var_32, %eax
+; X64-NEXT: movl %eax, %edi
+; X64-NEXT: movzwl var_901, %eax
+; X64-NEXT: xorl $51981, %eax # imm = 0xCB0D
+; X64-NEXT: movslq %eax, %r8
+; X64-NEXT: movabsq $-1142377792914660288, %r9 # imm = 0xF02575732E06E440
+; X64-NEXT: xorq %r9, %r8
+; X64-NEXT: movq %rdi, %r9
+; X64-NEXT: xorq %r8, %r9
+; X64-NEXT: xorq $-1, %r9
+; X64-NEXT: xorq %r9, %rdi
+; X64-NEXT: movq %rdi, %r8
+; X64-NEXT: orq var_57, %r8
+; X64-NEXT: orq %r8, %rdi
+; X64-NEXT: # kill: def $di killed $di killed $rdi
+; X64-NEXT: movw %di, var_900
+; X64-NEXT: cmpq var_28, %rcx
+; X64-NEXT: setne %r10b
+; X64-NEXT: andb $1, %r10b
+; X64-NEXT: movzbl %r10b, %eax
; X64-NEXT: # kill: def $ax killed $ax killed $eax
; X64-NEXT: movw %ax, var_827
; X64-NEXT: retq
diff --git a/llvm/test/CodeGen/X86/pr32345.ll b/llvm/test/CodeGen/X86/pr32345.ll
index 165e0292d4648..d5f7fde77f6d2 100644
--- a/llvm/test/CodeGen/X86/pr32345.ll
+++ b/llvm/test/CodeGen/X86/pr32345.ll
@@ -15,23 +15,23 @@ define void @foo() {
; X640-NEXT: xorl %ecx, %eax
; X640-NEXT: movzwl var_27, %ecx
; X640-NEXT: xorl %ecx, %eax
-; X640-NEXT: cltq
-; X640-NEXT: movq %rax, -{{[0-9]+}}(%rsp)
+; X640-NEXT: movslq %eax, %rdx
+; X640-NEXT: movq %rdx, -{{[0-9]+}}(%rsp)
; X640-NEXT: movzwl var_22, %eax
; X640-NEXT: movzwl var_27, %ecx
; X640-NEXT: xorl %ecx, %eax
; X640-NEXT: movzwl var_27, %ecx
; X640-NEXT: xorl %ecx, %eax
-; X640-NEXT: cltq
-; X640-NEXT: movzwl var_27, %ecx
-; X640-NEXT: subl $16610, %ecx # imm = 0x40E2
-; X640-NEXT: movl %ecx, %ecx
-; X640-NEXT: # kill: def $rcx killed $ecx
+; X640-NEXT: movslq %eax, %rdx
+; X640-NEXT: movzwl var_27, %eax
+; X640-NEXT: subl $16610, %eax # imm = 0x40E2
+; X640-NEXT: movl %eax, %eax
+; X640-NEXT: movl %eax, %ecx
; X640-NEXT: # kill: def $cl killed $rcx
-; X640-NEXT: sarq %cl, %rax
-; X640-NEXT: # kill: def $al killed $al killed $rax
-; X640-NEXT: # implicit-def: $rcx
-; X640-NEXT: movb %al, (%rcx)
+; X640-NEXT: sarq %cl, %rdx
+; X640-NEXT: # kill: def $dl killed $dl killed $rdx
+; X640-NEXT: # implicit-def: $rsi
+; X640-NEXT: movb %dl, (%rsi)
; X640-NEXT: retq
;
; 6860-LABEL: foo:
@@ -41,37 +41,43 @@ define void @foo() {
; 6860-NEXT: .cfi_offset %ebp, -8
; 6860-NEXT: movl %esp, %ebp
; 6860-NEXT: .cfi_def_cfa_register %ebp
+; 6860-NEXT: pushl %ebx
+; 6860-NEXT: pushl %edi
+; 6860-NEXT: pushl %esi
; 6860-NEXT: andl $-8, %esp
-; 6860-NEXT: subl $24, %esp
+; 6860-NEXT: subl $32, %esp
+; 6860-NEXT: .cfi_offset %esi, -20
+; 6860-NEXT: .cfi_offset %edi, -16
+; 6860-NEXT: .cfi_offset %ebx, -12
; 6860-NEXT: movw var_22, %ax
; 6860-NEXT: movzwl var_27, %ecx
; 6860-NEXT: movw %cx, %dx
; 6860-NEXT: xorw %dx, %ax
-; 6860-NEXT: # implicit-def: $edx
-; 6860-NEXT: movw %ax, %dx
-; 6860-NEXT: xorl %ecx, %edx
-; 6860-NEXT: # kill: def $dx killed $dx killed $edx
-; 6860-NEXT: movzwl %dx, %eax
-; 6860-NEXT: movl %eax, {{[0-9]+}}(%esp)
+; 6860-NEXT: # implicit-def: $esi
+; 6860-NEXT: movw %ax, %si
+; 6860-NEXT: xorl %ecx, %esi
+; 6860-NEXT: # kill: def $si killed $si killed $esi
+; 6860-NEXT: movzwl %si, %ecx
+; 6860-NEXT: movl %ecx, {{[0-9]+}}(%esp)
; 6860-NEXT: movl $0, {{[0-9]+}}(%esp)
; 6860-NEXT: movw var_22, %ax
; 6860-NEXT: movzwl var_27, %ecx
; 6860-NEXT: movw %cx, %dx
; 6860-NEXT: xorw %dx, %ax
-; 6860-NEXT: # implicit-def: $edx
-; 6860-NEXT: movw %ax, %dx
-; 6860-NEXT: xorl %ecx, %edx
-; 6860-NEXT: # kill: def $dx killed $dx killed $edx
-; 6860-NEXT: movzwl %dx, %eax
+; 6860-NEXT: # implicit-def: $edi
+; 6860-NEXT: movw %ax, %di
+; 6860-NEXT: xorl %ecx, %edi
+; 6860-NEXT: # kill: def $di killed $di killed $edi
+; 6860-NEXT: movzwl %di, %ebx
; 6860-NEXT: # kill: def $cl killed $cl killed $ecx
; 6860-NEXT: addb $30, %cl
-; 6860-NEXT: xorl %edx, %edx
+; 6860-NEXT: xorl %eax, %eax
; 6860-NEXT: movb %cl, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
-; 6860-NEXT: shrdl %cl, %edx, %eax
+; 6860-NEXT: shrdl %cl, %eax, %ebx
; 6860-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %cl # 1-byte Reload
; 6860-NEXT: testb $32, %cl
+; 6860-NEXT: movl %ebx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; 6860-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
-; 6860-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; 6860-NEXT: jne .LBB0_2
; 6860-NEXT: # %bb.1: # %bb
; 6860-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
@@ -81,7 +87,10 @@ define void @foo() {
; 6860-NEXT: # kill: def $al killed $al killed $eax
; 6860-NEXT: # implicit-def: $ecx
; 6860-NEXT: movb %al, (%ecx)
-; 6860-NEXT: movl %ebp, %esp
+; 6860-NEXT: leal -12(%ebp), %esp
+; 6860-NEXT: popl %esi
+; 6860-NEXT: popl %edi
+; 6860-NEXT: popl %ebx
; 6860-NEXT: popl %ebp
; 6860-NEXT: .cfi_def_cfa %esp, 4
; 6860-NEXT: retl
diff --git a/llvm/test/CodeGen/X86/pr32451.ll b/llvm/test/CodeGen/X86/pr32451.ll
index 3b1997234ce55..4754d8e4cf6cb 100644
--- a/llvm/test/CodeGen/X86/pr32451.ll
+++ b/llvm/test/CodeGen/X86/pr32451.ll
@@ -9,24 +9,29 @@ target triple = "x86_64-unknown-linux-gnu"
define i8** @japi1_convert_690(i8**, i8***, i32) {
; CHECK-LABEL: japi1_convert_690:
; CHECK: # %bb.0: # %top
+; CHECK-NEXT: pushl %ebx
+; CHECK-NEXT: .cfi_def_cfa_offset 8
; CHECK-NEXT: subl $16, %esp
-; CHECK-NEXT: .cfi_def_cfa_offset 20
+; CHECK-NEXT: .cfi_def_cfa_offset 24
+; CHECK-NEXT: .cfi_offset %ebx, -8
; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
-; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT: movl %eax, {{[0-9]+}}(%esp) # 4-byte Spill
; CHECK-NEXT: calll julia.gc_root_decl
-; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT: movl %eax, {{[0-9]+}}(%esp) # 4-byte Spill
; CHECK-NEXT: calll jl_get_ptls_states
-; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx # 4-byte Reload
; CHECK-NEXT: movl 4(%ecx), %edx
-; CHECK-NEXT: movb (%edx), %dl
-; CHECK-NEXT: andb $1, %dl
-; CHECK-NEXT: movzbl %dl, %edx
+; CHECK-NEXT: movb (%edx), %bl
+; CHECK-NEXT: andb $1, %bl
+; CHECK-NEXT: movzbl %bl, %edx
; CHECK-NEXT: movl %edx, (%esp)
-; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT: movl %eax, {{[0-9]+}}(%esp) # 4-byte Spill
; CHECK-NEXT: calll jl_box_int32
-; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx # 4-byte Reload
; CHECK-NEXT: movl %eax, (%ecx)
; CHECK-NEXT: addl $16, %esp
+; CHECK-NEXT: .cfi_def_cfa_offset 8
+; CHECK-NEXT: popl %ebx
; CHECK-NEXT: .cfi_def_cfa_offset 4
; CHECK-NEXT: retl
top:
diff --git a/llvm/test/CodeGen/X86/pr34592.ll b/llvm/test/CodeGen/X86/pr34592.ll
index 25b068c8fad6f..0f73036a4c6c9 100644
--- a/llvm/test/CodeGen/X86/pr34592.ll
+++ b/llvm/test/CodeGen/X86/pr34592.ll
@@ -10,7 +10,7 @@ define <16 x i64> @pluto(<16 x i64> %arg, <16 x i64> %arg1, <16 x i64> %arg2, <1
; CHECK-NEXT: movq %rsp, %rbp
; CHECK-NEXT: .cfi_def_cfa_register %rbp
; CHECK-NEXT: andq $-32, %rsp
-; CHECK-NEXT: subq $160, %rsp
+; CHECK-NEXT: subq $192, %rsp
; CHECK-NEXT: vmovaps 240(%rbp), %ymm8
; CHECK-NEXT: vmovaps 208(%rbp), %ymm9
; CHECK-NEXT: vmovaps 176(%rbp), %ymm10
@@ -27,14 +27,14 @@ define <16 x i64> @pluto(<16 x i64> %arg, <16 x i64> %arg1, <16 x i64> %arg2, <1
; CHECK-NEXT: vpalignr {{.*#+}} ymm2 = ymm2[8,9,10,11,12,13,14,15],ymm11[0,1,2,3,4,5,6,7],ymm2[24,25,26,27,28,29,30,31],ymm11[16,17,18,19,20,21,22,23]
; CHECK-NEXT: vpermq {{.*#+}} ymm2 = ymm2[2,3,2,0]
; CHECK-NEXT: vpblendd {{.*#+}} ymm0 = ymm2[0,1,2,3],ymm0[4,5],ymm2[6,7]
-; CHECK-NEXT: vmovaps %xmm7, %xmm2
-; CHECK-NEXT: vpslldq {{.*#+}} xmm2 = zero,zero,zero,zero,zero,zero,zero,zero,xmm2[0,1,2,3,4,5,6,7]
-; CHECK-NEXT: # implicit-def: $ymm9
-; CHECK-NEXT: vmovaps %xmm2, %xmm9
-; CHECK-NEXT: vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %ymm2 # 32-byte Reload
-; CHECK-NEXT: vpalignr {{.*#+}} ymm11 = ymm2[8,9,10,11,12,13,14,15],ymm5[0,1,2,3,4,5,6,7],ymm2[24,25,26,27,28,29,30,31],ymm5[16,17,18,19,20,21,22,23]
-; CHECK-NEXT: vpermq {{.*#+}} ymm11 = ymm11[0,1,0,3]
-; CHECK-NEXT: vpblendd {{.*#+}} ymm9 = ymm9[0,1,2,3],ymm11[4,5,6,7]
+; CHECK-NEXT: vmovaps %xmm7, %xmm9
+; CHECK-NEXT: vpslldq {{.*#+}} xmm9 = zero,zero,zero,zero,zero,zero,zero,zero,xmm9[0,1,2,3,4,5,6,7]
+; CHECK-NEXT: # implicit-def: $ymm2
+; CHECK-NEXT: vmovaps %xmm9, %xmm2
+; CHECK-NEXT: vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %ymm11 # 32-byte Reload
+; CHECK-NEXT: vpalignr {{.*#+}} ymm9 = ymm11[8,9,10,11,12,13,14,15],ymm5[0,1,2,3,4,5,6,7],ymm11[24,25,26,27,28,29,30,31],ymm5[16,17,18,19,20,21,22,23]
+; CHECK-NEXT: vpermq {{.*#+}} ymm9 = ymm9[0,1,0,3]
+; CHECK-NEXT: vpblendd {{.*#+}} ymm2 = ymm2[0,1,2,3],ymm9[4,5,6,7]
; CHECK-NEXT: vpblendd {{.*#+}} ymm8 = ymm7[0,1],ymm8[2,3],ymm7[4,5,6,7]
; CHECK-NEXT: vpermq {{.*#+}} ymm8 = ymm8[2,1,1,3]
; CHECK-NEXT: vpshufd {{.*#+}} ymm5 = ymm5[0,1,0,1,4,5,4,5]
@@ -43,11 +43,14 @@ define <16 x i64> @pluto(<16 x i64> %arg, <16 x i64> %arg1, <16 x i64> %arg2, <1
; CHECK-NEXT: vmovq {{.*#+}} xmm7 = xmm7[0],zero
; CHECK-NEXT: # implicit-def: $ymm8
; CHECK-NEXT: vmovaps %xmm7, %xmm8
-; CHECK-NEXT: vperm2i128 {{.*#+}} ymm2 = ymm8[0,1],ymm6[0,1]
+; CHECK-NEXT: vperm2i128 {{.*#+}} ymm6 = ymm8[0,1],ymm6[0,1]
; CHECK-NEXT: vmovaps %ymm1, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
; CHECK-NEXT: vmovaps %ymm5, %ymm1
+; CHECK-NEXT: vmovaps %ymm2, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
+; CHECK-NEXT: vmovaps %ymm6, %ymm2
+; CHECK-NEXT: vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %ymm5 # 32-byte Reload
; CHECK-NEXT: vmovaps %ymm3, (%rsp) # 32-byte Spill
-; CHECK-NEXT: vmovaps %ymm9, %ymm3
+; CHECK-NEXT: vmovaps %ymm5, %ymm3
; CHECK-NEXT: movq %rbp, %rsp
; CHECK-NEXT: popq %rbp
; CHECK-NEXT: .cfi_def_cfa %rsp, 8
diff --git a/llvm/test/CodeGen/X86/pr39733.ll b/llvm/test/CodeGen/X86/pr39733.ll
index 75f9dc51b85eb..4c7153852d22c 100644
--- a/llvm/test/CodeGen/X86/pr39733.ll
+++ b/llvm/test/CodeGen/X86/pr39733.ll
@@ -23,8 +23,8 @@ define void @test55() {
; CHECK-NEXT: vmovaps %xmm1, %xmm2
; CHECK-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
; CHECK-NEXT: vpmovsxwd %xmm0, %xmm0
-; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm2, %ymm0
-; CHECK-NEXT: vmovdqa %ymm0, (%rsp)
+; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm2, %ymm2
+; CHECK-NEXT: vmovdqa %ymm2, (%rsp)
; CHECK-NEXT: movq %rbp, %rsp
; CHECK-NEXT: popq %rbp
; CHECK-NEXT: .cfi_def_cfa %rsp, 8
diff --git a/llvm/test/CodeGen/X86/pr44749.ll b/llvm/test/CodeGen/X86/pr44749.ll
index 1012d8c723b13..d465009c7c38a 100644
--- a/llvm/test/CodeGen/X86/pr44749.ll
+++ b/llvm/test/CodeGen/X86/pr44749.ll
@@ -14,22 +14,20 @@ define i32 @a() {
; CHECK-NEXT: movsd %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) ## 8-byte Spill
; CHECK-NEXT: callq _b
; CHECK-NEXT: cvtsi2sd %eax, %xmm0
-; CHECK-NEXT: movq _calloc@{{.*}}(%rip), %rax
-; CHECK-NEXT: subq $-1, %rax
-; CHECK-NEXT: setne %cl
-; CHECK-NEXT: movzbl %cl, %ecx
-; CHECK-NEXT: ## kill: def $rcx killed $ecx
-; CHECK-NEXT: leaq {{.*}}(%rip), %rdx
+; CHECK-NEXT: movq _calloc@{{.*}}(%rip), %rcx
+; CHECK-NEXT: subq $-1, %rcx
+; CHECK-NEXT: setne %dl
+; CHECK-NEXT: movzbl %dl, %eax
+; CHECK-NEXT: movl %eax, %esi
+; CHECK-NEXT: leaq {{.*}}(%rip), %rdi
; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
; CHECK-NEXT: ucomisd %xmm1, %xmm0
-; CHECK-NEXT: setae %cl
-; CHECK-NEXT: movzbl %cl, %ecx
-; CHECK-NEXT: ## kill: def $rcx killed $ecx
-; CHECK-NEXT: leaq {{.*}}(%rip), %rdx
+; CHECK-NEXT: setae %dl
+; CHECK-NEXT: movzbl %dl, %eax
+; CHECK-NEXT: movl %eax, %esi
+; CHECK-NEXT: leaq {{.*}}(%rip), %rdi
; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
-; CHECK-NEXT: cvttsd2si %xmm0, %ecx
-; CHECK-NEXT: movq %rax, (%rsp) ## 8-byte Spill
-; CHECK-NEXT: movl %ecx, %eax
+; CHECK-NEXT: cvttsd2si %xmm0, %eax
; CHECK-NEXT: addq $24, %rsp
; CHECK-NEXT: retq
entry:
diff --git a/llvm/test/CodeGen/X86/pr46877.ll b/llvm/test/CodeGen/X86/pr46877.ll
new file mode 100644
index 0000000000000..581b2d586fa0c
--- /dev/null
+++ b/llvm/test/CodeGen/X86/pr46877.ll
@@ -0,0 +1,416 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -O3 < %s -mcpu=haswell -mtriple=x86_64 | FileCheck %s
+
+; Verify that we are not exponentially increasing compiling time.
+define void @tester(float %0, float %1, float %2, float %3, float %4, float %5, float %6, float %7, float %8, float %9, float %10, float %11, float %12, float %13, float %14, float %15, float %16, float %17, float %18, float %19, float %20, float %21, float %22, float %23, float %24, float %25, float %26, float %27, float %28, float %29, float %30, float %31, float %32, float %33, float %34, float %35, float %36, float %37, float %38, float %39, float %40, float %41, float %42, float %43, float %44, float %45, float %46, float %47, float %48, float %49, float %50, float %51, float %52, float %53, float %54, float %55, float %56, float %57, float %58, float %59, float %60, float %61, float %62, float %63, float %64, float %65, float %66, float %67, float %68, float %69, float %70, float %71, float %72, float %73, float %74, float %75, float %76, float %77, float %78, float %79, float* %80) {
+; CHECK-LABEL: tester:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vmovaps %xmm3, %xmm15
+; CHECK-NEXT: vmovss {{.*#+}} xmm14 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmovss {{.*#+}} xmm10 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmovss {{.*#+}} xmm13 = mem[0],zero,zero,zero
+; CHECK-NEXT: vsubss %xmm1, %xmm0, %xmm12
+; CHECK-NEXT: vmulss %xmm2, %xmm1, %xmm3
+; CHECK-NEXT: vfmsub213ss {{.*#+}} xmm3 = (xmm15 * xmm3) - xmm0
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm5 = -(xmm12 * xmm5) + xmm0
+; CHECK-NEXT: vmulss %xmm5, %xmm4, %xmm2
+; CHECK-NEXT: vmulss %xmm2, %xmm3, %xmm3
+; CHECK-NEXT: vmulss %xmm6, %xmm12, %xmm2
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm2 = -(xmm7 * xmm2) + xmm0
+; CHECK-NEXT: vmulss %xmm3, %xmm2, %xmm5
+; CHECK-NEXT: vmulss %xmm0, %xmm13, %xmm2
+; CHECK-NEXT: vmovss %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT: vmulss %xmm2, %xmm10, %xmm2
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm2 = -(xmm2 * mem) + xmm0
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm7, %xmm3
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm3 = -(xmm3 * mem) + xmm0
+; CHECK-NEXT: vmulss %xmm3, %xmm2, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm0, %xmm3
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm3, %xmm4
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm4 = -(xmm14 * xmm4) + xmm0
+; CHECK-NEXT: vmulss %xmm4, %xmm5, %xmm4
+; CHECK-NEXT: vmovss {{.*#+}} xmm5 = mem[0],zero,zero,zero
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm5 = -(xmm5 * mem) + xmm0
+; CHECK-NEXT: vmulss %xmm5, %xmm2, %xmm2
+; CHECK-NEXT: vmovss {{.*#+}} xmm7 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm7, %xmm5
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm5 = -(xmm10 * xmm5) + xmm0
+; CHECK-NEXT: vmulss %xmm5, %xmm4, %xmm4
+; CHECK-NEXT: vmovss {{.*#+}} xmm9 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmulss %xmm0, %xmm9, %xmm6
+; CHECK-NEXT: vmovss %xmm6, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT: vmulss %xmm6, %xmm14, %xmm5
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm5 = -(xmm12 * xmm5) + xmm0
+; CHECK-NEXT: vmulss %xmm5, %xmm2, %xmm2
+; CHECK-NEXT: vmovss {{.*#+}} xmm5 = mem[0],zero,zero,zero
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm5 = -(xmm13 * xmm5) + xmm0
+; CHECK-NEXT: vmulss %xmm5, %xmm4, %xmm4
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm3, %xmm11
+; CHECK-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm3 = -(xmm11 * xmm3) + xmm0
+; CHECK-NEXT: vmulss %xmm3, %xmm2, %xmm2
+; CHECK-NEXT: vmulss %xmm2, %xmm4, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm2, %xmm2
+; CHECK-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm3 = -(xmm15 * xmm3) + xmm0
+; CHECK-NEXT: vmulss %xmm2, %xmm3, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm2, %xmm2
+; CHECK-NEXT: vmovss %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm1, %xmm4
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm4 = -(xmm4 * mem) + xmm0
+; CHECK-NEXT: vmovss {{.*#+}} xmm8 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm8, %xmm6
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm6 = -(xmm6 * mem) + xmm0
+; CHECK-NEXT: vmulss %xmm6, %xmm4, %xmm4
+; CHECK-NEXT: vmulss %xmm4, %xmm2, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm2, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm2, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm2, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm10, %xmm4
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm4 = -(xmm4 * mem) + xmm0
+; CHECK-NEXT: vmulss %xmm2, %xmm4, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm2, %xmm2
+; CHECK-NEXT: vmovss {{.*#+}} xmm4 = mem[0],zero,zero,zero
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm4 = -(xmm1 * xmm4) + xmm0
+; CHECK-NEXT: vmovss {{.*#+}} xmm6 = mem[0],zero,zero,zero
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm6 = -(xmm6 * mem) + xmm0
+; CHECK-NEXT: vmulss %xmm6, %xmm4, %xmm4
+; CHECK-NEXT: vmulss %xmm4, %xmm2, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm2, %xmm2
+; CHECK-NEXT: vmovss {{.*#+}} xmm4 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm9, %xmm1
+; CHECK-NEXT: vmovss %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm4 = -(xmm1 * xmm4) + xmm0
+; CHECK-NEXT: vmulss %xmm2, %xmm4, %xmm10
+; CHECK-NEXT: vmulss %xmm0, %xmm12, %xmm6
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm6, %xmm4
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm4 = -(xmm4 * mem) + xmm0
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm13, %xmm5
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm5 = -(xmm7 * xmm5) + xmm0
+; CHECK-NEXT: vmulss %xmm5, %xmm4, %xmm4
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm10, %xmm5
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm5, %xmm5
+; CHECK-NEXT: vmulss %xmm4, %xmm5, %xmm12
+; CHECK-NEXT: vmovss {{.*#+}} xmm5 = mem[0],zero,zero,zero
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm5 = -(xmm7 * xmm5) + xmm0
+; CHECK-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmulss %xmm6, %xmm3, %xmm2
+; CHECK-NEXT: vmovss {{.*#+}} xmm10 = mem[0],zero,zero,zero
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm2 = -(xmm10 * xmm2) + xmm0
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm0, %xmm9
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm9, %xmm1
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm1 = -(xmm1 * mem) + xmm0
+; CHECK-NEXT: vmulss %xmm2, %xmm5, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm3, %xmm5
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm5 = -(xmm5 * mem) + xmm0
+; CHECK-NEXT: vmulss %xmm1, %xmm2, %xmm1
+; CHECK-NEXT: vmulss %xmm5, %xmm1, %xmm1
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm3, %xmm2
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm2 = -(xmm13 * xmm2) + xmm0
+; CHECK-NEXT: vmulss %xmm2, %xmm1, %xmm1
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm12, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm2, %xmm2
+; CHECK-NEXT: vmulss %xmm1, %xmm2, %xmm4
+; CHECK-NEXT: vmovss {{.*#+}} xmm13 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmovss {{.*#+}} xmm5 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm5, %xmm3
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm3 = -(xmm13 * xmm3) + xmm0
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm6, %xmm2
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm2 = -(xmm2 * mem) + xmm0
+; CHECK-NEXT: vmovss {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 4-byte Reload
+; CHECK-NEXT: # xmm1 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm1, %xmm1
+; CHECK-NEXT: vmulss %xmm2, %xmm3, %xmm2
+; CHECK-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm1 = -(xmm3 * xmm1) + xmm0
+; CHECK-NEXT: vmulss %xmm1, %xmm2, %xmm1
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm4, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm2, %xmm2
+; CHECK-NEXT: vmulss %xmm1, %xmm2, %xmm1
+; CHECK-NEXT: vmovss {{[-0-9]+}}(%r{{[sb]}}p), %xmm12 # 4-byte Reload
+; CHECK-NEXT: # xmm12 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm12, %xmm2
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm7 = -(xmm7 * mem) + xmm0
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm2 = -(xmm13 * xmm2) + xmm0
+; CHECK-NEXT: vmulss %xmm7, %xmm2, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm1, %xmm1
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm1, %xmm1
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm8 = -(xmm8 * mem) + xmm0
+; CHECK-NEXT: vmulss %xmm2, %xmm8, %xmm2
+; CHECK-NEXT: vmulss %xmm2, %xmm1, %xmm1
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm1, %xmm1
+; CHECK-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm2 = -(xmm15 * xmm2) + xmm0
+; CHECK-NEXT: vmulss %xmm1, %xmm2, %xmm1
+; CHECK-NEXT: vmulss %xmm0, %xmm5, %xmm2
+; CHECK-NEXT: vmulss %xmm3, %xmm2, %xmm2
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm2 = -(xmm10 * xmm2) + xmm0
+; CHECK-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm3 = -(xmm5 * xmm3) + xmm0
+; CHECK-NEXT: vmulss %xmm2, %xmm3, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm9, %xmm8
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm9, %xmm4
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm4 = -(xmm4 * mem) + xmm0
+; CHECK-NEXT: vmulss %xmm4, %xmm2, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm1, %xmm1
+; CHECK-NEXT: vmulss %xmm2, %xmm1, %xmm10
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm11 = -(xmm5 * xmm11) + xmm0
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm6, %xmm2
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm2 = -(xmm15 * xmm2) + xmm0
+; CHECK-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm1, %xmm4
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm4 = -(xmm4 * mem) + xmm0
+; CHECK-NEXT: vmulss %xmm2, %xmm11, %xmm2
+; CHECK-NEXT: vmulss %xmm4, %xmm2, %xmm2
+; CHECK-NEXT: vfnmadd132ss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm14 # 4-byte Folded Reload
+; CHECK-NEXT: # xmm14 = -(xmm14 * mem) + xmm0
+; CHECK-NEXT: vmulss %xmm2, %xmm14, %xmm9
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm0, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm2, %xmm11
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm11 = -(xmm11 * mem) + xmm0
+; CHECK-NEXT: vmovss {{.*#+}} xmm5 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm5, %xmm7
+; CHECK-NEXT: vmulss {{[-0-9]+}}(%r{{[sb]}}p), %xmm5, %xmm5 # 4-byte Folded Reload
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm6, %xmm1
+; CHECK-NEXT: vmulss %xmm6, %xmm15, %xmm6
+; CHECK-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm6 = -(xmm3 * xmm6) + xmm0
+; CHECK-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm2, %xmm4
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm4 = -(xmm3 * xmm4) + xmm0
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm7 = -(xmm3 * xmm7) + xmm0
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm5 = -(xmm3 * xmm5) + xmm0
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm12, %xmm2
+; CHECK-NEXT: vmulss %xmm0, %xmm13, %xmm3
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm3, %xmm3
+; CHECK-NEXT: vmovss {{.*#+}} xmm12 = mem[0],zero,zero,zero
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm3 = -(xmm12 * xmm3) + xmm0
+; CHECK-NEXT: vfnmadd213ss {{.*#+}} xmm2 = -(xmm12 * xmm2) + xmm0
+; CHECK-NEXT: vfmsub213ss {{.*#+}} xmm1 = (xmm15 * xmm1) - xmm0
+; CHECK-NEXT: vfnmadd132ss {{.*#+}} xmm8 = -(xmm8 * mem) + xmm0
+; CHECK-NEXT: vmulss %xmm8, %xmm9, %xmm0
+; CHECK-NEXT: vmulss %xmm6, %xmm0, %xmm0
+; CHECK-NEXT: vmulss %xmm4, %xmm0, %xmm0
+; CHECK-NEXT: vmulss %xmm7, %xmm0, %xmm0
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm10, %xmm4
+; CHECK-NEXT: vmulss %xmm0, %xmm4, %xmm0
+; CHECK-NEXT: vmulss %xmm5, %xmm11, %xmm4
+; CHECK-NEXT: vmulss %xmm3, %xmm4, %xmm3
+; CHECK-NEXT: vmulss %xmm2, %xmm3, %xmm2
+; CHECK-NEXT: vmulss {{[0-9]+}}(%rsp), %xmm0, %xmm0
+; CHECK-NEXT: vmulss %xmm1, %xmm2, %xmm1
+; CHECK-NEXT: vmulss %xmm1, %xmm0, %xmm0
+; CHECK-NEXT: vmovss %xmm0, (%rdi)
+; CHECK-NEXT: retq
+entry:
+ %81 = fsub reassoc nsz contract float %0, %1
+ %82 = fmul reassoc nsz contract float %1, %2
+ %83 = fmul reassoc nsz contract float %3, %82
+ %84 = fsub reassoc nsz contract float %0, %83
+ %85 = fmul reassoc nsz contract float %84, %4
+ %86 = fmul reassoc nsz contract float %81, %5
+ %87 = fsub reassoc nsz contract float %0, %86
+ %88 = fmul reassoc nsz contract float %87, %85
+ %89 = fmul reassoc nsz contract float %81, %6
+ %90 = fmul reassoc nsz contract float %89, %7
+ %91 = fsub reassoc nsz contract float %0, %90
+ %92 = fmul reassoc nsz contract float %91, %88
+ %93 = fmul reassoc nsz contract float %8, %0
+ %94 = fmul reassoc nsz contract float %93, %9
+ %95 = fmul reassoc nsz contract float %94, %10
+ %96 = fsub reassoc nsz contract float %0, %95
+ %97 = fmul reassoc nsz contract float %96, %92
+ %98 = fmul reassoc nsz contract float %11, %7
+ %99 = fmul reassoc nsz contract float %98, %12
+ %100 = fsub reassoc nsz contract float %0, %99
+ %101 = fmul reassoc nsz contract float %100, %97
+ %102 = fmul reassoc nsz contract float %13, %0
+ %103 = fmul reassoc nsz contract float %102, %14
+ %104 = fmul reassoc nsz contract float %103, %15
+ %105 = fsub reassoc nsz contract float %0, %104
+ %106 = fmul reassoc nsz contract float %105, %101
+ %107 = fmul reassoc nsz contract float %16, %17
+ %108 = fsub reassoc nsz contract float %0, %107
+ %109 = fmul reassoc nsz contract float %108, %106
+ %110 = fmul reassoc nsz contract float %18, %19
+ %111 = fmul reassoc nsz contract float %110, %9
+ %112 = fsub reassoc nsz contract float %0, %111
+ %113 = fmul reassoc nsz contract float %112, %109
+ %114 = fmul reassoc nsz contract float %20, %0
+ %115 = fmul reassoc nsz contract float %114, %15
+ %116 = fmul reassoc nsz contract float %81, %115
+ %117 = fsub reassoc nsz contract float %0, %116
+ %118 = fmul reassoc nsz contract float %117, %113
+ %119 = fmul reassoc nsz contract float %8, %21
+ %120 = fsub reassoc nsz contract float %0, %119
+ %121 = fmul reassoc nsz contract float %120, %118
+ %122 = fmul reassoc nsz contract float %102, %22
+ %123 = fmul reassoc nsz contract float %122, %23
+ %124 = fsub reassoc nsz contract float %0, %123
+ %125 = fmul reassoc nsz contract float %124, %121
+ %126 = fmul reassoc nsz contract float %125, %24
+ %127 = fmul reassoc nsz contract float %3, %25
+ %128 = fsub reassoc nsz contract float %0, %127
+ %129 = fmul reassoc nsz contract float %128, %126
+ %130 = fmul reassoc nsz contract float %129, %26
+ %131 = fmul reassoc nsz contract float %27, %1
+ %132 = fmul reassoc nsz contract float %131, %28
+ %133 = fsub reassoc nsz contract float %0, %132
+ %134 = fmul reassoc nsz contract float %133, %130
+ %135 = fmul reassoc nsz contract float %29, %30
+ %136 = fmul reassoc nsz contract float %135, %31
+ %137 = fsub reassoc nsz contract float %0, %136
+ %138 = fmul reassoc nsz contract float %137, %134
+ %139 = fmul reassoc nsz contract float %138, %32
+ %140 = fmul reassoc nsz contract float %139, %33
+ %141 = fmul reassoc nsz contract float %140, %34
+ %142 = fmul reassoc nsz contract float %35, %9
+ %143 = fmul reassoc nsz contract float %142, %36
+ %144 = fsub reassoc nsz contract float %0, %143
+ %145 = fmul reassoc nsz contract float %144, %141
+ %146 = fmul reassoc nsz contract float %145, %37
+ %147 = fmul reassoc nsz contract float %1, %38
+ %148 = fsub reassoc nsz contract float %0, %147
+ %149 = fmul reassoc nsz contract float %148, %146
+ %150 = fmul reassoc nsz contract float %39, %40
+ %151 = fsub reassoc nsz contract float %0, %150
+ %152 = fmul reassoc nsz contract float %151, %149
+ %153 = fmul reassoc nsz contract float %152, %41
+ %154 = fmul reassoc nsz contract float %20, %42
+ %155 = fmul reassoc nsz contract float %154, %43
+ %156 = fsub reassoc nsz contract float %0, %155
+ %157 = fmul reassoc nsz contract float %156, %153
+ %158 = fmul reassoc nsz contract float %157, %44
+ %159 = fmul reassoc nsz contract float %158, %45
+ %160 = fmul reassoc nsz contract float %81, %0
+ %161 = fmul reassoc nsz contract float %160, %46
+ %162 = fmul reassoc nsz contract float %161, %14
+ %163 = fsub reassoc nsz contract float %0, %162
+ %164 = fmul reassoc nsz contract float %163, %159
+ %165 = fmul reassoc nsz contract float %8, %47
+ %166 = fmul reassoc nsz contract float %18, %165
+ %167 = fsub reassoc nsz contract float %0, %166
+ %168 = fmul reassoc nsz contract float %167, %164
+ %169 = fmul reassoc nsz contract float %168, %48
+ %170 = fmul reassoc nsz contract float %169, %49
+ %171 = fmul reassoc nsz contract float %18, %50
+ %172 = fsub reassoc nsz contract float %0, %171
+ %173 = fmul reassoc nsz contract float %172, %170
+ %174 = fmul reassoc nsz contract float %16, %160
+ %175 = fmul reassoc nsz contract float %174, %12
+ %176 = fsub reassoc nsz contract float %0, %175
+ %177 = fmul reassoc nsz contract float %176, %173
+ %178 = fmul reassoc nsz contract float %51, %0
+ %179 = fmul reassoc nsz contract float %178, %22
+ %180 = fmul reassoc nsz contract float %179, %52
+ %181 = fsub reassoc nsz contract float %0, %180
+ %182 = fmul reassoc nsz contract float %181, %177
+ %183 = fmul reassoc nsz contract float %27, %16
+ %184 = fmul reassoc nsz contract float %183, %53
+ %185 = fsub reassoc nsz contract float %0, %184
+ %186 = fmul reassoc nsz contract float %185, %182
+ %187 = fmul reassoc nsz contract float %16, %54
+ %188 = fmul reassoc nsz contract float %8, %187
+ %189 = fsub reassoc nsz contract float %0, %188
+ %190 = fmul reassoc nsz contract float %189, %186
+ %191 = fmul reassoc nsz contract float %190, %55
+ %192 = fmul reassoc nsz contract float %191, %56
+ %193 = fmul reassoc nsz contract float %57, %58
+ %194 = fmul reassoc nsz contract float %193, %59
+ %195 = fsub reassoc nsz contract float %0, %194
+ %196 = fmul reassoc nsz contract float %195, %192
+ %197 = fmul reassoc nsz contract float %13, %160
+ %198 = fmul reassoc nsz contract float %197, %36
+ %199 = fsub reassoc nsz contract float %0, %198
+ %200 = fmul reassoc nsz contract float %199, %196
+ %201 = fmul reassoc nsz contract float %93, %60
+ %202 = fmul reassoc nsz contract float %201, %61
+ %203 = fsub reassoc nsz contract float %0, %202
+ %204 = fmul reassoc nsz contract float %203, %200
+ %205 = fmul reassoc nsz contract float %204, %62
+ %206 = fmul reassoc nsz contract float %205, %63
+ %207 = fmul reassoc nsz contract float %114, %9
+ %208 = fmul reassoc nsz contract float %207, %59
+ %209 = fsub reassoc nsz contract float %0, %208
+ %210 = fmul reassoc nsz contract float %209, %206
+ %211 = fmul reassoc nsz contract float %18, %64
+ %212 = fsub reassoc nsz contract float %0, %211
+ %213 = fmul reassoc nsz contract float %212, %210
+ %214 = fmul reassoc nsz contract float %29, %65
+ %215 = fsub reassoc nsz contract float %0, %214
+ %216 = fmul reassoc nsz contract float %215, %213
+ %217 = fmul reassoc nsz contract float %216, %66
+ %218 = fmul reassoc nsz contract float %3, %67
+ %219 = fsub reassoc nsz contract float %0, %218
+ %220 = fmul reassoc nsz contract float %219, %217
+ %221 = fmul reassoc nsz contract float %220, %68
+ %222 = fmul reassoc nsz contract float %57, %69
+ %223 = fsub reassoc nsz contract float %0, %222
+ %224 = fmul reassoc nsz contract float %223, %221
+ %225 = fmul reassoc nsz contract float %57, %0
+ %226 = fmul reassoc nsz contract float %225, %61
+ %227 = fmul reassoc nsz contract float %226, %12
+ %228 = fsub reassoc nsz contract float %0, %227
+ %229 = fmul reassoc nsz contract float %228, %224
+ %230 = fmul reassoc nsz contract float %178, %70
+ %231 = fmul reassoc nsz contract float %230, %46
+ %232 = fsub reassoc nsz contract float %0, %231
+ %233 = fmul reassoc nsz contract float %232, %229
+ %234 = fmul reassoc nsz contract float %233, %71
+ %235 = fmul reassoc nsz contract float %57, %122
+ %236 = fsub reassoc nsz contract float %0, %235
+ %237 = fmul reassoc nsz contract float %236, %234
+ %238 = fmul reassoc nsz contract float %20, %160
+ %239 = fmul reassoc nsz contract float %3, %238
+ %240 = fsub reassoc nsz contract float %0, %239
+ %241 = fmul reassoc nsz contract float %240, %237
+ %242 = fmul reassoc nsz contract float %16, %72
+ %243 = fmul reassoc nsz contract float %242, %73
+ %244 = fsub reassoc nsz contract float %0, %243
+ %245 = fmul reassoc nsz contract float %244, %241
+ %246 = fmul reassoc nsz contract float %154, %15
+ %247 = fsub reassoc nsz contract float %0, %246
+ %248 = fmul reassoc nsz contract float %247, %245
+ %249 = fmul reassoc nsz contract float %178, %23
+ %250 = fmul reassoc nsz contract float %249, %74
+ %251 = fsub reassoc nsz contract float %0, %250
+ %252 = fmul reassoc nsz contract float %251, %248
+ %253 = fmul reassoc nsz contract float %3, %160
+ %254 = fmul reassoc nsz contract float %51, %253
+ %255 = fsub reassoc nsz contract float %0, %254
+ %256 = fmul reassoc nsz contract float %255, %252
+ %257 = fmul reassoc nsz contract float %13, %75
+ %258 = fmul reassoc nsz contract float %257, %51
+ %259 = fsub reassoc nsz contract float %0, %258
+ %260 = fmul reassoc nsz contract float %259, %256
+ %261 = fmul reassoc nsz contract float %8, %76
+ %262 = fmul reassoc nsz contract float %51, %261
+ %263 = fsub reassoc nsz contract float %0, %262
+ %264 = fmul reassoc nsz contract float %263, %260
+ %265 = fmul reassoc nsz contract float %264, %77
+ %266 = fmul reassoc nsz contract float %39, %0
+ %267 = fmul reassoc nsz contract float %266, %78
+ %268 = fmul reassoc nsz contract float %267, %14
+ %269 = fsub reassoc nsz contract float %0, %268
+ %270 = fmul reassoc nsz contract float %269, %265
+ %271 = fmul reassoc nsz contract float %1, %76
+ %272 = fmul reassoc nsz contract float %51, %271
+ %273 = fsub reassoc nsz contract float %0, %272
+ %274 = fmul reassoc nsz contract float %273, %270
+ %275 = fmul reassoc nsz contract float %0, %59
+ %276 = fmul reassoc nsz contract float %275, %79
+ %277 = fmul reassoc nsz contract float %276, %36
+ %278 = fsub reassoc nsz contract float %0, %277
+ %279 = fmul reassoc nsz contract float %278, %274
+ %280 = fmul reassoc nsz contract float %114, %22
+ %281 = fmul reassoc nsz contract float %280, %36
+ %282 = fsub reassoc nsz contract float %0, %281
+ %283 = fmul reassoc nsz contract float %282, %279
+ %284 = fmul reassoc nsz contract float %0, %43
+ %285 = fmul reassoc nsz contract float %284, %81
+ %286 = fmul reassoc nsz contract float %3, %285
+ %287 = fsub reassoc nsz contract float %0, %286
+ %288 = fmul reassoc nsz contract float %287, %283
+ store float %288, float* %80, align 4
+ ret void
+}
diff --git a/llvm/test/CodeGen/X86/pr47000.ll b/llvm/test/CodeGen/X86/pr47000.ll
index 083aa780a07c2..922b6403cc4f4 100755
--- a/llvm/test/CodeGen/X86/pr47000.ll
+++ b/llvm/test/CodeGen/X86/pr47000.ll
@@ -12,47 +12,51 @@ define <4 x half> @doTheTestMod(<4 x half> %0, <4 x half> %1) nounwind {
; CHECK-NEXT: pushl %edi
; CHECK-NEXT: pushl %esi
; CHECK-NEXT: subl $124, %esp
-; CHECK-NEXT: movl 144(%esp), %eax
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
; CHECK-NEXT: movl %eax, %ecx
-; CHECK-NEXT: movw 176(%esp), %dx
-; CHECK-NEXT: movw 172(%esp), %si
-; CHECK-NEXT: movw 168(%esp), %di
-; CHECK-NEXT: movw 164(%esp), %bx
-; CHECK-NEXT: movw 160(%esp), %bp
+; CHECK-NEXT: movw {{[0-9]+}}(%esp), %dx
+; CHECK-NEXT: movw {{[0-9]+}}(%esp), %si
+; CHECK-NEXT: movw {{[0-9]+}}(%esp), %di
+; CHECK-NEXT: movw {{[0-9]+}}(%esp), %bx
+; CHECK-NEXT: movw {{[0-9]+}}(%esp), %bp
+; CHECK-NEXT: movw %dx, {{[-0-9]+}}(%e{{[sb]}}p) # 2-byte Spill
+; CHECK-NEXT: movw {{[0-9]+}}(%esp), %dx
+; CHECK-NEXT: movw %dx, {{[-0-9]+}}(%e{{[sb]}}p) # 2-byte Spill
+; CHECK-NEXT: movw {{[0-9]+}}(%esp), %dx
+; CHECK-NEXT: movw %dx, {{[-0-9]+}}(%e{{[sb]}}p) # 2-byte Spill
+; CHECK-NEXT: movw {{[0-9]+}}(%esp), %dx
+; CHECK-NEXT: movw %dx, {{[0-9]+}}(%esp)
+; CHECK-NEXT: movw {{[-0-9]+}}(%e{{[sb]}}p), %dx # 2-byte Reload
+; CHECK-NEXT: movw %dx, {{[0-9]+}}(%esp)
+; CHECK-NEXT: movw {{[-0-9]+}}(%e{{[sb]}}p), %dx # 2-byte Reload
+; CHECK-NEXT: movw %dx, {{[0-9]+}}(%esp)
+; CHECK-NEXT: movw %bp, {{[0-9]+}}(%esp)
+; CHECK-NEXT: movw {{[-0-9]+}}(%e{{[sb]}}p), %bp # 2-byte Reload
+; CHECK-NEXT: movw %bp, {{[0-9]+}}(%esp)
+; CHECK-NEXT: movw %si, {{[0-9]+}}(%esp)
+; CHECK-NEXT: movw %di, {{[0-9]+}}(%esp)
+; CHECK-NEXT: movw %bx, {{[0-9]+}}(%esp)
+; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %esi
+; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %edi
+; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %ebx
; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
-; CHECK-NEXT: movw 156(%esp), %ax
-; CHECK-NEXT: movw %ax, {{[-0-9]+}}(%e{{[sb]}}p) # 2-byte Spill
-; CHECK-NEXT: movw 152(%esp), %ax
-; CHECK-NEXT: movw %ax, {{[-0-9]+}}(%e{{[sb]}}p) # 2-byte Spill
-; CHECK-NEXT: movw 148(%esp), %ax
-; CHECK-NEXT: movw %ax, 112(%esp)
-; CHECK-NEXT: movw {{[-0-9]+}}(%e{{[sb]}}p), %ax # 2-byte Reload
-; CHECK-NEXT: movw %ax, 114(%esp)
-; CHECK-NEXT: movw {{[-0-9]+}}(%e{{[sb]}}p), %ax # 2-byte Reload
-; CHECK-NEXT: movw %ax, 116(%esp)
-; CHECK-NEXT: movw %bp, 118(%esp)
-; CHECK-NEXT: movw %dx, 110(%esp)
-; CHECK-NEXT: movw %si, 108(%esp)
-; CHECK-NEXT: movw %di, 106(%esp)
-; CHECK-NEXT: movw %bx, 104(%esp)
-; CHECK-NEXT: movzwl 118(%esp), %edx
-; CHECK-NEXT: movzwl 116(%esp), %esi
-; CHECK-NEXT: movzwl 114(%esp), %edi
-; CHECK-NEXT: movzwl 112(%esp), %ebx
-; CHECK-NEXT: movzwl 110(%esp), %ebp
-; CHECK-NEXT: movzwl 108(%esp), %eax
+; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %eax
; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
-; CHECK-NEXT: movzwl 106(%esp), %eax
+; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %eax
; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
-; CHECK-NEXT: movzwl 104(%esp), %eax
+; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %eax
+; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %eax
+; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %eax
; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; CHECK-NEXT: movl %esp, %eax
-; CHECK-NEXT: movl %ebx, (%eax)
; CHECK-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
-; CHECK-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
+; CHECK-NEXT: movl %ecx, (%eax)
; CHECK-NEXT: movl %esi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; CHECK-NEXT: movl %edi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
-; CHECK-NEXT: movl %ebp, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT: movl %ebx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; CHECK-NEXT: calll __gnu_h2f_ieee
; CHECK-NEXT: movl %esp, %eax
; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
@@ -68,58 +72,58 @@ define <4 x half> @doTheTestMod(<4 x half> %0, <4 x half> %1) nounwind {
; CHECK-NEXT: fstps (%eax)
; CHECK-NEXT: calll __gnu_f2h_ieee
; CHECK-NEXT: movl %esp, %ecx
-; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload
-; CHECK-NEXT: movl %edx, (%ecx)
+; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %esi # 4-byte Reload
+; CHECK-NEXT: movl %esi, (%ecx)
; CHECK-NEXT: movw %ax, {{[-0-9]+}}(%e{{[sb]}}p) # 2-byte Spill
; CHECK-NEXT: calll __gnu_h2f_ieee
-; CHECK-NEXT: movl %esp, %eax
-; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
-; CHECK-NEXT: movl %ecx, (%eax)
+; CHECK-NEXT: movl %esp, %ecx
+; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %esi # 4-byte Reload
+; CHECK-NEXT: movl %esi, (%ecx)
; CHECK-NEXT: fstpt {{[-0-9]+}}(%e{{[sb]}}p) # 10-byte Folded Spill
; CHECK-NEXT: calll __gnu_h2f_ieee
-; CHECK-NEXT: movl %esp, %eax
-; CHECK-NEXT: fstps 4(%eax)
+; CHECK-NEXT: movl %esp, %ecx
+; CHECK-NEXT: fstps 4(%ecx)
; CHECK-NEXT: fldt {{[-0-9]+}}(%e{{[sb]}}p) # 10-byte Folded Reload
-; CHECK-NEXT: fstps (%eax)
+; CHECK-NEXT: fstps (%ecx)
; CHECK-NEXT: calll fmodf
-; CHECK-NEXT: movl %esp, %eax
-; CHECK-NEXT: fstps (%eax)
+; CHECK-NEXT: movl %esp, %ecx
+; CHECK-NEXT: fstps (%ecx)
; CHECK-NEXT: calll __gnu_f2h_ieee
; CHECK-NEXT: movl %esp, %ecx
-; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload
-; CHECK-NEXT: movl %edx, (%ecx)
+; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %esi # 4-byte Reload
+; CHECK-NEXT: movl %esi, (%ecx)
; CHECK-NEXT: movw %ax, {{[-0-9]+}}(%e{{[sb]}}p) # 2-byte Spill
; CHECK-NEXT: calll __gnu_h2f_ieee
-; CHECK-NEXT: movl %esp, %eax
-; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
-; CHECK-NEXT: movl %ecx, (%eax)
+; CHECK-NEXT: movl %esp, %ecx
+; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %esi # 4-byte Reload
+; CHECK-NEXT: movl %esi, (%ecx)
; CHECK-NEXT: fstpt {{[-0-9]+}}(%e{{[sb]}}p) # 10-byte Folded Spill
; CHECK-NEXT: calll __gnu_h2f_ieee
-; CHECK-NEXT: movl %esp, %eax
-; CHECK-NEXT: fstps 4(%eax)
+; CHECK-NEXT: movl %esp, %ecx
+; CHECK-NEXT: fstps 4(%ecx)
; CHECK-NEXT: fldt {{[-0-9]+}}(%e{{[sb]}}p) # 10-byte Folded Reload
-; CHECK-NEXT: fstps (%eax)
+; CHECK-NEXT: fstps (%ecx)
; CHECK-NEXT: calll fmodf
-; CHECK-NEXT: movl %esp, %eax
-; CHECK-NEXT: fstps (%eax)
+; CHECK-NEXT: movl %esp, %ecx
+; CHECK-NEXT: fstps (%ecx)
; CHECK-NEXT: calll __gnu_f2h_ieee
; CHECK-NEXT: movl %esp, %ecx
-; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload
-; CHECK-NEXT: movl %edx, (%ecx)
+; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %esi # 4-byte Reload
+; CHECK-NEXT: movl %esi, (%ecx)
; CHECK-NEXT: movw %ax, {{[-0-9]+}}(%e{{[sb]}}p) # 2-byte Spill
; CHECK-NEXT: calll __gnu_h2f_ieee
-; CHECK-NEXT: movl %esp, %eax
-; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
-; CHECK-NEXT: movl %ecx, (%eax)
+; CHECK-NEXT: movl %esp, %ecx
+; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %esi # 4-byte Reload
+; CHECK-NEXT: movl %esi, (%ecx)
; CHECK-NEXT: fstpt {{[-0-9]+}}(%e{{[sb]}}p) # 10-byte Folded Spill
; CHECK-NEXT: calll __gnu_h2f_ieee
-; CHECK-NEXT: movl %esp, %eax
-; CHECK-NEXT: fstps 4(%eax)
+; CHECK-NEXT: movl %esp, %ecx
+; CHECK-NEXT: fstps 4(%ecx)
; CHECK-NEXT: fldt {{[-0-9]+}}(%e{{[sb]}}p) # 10-byte Folded Reload
-; CHECK-NEXT: fstps (%eax)
+; CHECK-NEXT: fstps (%ecx)
; CHECK-NEXT: calll fmodf
-; CHECK-NEXT: movl %esp, %eax
-; CHECK-NEXT: fstps (%eax)
+; CHECK-NEXT: movl %esp, %ecx
+; CHECK-NEXT: fstps (%ecx)
; CHECK-NEXT: calll __gnu_f2h_ieee
; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
; CHECK-NEXT: movw %ax, 6(%ecx)
@@ -127,9 +131,10 @@ define <4 x half> @doTheTestMod(<4 x half> %0, <4 x half> %1) nounwind {
; CHECK-NEXT: movw %ax, 4(%ecx)
; CHECK-NEXT: movw {{[-0-9]+}}(%e{{[sb]}}p), %dx # 2-byte Reload
; CHECK-NEXT: movw %dx, 2(%ecx)
-; CHECK-NEXT: movw {{[-0-9]+}}(%e{{[sb]}}p), %si # 2-byte Reload
-; CHECK-NEXT: movw %si, (%ecx)
-; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
+; CHECK-NEXT: movw {{[-0-9]+}}(%e{{[sb]}}p), %bp # 2-byte Reload
+; CHECK-NEXT: movw %bp, (%ecx)
+; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %esi # 4-byte Reload
+; CHECK-NEXT: movl %esi, %eax
; CHECK-NEXT: addl $124, %esp
; CHECK-NEXT: popl %esi
; CHECK-NEXT: popl %edi
diff --git a/llvm/test/CodeGen/X86/pr47517.ll b/llvm/test/CodeGen/X86/pr47517.ll
new file mode 100644
index 0000000000000..5672fbc69a41d
--- /dev/null
+++ b/llvm/test/CodeGen/X86/pr47517.ll
@@ -0,0 +1,28 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple x86_64 < %s | FileCheck %s
+
+; To ensure unused floating point constant is correctly removed
+define float @test(float %src, float* %p) {
+; CHECK-LABEL: test:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: movq $0, (%rdi)
+; CHECK-NEXT: xorps %xmm0, %xmm0
+; CHECK-NEXT: retq
+entry:
+ %a0 = getelementptr inbounds float, float* %p, i32 0
+ %a1 = getelementptr inbounds float, float* %p, i32 1
+ store float 0.000000e+00, float* %a0
+ store float 0.000000e+00, float* %a1
+ %zero = load float, float* %a0
+ %fmul1 = fmul fast float %zero, %src
+ %fadd1 = fadd fast float %fmul1, %zero
+ %fmul2 = fmul fast float %fadd1, 2.000000e+00
+ %fmul3 = fmul fast float %fmul2, %fmul2
+ %fmul4 = fmul fast float %fmul2, 2.000000e+00
+ %fadd2 = fadd fast float %fmul4, -3.000000e+00
+ %fmul5 = fmul fast float %fadd2, %fmul2
+ %fadd3 = fadd fast float %fmul2, %src
+ %fadd4 = fadd fast float %fadd3, %fmul5
+ %fmul6 = fmul fast float %fmul3, %fadd4
+ ret float %fmul6
+}
diff --git a/llvm/test/CodeGen/X86/regalloc-fast-missing-live-out-spill.mir b/llvm/test/CodeGen/X86/regalloc-fast-missing-live-out-spill.mir
index 2821f00940ecf..0fe9f60897fd1 100644
--- a/llvm/test/CodeGen/X86/regalloc-fast-missing-live-out-spill.mir
+++ b/llvm/test/CodeGen/X86/regalloc-fast-missing-live-out-spill.mir
@@ -23,15 +23,15 @@ body: |
; CHECK: successors: %bb.3(0x80000000)
; CHECK: $rax = MOV64rm %stack.1, 1, $noreg, 0, $noreg :: (load 8 from %stack.1)
; CHECK: renamable $ecx = MOV32r0 implicit-def $eflags
- ; CHECK: renamable $rcx = SUBREG_TO_REG 0, killed renamable $ecx, %subreg.sub_32bit
+ ; CHECK: renamable $rdx = SUBREG_TO_REG 0, killed renamable $ecx, %subreg.sub_32bit
; CHECK: MOV64mi32 killed renamable $rax, 1, $noreg, 0, $noreg, 0 :: (volatile store 8)
- ; CHECK: MOV64mr %stack.0, 1, $noreg, 0, $noreg, killed $rcx :: (store 8 into %stack.0)
+ ; CHECK: MOV64mr %stack.0, 1, $noreg, 0, $noreg, killed $rdx :: (store 8 into %stack.0)
; CHECK: bb.3:
; CHECK: successors: %bb.2(0x40000000), %bb.1(0x40000000)
; CHECK: $rax = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load 8 from %stack.0)
; CHECK: renamable $ecx = MOV32r0 implicit-def dead $eflags
- ; CHECK: renamable $rcx = SUBREG_TO_REG 0, killed renamable $ecx, %subreg.sub_32bit
- ; CHECK: MOV64mr %stack.1, 1, $noreg, 0, $noreg, killed $rcx :: (store 8 into %stack.1)
+ ; CHECK: renamable $rdx = SUBREG_TO_REG 0, killed renamable $ecx, %subreg.sub_32bit
+ ; CHECK: MOV64mr %stack.1, 1, $noreg, 0, $noreg, killed $rdx :: (store 8 into %stack.1)
; CHECK: JMP64r killed renamable $rax
bb.0:
liveins: $edi, $rsi
diff --git a/llvm/test/CodeGen/X86/semantic-interposition-comdat.ll b/llvm/test/CodeGen/X86/semantic-interposition-comdat.ll
index d0efd4d11c958..d11be2d6bd0c2 100644
--- a/llvm/test/CodeGen/X86/semantic-interposition-comdat.ll
+++ b/llvm/test/CodeGen/X86/semantic-interposition-comdat.ll
@@ -3,7 +3,7 @@
$comdat_func = comdat any
; CHECK-LABEL: func2:
-; CHECK-NEXT: .Lfunc2$local
+; CHECK-NOT: .Lfunc2$local
declare void @func()
diff --git a/llvm/test/CodeGen/X86/swift-return.ll b/llvm/test/CodeGen/X86/swift-return.ll
index 4934419055acd..c62e92f2cac55 100644
--- a/llvm/test/CodeGen/X86/swift-return.ll
+++ b/llvm/test/CodeGen/X86/swift-return.ll
@@ -28,10 +28,11 @@ define i16 @test(i32 %key) {
; CHECK-O0-NEXT: movl %edi, {{[0-9]+}}(%rsp)
; CHECK-O0-NEXT: movl {{[0-9]+}}(%rsp), %edi
; CHECK-O0-NEXT: callq gen
-; CHECK-O0-NEXT: cwtl
-; CHECK-O0-NEXT: movsbl %dl, %ecx
-; CHECK-O0-NEXT: addl %ecx, %eax
-; CHECK-O0-NEXT: # kill: def $ax killed $ax killed $eax
+; CHECK-O0-NEXT: movswl %ax, %ecx
+; CHECK-O0-NEXT: movsbl %dl, %esi
+; CHECK-O0-NEXT: addl %esi, %ecx
+; CHECK-O0-NEXT: # kill: def $cx killed $cx killed $ecx
+; CHECK-O0-NEXT: movw %cx, %ax
; CHECK-O0-NEXT: popq %rcx
; CHECK-O0-NEXT: .cfi_def_cfa_offset 8
; CHECK-O0-NEXT: retq
@@ -79,16 +80,16 @@ define i32 @test2(i32 %key) #0 {
; CHECK-O0-NEXT: movl {{[0-9]+}}(%rsp), %edi
; CHECK-O0-NEXT: movq %rsp, %rax
; CHECK-O0-NEXT: callq gen2
-; CHECK-O0-NEXT: movl {{[0-9]+}}(%rsp), %eax
; CHECK-O0-NEXT: movl {{[0-9]+}}(%rsp), %ecx
; CHECK-O0-NEXT: movl {{[0-9]+}}(%rsp), %edx
-; CHECK-O0-NEXT: movl (%rsp), %esi
-; CHECK-O0-NEXT: movl {{[0-9]+}}(%rsp), %edi
-; CHECK-O0-NEXT: addl %edi, %esi
-; CHECK-O0-NEXT: addl %edx, %esi
-; CHECK-O0-NEXT: addl %ecx, %esi
-; CHECK-O0-NEXT: addl %eax, %esi
-; CHECK-O0-NEXT: movl %esi, %eax
+; CHECK-O0-NEXT: movl {{[0-9]+}}(%rsp), %esi
+; CHECK-O0-NEXT: movl (%rsp), %edi
+; CHECK-O0-NEXT: movl {{[0-9]+}}(%rsp), %r8d
+; CHECK-O0-NEXT: addl %r8d, %edi
+; CHECK-O0-NEXT: addl %esi, %edi
+; CHECK-O0-NEXT: addl %edx, %edi
+; CHECK-O0-NEXT: addl %ecx, %edi
+; CHECK-O0-NEXT: movl %edi, %eax
; CHECK-O0-NEXT: addq $24, %rsp
; CHECK-O0-NEXT: .cfi_def_cfa_offset 8
; CHECK-O0-NEXT: retq
@@ -263,17 +264,17 @@ define void @consume_i1_ret() {
; CHECK-O0-NEXT: .cfi_def_cfa_offset 16
; CHECK-O0-NEXT: callq produce_i1_ret
; CHECK-O0-NEXT: andb $1, %al
-; CHECK-O0-NEXT: movzbl %al, %eax
-; CHECK-O0-NEXT: movl %eax, var
+; CHECK-O0-NEXT: movzbl %al, %esi
+; CHECK-O0-NEXT: movl %esi, var
; CHECK-O0-NEXT: andb $1, %dl
-; CHECK-O0-NEXT: movzbl %dl, %eax
-; CHECK-O0-NEXT: movl %eax, var
+; CHECK-O0-NEXT: movzbl %dl, %esi
+; CHECK-O0-NEXT: movl %esi, var
; CHECK-O0-NEXT: andb $1, %cl
-; CHECK-O0-NEXT: movzbl %cl, %eax
-; CHECK-O0-NEXT: movl %eax, var
+; CHECK-O0-NEXT: movzbl %cl, %esi
+; CHECK-O0-NEXT: movl %esi, var
; CHECK-O0-NEXT: andb $1, %r8b
-; CHECK-O0-NEXT: movzbl %r8b, %eax
-; CHECK-O0-NEXT: movl %eax, var
+; CHECK-O0-NEXT: movzbl %r8b, %esi
+; CHECK-O0-NEXT: movl %esi, var
; CHECK-O0-NEXT: popq %rax
; CHECK-O0-NEXT: .cfi_def_cfa_offset 8
; CHECK-O0-NEXT: retq
diff --git a/llvm/test/CodeGen/X86/swifterror.ll b/llvm/test/CodeGen/X86/swifterror.ll
index 1afae31b2b8d2..1388c61c18984 100644
--- a/llvm/test/CodeGen/X86/swifterror.ll
+++ b/llvm/test/CodeGen/X86/swifterror.ll
@@ -790,8 +790,8 @@ a:
; CHECK-O0-LABEL: testAssign4
; CHECK-O0: callq _foo2
; CHECK-O0: xorl %eax, %eax
-; CHECK-O0: ## kill: def $rax killed $eax
-; CHECK-O0: movq %rax, [[SLOT:[-a-z0-9\(\)\%]*]]
+; CHECK-O0: movl %eax, %ecx
+; CHECK-O0: movq %rcx, [[SLOT:[-a-z0-9\(\)\%]*]]
; CHECK-O0: movq [[SLOT]], %rax
; CHECK-O0: movq %rax, [[SLOT2:[-a-z0-9\(\)\%]*]]
; CHECK-O0: movq [[SLOT2]], %r12
diff --git a/llvm/test/CodeGen/X86/tailcallpic1.ll b/llvm/test/CodeGen/X86/tailcallpic1.ll
index 717cc1fddec93..ed101fcccd2db 100644
--- a/llvm/test/CodeGen/X86/tailcallpic1.ll
+++ b/llvm/test/CodeGen/X86/tailcallpic1.ll
@@ -12,5 +12,5 @@ define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
entry:
%tmp11 = tail call fastcc i32 @tailcallee( i32 %in1, i32 %in2, i32 %in1, i32 %in2 ) ; [#uses=1]
ret i32 %tmp11
-; CHECK: jmp .Ltailcallee$local
+; CHECK: jmp tailcallee
}
diff --git a/llvm/test/CodeGen/X86/tailcallpic3.ll b/llvm/test/CodeGen/X86/tailcallpic3.ll
index 13b160aae2f63..edc58052d82f6 100644
--- a/llvm/test/CodeGen/X86/tailcallpic3.ll
+++ b/llvm/test/CodeGen/X86/tailcallpic3.ll
@@ -16,7 +16,7 @@ entry:
ret void
}
; CHECK: tailcall_hidden:
-; CHECK: jmp .Ltailcallee_hidden$local
+; CHECK: jmp tailcallee_hidden
define internal void @tailcallee_internal() {
entry:
diff --git a/llvm/test/CodeGen/X86/tailccpic1.ll b/llvm/test/CodeGen/X86/tailccpic1.ll
index dbdc56aa61c74..de8f2219bc2f3 100644
--- a/llvm/test/CodeGen/X86/tailccpic1.ll
+++ b/llvm/test/CodeGen/X86/tailccpic1.ll
@@ -12,5 +12,5 @@ define tailcc i32 @tailcaller(i32 %in1, i32 %in2) {
entry:
%tmp11 = tail call tailcc i32 @tailcallee( i32 %in1, i32 %in2, i32 %in1, i32 %in2 ) ; [#uses=1]
ret i32 %tmp11
-; CHECK: jmp .Ltailcallee$local
+; CHECK: jmp tailcallee
}
diff --git a/llvm/test/DebugInfo/AArch64/dbg-sve-types.ll b/llvm/test/DebugInfo/AArch64/dbg-sve-types.ll
new file mode 100644
index 0000000000000..62b86f294861d
--- /dev/null
+++ b/llvm/test/DebugInfo/AArch64/dbg-sve-types.ll
@@ -0,0 +1,44 @@
+; Test that the debug info for the vector type is correctly codegenerated
+; when the DISubrange has no count, but only an upperbound.
+; RUN: llc -mtriple aarch64 -mattr=+sve -filetype=obj -o %t %s
+; RUN: llvm-dwarfdump %t | FileCheck %s
+; RUN: rm %t
+
+; CHECK: {{.*}}: DW_TAG_subrange_type
+; CHECK-NEXT: DW_AT_type ({{.*}} "__ARRAY_SIZE_TYPE__")
+; CHECK-NEXT: DW_AT_upper_bound (DW_OP_lit8, DW_OP_bregx VG+0, DW_OP_mul, DW_OP_lit1, DW_OP_minus)
+
+define @test_svint8_t( returned %op1) !dbg !7 {
+entry:
+ call void @llvm.dbg.value(metadata %op1, metadata !19, metadata !DIExpression()), !dbg !20
+ ret %op1, !dbg !21
+}
+
+declare void @llvm.dbg.value(metadata, metadata, metadata)
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!3, !4, !5}
+!llvm.ident = !{!6}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 12.0.0", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, nameTableKind: None)
+!1 = !DIFile(filename: "dbg-sve-types.ll", directory: "")
+!2 = !{}
+!3 = !{i32 7, !"Dwarf Version", i32 4}
+!4 = !{i32 2, !"Debug Info Version", i32 3}
+!5 = !{i32 1, !"wchar_size", i32 4}
+!6 = !{!"clang version 12.0.0"}
+!7 = distinct !DISubprogram(name: "test_svint8_t", scope: !8, file: !8, line: 5, type: !9, scopeLine: 5, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !18)
+!8 = !DIFile(filename: "dbg-sve-types.ll", directory: "")
+!9 = !DISubroutineType(types: !10)
+!10 = !{!11, !11}
+!11 = !DIDerivedType(tag: DW_TAG_typedef, name: "svint8_t", file: !12, line: 32, baseType: !13)
+!12 = !DIFile(filename: "lib/clang/12.0.0/include/arm_sve.h", directory: "")
+!13 = !DIDerivedType(tag: DW_TAG_typedef, name: "__SVInt8_t", file: !1, baseType: !14)
+!14 = !DICompositeType(tag: DW_TAG_array_type, baseType: !15, flags: DIFlagVector, elements: !16)
+!15 = !DIBasicType(name: "signed char", size: 8, encoding: DW_ATE_signed_char)
+!16 = !{!17}
+!17 = !DISubrange(lowerBound: 0, upperBound: !DIExpression(DW_OP_constu, 8, DW_OP_bregx, 46, 0, DW_OP_mul, DW_OP_constu, 1, DW_OP_minus))
+!18 = !{!19}
+!19 = !DILocalVariable(name: "op1", arg: 1, scope: !7, file: !8, line: 5, type: !11)
+!20 = !DILocation(line: 0, scope: !7)
+!21 = !DILocation(line: 5, column: 39, scope: !7)
diff --git a/llvm/test/DebugInfo/X86/op_deref.ll b/llvm/test/DebugInfo/X86/op_deref.ll
index 1b49dc554f7ef..5de9976d6de2a 100644
--- a/llvm/test/DebugInfo/X86/op_deref.ll
+++ b/llvm/test/DebugInfo/X86/op_deref.ll
@@ -6,10 +6,10 @@
; RUN: | FileCheck %s -check-prefix=CHECK -check-prefix=DWARF3
; DWARF4: DW_AT_location [DW_FORM_sec_offset] (0x00000000
-; DWARF4-NEXT: {{.*}}: DW_OP_breg2 RCX+0, DW_OP_deref
+; DWARF4-NEXT: {{.*}}: DW_OP_breg1 RDX+0, DW_OP_deref
; DWARF3: DW_AT_location [DW_FORM_data4] (0x00000000
-; DWARF3-NEXT: {{.*}}: DW_OP_breg2 RCX+0, DW_OP_deref
+; DWARF3-NEXT: {{.*}}: DW_OP_breg1 RDX+0, DW_OP_deref
; CHECK-NOT: DW_TAG
; CHECK: DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000067] = "vla")
@@ -17,8 +17,8 @@
; Check the DEBUG_VALUE comments for good measure.
; RUN: llc -O0 -mtriple=x86_64-apple-darwin %s -o - -filetype=asm | FileCheck %s -check-prefix=ASM-CHECK
; vla should have a register-indirect address at one point.
-; ASM-CHECK: DEBUG_VALUE: vla <- [DW_OP_deref] [$rcx+0]
-; ASM-CHECK: DW_OP_breg2
+; ASM-CHECK: DEBUG_VALUE: vla <- [DW_OP_deref] [$rdx+0]
+; ASM-CHECK: DW_OP_breg1
; RUN: llvm-as %s -o - | llvm-dis - | FileCheck %s --check-prefix=PRETTY-PRINT
; PRETTY-PRINT: DIExpression(DW_OP_deref)
diff --git a/llvm/test/MC/AArch64/SVE/st1b.s b/llvm/test/MC/AArch64/SVE/st1b.s
index a6f766bdfd7cc..40b830709ead4 100644
--- a/llvm/test/MC/AArch64/SVE/st1b.s
+++ b/llvm/test/MC/AArch64/SVE/st1b.s
@@ -168,3 +168,27 @@ st1b { z31.d }, p7, [z31.d, #31]
// CHECK-ENCODING: [0xff,0xbf,0x5f,0xe4]
// CHECK-ERROR: instruction requires: sve
// CHECK-UNKNOWN: ff bf 5f e4
+
+st1b { z0.s }, p7, [z0.s, #0]
+// CHECK-INST: st1b { z0.s }, p7, [z0.s]
+// CHECK-ENCODING: [0x00,0xbc,0x60,0xe4]
+// CHECK-ERROR: instruction requires: sve
+// CHECK-UNKNOWN: 00 bc 60 e4
+
+st1b { z0.s }, p7, [z0.s]
+// CHECK-INST: st1b { z0.s }, p7, [z0.s]
+// CHECK-ENCODING: [0x00,0xbc,0x60,0xe4]
+// CHECK-ERROR: instruction requires: sve
+// CHECK-UNKNOWN: 00 bc 60 e4
+
+st1b { z0.d }, p7, [z0.d, #0]
+// CHECK-INST: st1b { z0.d }, p7, [z0.d]
+// CHECK-ENCODING: [0x00,0xbc,0x40,0xe4]
+// CHECK-ERROR: instruction requires: sve
+// CHECK-UNKNOWN: 00 bc 40 e4
+
+st1b { z0.d }, p7, [z0.d]
+// CHECK-INST: st1b { z0.d }, p7, [z0.d]
+// CHECK-ENCODING: [0x00,0xbc,0x40,0xe4]
+// CHECK-ERROR: instruction requires: sve
+// CHECK-UNKNOWN: 00 bc 40 e4
diff --git a/llvm/test/MC/AArch64/SVE/st1d.s b/llvm/test/MC/AArch64/SVE/st1d.s
index ba4a0e5be114b..a5a19e772b528 100644
--- a/llvm/test/MC/AArch64/SVE/st1d.s
+++ b/llvm/test/MC/AArch64/SVE/st1d.s
@@ -78,3 +78,15 @@ st1d { z31.d }, p7, [z31.d, #248]
// CHECK-ENCODING: [0xff,0xbf,0xdf,0xe5]
// CHECK-ERROR: instruction requires: sve
// CHECK-UNKNOWN: ff bf df e5
+
+st1d { z0.d }, p7, [z0.d, #0]
+// CHECK-INST: st1d { z0.d }, p7, [z0.d]
+// CHECK-ENCODING: [0x00,0xbc,0xc0,0xe5]
+// CHECK-ERROR: instruction requires: sve
+// CHECK-UNKNOWN: 00 bc c0 e5
+
+st1d { z0.d }, p7, [z0.d]
+// CHECK-INST: st1d { z0.d }, p7, [z0.d]
+// CHECK-ENCODING: [0x00,0xbc,0xc0,0xe5]
+// CHECK-ERROR: instruction requires: sve
+// CHECK-UNKNOWN: 00 bc c0 e5
diff --git a/llvm/test/MC/AArch64/SVE/st1h.s b/llvm/test/MC/AArch64/SVE/st1h.s
index cd6c20d83482e..fe22c52bb9bef 100644
--- a/llvm/test/MC/AArch64/SVE/st1h.s
+++ b/llvm/test/MC/AArch64/SVE/st1h.s
@@ -168,3 +168,27 @@ st1h { z31.d }, p7, [z31.d, #62]
// CHECK-ENCODING: [0xff,0xbf,0xdf,0xe4]
// CHECK-ERROR: instruction requires: sve
// CHECK-UNKNOWN: ff bf df e4
+
+st1h { z0.s }, p7, [z0.s, #0]
+// CHECK-INST: st1h { z0.s }, p7, [z0.s]
+// CHECK-ENCODING: [0x00,0xbc,0xe0,0xe4]
+// CHECK-ERROR: instruction requires: sve
+// CHECK-UNKNOWN: 00 bc e0 e4
+
+st1h { z0.s }, p7, [z0.s]
+// CHECK-INST: st1h { z0.s }, p7, [z0.s]
+// CHECK-ENCODING: [0x00,0xbc,0xe0,0xe4]
+// CHECK-ERROR: instruction requires: sve
+// CHECK-UNKNOWN: 00 bc e0 e4
+
+st1h { z0.d }, p7, [z0.d, #0]
+// CHECK-INST: st1h { z0.d }, p7, [z0.d]
+// CHECK-ENCODING: [0x00,0xbc,0xc0,0xe4]
+// CHECK-ERROR: instruction requires: sve
+// CHECK-UNKNOWN: 00 bc c0 e4
+
+st1h { z0.d }, p7, [z0.d]
+// CHECK-INST: st1h { z0.d }, p7, [z0.d]
+// CHECK-ENCODING: [0x00,0xbc,0xc0,0xe4]
+// CHECK-ERROR: instruction requires: sve
+// CHECK-UNKNOWN: 00 bc c0 e4
diff --git a/llvm/test/MC/AArch64/SVE/st1w.s b/llvm/test/MC/AArch64/SVE/st1w.s
index e20194f5747e9..5bbcd2e1ea0ff 100644
--- a/llvm/test/MC/AArch64/SVE/st1w.s
+++ b/llvm/test/MC/AArch64/SVE/st1w.s
@@ -138,3 +138,27 @@ st1w { z31.d }, p7, [z31.d, #124]
// CHECK-ENCODING: [0xff,0xbf,0x5f,0xe5]
// CHECK-ERROR: instruction requires: sve
// CHECK-UNKNOWN: ff bf 5f e5
+
+st1w { z0.s }, p7, [z0.s, #0]
+// CHECK-INST: st1w { z0.s }, p7, [z0.s]
+// CHECK-ENCODING: [0x00,0xbc,0x60,0xe5]
+// CHECK-ERROR: instruction requires: sve
+// CHECK-UNKNOWN: 00 bc 60 e5
+
+st1w { z0.s }, p7, [z0.s]
+// CHECK-INST: st1w { z0.s }, p7, [z0.s]
+// CHECK-ENCODING: [0x00,0xbc,0x60,0xe5]
+// CHECK-ERROR: instruction requires: sve
+// CHECK-UNKNOWN: 00 bc 60 e5
+
+st1w { z0.d }, p7, [z0.d, #0]
+// CHECK-INST: st1w { z0.d }, p7, [z0.d]
+// CHECK-ENCODING: [0x00,0xbc,0x40,0xe5]
+// CHECK-ERROR: instruction requires: sve
+// CHECK-UNKNOWN: 00 bc 40 e5
+
+st1w { z0.d }, p7, [z0.d]
+// CHECK-INST: st1w { z0.d }, p7, [z0.d]
+// CHECK-ENCODING: [0x00,0xbc,0x40,0xe5]
+// CHECK-ERROR: instruction requires: sve
+// CHECK-UNKNOWN: 00 bc 40 e5
diff --git a/llvm/test/Transforms/InstCombine/select.ll b/llvm/test/Transforms/InstCombine/select.ll
index 185ff838b8192..c23587b606ce7 100644
--- a/llvm/test/Transforms/InstCombine/select.ll
+++ b/llvm/test/Transforms/InstCombine/select.ll
@@ -2487,3 +2487,21 @@ define <2 x i32> @true_undef_vec(i1 %cond, <2 x i32> %x) {
%s = select i1 %cond, <2 x i32> undef, <2 x i32> %x
ret <2 x i32> %s
}
+
+; FIXME: This is a miscompile!
+define i32 @pr47322_more_poisonous_replacement(i32 %arg) {
+; CHECK-LABEL: @pr47322_more_poisonous_replacement(
+; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[ARG:%.*]], 0
+; CHECK-NEXT: [[TRAILING:%.*]] = call i32 @llvm.cttz.i32(i32 [[ARG]], i1 immarg true), [[RNG0:!range !.*]]
+; CHECK-NEXT: [[SHIFTED:%.*]] = lshr i32 [[ARG]], [[TRAILING]]
+; CHECK-NEXT: [[R1_SROA_0_1:%.*]] = select i1 [[CMP]], i32 0, i32 [[SHIFTED]]
+; CHECK-NEXT: ret i32 [[R1_SROA_0_1]]
+;
+ %cmp = icmp eq i32 %arg, 0
+ %trailing = call i32 @llvm.cttz.i32(i32 %arg, i1 immarg true)
+ %shifted = lshr i32 %arg, %trailing
+ %r1.sroa.0.1 = select i1 %cmp, i32 0, i32 %shifted
+ ret i32 %r1.sroa.0.1
+}
+
+declare i32 @llvm.cttz.i32(i32, i1 immarg)
diff --git a/llvm/test/Transforms/InstSimplify/select.ll b/llvm/test/Transforms/InstSimplify/select.ll
index b1264138a15ea..05fa46ca3f49c 100644
--- a/llvm/test/Transforms/InstSimplify/select.ll
+++ b/llvm/test/Transforms/InstSimplify/select.ll
@@ -919,3 +919,19 @@ define <2 x i32> @all_constant_true_undef_false_constexpr_vec() {
%s = select i1 ptrtoint (<2 x i32> ()* @all_constant_true_undef_false_constexpr_vec to i1), <2 x i32> undef, <2 x i32> ()* @all_constant_true_undef_false_constexpr_vec to i32)>
ret <2 x i32> %s
}
+
+define i32 @pr47322_more_poisonous_replacement(i32 %arg) {
+; CHECK-LABEL: @pr47322_more_poisonous_replacement(
+; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[ARG:%.*]], 0
+; CHECK-NEXT: [[TRAILING:%.*]] = call i32 @llvm.cttz.i32(i32 [[ARG]], i1 immarg true)
+; CHECK-NEXT: [[SHIFTED:%.*]] = lshr i32 [[ARG]], [[TRAILING]]
+; CHECK-NEXT: [[R1_SROA_0_1:%.*]] = select i1 [[CMP]], i32 0, i32 [[SHIFTED]]
+; CHECK-NEXT: ret i32 [[R1_SROA_0_1]]
+;
+ %cmp = icmp eq i32 %arg, 0
+ %trailing = call i32 @llvm.cttz.i32(i32 %arg, i1 immarg true)
+ %shifted = lshr i32 %arg, %trailing
+ %r1.sroa.0.1 = select i1 %cmp, i32 0, i32 %shifted
+ ret i32 %r1.sroa.0.1
+}
+declare i32 @llvm.cttz.i32(i32, i1 immarg)
diff --git a/llvm/test/Transforms/OpenMP/deduplication.ll b/llvm/test/Transforms/OpenMP/deduplication.ll
index a25d980b1806f..9074b948cc3fe 100644
--- a/llvm/test/Transforms/OpenMP/deduplication.ll
+++ b/llvm/test/Transforms/OpenMP/deduplication.ll
@@ -5,21 +5,21 @@ target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16
%struct.ident_t = type { i32, i32, i32, i32, i8* }
-@0 = private unnamed_addr global %struct.ident_t { i32 0, i32 34, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str0, i32 0, i32 0) }, align 8
-@1 = private unnamed_addr global %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str1, i32 0, i32 0) }, align 8
-@2 = private unnamed_addr global %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str2, i32 0, i32 0) }, align 8
+@0 = private unnamed_addr constant %struct.ident_t { i32 0, i32 34, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str0, i32 0, i32 0) }, align 8
+@1 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str1, i32 0, i32 0) }, align 8
+@2 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str2, i32 0, i32 0) }, align 8
@.str0 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
@.str1 = private unnamed_addr constant [23 x i8] c";file001;loc0001;0;0;;\00", align 1
@.str2 = private unnamed_addr constant [23 x i8] c";file002;loc0002;0;0;;\00", align 1
; UTC_ARGS: --disable
-; CHECK-DAG: @0 = private unnamed_addr global %struct.ident_t { i32 0, i32 34, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str0, i32 0, i32 0) }, align 8
-; CHECK-DAG: @1 = private unnamed_addr global %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str1, i32 0, i32 0) }, align 8
-; CHECK-DAG: @2 = private unnamed_addr global %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str2, i32 0, i32 0) }, align 8
+; CHECK-DAG: @0 = private unnamed_addr constant %struct.ident_t { i32 0, i32 34, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str0, i32 0, i32 0) }, align 8
+; CHECK-DAG: @1 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str1, i32 0, i32 0) }, align 8
+; CHECK-DAG: @2 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str2, i32 0, i32 0) }, align 8
; CHECK-DAG: @.str0 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
; CHECK-DAG: @.str1 = private unnamed_addr constant [23 x i8] c";file001;loc0001;0;0;;\00", align 1
; CHECK-DAG: @.str2 = private unnamed_addr constant [23 x i8] c";file002;loc0002;0;0;;\00", align 1
-; CHECK-DAG: @3 = private unnamed_addr global %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str0, i32 0, i32 0) }, align 8
+; CHECK-DAG: @3 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str0, i32 0, i32 0) }, align 8
; UTC_ARGS: --enable
diff --git a/llvm/unittests/ADT/SmallPtrSetTest.cpp b/llvm/unittests/ADT/SmallPtrSetTest.cpp
index 3226fe615509c..eacd62ffc6ff0 100644
--- a/llvm/unittests/ADT/SmallPtrSetTest.cpp
+++ b/llvm/unittests/ADT/SmallPtrSetTest.cpp
@@ -313,8 +313,8 @@ TEST(SmallPtrSetTest, ConstTest) {
IntSet.insert(B);
EXPECT_EQ(IntSet.count(B), 1u);
EXPECT_EQ(IntSet.count(C), 1u);
- EXPECT_NE(IntSet.find(B), IntSet.end());
- EXPECT_NE(IntSet.find(C), IntSet.end());
+ EXPECT_TRUE(IntSet.contains(B));
+ EXPECT_TRUE(IntSet.contains(C));
}
// Verify that we automatically get the const version of PointerLikeTypeTraits
@@ -327,7 +327,7 @@ TEST(SmallPtrSetTest, ConstNonPtrTest) {
TestPair Pair(&A[0], 1);
IntSet.insert(Pair);
EXPECT_EQ(IntSet.count(Pair), 1u);
- EXPECT_NE(IntSet.find(Pair), IntSet.end());
+ EXPECT_TRUE(IntSet.contains(Pair));
}
// Test equality comparison.
@@ -367,3 +367,31 @@ TEST(SmallPtrSetTest, EqualityComparison) {
EXPECT_NE(c, e);
EXPECT_NE(e, d);
}
+
+TEST(SmallPtrSetTest, Contains) {
+ SmallPtrSet Set;
+ int buf[4] = {0, 11, 22, 11};
+ EXPECT_FALSE(Set.contains(&buf[0]));
+ EXPECT_FALSE(Set.contains(&buf[1]));
+
+ Set.insert(&buf[0]);
+ Set.insert(&buf[1]);
+ EXPECT_TRUE(Set.contains(&buf[0]));
+ EXPECT_TRUE(Set.contains(&buf[1]));
+ EXPECT_FALSE(Set.contains(&buf[3]));
+
+ Set.insert(&buf[1]);
+ EXPECT_TRUE(Set.contains(&buf[0]));
+ EXPECT_TRUE(Set.contains(&buf[1]));
+ EXPECT_FALSE(Set.contains(&buf[3]));
+
+ Set.erase(&buf[1]);
+ EXPECT_TRUE(Set.contains(&buf[0]));
+ EXPECT_FALSE(Set.contains(&buf[1]));
+
+ Set.insert(&buf[1]);
+ Set.insert(&buf[2]);
+ EXPECT_TRUE(Set.contains(&buf[0]));
+ EXPECT_TRUE(Set.contains(&buf[1]));
+ EXPECT_TRUE(Set.contains(&buf[2]));
+}
diff --git a/llvm/unittests/IR/LegacyPassManagerTest.cpp b/llvm/unittests/IR/LegacyPassManagerTest.cpp
index 8dda94b1b0326..b7801b52481dd 100644
--- a/llvm/unittests/IR/LegacyPassManagerTest.cpp
+++ b/llvm/unittests/IR/LegacyPassManagerTest.cpp
@@ -680,7 +680,7 @@ namespace llvm {
ASSERT_EQ(M->getFunctionList().size(), 4U);
Function *F = M->getFunction("test2");
Function *SF = splitSimpleFunction(*F);
- CallInst::Create(F, "", &*SF->getEntryBlock().getFirstInsertionPt());
+ CallInst::Create(F, "", &SF->getEntryBlock());
ASSERT_EQ(M->getFunctionList().size(), 5U);
CGModifierPass *P = new CGModifierPass();
legacy::PassManager Passes;
diff --git a/openmp/docs/ReleaseNotes.rst b/openmp/docs/ReleaseNotes.rst
index b7f2ec42277e3..e09ef5f5b6382 100644
--- a/openmp/docs/ReleaseNotes.rst
+++ b/openmp/docs/ReleaseNotes.rst
@@ -5,11 +5,6 @@ openmp 11.0.0 Release Notes
.. contents::
:local:
-.. warning::
- These are in-progress notes for the upcoming LLVM 11.0.0 release.
- Release notes for previous releases can be found on
- `the Download Page |