Description
a bytewise copy of small but non-constant size with non-aliasing src/dest is transformed by is transformed LoopIdiomRecognize into an intrinsic memcpy. because the size is non-constant, neither InstCombine nor SelectionDAG transform the small copy back into an appropriate series of loads and stores, typically the intrinsic ends up as a call to memcpy
. for small copies (<8 bytes as a fairly unscientific threshold) the library call is much slower than doing the copy with a short loop or inlined instructions. for size-optimized code, at least for x86 targets, a library call is also just larger.
i noticed this in some Rust (godbolt) but it's pretty apparent with restrict
arguments in C as well (clang godbolt).
it seems like handling dynamic-but-small-sized memcpy is just particularly tricky, so maybe there's not much we can do here. i didn't see an existing issue similar to this, at least...
i'm not very familiar with how symbolic information is retained in LLVM. it seems that ideally i could write if (Size.isNotConstantButSmallerThan(16))
and decide to insert something better than a memcpy library call, but i can't tell if the max trip count of the original loop is retained as a hint on the memcpy size later, or if it's totally lost by virtue of being non-constant.
even then, in some target-specific cases there are specific instruction sequences that are more profitable than a memcpy
- x86 FSRM (already handled in x86 SelectionDAG) is the example i know. so i'm not sure that it is always profitable to inline a small-but-dynamic-size memcpy?
i also couldn't figure out if there's a non-constant SDValue might still have range information associated to try anything in X86SelectionDAGInfo.cpp. did i miss a detail, or is SelectionDAG too late in the process to have range information? maybe an appropriate thing here would be a flag on memcpy to hint later that we knew a memcpy's max size is "small"? (and in that case, is "dynamic but low-upper-bound" something LLVM could determine in LoopIdiomRecognize when creating the memcpy in the first place?)
i was hoping to put together a patch to propose too, but as-is i have no idea what an appropriate change would be 😅 hopefully someone has a better idea?