Skip to content

AMDGPU should not scalarize v2f16 / v2bf16 copysign #141931

@arsenm

Description

@arsenm

Currently half element copysign is scalarized and produces this ugly expansion:

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s

; s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; s_movk_i32 s4, 0x7fff
; v_bfi_b32 v2, s4, v0, v1
; v_lshrrev_b32_e32 v1, 16, v1
; v_lshrrev_b32_e32 v0, 16, v0
; v_bfi_b32 v0, s4, v0, v1
; s_mov_b32 s4, 0x5040100
; v_perm_b32 v0, v0, v2, s4
; s_setpc_b64 s[30:31]
define <2 x half> @copysign_v2f16(<2 x half> %a, <2 x half> %b) {
  %result = call <2 x half> @llvm.copysign.v2f16(<2 x half> %a, <2 x half> %b)
  ret <2 x half> %result
}

If I hack up the vector legalizer's logic, the default expansion finds a vector BFI:

WIth gx803:

	s_mov_b32 s4, 0x7fff7fff
	v_bfi_b32 v0, s4, v0, v1

With gfx9+, it does worse:

	v_and_b32_e32 v1, 0x80008000, v1
	s_mov_b32 s4, 0x7fff7fff
	v_and_or_b32 v0, v0, s4, v1

We can trivially extend the existing legal f16 copysign pattern to handle the 2 element case like in the gfx8 output. It's a little more work than that to support the cases where the sign source is a different FP type

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions