Skip to content

Avx512f #912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Sep 19, 2020
Merged

Avx512f #912

merged 24 commits into from
Sep 19, 2020

Conversation

minybot
Copy link
Contributor

@minybot minybot commented Sep 17, 2020

knot, kandn, kxornd, kmov
permute: epi32
extractf32x4_ps not mask and maskz
permute_f32x4
permute_f64x2
permute_i32x4
permute_i64x2
moveldup_ps
movehdup_ps
movedup_pd

@rust-highfive
Copy link

r? @Amanieu

(rust_highfive has picked a reviewer for you, use r? to override)

/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=knot_mask16&expand=3233)
#[inline]
#[target_feature(enable = "avx512f")]
#[cfg_attr(all(test, not(target_os = "macos")), assert_instr(not))] // generate normal not code instead of knotw
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you special-casing macos here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In osX, it does not generate any "not" or "xor" instructions. It generates "vmovaps" tested in CI and osX with clang.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you post the fully assembly output you are getting on macOS? I find it extremely strange that it is behaving differently from Linux.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

osX Rust

#[target_feature(enable = "avx512f")]
unsafe fn avx512() {
let c = _mm512_knot(0b00000000_00000000);
}

fn main() {
unsafe { avx512(); }
}

   .loc    6 10538 0
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register %rbp
    subq    $16, %rsp
    movw    %di, -4(%rbp)

Ltmp21:
.loc 6 10539 15 prologue_end
movzwl %di, %edi
movl $65535, %esi
callq __ZN9core_arch9core_arch3x867avx512f11_mm512_kxor17hbdf54d0540836621E
movw %ax, -6(%rbp)
.loc 6 0 15 is_stmt 0
movw -6(%rbp), %ax
.loc 6 10539 5
movw %ax, -2(%rbp)
movw -2(%rbp), %ax
movw %ax, -8(%rbp)
.loc 6 0 5
movw -8(%rbp), %ax
.loc 6 10540 2 is_stmt 1
addq $16, %rsp
popq %rbp
retq

@minybot
Copy link
Contributor Author

minybot commented Sep 19, 2020

Clang on osX:
There is a knot now.

mask_0 = _mm512_int2mask( (1<<0) | (1<<1) | (1<<2) | (1<<3) | (1<<4) | (1<<5) | (1<<6) | (1<<7) | (1<<8) | (1<<9) | (1<<10) | (1<<11) | (1<<12) | (1<<13) | (1<<14) | (1<<15) );

mask_2 = _mm512_knot(mask_0);

%bb.0:

    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register %rbp
    andq    $-64, %rsp
    subq    $1152, %rsp             ## imm = 0x480
    xorl    %eax, %eax
    movl    $65535, 1128(%rsp)      ## imm = 0xFFFF
    movw    1128(%rsp), %cx
    movw    %cx, 30(%rsp)
    movw    30(%rsp), %cx
    movw    %cx, 1134(%rsp)
    kmovw   1134(%rsp), %k0
    knotw   %k0, %k0
    kmovw   %k0, %edx
                                    ## kill: def $dx killed $dx killed $edx
    movw    %dx, 26(%rsp)
    movq    %rbp, %rsp
    popq    %rbp
    retq
    .cfi_endproc
                                    ## -- End function

.subsections_via_symbols

@Amanieu
Copy link
Member

Amanieu commented Sep 19, 2020

For the k{not,and,xor, etc} function I would prefer if we used simple bitwise operations on integers rather than intrinsics. As a bonus it would allow the compiler to optimize these functions better.

We could then remove the mostly useless assert_instr on these intrinsics.

@minybot
Copy link
Contributor Author

minybot commented Sep 19, 2020

For the k{not,and,xor, etc} function I would prefer if we used simple bitwise operations on integers rather than intrinsics. As a bonus it would allow the compiler to optimize these functions better.

We could then remove the mostly useless assert_instr on these intrinsics.
Ok. I will update a new version use bitwise operations.

@Amanieu
Copy link
Member

Amanieu commented Sep 19, 2020

knot doesn't use a not instruction because it only inverts a 16-bit value. It uses xor 0xffff instead.

@minybot
Copy link
Contributor Author

minybot commented Sep 19, 2020

knot doesn't use a not instruction because it only inverts a 16-bit value. It uses xor 0xffff instead.
Yes, I just remove assert_instr on these intrinsics?

@Amanieu
Copy link
Member

Amanieu commented Sep 19, 2020

You can change it to xor. CI will complain if you have an intrinsic without assert_instr.

@@ -10524,7 +10524,7 @@ pub unsafe fn _mm512_kxor(a: __mmask16, b: __mmask16) -> __mmask16 {
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=knot_mask16&expand=3233)
#[inline]
#[target_feature(enable = "avx512f")]
#[cfg_attr(test, assert_instr(not))] // generate normal not code instead of knotw
//#[cfg_attr(test, assert_instr(xor))] // generate normal not code instead of knotw
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't leave commented-out code. Either keep the line with xor or delete it entirely.

@minybot
Copy link
Contributor Author

minybot commented Sep 19, 2020

For osX, it uses xor. For Linux, it uses not. So, I remove assert_instr test for _mm512_not?

@Amanieu
Copy link
Member

Amanieu commented Sep 19, 2020

Yes, in that case just remove the assert_instr. I think this happens because the darwin target optimizes for the core2 CPU while other targets optimize for the generic cpu.

@Amanieu Amanieu merged commit 20f9e2f into rust-lang:master Sep 19, 2020
@minybot minybot deleted the avx512f branch September 19, 2020 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants