Skip to content

AMDGPUTargetStreamer generates .kd symbols, breaking LTO requirement, may be discarded by --gc-sections #119479

@benvanik

Description

@benvanik

The caching added in 3733ed6 by @MaskRay seems to have broken LTO and --gc-sections for certain use cases. Specifically the change made in lld/ELF/MarkLive.cpp to use the cached isExported value instead of calling includeInDynsym seems to be causing additional sections to be dropped that should not be (or at least were not before the change): 3733ed6#diff-3c88c62d912008cc04f796b330a035ecda925645264eaef43185ad43991cb8e9L224)

The AMDGPU target inserts special kernel descriptor object symbols that must be preserved into the final ELF for the runtime to load. These match any exported kernel in name with a .kd suffix and are emitted by AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor. Prior to the referenced commit these symbols existed and after they don't.

By reverting the mentioned line in MarkLive.cpp the original behavior is restored. I'm not familiar with the codebase but I suspect isExported is not initialized or not safe to cache at that location.

The following repro shows the issue (lld_lto_bug.c):

[[clang::amdgpu_kernel, gnu::visibility("protected")]] void some_kernel(int n) {
  //
}

compiled using

$ clang \
  -x c -std=c23 \
  -target amdgcn-amd-amdhsa -march=gfx1100 \
  -nogpulib \
  -fgpu-rdc \
  -fno-ident \
  -fvisibility=hidden \
  -O3 \
  lld_lto_bug.c \
  -c -emit-llvm -o lld_lto_bug.bc

or since bc files cannot be attached:

; ModuleID = 'lld_lto_bug.bc'
source_filename = "lld_lto_bug.c"
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
target triple = "amdgcn-amd-amdhsa"

@__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 500

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
define protected amdgpu_kernel void @some_kernel(i32 noundef %n) local_unnamed_addr #0 {
entry:
  ret void
}

attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx1100" "target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot10-insts,+dot12-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32" "uniform-work-group-size"="false" }

!llvm.module.flags = !{!0, !1, !2}

!0 = !{i32 1, !"amdhsa_code_object_version", i32 500}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{i32 8, !"PIC Level", i32 2}

Linking with LTO and gc-sections:

lld \
  -flavor gnu \
  -m elf64_amdgpu \
  -shared \
  -plugin-opt=mcpu=gfx1100 \
  -plugin-opt=O3 \
  --lto-CGO3 \
  --gc-sections \
  --print-gc-sections \
  --strip-debug \
  --discard-all \
  --discard-locals \
  -o lld_lto_bug.so \
  lld_lto_bug.bc

Before the commit this will print the expected output (no removal of the rodata):

removing unused section lld_lto_bug_patched.so.lto.o:(.text)

After the commit with the regression removing the rodata:

removing unused section lld_lto_bug.so.lto.o:(.text)
removing unused section lld_lto_bug.so.lto.o:(.rodata)

This can be verified with llvm-readelf as before:

Symbol table '.dynsym' contains 3 entries:
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND 
     1: 0000000000001500     4 FUNC    GLOBAL PROTECTED   7 some_kernel
     2: 0000000000000480    64 OBJECT  GLOBAL PROTECTED   6 some_kernel.kd

The some_kernel.kd OBJECT is what is required at runtime to use the ELF.

And after:

Symbol table '.dynsym' contains 2 entries:
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND 
     1: 0000000000001500     4 FUNC    GLOBAL PROTECTED   6 some_kernel

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions