Description
Bugzilla Link | 47729 |
Version | trunk |
OS | Solaris |
CC | @efriedma-quic,@jrtc27,@jyknight,@jfbastien |
Extended Description
Several tests FAIL on Solaris/sparcv9 where long double is 128 bits:
Builtins-sparcv9-sunos :: addtf3_test.c
Builtins-sparcv9-sunos :: divtf3_test.c
Builtins-sparcv9-sunos :: extenddftf2_test.c
Builtins-sparcv9-sunos :: extendsftf2_test.c
Builtins-sparcv9-sunos :: floatditf_test.c
Builtins-sparcv9-sunos :: floatsitf_test.c
Builtins-sparcv9-sunos :: floattitf_test.c
Builtins-sparcv9-sunos :: floatunditf_test.c
Builtins-sparcv9-sunos :: floatunsitf_test.c
Builtins-sparcv9-sunos :: floatuntitf_test.c
Builtins-sparcv9-sunos :: multf3_test.c
Builtins-sparcv9-sunos :: subtf3_test.c
E.g. addtf3_test.c FAILs with
error in test__addtf3(36.40888825164657541977, 0.96444431369742592240) = 37.37333256534401470898, expected 37.37333256534400134216
The error doesn't happen in a 1-stage build with gcc or in a Debug build.
Via side-by-side debugging with addtf3.c.o compiled with clang -O vs. gcc -O
(everything else from a regular 2-stage clang build), it turned out that both
compilers produce the same result until the very end of __addtf3. The only
difference is in the final fromRep call, which can be seen with this testcase:
$ cat fr.c
typedef long double fp_t;
typedef __uint128_t rep_t;
fp_t fromRep(rep_t x) {
const union {
fp_t f;
rep_t i;
} rep = {.i = x};
return rep.f;
}
gcc -m64 -O produces
fromRep:
add %sp, -144, %sp
stx %o0, [%sp+2175]
stx %o1, [%sp+2183]
ldd [%sp+2175], %f0
ldd [%sp+2183], %f2
jmp %o7+8
add %sp, 144, %sp
while clang yields
fromRep: ! @fromRep
! %bb.0: ! %entry
save %sp, -144, %sp
add %fp, 2031, %i2
or %i2, 8, %i2
stx %i0, [%fp+2031]
ldd [%fp+2031], %f0
ldd [%i2], %f2
stx %i1, [%i2]
ret
restore
The long double return value is supposed to be in %f0 and %f2. gcc handles
this just fine, and clang gets it right for %f0, too. However, it stores the
contents of an uninitialized stack slot in %f2 and only then stores the second
half (%i1) of the arg there.
I don't have the slightest idea how to fix this codegen bug, but I have a
workaround patch (to be posted for reference shortly) that wraps the affected
functions in #pragma clang optimize off/on (nothing more than a hack to show
that this fixes all the failures above).