x86: Properly implement AMX-TILE load/store intrinsics
ldtilecfg and sttilecfg take a 512-byte memory block. With _tile_loadconfig implemented as extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _tile_loadconfig (const void *__config) { __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config))); } GCC sees: (parallel [ (asm_operands/v ("ldtilecfg %X0") ("") 0 [(mem/f/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars) (const_int -64 [0xffffffffffffffc0])) [1 MEM[(const void * *)&tile_data]+0 S8 A128])] [(asm_input:DI ("m"))] (clobber (reg:CC 17 flags))]) and the memory operand size is 1 byte. As the result, the rest of 511 bytes is ignored by GCC. Implement ldtilecfg and sttilecfg intrinsics with a pointer to XImode to honor the 512-byte memory block. gcc/ChangeLog: PR target/114098 * config/i386/amxtileintrin.h (_tile_loadconfig): Use __builtin_ia32_ldtilecfg. (_tile_storeconfig): Use __builtin_ia32_sttilecfg. * config/i386/i386-builtin.def (BDESC): Add __builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg. * config/i386/i386-expand.cc (ix86_expand_builtin): Handle IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG. * config/i386/i386.md (ldtilecfg): New pattern. (sttilecfg): Likewise. gcc/testsuite/ChangeLog: PR target/114098 * gcc.target/i386/amxtile-4.c: New test.
Showing
- gcc/config/i386/amxtileintrin.h 2 additions, 2 deletionsgcc/config/i386/amxtileintrin.h
- gcc/config/i386/i386-builtin.def 4 additions, 0 deletionsgcc/config/i386/i386-builtin.def
- gcc/config/i386/i386-expand.cc 19 additions, 0 deletionsgcc/config/i386/i386-expand.cc
- gcc/config/i386/i386.md 24 additions, 0 deletionsgcc/config/i386/i386.md
- gcc/testsuite/gcc.target/i386/amxtile-4.c 52 additions, 0 deletionsgcc/testsuite/gcc.target/i386/amxtile-4.c
Loading
Please register or sign in to comment