arm: [MVE intrinsics] Improve vdupq_n implementation
This patch makes the non-predicated vdupq_n MVE intrinsics use vec_duplicate rather than an unspec. This enables the compiler to generate better code sequences (for instance using vmov when possible). The patch renames the existing mve_vdup<mode> pattern into @mve_vdupq_n<mode>, and removes the now useless @mve_<mve_insn>q_n_f<mode> and @mve_<mve_insn>q_n_<supf><mode> ones. As a side-effect, it needs to update the mve_unpredicated_insn predicates in @mve_<mve_insn>q_m_n_<supf><mode> and @mve_<mve_insn>q_m_n_f<mode>. Using vec_duplicates means the compiler is now able to use vmov in the tests with an immediate argument in vdupq_n_[su]{8,16,32}.c: vmov.i8 q0,#0x1 However, this is only possible when the immediate has a suitable value (MVE encoding constraints, see imm_for_neon_mov_operand predicate). Provided we adjust the cost computations in arm_rtx_costs_internal(), when the immediate does not meet the vmov constraints, we now generate: mov r0, #imm vdup.xx q0,r0 or ldr r0, .L4 vdup.32 q0,r0 in the f32 case (with 1.1 as immediate). Without the cost adjustment, we would generate: vldr.64 d0, .L4 vldr.64 d1, .L4+8 and an associated literal pool entry. Regarding the testsuite updates: -------------------------------- * The signed versions of vdupq_* tests lack a version with an immediate argument. This patch adds them, similar to what we already have for vdupq_n_u*.c tests. * Code generation for different immediate values is checked with the new tests this patch introduces. Note there's no need for s8/u8 tests because 8-bit immediates always comply wth imm_for_neon_mov_operand. * We can remove xfail from vcmp*f tests since we now generate: movw r3, #15462 vcmp.f16 eq, q0, r3 instead of the previous: vldr.64 d6, .L5 vldr.64 d7, .L5+8 vcmp.f16 eq, q0, q3 Tested on arm-linux-gnueabihf and arm-none-eabi with no regression. 2024-07-02 Jolen Li <jolen.li@arm.com> Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/arm-mve-builtins-base.cc (vdupq_impl): New class. (vdupq): Use new implementation. * config/arm/arm.cc (arm_rtx_costs_internal): Handle HFmode for COST_DOUBLE. Update costing for CONST_VECTOR. * config/arm/arm_mve_builtins.def: Merge vdupq_n_f, vdupq_n_s and vdupq_n_u into vdupq_n. * config/arm/mve.md (mve_vdup<mode>): Rename into ... (@mve_vdup_n<mode>): ... this. (@mve_<mve_insn>q_n_f<mode>): Delete. (@mve_<mve_insn>q_n_<supf><mode>): Delete.. (@mve_<mve_insn>q_m_n_<supf><mode>): Update mve_unpredicated_insn attribute. (@mve_<mve_insn>q_m_n_f<mode>): Likewise. gcc/testsuite/ * gcc.target/arm/mve/intrinsics/vdupq_n_u8.c (foo1): Update expected code. * gcc.target/arm/mve/intrinsics/vdupq_n_u16.c (foo1): Likewise. * gcc.target/arm/mve/intrinsics/vdupq_n_u32.c (foo1): Likewise. * gcc.target/arm/mve/intrinsics/vdupq_n_s8.c: Add test with immediate argument. * gcc.target/arm/mve/intrinsics/vdupq_n_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_n_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_n_f16.c (foo1): Update expected code. * gcc.target/arm/mve/intrinsics/vdupq_n_f32.c (foo1): Likewise. * gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c: Add test with immediate argument. * gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_n_f32-2.c: New test. * gcc.target/arm/mve/intrinsics/vdupq_n_s16-2.c: New test. * gcc.target/arm/mve/intrinsics/vdupq_n_s32-2.c: New test. * gcc.target/arm/mve/intrinsics/vdupq_n_u16-2.c: New test. * gcc.target/arm/mve/intrinsics/vdupq_n_u32-2.c: New test. * gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: Remove xfail. * gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c: Likewise.
Showing
- gcc/config/arm/arm-mve-builtins-base.cc 54 additions, 1 deletiongcc/config/arm/arm-mve-builtins-base.cc
- gcc/config/arm/arm.cc 8 additions, 2 deletionsgcc/config/arm/arm.cc
- gcc/config/arm/arm_mve_builtins.def 1 addition, 3 deletionsgcc/config/arm/arm_mve_builtins.def
- gcc/config/arm/mve.md 7 additions, 34 deletionsgcc/config/arm/mve.md
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c 2 additions, 2 deletionsgcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c 2 additions, 2 deletionsgcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c 2 additions, 2 deletionsgcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c 2 additions, 2 deletionsgcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c 1 addition, 1 deletiongcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c 2 additions, 2 deletionsgcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c 2 additions, 2 deletionsgcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c 2 additions, 2 deletionsgcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c 2 additions, 2 deletionsgcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c 2 additions, 2 deletionsgcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c 2 additions, 2 deletionsgcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c 2 additions, 2 deletionsgcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c 17 additions, 1 deletiongcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c 17 additions, 1 deletiongcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c 17 additions, 1 deletiongcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c
- gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c 2 additions, 1 deletiongcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
Loading
Please register or sign in to comment