Skip to content
Snippets Groups Projects
Commit 74caf975 authored by Christophe Lyon's avatar Christophe Lyon
Browse files

arm: [MVE intrinsics] Improve vdupq_n implementation

This patch makes the non-predicated vdupq_n MVE intrinsics use
vec_duplicate rather than an unspec.  This enables the compiler to
generate better code sequences (for instance using vmov when
possible).

The patch renames the existing mve_vdup<mode> pattern into
@mve_vdupq_n<mode>, and removes the now useless
@mve_<mve_insn>q_n_f<mode> and @mve_<mve_insn>q_n_<supf><mode> ones.

As a side-effect, it needs to update the mve_unpredicated_insn
predicates in @mve_<mve_insn>q_m_n_<supf><mode> and
@mve_<mve_insn>q_m_n_f<mode>.

Using vec_duplicates means the compiler is now able to use vmov in the
tests with an immediate argument in vdupq_n_[su]{8,16,32}.c:
	vmov.i8 q0,#0x1

However, this is only possible when the immediate has a suitable value
(MVE encoding constraints, see imm_for_neon_mov_operand predicate).

Provided we adjust the cost computations in arm_rtx_costs_internal(),
when the immediate does not meet the vmov constraints, we now generate:
	mov r0, #imm
	vdup.xx q0,r0

or
	ldr r0, .L4
	vdup.32 q0,r0
in the f32 case (with 1.1 as immediate).

Without the cost adjustment, we would generate:
	vldr.64	d0, .L4
	vldr.64	d1, .L4+8
and an associated literal pool entry.

Regarding the testsuite updates:
--------------------------------
* The signed versions of vdupq_* tests lack a version with an
immediate argument.  This patch adds them, similar to what we already
have for vdupq_n_u*.c tests.

* Code generation for different immediate values is checked with the
new tests this patch introduces.  Note there's no need for s8/u8 tests
because 8-bit immediates always comply wth imm_for_neon_mov_operand.

* We can remove xfail from vcmp*f tests since we now generate:
	movw r3, #15462
	vcmp.f16 eq, q0, r3
instead of the previous:
	vldr.64 d6, .L5
	vldr.64 d7, .L5+8
	vcmp.f16 eq, q0, q3

Tested on arm-linux-gnueabihf and arm-none-eabi with no regression.

2024-07-02  Jolen Li  <jolen.li@arm.com>
	    Christophe Lyon  <christophe.lyon@arm.com>

	gcc/
	* config/arm/arm-mve-builtins-base.cc (vdupq_impl): New class.
	(vdupq): Use new implementation.
	* config/arm/arm.cc (arm_rtx_costs_internal): Handle HFmode
	for COST_DOUBLE. Update costing for CONST_VECTOR.
	* config/arm/arm_mve_builtins.def: Merge vdupq_n_f, vdupq_n_s
	and vdupq_n_u into vdupq_n.
	* config/arm/mve.md (mve_vdup<mode>): Rename into ...
	(@mve_vdup_n<mode>): ... this.
	(@mve_<mve_insn>q_n_f<mode>): Delete.
	(@mve_<mve_insn>q_n_<supf><mode>): Delete..
	(@mve_<mve_insn>q_m_n_<supf><mode>): Update mve_unpredicated_insn
	attribute.
	(@mve_<mve_insn>q_m_n_f<mode>): Likewise.

	gcc/testsuite/
	* gcc.target/arm/mve/intrinsics/vdupq_n_u8.c (foo1): Update
	expected code.
	* gcc.target/arm/mve/intrinsics/vdupq_n_u16.c (foo1): Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_u32.c (foo1): Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_s8.c: Add test with
	immediate argument.
	* gcc.target/arm/mve/intrinsics/vdupq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_f16.c (foo1): Update
	expected code.
	* gcc.target/arm/mve/intrinsics/vdupq_n_f32.c (foo1): Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c: Add test with
	immediate argument.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_f32-2.c: New test.
	* gcc.target/arm/mve/intrinsics/vdupq_n_s16-2.c: New test.
	* gcc.target/arm/mve/intrinsics/vdupq_n_s32-2.c: New test.
	* gcc.target/arm/mve/intrinsics/vdupq_n_u16-2.c: New test.
	* gcc.target/arm/mve/intrinsics/vdupq_n_u32-2.c: New test.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: Remove xfail.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c: Likewise.
parent 79dae328
No related branches found
No related tags found
No related merge requests found
Showing
with 146 additions and 67 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment