SVE intrinsics: Fold division and multiplication by -1 to neg
Because a neg instruction has lower latency and higher throughput than
sdiv and mul, svdiv and svmul by -1 can be folded to svneg. For svdiv,
this is already implemented on the RTL level; for svmul, the
optimization was still missing.
This patch implements folding to svneg for both operations using the
gimple_folder. For svdiv, the transform is applied if the divisor is -1.
Svmul is folded if either of the operands is -1. A case distinction of
the predication is made to account for the fact that svneg_m has 3 arguments
(argument 0 holds the values for the inactive lanes), while svneg_x and
svneg_z have only 2 arguments.
Tests were added or adjusted to check the produced assembly and runtime
tests were added to check correctness.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by:
Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Fold division by -1 to svneg.
(svmul_impl::fold): Fold multiplication by -1 to svneg.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/div_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/mul_s16.c: Adjust expected outcome.
* gcc.target/aarch64/sve/acle/asm/mul_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/mul_s64.c: Adjust expected outcome.
* gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise.
* gcc.target/aarch64/sve/div_const_run.c: New test.
* gcc.target/aarch64/sve/mul_const_run.c: Likewise.
Showing
- gcc/config/aarch64/aarch64-sve-builtins-base.cc 62 additions, 11 deletionsgcc/config/aarch64/aarch64-sve-builtins-base.cc
- gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c 59 additions, 0 deletionsgcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
- gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c 2 additions, 3 deletionsgcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c
- gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s32.c 43 additions, 3 deletionsgcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s32.c
- gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s64.c 2 additions, 3 deletionsgcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s64.c
- gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c 3 additions, 4 deletionsgcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c
- gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c 8 additions, 2 deletionsgcc/testsuite/gcc.target/aarch64/sve/div_const_run.c
- gcc/testsuite/gcc.target/aarch64/sve/mul_const_run.c 8 additions, 2 deletionsgcc/testsuite/gcc.target/aarch64/sve/mul_const_run.c
Loading
Please register or sign in to comment