Skip to content
Snippets Groups Projects
Commit f6fbc0d2 authored by Jennifer Schmitz's avatar Jennifer Schmitz
Browse files

SVE intrinsics: Fold svsra with op1 all zeros to svlsr/svasr.


A common idiom in intrinsics loops is to have accumulator intrinsics
in an unrolled loop with an accumulator initialized to zero at the beginning.
Propagating the initial zero accumulator into the first iteration
of the loop and simplifying the first accumulate instruction is a
desirable transformation that we should teach GCC.
Therefore, this patch folds svsra to svlsr/svasr if op1 is all zeros,
producing the lower latency instructions LSR/ASR instead of USRA/SSRA.
We implemented this optimization in svsra_impl::fold.

Tests were added to check the produced assembly for use of LSR/ASR.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: default avatarJennifer Schmitz <jschmitz@nvidia.com>

gcc/
	* config/aarch64/aarch64-sve-builtins-sve2.cc
	(svsra_impl::fold): Fold svsra to svlsr/svasr if op1 is all zeros.

gcc/testsuite/
	* gcc.target/aarch64/sve2/acle/asm/sra_s32.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/sra_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/sra_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/sra_u64.c: Likewise.
parent 3e7549ec
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment