Commit 9b2915d9 authored 5 months ago by Soumya AR

aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]


This patch uses the FSCALE instruction provided by SVE to implement the
standard ldexp family of functions.

Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
following code:

float
test_ldexpf (float x, int i)
{
	return __builtin_ldexpf (x, i);
}

double
test_ldexp (double x, int i)
{
	return __builtin_ldexp(x, i);
}

GCC Output:

test_ldexpf:
	b ldexpf

test_ldexp:
	b ldexp

Since SVE has support for an FSCALE instruction, we can use this to process
scalar floats by moving them to a vector register and performing an fscale call,
similar to how LLVM tackles an ldexp builtin as well.

New Output:

test_ldexpf:
	fmov	s31, w0
	ptrue	p7.b, vl4
	fscale	z0.s, p7/m, z0.s, z31.s
	ret

test_ldexp:
	sxtw	x0, w0
	ptrue	p7.b, vl8
	fmov	d31, x0
	fscale	z0.d, p7/m, z0.d, z31.d
	ret

This is a revision of an earlier patch, and now uses the extended definition of
aarch64_ptrue_reg to generate predicate registers with the appropriate set bits.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR <soumyaa@nvidia.com>

gcc/ChangeLog:

	PR target/111733
	* config/aarch64/aarch64-sve.md
	(ldexp<mode>3): Added a new pattern to match ldexp calls with scalar
	floating modes and expand to the existing pattern for FSCALE.
	* config/aarch64/iterators.md:
	(SVE_FULL_F_SCALAR): Added an iterator to match all FP SVE modes as well
	as their scalar equivalents.
	(VPRED): Extended the attribute to handle GPF_HF modes.
	* internal-fn.def (LDEXP): Changed macro to incorporate ldexpf16.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/fscale.c: New test.

parent 445d8bb6

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 72 additions and 7 deletions

Please register or to comment