-
- Downloads
aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]
This patch uses the FSCALE instruction provided by SVE to implement the
standard ldexp family of functions.
Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
following code:
float
test_ldexpf (float x, int i)
{
return __builtin_ldexpf (x, i);
}
double
test_ldexp (double x, int i)
{
return __builtin_ldexp(x, i);
}
GCC Output:
test_ldexpf:
b ldexpf
test_ldexp:
b ldexp
Since SVE has support for an FSCALE instruction, we can use this to process
scalar floats by moving them to a vector register and performing an fscale call,
similar to how LLVM tackles an ldexp builtin as well.
New Output:
test_ldexpf:
fmov s31, w0
ptrue p7.b, vl4
fscale z0.s, p7/m, z0.s, z31.s
ret
test_ldexp:
sxtw x0, w0
ptrue p7.b, vl8
fmov d31, x0
fscale z0.d, p7/m, z0.d, z31.d
ret
This is a revision of an earlier patch, and now uses the extended definition of
aarch64_ptrue_reg to generate predicate registers with the appropriate set bits.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by:
Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
PR target/111733
* config/aarch64/aarch64-sve.md
(ldexp<mode>3): Added a new pattern to match ldexp calls with scalar
floating modes and expand to the existing pattern for FSCALE.
* config/aarch64/iterators.md:
(SVE_FULL_F_SCALAR): Added an iterator to match all FP SVE modes as well
as their scalar equivalents.
(VPRED): Extended the attribute to handle GPF_HF modes.
* internal-fn.def (LDEXP): Changed macro to incorporate ldexpf16.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/fscale.c: New test.
Showing
- gcc/config/aarch64/aarch64-sve.md 20 additions, 5 deletionsgcc/config/aarch64/aarch64-sve.md
- gcc/config/aarch64/iterators.md 5 additions, 1 deletiongcc/config/aarch64/iterators.md
- gcc/internal-fn.def 1 addition, 1 deletiongcc/internal-fn.def
- gcc/testsuite/gcc.target/aarch64/sve/fscale.c 46 additions, 0 deletionsgcc/testsuite/gcc.target/aarch64/sve/fscale.c
Loading
Please register or sign in to comment