Skip to content
Snippets Groups Projects
Commit ade0a1ee authored by Jakub Jelinek's avatar Jakub Jelinek
Browse files

tree-vect-patterns: Improve __builtin_{clz,ctz,ffs}ll vectorization [PR109011]

For __builtin_popcountll tree-vect-patterns.cc has
vect_recog_popcount_pattern, which improves the vectorized code.
Without that the vectorization is always multi-type vectorization
in the loop (at least int and long long types) where we emit two
.POPCOUNT calls with long long arguments and int return value and then
widen to long long, so effectively after vectorization do the
V?DImode -> V?DImode popcount twice, then pack the result into V?SImode
and immediately unpack.

The following patch extends that handling to __builtin_{clz,ctz,ffs}ll
builtins as well (as long as there is an optab for them; more to come
laster).

x86 can do __builtin_popcountll with -mavx512vpopcntdq, __builtin_clzll
with -mavx512cd, ppc can do __builtin_popcountll and __builtin_clzll
with -mpower8-vector and __builtin_ctzll with -mpower9-vector, s390
can do __builtin_{popcount,clz,ctz}ll with -march=z13 -mzarch (i.e. VX).

2023-04-19  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/109011
	* tree-vect-patterns.cc (vect_recog_popcount_pattern): Rename to ...
	(vect_recog_popcount_clz_ctz_ffs_pattern): ... this.  Handle also
	CLZ, CTZ and FFS.  Remove vargs variable, use
	gimple_build_call_internal rather than gimple_build_call_internal_vec.
	(vect_vect_recog_func_ptrs): Adjust popcount entry.

	* gcc.dg/vect/pr109011-1.c: New test.
parent 76f44fbf
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment