Skip to content
Snippets Groups Projects
Commit 4db34072 authored by Kewen Lin's avatar Kewen Lin
Browse files

predcom: Enabled by loop vect at O2 [PR100794]

As PR100794 shows, in the current implementation PRE bypasses
some optimization to avoid introducing loop carried dependence
which stops loop vectorizer to vectorize the loop.  At -O2,
there is no downstream pass to re-catch this kind of opportunity
if loop vectorizer fails to vectorize that loop.

This patch follows Richi's suggestion in the PR, if predcom flag
isn't set and loop vectorization will enable predcom without any
unrolling implicitly.  The Power9 SPEC2017 evaluation showed it
can speed up 521.wrf_r 3.30% and 554.roms_r 1.08% at very-cheap
cost model, no remarkable impact at cheap cost model, the build
time and size impact is fine (see the PR for the details).

By the way, I tested another proposal to guard PRE not skip the
optimization for cheap and very-cheap vect cost models, the
evaluation results showed it's fine with very cheap cost model,
but it can degrade some bmks like 521.wrf_r -9.17% and
549.fotonik3d_r -2.07% etc.

Bootstrapped/regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu.

gcc/ChangeLog:

	PR tree-optimization/100794
	* tree-predcom.c (tree_predictive_commoning_loop): Add parameter
	allow_unroll_p and only allow unrolling when it's true.
	(tree_predictive_commoning): Add parameter allow_unroll_p and
	adjust for it.
	(run_tree_predictive_commoning): Likewise.
	(pass_predcom::gate): Check flag_tree_loop_vectorize and
	global_options_set.x_flag_predictive_commoning.
	(pass_predcom::execute): Adjust for allow_unroll_p.

gcc/testsuite/ChangeLog:

	PR tree-optimization/100794
	* gcc.dg/tree-ssa/pr100794.c: New test.
parent 774686d4
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment