Also lower SLP grouped loads with just one consumer
This makes sure to produce interleaving schemes or load-lanes for single-element interleaving and other permutes that otherwise would use more than three vectors. It exposes the latent issue that single-element interleaving with large gaps can be inefficient - the mitigation in get_group_load_store_type doesn't trigger when we clear the load permutation. It also exposes the fact that not all permutes can be lowered in the best way in a vector length agnostic way so I've added an exception to keep power-of-two size contiguous aligned chunks unlowered (unless we want load-lanes). The optimal handling of load/store vectorization is going to continue to be a learning process. * tree-vect-slp.cc (vect_lower_load_permutations): Also process single-use grouped loads. Avoid lowering contiguous aligned power-of-two sized chunks, those are better handled by the vector size specific SLP code generation. * tree-vect-stmts.cc (get_group_load_store_type): Drop the unrelated requirement of a load permutation for the single-element interleaving limit. * gcc.dg/vect/slp-46.c: Remove XFAIL.
Loading
Please register or sign in to comment