-
- Downloads
Handle non-grouped stores as single-lane SLP
The following enables single-lane loop SLP discovery for non-grouped stores and adjusts vectorizable_store to properly handle those. For gfortran.dg/vect/vect-8.f90 we vectorize one additional loop, not running into the "not falling back to strided accesses" bail-out. I have not investigated in detail. There is a set of i386 target assembler test FAILs, gcc.target/i386/pr88531-2[bc].c in particular fail because the target cannot identify SLP emulated gathers, see another mail from me. Others need adjustment, I've adjusted one with this patch only. In particular there are gcc.target/i386/cond_op_fma_*-1.c FAILs that are because we no longer fold a VEC_COND_EXPR during the region value-numbering we do after vectorization since we code-generate a { 0.0, ... } constant in the VEC_COND_EXPR now instead of having a separate statement which gets forwarded and then triggers folding. This leads to sligtly different code generation. The solution is probably to use gimple_build when building stmts or, in this case, directly emit .COND_FMA instead of .FMA and a VEC_COND_EXPR. gcc.dg/vect/slp-19a.c mixes contiguous 8-lane SLP with a single lane contiguous store from one lane of the 8-lane load and we expect to use load-lanes for this reason but the heuristic for forcing single-lane rediscovery as implemented doesn't trigger here as it treats both SLP instances separately. FAILs on RISC-V gcc.dg/vect/slp-19c.c shows we fail to implement an interleaving scheme for group_size 12 (by extension using the group_size 3 scheme to reduce to 4 lanes and then continue with a pow2 scheme would work); we are also not considering load-lanes because of the above reason, but aarch64 cannot do ld12. FAILs on AARCH64 (load requires three vectors) and x86_64. gcc.dg/vect/slp-19c.c FAILs with variable-length vectors because of "SLP induction not supported for variable-length vectors". gcc.target/aarch64/pr110449.c will FAIL because the (contested) optimization in r14-2367-g224fd59b2dc8a5 was only applied to loop-vect but not SLP vect. I'll leave it to target maintainers to either XFAIL (the optimization is bad) or remove the test. * tree-vect-slp.cc (vect_analyze_slp): Perform single-lane loop SLP discovery for non-grouped stores. Move check on the root for re-doing SLP analysis with a single lane for load/store-lanes earlier and make sure we are dealing with a grouped access. * tree-vect-stmts.cc (vectorizable_store): Always set vec_num for SLP. * gcc.dg/vect/O3-pr39675-2.c: Adjust expected number of SLP. * gcc.dg/vect/fast-math-vect-call-1.c: Likewise. * gcc.dg/vect/no-scevccp-slp-31.c: Likewise. * gcc.dg/vect/slp-12b.c: Likewise. * gcc.dg/vect/slp-12c.c: Likewise. * gcc.dg/vect/slp-19a.c: Likewise. * gcc.dg/vect/slp-19b.c: Likewise. * gcc.dg/vect/slp-4-big-array.c: Likewise. * gcc.dg/vect/slp-4.c: Likewise. * gcc.dg/vect/slp-5.c: Likewise. * gcc.dg/vect/slp-7.c: Likewise. * gcc.dg/vect/slp-perm-7.c: Likewise. * gcc.dg/vect/slp-37.c: Likewise. * gcc.dg/vect/fast-math-vect-call-2.c: Likewise. * gcc.dg/vect/slp-26.c: RISC-V can now SLP two instances. * gcc.dg/vect/vect-outer-slp-3.c: Disable vectorization of initialization loop. * gcc.dg/vect/slp-reduc-5.c: Likewise. * gcc.dg/vect/no-scevccp-outer-12.c: Un-XFAIL. SLP can handle inner loop inductions with multiple vector stmt copies. * gfortran.dg/vect/vect-8.f90: Adjust expected number of vectorized loops. * gcc.target/i386/vectorize1.c: Adjust what we scan for.
Showing
- gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c 1 addition, 1 deletiongcc/testsuite/gcc.dg/vect/O3-pr39675-2.c
- gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c 1 addition, 1 deletiongcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c
- gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c 1 addition, 1 deletiongcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c
- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c 1 addition, 2 deletionsgcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
- gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c 3 additions, 2 deletionsgcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
- gcc/testsuite/gcc.dg/vect/slp-12b.c 1 addition, 1 deletiongcc/testsuite/gcc.dg/vect/slp-12b.c
- gcc/testsuite/gcc.dg/vect/slp-12c.c 1 addition, 1 deletiongcc/testsuite/gcc.dg/vect/slp-12c.c
- gcc/testsuite/gcc.dg/vect/slp-19a.c 1 addition, 1 deletiongcc/testsuite/gcc.dg/vect/slp-19a.c
- gcc/testsuite/gcc.dg/vect/slp-19b.c 1 addition, 1 deletiongcc/testsuite/gcc.dg/vect/slp-19b.c
- gcc/testsuite/gcc.dg/vect/slp-26.c 2 additions, 1 deletiongcc/testsuite/gcc.dg/vect/slp-26.c
- gcc/testsuite/gcc.dg/vect/slp-37.c 1 addition, 1 deletiongcc/testsuite/gcc.dg/vect/slp-37.c
- gcc/testsuite/gcc.dg/vect/slp-4-big-array.c 1 addition, 1 deletiongcc/testsuite/gcc.dg/vect/slp-4-big-array.c
- gcc/testsuite/gcc.dg/vect/slp-4.c 1 addition, 1 deletiongcc/testsuite/gcc.dg/vect/slp-4.c
- gcc/testsuite/gcc.dg/vect/slp-5.c 1 addition, 1 deletiongcc/testsuite/gcc.dg/vect/slp-5.c
- gcc/testsuite/gcc.dg/vect/slp-7.c 2 additions, 2 deletionsgcc/testsuite/gcc.dg/vect/slp-7.c
- gcc/testsuite/gcc.dg/vect/slp-perm-7.c 1 addition, 1 deletiongcc/testsuite/gcc.dg/vect/slp-perm-7.c
- gcc/testsuite/gcc.dg/vect/slp-reduc-5.c 2 additions, 1 deletiongcc/testsuite/gcc.dg/vect/slp-reduc-5.c
- gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c 1 addition, 0 deletionsgcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c
- gcc/testsuite/gcc.target/i386/vectorize1.c 2 additions, 2 deletionsgcc/testsuite/gcc.target/i386/vectorize1.c
- gcc/testsuite/gfortran.dg/vect/vect-8.f90 1 addition, 1 deletiongcc/testsuite/gfortran.dg/vect/vect-8.f90
Loading