Skip to content
Snippets Groups Projects
  1. Oct 10, 2023
    • Christoph Müllner's avatar
      MAINTAINERS: Add myself to write after approval · ddf17b6d
      Christoph Müllner authored
      
      Signed-off-by: default avatarChristoph Müllner <christoph.muellner@vrull.eu>
      
      ChangeLog:
      
      	* MAINTAINERS: Add myself.
      ddf17b6d
    • Juzhe-Zhong's avatar
      RISC-V: Add VLS BOOL mode vcond_mask[PR111751] · 5255273e
      Juzhe-Zhong authored
      Richard patch resolve PR111751: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=7c76c876e917a1f20a788f602cc78fff7d0a2a65
      
      which cause ICE in RISC-V regression:
      
      FAIL: gcc.dg/torture/pr53144.c   -O2  (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
      FAIL: gcc.dg/torture/pr53144.c   -O2  (test for excess errors)
      FAIL: gcc.dg/torture/pr53144.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
      FAIL: gcc.dg/torture/pr53144.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
      FAIL: gcc.dg/torture/pr53144.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
      FAIL: gcc.dg/torture/pr53144.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
      FAIL: gcc.dg/torture/pr53144.c   -O3 -g  (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
      FAIL: gcc.dg/torture/pr53144.c   -O3 -g  (test for excess errors)
      
      VLS BOOL modes vcond_mask is needed to fix this regression ICE.
      
      More details: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111751
      
      Tested and Committed.
      
      	PR target/111751
      
      gcc/ChangeLog:
      
      	* config/riscv/autovec.md: Add VLS BOOL modes.
      5255273e
    • Richard Biener's avatar
      tree-optimization/111751 - support 1024 bit vector constant reinterpretation · 70b5c698
      Richard Biener authored
      The following ups the limit in fold_view_convert_expr to handle
      1024bit vectors as used by GCN and RVV.  It also robustifies
      the handling in visit_reference_op_load to properly give up when
      constants cannot be re-interpreted.
      
      	PR tree-optimization/111751
      	* fold-const.cc (fold_view_convert_expr): Up the buffer size
      	to 128 bytes.
      	* tree-ssa-sccvn.cc (visit_reference_op_load): Special case
      	constants, giving up when re-interpretation to the target type
      	fails.
      70b5c698
    • Eric Botcazou's avatar
      ada: Fix internal error on too large representation clause for small component · 2f150833
      Eric Botcazou authored
      This is a small bug present on strict-alignment platforms for questionable
      representation clauses.
      
      gcc/ada/
      
      	* gcc-interface/decl.cc (inline_status_for_subprog): Minor tweak.
      	(gnat_to_gnu_field): Try harder to get a packable form of the type
      	for a bitfield.
      2f150833
    • Ronan Desplanques's avatar
      ada: Tweak internal subprogram in Ada.Directories · 42c46cfe
      Ronan Desplanques authored
      The purpose of this patch is to work around false-positive warnings
      emitted by GNAT SAS (also known as CodePeer). It does not change
      the behavior of the modified subprogram.
      
      gcc/ada/
      
      	* libgnat/a-direct.adb (Start_Search_Internal): Tweak subprogram
      	body.
      42c46cfe
    • Eric Botcazou's avatar
      ada: Remove superfluous setter procedure · 25c253e6
      Eric Botcazou authored
      It is only called once.
      
      gcc/ada/
      
      	* sem_util.ads (Set_Scope_Is_Transient): Delete.
      	* sem_util.adb (Set_Scope_Is_Transient): Likewise.
      	* exp_ch7.adb (Create_Transient_Scope): Set Is_Transient directly.
      25c253e6
    • Eric Botcazou's avatar
      ada: Fix bad finalization of limited aggregate in conditional expression · e05e5d6b
      Eric Botcazou authored
      This happens when the conditional expression is immediately returned, for
      example in an expression function.
      
      gcc/ada/
      
      	* exp_aggr.adb (Is_Build_In_Place_Aggregate_Return): Return true
      	if the aggregate is a dependent expression of a conditional
      	expression being returned from a build-in-place function.
      e05e5d6b
    • Eric Botcazou's avatar
      ada: Fix infinite loop with multiple limited with clauses · 6bd83c90
      Eric Botcazou authored
      This occurs when one of the types has an incomplete declaration in addition
      to its full declaration in its package. In this case AI05-129 says that the
      incomplete type is not part of the limited view of the package, i.e. only
      the full view is. Now, in the GNAT implementation, it's the opposite in the
      regular view of the package, i.e. the incomplete type is the visible one.
      
      That's why the implementation needs to also swap the types on the visibility
      chain while it is swapping the views when the clauses are either installed
      or removed. This works correctly for the installation, but does not for the
      removal, so this change rewrites the code doing the latter.
      
      gcc/ada/
      	PR ada/111434
      	* sem_ch10.adb (Replace): New procedure to replace an entity with
      	another on the homonym chain.
      	(Install_Limited_With_Clause): Rename Non_Lim_View to Typ for the
      	sake of consistency.  Call Replace to do the replacements and split
      	the code into the regular and the special cases.  Add debuggging
      	output controlled by -gnatdi.
      	(Install_With_Clause): Print the Parent_With and Implicit_With flags
      	in the debugging output controlled by -gnatdi.
      	(Remove_Limited_With_Unit.Restore_Chain_For_Shadow (Shadow)): Rewrite
      	using a direct replacement of E4 by E2.   Call Replace to do the
      	replacements.  Add debuggging output controlled by -gnatdi.
      6bd83c90
    • Ronan Desplanques's avatar
      ada: Fix filesystem entry filtering · 34992e15
      Ronan Desplanques authored
      This patch fixes the behavior of Ada.Directories.Search when being
      requested to filter out regular files or directories. One of the
      configurations in which that behavior was incorrect was that when the
      caller requested only the regular and special files but not the
      directories, the directories would still be returned.
      
      gcc/ada/
      
      	* libgnat/a-direct.adb: Fix filesystem entry filtering.
      34992e15
    • Ronan Desplanques's avatar
      ada: Tweak documentation comments · f71c6312
      Ronan Desplanques authored
      The concept of extended nodes was retired at the same time Gen_IL
      was introduced, but there was a reference to that concept left over
      in a comment. This patch removes that reference.
      
      Also, the description of the field Comes_From_Check_Or_Contract was
      incorrectly placed in a section for fields present in all nodes in
      sinfo.ads. This patch fixes this.
      
      gcc/ada/
      
      	* atree.ads, nlists.ads, types.ads: Remove references to extended
      	nodes. Fix typo.
      	* sinfo.ads: Likewise and fix position of
      	Comes_From_Check_Or_Contract description.
      f71c6312
    • Javier Miranda's avatar
      ada: Crash processing pragmas Compile_Time_Error and Compile_Time_Warning · 85a0ce90
      Javier Miranda authored
      gcc/ada/
      
      	* sem_attr.adb (Analyze_Attribute): Protect the frontend against
      	replacing 'Size by its static value if 'Size is not known at
      	compile time and we are processing pragmas Compile_Time_Warning or
      	Compile_Time_Errors.
      85a0ce90
    • Juzhe-Zhong's avatar
      RISC-V: Add testcase for SCCVN optimization[PR111751] · a704603d
      Juzhe-Zhong authored
      Add testcase for PR111751 which has been fixed:
      https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632474.html
      
      	PR target/111751
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/pr111751.c: New test.
      a704603d
    • Richard Biener's avatar
      Fix missed CSE with a BLKmode entity · 7c76c876
      Richard Biener authored
      The following fixes fallout of r10-7145-g1dc00a8ec9aeba which made
      us cautionous about CSEing a load to an object that has padding bits.
      The added check also triggers for BLKmode entities like STRING_CSTs
      but by definition a BLKmode entity does not have padding bits.
      
      	PR tree-optimization/111751
      	* tree-ssa-sccvn.cc (visit_reference_op_load): Exempt
      	BLKmode result from the padding bits check.
      7c76c876
    • Juzhe-Zhong's avatar
      RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV · 4d230493
      Juzhe-Zhong authored
      Here is the reference comparing dump IR between ARM SVE and RVV.
      
      https://godbolt.org/z/zqess8Gss
      
      We can see RVV has one more dump IR:
      optimized: basic block part vectorized using 128 byte vectors
      since RVV has 1024 bit vectors.
      
      The codegen is reasonable good.
      
      However, I saw GCN also has 1024 bit vector.
      This patch may cause this case FAIL in GCN port ?
      
      Hi, GCN folk, could you check this patch in GCN port for me ?
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/bb-slp-pr65935.c: Add vect1024 variant.
      	* lib/target-supports.exp: Ditto.
      4d230493
    • Claudiu Zissulescu's avatar
      arc: Refurbish add.f combiner patterns · aaa5a531
      Claudiu Zissulescu authored
      
      Refurbish add compare patterns: use 'r' constraint, fix identation,
      and fix pattern to match 'if (a+b) { ... }' constructions.
      
      gcc/
      
      	* config/arc/arc.cc (arc_select_cc_mode): Match NEG code with
      	the first operand.
      	* config/arc/arc.md (addsi_compare): Make pattern canonical.
      	(addsi_compare_2): Fix identation, constraint letters.
      	(addsi_compare_3): Likewise.
      
      gcc/testsuite/
      
      	* gcc.target/arc/add_f-combine.c: New test.
      
      Signed-off-by: default avatarClaudiu Zissulescu <claziss@gmail.com>
      aaa5a531
    • Juzhe-Zhong's avatar
      RISC-V: Add available vector size for RVV · 4ecb9b03
      Juzhe-Zhong authored
      For RVV, we have VLS modes enable according to TARGET_MIN_VLEN
      from M1 to M8.
      
      For example, when TARGET_MIN_VLEN = 128 bits, we enable
      128/256/512/1024 bits VLS modes.
      
      This patch fixes following FAIL:
      FAIL: gcc.dg/vect/bb-slp-subgroups-2.c -flto -ffat-lto-objects  scan-tree-dump-times slp2 "optimized: basic block" 2
      FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: basic block" 2
      
      gcc/testsuite/ChangeLog:
      
      	* lib/target-supports.exp: Add 256/512/1024
      4ecb9b03
    • GCC Administrator's avatar
      Daily bump. · fb124f2a
      GCC Administrator authored
      fb124f2a
  2. Oct 09, 2023
    • Eugene Rozenfeld's avatar
      Fixes for profile count/probability maintenance · cc503372
      Eugene Rozenfeld authored
      Verifier checks have recently been strengthened to check that
      all counts and probabilities are initialized. The checks fired
      during autoprofiledbootstrap build and this patch fixes it.
      
      Tested on x86_64-pc-linux-gnu.
      
      gcc/ChangeLog:
      	* auto-profile.cc (afdo_calculate_branch_prob): Fix count comparisons
      	* tree-vect-loop-manip.cc (vect_do_peeling): Guard against zero count
      	when scaling loop profile
      cc503372
    • David Malcolm's avatar
      analyzer: fix build with gcc < 6 · 08d0f840
      David Malcolm authored
      
      gcc/analyzer/ChangeLog:
      	* access-diagram.cc (boundaries::add): Explicitly state
      	"boundaries::" scope for "kind" enum.
      
      Signed-off-by: default avatarDavid Malcolm <dmalcolm@redhat.com>
      08d0f840
    • Andrew MacLeod's avatar
      Ensure float equivalences include + and - zero. · b0892b1f
      Andrew MacLeod authored
      A floating point equivalence may not properly reflect both signs of
      zero, so be pessimsitic and ensure both signs are included.
      
      	PR tree-optimization/111694
      	gcc/
      	* gimple-range-cache.cc (ranger_cache::fill_block_cache): Adjust
      	equivalence range.
      	* value-relation.cc (adjust_equivalence_range): New.
      	* value-relation.h (adjust_equivalence_range): New prototype.
      
      	gcc/testsuite/
      	* gcc.dg/pr111694.c: New.
      b0892b1f
    • Andrew MacLeod's avatar
      Remove unused get_identity_relation. · 5ee51119
      Andrew MacLeod authored
      Turns out we didnt need this as there is no unordered relations
      managed by the oracle.
      
      	* gimple-range-gori.cc (gori_compute::compute_operand1_range): Do
      	not call get_identity_relation.
      	(gori_compute::compute_operand2_range): Ditto.
      	* value-relation.cc (get_identity_relation): Remove.
      	* value-relation.h (get_identity_relation): Remove protyotype.
      5ee51119
    • Juzhe-Zhong's avatar
      RISC-V Regression test: Fix slp-perm-4.c FAIL for RVV · dae21448
      Juzhe-Zhong authored
      RVV vectorize it with stride5 load_lanes.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/slp-perm-4.c: Adapt test for stride5 load_lanes.
      dae21448
    • Juzhe-Zhong's avatar
      RISC-V Regression tests: Fix FAIL of pr97832* for RVV · e90eddde
      Juzhe-Zhong authored
      These cases are vectorized by vec_load_lanes with strided = 8 instead of SLP
      with -fno-vect-cost-model.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/pr97832-2.c: Adapt dump check for target supports load_lanes with stride = 8.
      	* gcc.dg/vect/pr97832-3.c: Ditto.
      	* gcc.dg/vect/pr97832-4.c: Ditto.
      e90eddde
    • Juzhe-Zhong's avatar
      RISC-V Regression test: Fix FAIL of slp-12a.c · 30b76f86
      Juzhe-Zhong authored
      This case is vectorized by stride8 load_lanes.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/slp-12a.c: Adapt for stride 8 load_lanes.
      30b76f86
    • Juzhe-Zhong's avatar
      RISC-V Regression test: Fix FAIL of slp-reduc-4.c for RVV · db20b83c
      Juzhe-Zhong authored
      RVV vectortizes this case with stride8 load_lanes.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/slp-reduc-4.c: Adapt test for stride8 load_lanes.
      db20b83c
    • Juzhe-Zhong's avatar
      RISC-V Regression test: Adapt SLP tests like ARM SVE · 79e6ea48
      Juzhe-Zhong authored
      Like ARM SVE, RVV is vectorizing these 2 cases in the same way.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/slp-23.c: Add RVV like ARM SVE.
      	* gcc.dg/vect/slp-perm-10.c: Ditto.
      79e6ea48
    • Robin Dapp's avatar
      RISC-V: Add initial pipeline description for an out-of-order core. · f8498436
      Robin Dapp authored
      This adds a pipeline description for a generic out-of-order core.
      Latency and units are not based on any real processor but more or less
      educated guesses what such a processor would look like.
      
      In order to account for latency scaling by LMUL != 1, sched_adjust_cost
      is implemented.  It will scale an instruction's latency by its LMUL
      so an LMUL == 8 instruction will take 8 times the number of cycles
      the same instruction with LMUL == 1 would take.
      As this potentially causes very high latencies which, in turn, might
      lead to scheduling anomalies and a higher number of vsetvls emitted
      this feature is only enabled when specifying -madjust-lmul-cost.
      
      Additionally, in order to easily recognize pre-RA vsetvls this patch
      introduces an insn type vsetvl_pre which is used in sched_adjust_cost.
      
      In the future we might also want a latency adjustment similar to lmul
      for reductions, i.e. make the latency dependent on the type and its
      number of units.
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter.
      	* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
      	Add generic_ooo.
      	* config/riscv/riscv.cc (riscv_sched_adjust_cost): Implement
      	scheduler hook.
      	(TARGET_SCHED_ADJUST_COST): Define.
      	* config/riscv/riscv.md (no,yes"): Include generic-ooo.md
      	* config/riscv/riscv.opt: Add -madjust-lmul-cost.
      	* config/riscv/generic-ooo.md: New file.
      	* config/riscv/vector.md: Add vsetvl_pre.
      f8498436
    • Juzhe-Zhong's avatar
      RISC-V: Support movmisalign of RVV VLA modes · dee55cf5
      Juzhe-Zhong authored
      This patch fixed these following FAILs in regressions:
      FAIL: gcc.dg/vect/slp-perm-11.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 1
      FAIL: gcc.dg/vect/slp-perm-11.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1
      FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
      FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized "Invalid sum"
      FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
      FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized "Invalid sum"
      FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
      FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized "Invalid sum"
      FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
      FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-not optimized "Invalid sum"
      
      Previously, I removed the movmisalign pattern to fix the execution FAILs in this commit:
      https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520
      
      I was thinking that RVV doesn't allow misaligned at the beginning so I removed that pattern.
      However, after deep investigation && reading RVV ISA again and experiment on SPIKE,
      I realized I was wrong.
      
      RVV ISA reference: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints
      
      "If an element accessed by a vector memory instruction is not naturally aligned to the size of the element,
       either the element is transferred successfully or an address misaligned exception is raised on that element."
      
      It's obvious that RVV ISA does allow misaligned vector load/store.
      
      And experiment and confirm on SPIKE:
      
      [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64  a.out
      bbl loader
      z  0000000000000000 ra 0000000000010158 sp 0000003ffffffb40 gp 0000000000012c48
      tp 0000000000000000 t0 00000000000110da t1 000000000000000f t2 0000000000000000
      s0 0000000000013460 s1 0000000000000000 a0 0000000000012ef5 a1 0000000000012018
      a2 0000000000012a71 a3 000000000000000d a4 0000000000000004 a5 0000000000012a71
      a6 0000000000012a71 a7 0000000000012018 s2 0000000000000000 s3 0000000000000000
      s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
      s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
      t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
      pc 0000000000010258 va/inst 00000000020660a7 sr 8000000200006620
      Store/AMO access fault!
      
      [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64  a.out
      bbl loader
      
      We can see SPIKE can pass previous *FAILED* execution tests with specifying --misaligned to SPIKE.
      
      So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the investigations I have done since
      it can improve multiple vectorization tests and fix dumple FAILs.
      
      This patch adds TARGET_VECTOR_MISALIGN_SUPPORTED to decide whether we support misalign pattern for VLA modes (By default it is enabled).
      
      Consider this following case:
      
      struct s {
          unsigned i : 31;
          char a : 4;
      };
      
      #define N 32
      #define ELT0 {0x7FFFFFFFUL, 0}
      #define ELT1 {0x7FFFFFFFUL, 1}
      #define ELT2 {0x7FFFFFFFUL, 2}
      #define ELT3 {0x7FFFFFFFUL, 3}
      #define RES 48
      struct s A[N]
        = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
            ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
            ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
            ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
      
      int __attribute__ ((noipa))
      f(struct s *ptr, unsigned n) {
          int res = 0;
          for (int i = 0; i < n; ++i)
            res += ptr[i].a;
          return res;
      }
      
      -O3 -S -fno-vect-cost-model (default strict-align):
      
      f:
      	mv	a4,a0
      	beq	a1,zero,.L9
      	addiw	a5,a1,-1
      	li	a3,14
      	vsetivli	zero,16,e64,m8,ta,ma
      	bleu	a5,a3,.L3
      	andi	a5,a0,127
      	bne	a5,zero,.L3
      	srliw	a3,a1,4
      	slli	a3,a3,7
      	li	a0,15
      	slli	a0,a0,32
      	add	a3,a3,a4
      	mv	a5,a4
      	li	a2,32
      	vmv.v.x	v16,a0
      	vsetvli	zero,zero,e32,m4,ta,ma
      	vmv.v.i	v4,0
      .L4:
      	vsetvli	zero,zero,e64,m8,ta,ma
      	vle64.v	v8,0(a5)
      	addi	a5,a5,128
      	vand.vv	v8,v8,v16
      	vsetvli	zero,zero,e32,m4,ta,ma
      	vnsrl.wx	v8,v8,a2
      	vadd.vv	v4,v4,v8
      	bne	a5,a3,.L4
      	li	a3,0
      	andi	a5,a1,15
      	vmv.s.x	v1,a3
      	andi	a3,a1,-16
      	vredsum.vs	v1,v4,v1
      	vmv.x.s	a0,v1
      	mv	a2,a0
      	beq	a5,zero,.L15
      	slli	a5,a3,3
      	add	a5,a4,a5
      	lw	a0,4(a5)
      	andi	a0,a0,15
      	addiw	a4,a3,1
      	addw	a0,a0,a2
      	bgeu	a4,a1,.L15
      	lw	a2,12(a5)
      	andi	a2,a2,15
      	addiw	a4,a3,2
      	addw	a0,a2,a0
      	bgeu	a4,a1,.L15
      	lw	a2,20(a5)
      	andi	a2,a2,15
      	addiw	a4,a3,3
      	addw	a0,a2,a0
      	bgeu	a4,a1,.L15
      	lw	a2,28(a5)
      	andi	a2,a2,15
      	addiw	a4,a3,4
      	addw	a0,a2,a0
      	bgeu	a4,a1,.L15
      	lw	a2,36(a5)
      	andi	a2,a2,15
      	addiw	a4,a3,5
      	addw	a0,a2,a0
      	bgeu	a4,a1,.L15
      	lw	a2,44(a5)
      	andi	a2,a2,15
      	addiw	a4,a3,6
      	addw	a0,a2,a0
      	bgeu	a4,a1,.L15
      	lw	a2,52(a5)
      	andi	a2,a2,15
      	addiw	a4,a3,7
      	addw	a0,a2,a0
      	bgeu	a4,a1,.L15
      	lw	a4,60(a5)
      	andi	a4,a4,15
      	addw	a4,a4,a0
      	addiw	a2,a3,8
      	mv	a0,a4
      	bgeu	a2,a1,.L15
      	lw	a0,68(a5)
      	andi	a0,a0,15
      	addiw	a2,a3,9
      	addw	a0,a0,a4
      	bgeu	a2,a1,.L15
      	lw	a2,76(a5)
      	andi	a2,a2,15
      	addiw	a4,a3,10
      	addw	a0,a2,a0
      	bgeu	a4,a1,.L15
      	lw	a2,84(a5)
      	andi	a2,a2,15
      	addiw	a4,a3,11
      	addw	a0,a2,a0
      	bgeu	a4,a1,.L15
      	lw	a2,92(a5)
      	andi	a2,a2,15
      	addiw	a4,a3,12
      	addw	a0,a2,a0
      	bgeu	a4,a1,.L15
      	lw	a2,100(a5)
      	andi	a2,a2,15
      	addiw	a4,a3,13
      	addw	a0,a2,a0
      	bgeu	a4,a1,.L15
      	lw	a4,108(a5)
      	andi	a4,a4,15
      	addiw	a3,a3,14
      	addw	a0,a4,a0
      	bgeu	a3,a1,.L15
      	lw	a5,116(a5)
      	andi	a5,a5,15
      	addw	a0,a5,a0
      	ret
      .L9:
      	li	a0,0
      .L15:
      	ret
      .L3:
      	mv	a5,a4
      	slli	a4,a1,32
      	srli	a1,a4,29
      	add	a1,a5,a1
      	li	a0,0
      .L7:
      	lw	a4,4(a5)
      	andi	a4,a4,15
      	addi	a5,a5,8
      	addw	a0,a4,a0
      	bne	a5,a1,.L7
      	ret
      
      -O3 -S -mno-strict-align -fno-vect-cost-model:
      
      f:
      	beq	a1,zero,.L4
      	slli	a1,a1,32
      	li	a5,15
      	vsetvli	a4,zero,e64,m1,ta,ma
      	slli	a5,a5,32
      	srli	a1,a1,32
      	li	a6,32
      	vmv.v.x	v3,a5
      	vsetvli	zero,zero,e32,mf2,ta,ma
      	vmv.v.i	v2,0
      .L3:
      	vsetvli	a5,a1,e64,m1,ta,ma
      	vle64.v	v1,0(a0)
      	vsetvli	a3,zero,e64,m1,ta,ma
      	slli	a2,a5,3
      	vand.vv	v1,v1,v3
      	sub	a1,a1,a5
      	vsetvli	zero,zero,e32,mf2,ta,ma
      	add	a0,a0,a2
      	vnsrl.wx	v1,v1,a6
      	vsetvli	zero,a5,e32,mf2,tu,ma
      	vadd.vv	v2,v2,v1
      	bne	a1,zero,.L3
      	li	a5,0
      	vsetvli	a3,zero,e32,mf2,ta,ma
      	vmv.s.x	v1,a5
      	vredsum.vs	v2,v2,v1
      	vmv.x.s	a0,v2
      	ret
      .L4:
      	li	a0,0
      	ret
      
      We can see it improves this case codegen a lot.
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED): New macro.
      	* config/riscv/riscv.cc (riscv_support_vector_misalignment): Depend on movmisalign pattern.
      	* config/riscv/vector.md (movmisalign<mode>): New pattern.
      dee55cf5
    • Xianmiao Qu's avatar
      THead: Fix missing CFI directives for th.sdd in prologue. · 578aa2f8
      Xianmiao Qu authored
      When generating CFI directives for the store-pair instruction,
      if we add two parallel REG_FRAME_RELATED_EXPR expr_lists like
        (expr_list:REG_FRAME_RELATED_EXPR (set (mem/c:DI (plus:DI (reg/f:DI 2 sp)
          (const_int 8 [0x8])) [1  S8 A64])
          (reg:DI 1 ra))
        (expr_list:REG_FRAME_RELATED_EXPR (set (mem/c:DI (reg/f:DI 2 sp) [1  S8 A64])
          (reg:DI 8 s0))
      only the first expr_list will be recognized by dwarf2out_frame_debug
      funciton. So, here we generate a SEQUENCE expression of REG_FRAME_RELATED_EXPR,
      which includes two sub-expressions of RTX_FRAME_RELATED_P. Then the
      dwarf2out_frame_debug_expr function will iterate through all the sub-expressions
      and generate the corresponding CFI directives.
      
      gcc/
      	* config/riscv/thead.cc (th_mempair_save_regs): Fix missing CFI
      	directives for store-pair instruction.
      
      gcc/testsuite/
      	* gcc.target/riscv/xtheadmempair-4.c: New test.
      578aa2f8
    • Richard Biener's avatar
      tree-optimization/111715 - improve TBAA for access paths with pun · 11b8cf16
      Richard Biener authored
      The following improves basic TBAA for access paths formed by
      C++ abstraction where we are able to combine a path from an
      address-taking operation with a path based on that access using
      a pun to avoid memory access semantics on the address-taking part.
      
      The trick is to identify the point the semantic memory access path
      starts which allows us to use the alias set of the outermost access
      instead of only that of the base of this path.
      
      	PR tree-optimization/111715
      	* alias.cc (reference_alias_ptr_type_1): When we have
      	a type-punning ref at the base search for the access
      	path part that's still semantically valid.
      
      	* gcc.dg/tree-ssa/ssa-fre-102.c: New testcase.
      11b8cf16
    • Pan Li's avatar
      RISC-V: Refine bswap16 auto vectorization code gen · 841668aa
      Pan Li authored
      
      Update in v2
      
      * Remove emit helper functions.
      * Take expand_binop instead.
      
      Original log:
      
      This patch would like to refine the code gen for the bswap16.
      
      We will have VEC_PERM_EXPR after rtl expand when invoking
      __builtin_bswap. It will generate about 9 instructions in
      loop as below, no matter it is bswap16, bswap32 or bswap64.
      
        .L2:
      1 vle16.v v4,0(a0)
      2 vmv.v.x v2,a7
      3 vand.vv v2,v6,v2
      4 slli    a2,a5,1
      5 vrgatherei16.vv v1,v4,v2
      6 sub     a4,a4,a5
      7 vse16.v v1,0(a3)
      8 add     a0,a0,a2
      9 add     a3,a3,a2
        bne     a4,zero,.L2
      
      But for bswap16 we may have a even simple code gen, which
      has only 7 instructions in loop as below.
      
        .L5
      1 vle8.v  v2,0(a5)
      2 addi    a5,a5,32
      3 vsrl.vi v4,v2,8
      4 vsll.vi v2,v2,8
      5 vor.vv  v4,v4,v2
      6 vse8.v  v4,0(a4)
      7 addi    a4,a4,32
        bne     a5,a6,.L5
      
      Unfortunately, this way will make the insn in loop will grow up to
      13 and 24 for bswap32 and bswap64. Thus, we will refine the code
      gen for the bswap16 only, and leave both the bswap32 and bswap64
      as is.
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-v.cc (shuffle_bswap_pattern): New func impl
      	for shuffle bswap.
      	(expand_vec_perm_const_1): Add handling for shuffle bswap pattern.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker.
      	* gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test.
      	* gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test.
      	* gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      841668aa
    • Juzhe-Zhong's avatar
      RISC-V Regression test: Fix FAIL of pr45752.c for RVV · 1543f3e3
      Juzhe-Zhong authored
      RVV use load_lanes with stride = 5 vectorize this case with -fno-vect-cost-model
      instead of SLP.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/pr45752.c: Adapt dump check for target supports load_lanes with stride = 5.
      1543f3e3
    • Robin Dapp's avatar
      testsuite: Fix vect_cond_arith_* dump checks for RVV. · 3f99b709
      Robin Dapp authored
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/vect-cond-arith-2.c: Also match COND_LEN.
      	* gcc.dg/vect/vect-cond-arith-4.c: Ditto.
      	* gcc.dg/vect/vect-cond-arith-5.c: Ditto.
      	* gcc.dg/vect/vect-cond-arith-6.c: Ditto.
      3f99b709
    • Juzhe-Zhong's avatar
      RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for RVV · 784deda0
      Juzhe-Zhong authored
      Reference: https://godbolt.org/z/G9jzf5Grh
      
      RVV is able to vectorize this case using SLP. However, with -fno-vect-cost-model,
      RVV vectorize it by vec_load_lanes with stride 6.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/fast-math-slp-38.c: Add ! vect_strided6.
      784deda0
    • Roger Sayle's avatar
      i386: Implement doubleword right shifts by 1 bit using s[ha]r+rcr. · 34d4168e
      Roger Sayle authored
      This patch tweaks the i386 back-end's ix86_split_ashr and ix86_split_lshr
      functions to implement doubleword right shifts by 1 bit, using a shift
      of the highpart that sets the carry flag followed by a rotate-carry-right
      (RCR) instruction on the lowpart.
      
      Conceptually this is similar to the recent left shift patch, but with two
      complicating factors.  The first is that although the RCR sequence is
      shorter, and is a ~3x performance improvement on AMD, my microbenchmarking
      shows it ~10% slower on Intel.  Hence this patch also introduces a new
      X86_TUNE_USE_RCR tuning parameter.  The second is that I believe this is
      the first time a "rotate-right-through-carry" and a right shift that sets
      the carry flag from the least significant bit has been modelled in GCC RTL
      (on a MODE_CC target).  For this I've used the i386 back-end's UNSPEC_CC_NE
      which seems appropriate.  Finally rcrsi2 and rcrdi2 are separate
      define_insns so that we can use their generator functions.
      
      For the pair of functions:
      unsigned __int128 foo(unsigned __int128 x) { return x >> 1; }
      __int128 bar(__int128 x) { return x >> 1; }
      
      with -O2 -march=znver4 we previously generated:
      
      foo:	movq    %rdi, %rax
              movq    %rsi, %rdx
              shrdq   $1, %rsi, %rax
              shrq    %rdx
              ret
      bar:	movq    %rdi, %rax
              movq    %rsi, %rdx
              shrdq   $1, %rsi, %rax
              sarq    %rdx
              ret
      
      with this patch we now generate:
      
      foo:	movq    %rsi, %rdx
              movq    %rdi, %rax
              shrq    %rdx
              rcrq    %rax
              ret
      bar:	movq    %rsi, %rdx
              movq    %rdi, %rax
              sarq    %rdx
              rcrq    %rax
              ret
      
      2023-10-09  Roger Sayle  <roger@nextmovesoftware.com>
      
      gcc/ChangeLog
      	* config/i386/i386-expand.cc (ix86_split_ashr): Split shifts by
      	one into ashr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR
      	or -Oz.
      	(ix86_split_lshr): Likewise, split shifts by one bit into
      	lshr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR or -Oz.
      	* config/i386/i386.h (TARGET_USE_RCR): New backend macro.
      	* config/i386/i386.md (rcrsi2): New define_insn for rcrl.
      	(rcrdi2): New define_insn for rcrq.
      	(<anyshiftrt><mode>3_carry): New define_insn for right shifts that
      	set the carry flag from the least significant bit, modelled using
      	UNSPEC_CC_NE.
      	* config/i386/x86-tune.def (X86_TUNE_USE_RCR): New tuning parameter
      	controlling use of rcr 1 vs. shrd, which is significantly faster on
      	AMD processors.
      
      gcc/testsuite/ChangeLog
      	* gcc.target/i386/rcr-1.c: New 64-bit test case.
      	* gcc.target/i386/rcr-2.c: New 32-bit test case.
      34d4168e
    • Haochen Jiang's avatar
      Allow -mno-evex512 usage · 85bd47bf
      Haochen Jiang authored
      gcc/ChangeLog:
      
      	* config/i386/i386.opt: Allow -mno-evex512.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/noevex512-1.c: New test.
      	* gcc.target/i386/noevex512-2.c: Ditto.
      	* gcc.target/i386/noevex512-3.c: Ditto.
      85bd47bf
    • Haochen Jiang's avatar
      Support -mevex512 for AVX512FP16 intrins · 43b08ab2
      Haochen Jiang authored
      
      gcc/ChangeLog:
      
      	* config/i386/sse.md (V48H_AVX512VL): Add TARGET_EVEX512.
      	(VFH): Ditto.
      	(VF2H): Ditto.
      	(VFH_AVX512VL): Ditto.
      	(VHFBF): Ditto.
      	(VHF_AVX512VL): Ditto.
      	(VI2H_AVX512VL): Ditto.
      	(VI2F_256_512): Ditto.
      	(VF48_I1248): Remove unused iterator.
      	(VF48H_AVX512VL): Add TARGET_EVEX512.
      	(VF_AVX512): Remove unused iterator.
      	(REDUC_PLUS_MODE): Add TARGET_EVEX512.
      	(REDUC_SMINMAX_MODE): Ditto.
      	(FMAMODEM): Ditto.
      	(VFH_SF_AVX512VL): Ditto.
      	(VEC_PERM_AVX2): Ditto.
      
      Co-authored-by: default avatarHu, Lin1 <lin1.hu@intel.com>
      43b08ab2
    • Haochen Jiang's avatar
      Support -mevex512 for... · b5490055
      Haochen Jiang authored
      Support -mevex512 for AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ intrins
      
      gcc/ChangeLog:
      
      	* config/i386/sse.md (VI1_AVX512VL): Add TARGET_EVEX512.
      	(VI8_FVL): Ditto.
      	(VI1_AVX512F): Ditto.
      	(VI1_AVX512VNNI): Ditto.
      	(VI1_AVX512VL_F): Ditto.
      	(VI12_VI48F_AVX512VL): Ditto.
      	(*avx512f_permvar_truncv32hiv32qi_1): Ditto.
      	(sdot_prod<mode>): Ditto.
      	(VEC_PERM_AVX2): Ditto.
      	(VPERMI2): Ditto.
      	(VPERMI2I): Ditto.
      	(vpmadd52<vpmadd52type>v8di): Ditto.
      	(usdot_prod<mode>): Ditto.
      	(vpdpbusd_v16si): Ditto.
      	(vpdpbusds_v16si): Ditto.
      	(vpdpwssd_v16si): Ditto.
      	(vpdpwssds_v16si): Ditto.
      	(VI48_AVX512VP2VL): Ditto.
      	(avx512vp2intersect_2intersectv16si): Ditto.
      	(VF_AVX512BF16VL): Ditto.
      	(VF1_AVX512_256): Ditto.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr90096.c: Adjust error message.
      
      Co-authored-by: default avatarHu, Lin1 <lin1.hu@intel.com>
      b5490055
    • Haochen Jiang's avatar
      Support -mevex512 for AVX512BW intrins · 8e79b1b4
      Haochen Jiang authored
      gcc/Changelog:
      
      	* config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
      	Make sure there is EVEX512 enabled.
      	(ix86_expand_vecop_qihi2): Refuse V32QI->V32HI when no EVEX512.
      	* config/i386/i386.cc (ix86_hard_regno_mode_ok): Disable 64 bit mask
      	when !TARGET_EVEX512.
      	* config/i386/i386.md (avx512bw_512): New.
      	(SWI1248_AVX512BWDQ_64): Add TARGET_EVEX512.
      	(*zero_extendsidi2): Change isa to avx512bw_512.
      	(kmov_isa): Ditto.
      	(*anddi_1): Ditto.
      	(*andn<mode>_1): Change isa to kmov_isa.
      	(*<code><mode>_1): Ditto.
      	(*notxor<mode>_1): Ditto.
      	(*one_cmpl<mode>2_1): Ditto.
      	(*one_cmplsi2_1_zext): Change isa to avx512bw_512.
      	(*ashl<mode>3_1): Change isa to kmov_isa.
      	(*lshr<mode>3_1): Ditto.
      	* config/i386/sse.md (VI12HFBF_AVX512VL): Add TARGET_EVEX512.
      	(VI1248_AVX512VLBW): Ditto.
      	(VHFBF_AVX512VL): Ditto.
      	(VI): Ditto.
      	(VIHFBF): Ditto.
      	(VI_AVX2): Ditto.
      	(VI1_AVX512): Ditto.
      	(VI12_256_512_AVX512VL): Ditto.
      	(VI2_AVX2_AVX512BW): Ditto.
      	(VI2_AVX512VNNIBW): Ditto.
      	(VI2_AVX512VL): Ditto.
      	(VI2HFBF_AVX512VL): Ditto.
      	(VI8_AVX2_AVX512BW): Ditto.
      	(VIMAX_AVX2_AVX512BW): Ditto.
      	(VIMAX_AVX512VL): Ditto.
      	(VI12_AVX2_AVX512BW): Ditto.
      	(VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto.
      	(VI248_AVX512VL): Ditto.
      	(VI248_AVX512VLBW): Ditto.
      	(VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto.
      	(VI248_AVX512BW): Ditto.
      	(VI248_AVX512BW_AVX512VL): Ditto.
      	(VI248_512): Ditto.
      	(VI124_256_AVX512F_AVX512BW): Ditto.
      	(VI_AVX512BW): Ditto.
      	(VIHFBF_AVX512BW): Ditto.
      	(SWI1248_AVX512BWDQ): Ditto.
      	(SWI1248_AVX512BW): Ditto.
      	(SWI1248_AVX512BWDQ2): Ditto.
      	(*knotsi_1_zext): Ditto.
      	(define_split for zero_extend + not): Ditto.
      	(kunpckdi): Ditto.
      	(REDUC_SMINMAX_MODE): Ditto.
      	(VEC_EXTRACT_MODE): Ditto.
      	(*avx512bw_permvar_truncv16siv16hi_1): Ditto.
      	(*avx512bw_permvar_truncv16siv16hi_1_hf): Ditto.
      	(truncv32hiv32qi2): Ditto.
      	(avx512bw_<code>v32hiv32qi2): Ditto.
      	(avx512bw_<code>v32hiv32qi2_mask): Ditto.
      	(avx512bw_<code>v32hiv32qi2_mask_store): Ditto.
      	(usadv64qi): Ditto.
      	(VEC_PERM_AVX2): Ditto.
      	(AVX512ZEXTMASK): Ditto.
      	(SWI24_MASK): New.
      	(vec_pack_trunc_<mode>): Change iterator to SWI24_MASK.
      	(avx512bw_packsswb<mask_name>): Add TARGET_EVEX512.
      	(avx512bw_packssdw<mask_name>): Ditto.
      	(avx512bw_interleave_highv64qi<mask_name>): Ditto.
      	(avx512bw_interleave_lowv64qi<mask_name>): Ditto.
      	(<mask_codefor>avx512bw_pshuflwv32hi<mask_name>): Ditto.
      	(<mask_codefor>avx512bw_pshufhwv32hi<mask_name>): Ditto.
      	(vec_unpacks_lo_di): Ditto.
      	(SWI48x_MASK): New.
      	(vec_unpacks_hi_<mode>): Change iterator to SWI48x_MASK.
      	(avx512bw_umulhrswv32hi3<mask_name>): Add TARGET_EVEX512.
      	(VI1248_AVX512VL_AVX512BW): Ditto.
      	(avx512bw_<code>v32qiv32hi2<mask_name>): Ditto.
      	(*avx512bw_zero_extendv32qiv32hi2_1): Ditto.
      	(*avx512bw_zero_extendv32qiv32hi2_2): Ditto.
      	(<insn>v32qiv32hi2): Ditto.
      	(pbroadcast_evex_isa): Change isa attribute to avx512bw_512.
      	(VPERMI2): Add TARGET_EVEX512.
      	(VPERMI2I): Ditto.
      8e79b1b4
    • Haochen Jiang's avatar
      Support -mevex512 for AVX512DQ intrins · 1b248907
      Haochen Jiang authored
      gcc/ChangeLog:
      
      	* config/i386/i386-expand.cc (ix86_expand_sse2_mulvxdi3):
      	Add TARGET_EVEX512 for 512 bit usage.
      	* config/i386/i386.cc (standard_sse_constant_opcode): Ditto.
      	* config/i386/sse.md (VF1_VF2_AVX512DQ): Ditto.
      	(VF1_128_256VL): Ditto.
      	(VF2_AVX512VL): Ditto.
      	(VI8_256_512): Ditto.
      	(<mask_codefor>fixuns_trunc<mode><sseintvecmodelower>2<mask_name>):
      	Ditto.
      	(AVX512_VEC): Ditto.
      	(AVX512_VEC_2): Ditto.
      	(VI4F_BRCST32x2): Ditto.
      	(VI8F_BRCST64x2): Ditto.
      1b248907
Loading