Skip to content
Snippets Groups Projects
  1. Jul 05, 2024
    • Robin Dapp's avatar
      RISC-V: Use tu policy for first-element vec_set [PR115725]. · acc3b703
      Robin Dapp authored
      This patch changes the tail policy for vmv.s.x from ta to tu.
      By default the bug does not show up with qemu because qemu's
      current vmv.s.x implementation always uses the tail-undisturbed
      policy.  With a local qemu version that overwrites the tail
      with ones when the tail-agnostic policy is specified, the bug
      shows.
      
      gcc/ChangeLog:
      
      	* config/riscv/autovec.md: Add TU policy.
      	* config/riscv/riscv-protos.h (enum insn_type): Define
      	SCALAR_MOVE_MERGED_OP_TU.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/115725
      
      	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Adjust
      	test expectation.
      	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Ditto.
      	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Ditto.
      	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Ditto.
      acc3b703
    • Georg-Johann Lay's avatar
      AVR: target/87376 - Use nop_general_operand for DImode inputs. · 23a09352
      Georg-Johann Lay authored
      The avr-dimode.md expanders have code like  emit_move_insn(acc_a, operands[1])
      where acc_a is a hard register and operands[1] might be a non-generic
      address-space memory reference.  Such loads may clobber hard regs since
      some of them are implemented as libgcc calls /and/ 64-moves are
      expanded as eight byte-moves, so that acc_a or acc_b might be clobbered
      by such a load.
      
      This patch simply denies non-generic address-space references by using
      nop_general_operand for all avr-dimode.md input predicates.
      With the patch, all memory loads that require library calls are issued
      before the expander codes from avr-dimode.md are run.
      
      	PR target/87376
      gcc/
      	* config/avr/avr-dimode.md: Use "nop_general_operand" instead
      	of "general_operand" as predicate for all input operands.
      
      gcc/testsuite/
      	* gcc.target/avr/torture/pr87376.c: New test.
      23a09352
    • Jonathan Wakely's avatar
      libstdc++: Add dg-error for new -Wdelete-incomplete diagnostics [PR115747] · f63896ff
      Jonathan Wakely authored
      Since r15-1794-gbeb7a418aaef2e the -Wdelete-incomplete diagnostic is a
      permerror instead of a (suppressed in system headers) warning. Add
      dg-error directives.
      
      libstdc++-v3/ChangeLog:
      
      	PR c++/115747
      	* testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc:
      	Add dg-error for new C++26 diagnostics.
      f63896ff
    • Jonathan Wakely's avatar
      libstdc++: Use RAII in <bits/stl_uninitialized.h> · 6025256d
      Jonathan Wakely authored
      This adds an _UninitDestroyGuard class template, similar to
      ranges::_DestroyGuard used in <bits/ranges_uninitialized.h>. This allows
      us to remove all the try-catch blocks and rethrows, because any required
      cleanup gets done in the guard destructor.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/stl_uninitialized.h (_UninitDestroyGuard): New
      	class template and partial specialization.
      	(__do_uninit_copy, __do_uninit_fill, __do_uninit_fill_n)
      	(__uninitialized_copy_a, __uninitialized_fill_a)
      	(__uninitialized_fill_n_a, __uninitialized_copy_move)
      	(__uninitialized_move_copy, __uninitialized_fill_move)
      	(__uninitialized_move_fill, __uninitialized_default_1)
      	(__uninitialized_default_n_a, __uninitialized_default_novalue_1)
      	(__uninitialized_default_novalue_n_1, __uninitialized_copy_n)
      	(__uninitialized_copy_n_pair): Use it.
      6025256d
    • Jonathan Wakely's avatar
      libstdc++: Use memchr to optimize std::find [PR88545] · de19b516
      Jonathan Wakely authored
      This optimizes std::find to use memchr when searching for an integer in
      a range of bytes.
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/88545
      	PR libstdc++/115040
      	* include/bits/cpp_type_traits.h (__can_use_memchr_for_find):
      	New variable template.
      	* include/bits/ranges_util.h (__find_fn): Use memchr when
      	possible.
      	* include/bits/stl_algo.h (find): Likewise.
      	* testsuite/25_algorithms/find/bytes.cc: New test.
      de19b516
    • Tamar Christina's avatar
      AArch64: lower 2 reg TBL permutes with one zero register to 1 reg TBL. · 97fcfeac
      Tamar Christina authored
      When a two reg TBL is performed with one operand being a zero vector we can
      instead use a single reg TBL and map the indices for accessing the zero vector
      to an out of range constant.
      
      On AArch64 out of range indices into a TBL have a defined semantics of setting
      the element to zero.  Many uArches have a slower 2-reg TBL than 1-reg TBL.
      
      Before this change we had:
      
      typedef unsigned int v4si __attribute__ ((vector_size (16)));
      
      v4si f1 (v4si a)
      {
        v4si zeros = {0,0,0,0};
        return __builtin_shufflevector (a, zeros, 0, 5, 1, 6);
      }
      
      which generates:
      
      f1:
              mov     v30.16b, v0.16b
              movi    v31.4s, 0
              adrp    x0, .LC0
              ldr     q0, [x0, #:lo12:.LC0]
              tbl     v0.16b, {v30.16b - v31.16b}, v0.16b
              ret
      
      .LC0:
              .byte   0
              .byte   1
              .byte   2
              .byte   3
              .byte   20
              .byte   21
              .byte   22
              .byte   23
              .byte   4
              .byte   5
              .byte   6
              .byte   7
              .byte   24
              .byte   25
              .byte   26
              .byte   27
      
      and with the patch:
      
      f1:
              adrp    x0, .LC0
              ldr     q31, [x0, #:lo12:.LC0]
              tbl     v0.16b, {v0.16b}, v31.16b
              ret
      
      .LC0:
              .byte   0
              .byte   1
              .byte   2
              .byte   3
              .byte   -1
              .byte   -1
              .byte   -1
              .byte   -1
              .byte   4
              .byte   5
              .byte   6
              .byte   7
              .byte   -1
              .byte   -1
              .byte   -1
              .byte   -1
      
      This sequence is generated often by openmp and aside from the
      strict performance impact of this change, it also gives better
      register allocation as we no longer have the consecutive
      register limitation.
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64.cc (struct expand_vec_perm_d): Add zero_op0_p
      	and zero_op_p1.
      	(aarch64_evpc_tbl): Implement register value remapping.
      	(aarch64_vectorize_vec_perm_const): Detect if operand is a zero dup
      	before it's forced to a reg.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/aarch64/tbl_with_zero_1.c: New test.
      	* gcc.target/aarch64/tbl_with_zero_2.c: New test.
      97fcfeac
    • Tamar Christina's avatar
      AArch64: remove aarch64_simd_vec_unpack<su>_lo_ · 6ff69810
      Tamar Christina authored
      The fix for PR18127 reworked the uxtl to zip optimization.
      In doing so it undid the changes in aarch64_simd_vec_unpack<su>_lo_ and this now
      no longer matches aarch64_simd_vec_unpack<su>_hi_.  It still works because the
      RTL generated by aarch64_simd_vec_unpack<su>_lo_ overlaps with the general zero
      extend RTL and so because that one is listed before the lo pattern recog picks
      it instead.
      
      This removes aarch64_simd_vec_unpack<su>_lo_.
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64-simd.md
      	(aarch64_simd_vec_unpack<su>_lo_<mode>): Remove.
      	(vec_unpack<su>_lo_<mode): Simplify.
      	* config/aarch64/aarch64.cc (aarch64_gen_shareable_zero): Update
      	comment.
      6ff69810
    • Alex Coplan's avatar
      middle-end: Add debug functions to dump dominator tree in dot format · ae07f62a
      Alex Coplan authored
      This adds debug functions to dump the dominator tree in dot format.
      There are two overloads: one which takes a FILE * and another which
      takes a const char *fname and wraps the first with fopen/fclose for
      convenience.
      
      gcc/ChangeLog:
      
      	* dominance.cc (dot_dominance_tree): New.
      ae07f62a
    • Hu, Lin1's avatar
      i386: Refactor ssedoublemode · 319d3956
      Hu, Lin1 authored
      ssedoublemode's double should mean double type, like SI -> DI.
      And we need to refactor some patterns with <ssedoublemode> instead of
      <ssedoublevecmode>.
      
      gcc/ChangeLog:
      
      	* config/i386/sse.md (ssedoublemode): Remove mappings to twice
      	the number of same-sized elements. Add mappings to the same
      	number of double-sized elements.
      	(define_split for vec_concat_minus_plus): Change mode_attr from
      	ssedoublemode to ssedoublevecmode.
      	(define_split for vec_concat_plus_minus): Ditto.
      	(<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>):
      	Ditto.
      	(avx512f_shuf_<shuffletype>64x2_1<mask_name>): Ditto.
      	(avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Ditto.
      	(avx512f_shuf_<shuffletype>32x4_1<mask_name>): Ditto.
      319d3956
    • YunQiang Su's avatar
      MIPS: Support more cases with alien mode of SHF.DF · 320c2ed4
      YunQiang Su authored
      Currently, we support the cases that strictly fit for the instructions.
      For example, for V16QImode, we only support shuffle like
      (0<=N0, N1, N2, N3<=3 here)
      	N0,	N1,	N2,	N3
      	N0+4	N1+4	N2+4,	N3+4
      	N0+8	N1+8	N2+8,	N3+8
      	N0+12	N1+12	N2+12,	N3+12
      
      While in fact we can support more cases to try use other SHF.DF
      instructions not strictly fitting the mode.
      
      1) We can use SHF.H to support more cases for V16QImode:
      (M0/M1/M2/M3 are 0 or 2 or 4 or 6)
      	M0	M0+1,	M1,	M1+1
      	M2	M2+1,	M3,	M3+1
      	M0+8	M0+9,	M1+8,	M1+9
      	M2+8	M2+9,	M3+8,	M3+9
      
      2) We can use SHF.W to support some cases for V16QImode:
      (M0/M1/M2/M3 are 0 or 4 or 8 or 12)
      	M0,	M0+1,	M0+2,	M0+3
      	M1,	M1+1,	M1+2,	M1+3
      	M2,	M2+1,	M2+2,	M2+3
      	M3,	M3+1,	M3+2,	M3+3
      
      3) We can use SHF.W to support some cases for V8HImode:
      (M0/M1/M2/M3 are 0 or 2 or 4 or 6)
      	M0,	M0+1
      	M1,	M1+1
      	M2,	M2+1
      	M3,	M3+1
      
      4) We can also use SHF.W to swap the 2 parts of V2DF or V2DI.
      
      gcc
      	* config/mips/mips-protos.h: New function mips_msa_shf_i8.
      	* config/mips/mips-msa.md(MSA_WHB_W): Not used anymore;
      	(msa_shf_<msafmt_f>): Use mips_msa_shf_i8.
      	* config/mips/mips.cc(mips_const_vector_shuffle_set_p):
      	Support more cases try to use alien mode instruction;
      	(mips_msa_shf_i8): New function to get the correct MSA SHF
      	instruction and IMM.
      320c2ed4
    • YunQiang Su's avatar
      Testsuite/MIPS: Fix msa.c: test7_v2f64, test7_v4f32, test43_v2i64 · 33dfd679
      YunQiang Su authored
      BNEGI.W/D are used for test7_v2f64 and test7_v4f32 now.  It is
      an improvment since that we can save a instruction.
      
      ILVR.D is used for test43_v2i64 now, instead of INSVE.D.
      
      gcc/testsuite
      	* gcc.target/mips/msa.c: Fix test7_v2f64, test7_v4f32 and
      	test43_v2i64.
      33dfd679
    • YunQiang Su's avatar
      MIPS/testsuite: Add -mfpxx to call-clobbered-1.c · e08ed5f1
      YunQiang Su authored
      The scan-assembler-times rules only fit for -mfp32 and -mfpxx.
      It fails if we are configured as FP64 by default, as it has
      one less sdc1/ldc1 pair.
      
      gcc/testsuite
      	* gcc.target/mips/call-clobbered-1.c: Add -mfpxx.
      e08ed5f1
    • YunQiang Su's avatar
      MIPS/testsuite: Fix umips-save-restore-1.c · f1437b96
      YunQiang Su authored
      With some recent optimization, -O1/-O2/-O3 can archive almost same
      performace/size by stack load/store.  Thus lwm/swm will save/store
      less callee-saved register.  In fact only $16 is saved with swm.
      
      To be sure that this optimization does exist, let's add 2 more
      function calls.  So that lwm/swm can be much more profitable.
      
      If we add only once more, -O1 will still use stack load/store.
      
      gcc/testsuite
      	* gcc.target/mips/umips-save-restore-1.c: Be sure lwm/swm
      	are used for more callee-saved registers with addtional
      	2 more function calls.
      f1437b96
    • Richard Biener's avatar
      Support group size of three in SLP store permute lowering · 7eb8b657
      Richard Biener authored
      The following implements the group-size three scheme from
      vect_permute_store_chain in SLP grouped store permute lowering
      and extends it to power-of-two multiples of group size three.
      
      The scheme goes from vectors A, B and C to
      { A[0], B[0], C[0], A[1], B[1], C[1], ... } by first producing
      { A[0], B[0], X, A[1], B[1], X, ... } (with X random but chosen
      to A[n]) and then permuting in C[n] in the appropriate places.
      
      The extension goes as to replace vector elements with a
      power-of-two number of lanes and you'd get pairwise interleaving
      until the final three input permutes happen.
      
      The last permute step could be seen as extending C to { C[0], C[0],
      C[0], ... } and then performing a blend.
      
      VLA archs will want to use store-lanes here I guess, I'm not sure
      if the three vector interleave operation is also available with
      a register source and destination and thus available for a shuffle.
      
      	* tree-vect-slp.cc (vect_build_slp_instance): Special case
      	three input permute with the same number of lanes in store
      	permute lowering.
      
      	* gcc.dg/vect/slp-53.c: New testcase.
      	* gcc.dg/vect/slp-54.c: New testcase.
      7eb8b657
    • GCC Administrator's avatar
      Daily bump. · 304b6464
      GCC Administrator authored
      304b6464
  2. Jul 04, 2024
    • David Malcolm's avatar
      analyzer: convert sm_context * to sm_context & · f8c130cd
      David Malcolm authored
      
      These are never nullptr and never change, so use a reference rather
      than a pointer.
      
      No functional change intended.
      
      gcc/analyzer/ChangeLog:
      	* diagnostic-manager.cc
      	(diagnostic_manager::add_events_for_eedge): Pass sm_ctxt by
      	reference.
      	* engine.cc (impl_region_model_context::on_condition): Likewise.
      	(impl_region_model_context::on_bounded_ranges): Likewise.
      	(impl_region_model_context::on_phi): Likewise.
      	(exploded_node::on_stmt): Likewise.
      	* sm-fd.cc: Update all uses of sm_context * to sm_context &.
      	* sm-file.cc: Likewise.
      	* sm-malloc.cc: Likewise.
      	* sm-pattern-test.cc: Likewise.
      	* sm-sensitive.cc: Likewise.
      	* sm-signal.cc: Likewise.
      	* sm-taint.cc: Likewise.
      	* sm.h: Likewise.
      	* varargs.cc: Likewise.
      
      gcc/testsuite/ChangeLog:
      	* gcc.dg/plugin/analyzer_gil_plugin.c: Update all uses of
      	sm_context * to sm_context &.
      
      Signed-off-by: default avatarDavid Malcolm <dmalcolm@redhat.com>
      f8c130cd
    • David Malcolm's avatar
      analyzer: handle <error.h> at -O0 [PR115724] · a6fdb1a2
      David Malcolm authored
      
      At -O0, glibc's:
      
      __extern_always_inline void
      error (int __status, int __errnum, const char *__format, ...)
      {
        if (__builtin_constant_p (__status) && __status != 0)
          __error_noreturn (__status, __errnum, __format, __builtin_va_arg_pack ());
        else
          __error_alias (__status, __errnum, __format, __builtin_va_arg_pack ());
      }
      
      becomes just:
      
      __extern_always_inline void
      error (int __status, int __errnum, const char *__format, ...)
      {
        if (0)
          __error_noreturn (__status, __errnum, __format, __builtin_va_arg_pack ());
        else
          __error_alias (__status, __errnum, __format, __builtin_va_arg_pack ());
      }
      
      and thus calls to "error" are calls to "__error_alias" by the
      time -fanalyzer "sees" them.
      
      Handle them with more special-casing in kf.cc.
      
      gcc/analyzer/ChangeLog:
      	PR analyzer/115724
      	* kf.cc (register_known_functions): Add __error_alias and
      	__error_at_line_alias.
      
      gcc/testsuite/ChangeLog:
      	PR analyzer/115724
      	* c-c++-common/analyzer/error-pr115724.c: New test.
      
      Signed-off-by: default avatarDavid Malcolm <dmalcolm@redhat.com>
      a6fdb1a2
    • Jeff Law's avatar
      [committed][RISC-V] Fix test expectations after recent late-combine changes · b611f396
      Jeff Law authored
      With the recent DCE related adjustment to late-combine the rvv/base/vcreate.c
      test no longer has those undesirable vmvNr statements.
      
      It's a bit unclear why this wasn't written as a scan-assembler-not and xfailed
      given the comment says we don't want to see vmvNr insructions.  I must have
      missed that during review.
      
      This patch adjusts the test to expect no vmvNr statements and if they're ever
      re-introduced, we'll get a nice unexpected failure.
      
      gcc/testsuite
      	* gcc.target/riscv/rvv/base/vcreate.c: Update expected output.
      b611f396
    • John David Anglin's avatar
      Skip 30_threads/future/members/poll.cc on hppa*-*-linux* · 46ffda9b
      John David Anglin authored
      hppa*-*-linux* lacks high resolution timer support. Timer resolution
      ranges from 1 to 10ms. As a result, a large number of iterations are
      needed for the wait_for_0 and ready loops. This causes the
      wait_until_sys_epoch and wait_until_steady_epoch loops to timeout.
      There the loop wait time is determined by the timer resolution.
      
      2024-07-04  John David Anglin  <danglin@gcc.gnu.org>
      
      libstdc++-v3/ChangeLog:
      	PR libstdc++/98678
      	* testsuite/30_threads/future/members/poll.cc: Skip on hppa*-*-linux*.
      46ffda9b
    • Tamar Christina's avatar
      testsuite: Update test for PR115537 to use SVE . · adcfb4fb
      Tamar Christina authored
      The PR was about SVE codegen, the testcase accidentally used neoverse-n1
      instead of neoverse-v1 as was the original report.
      
      This updates the tool options.
      
      gcc/testsuite/ChangeLog:
      
      	PR tree-optimization/115537
      	* gcc.dg/vect/pr115537.c: Update flag from neoverse-n1 to neoverse-v1.
      adcfb4fb
    • Tamar Christina's avatar
      c++ frontend: check for missing condition for novector [PR115623] · 84acbfbe
      Tamar Christina authored
      It looks like I forgot to check in the C++ frontend if a condition exist for the
      loop being adorned with novector.  This causes a segfault because cond isn't
      expected to be null.
      
      This fixes it by issuing ignoring the pragma when there's no loop condition
      the same way we do in the C frontend.
      
      gcc/cp/ChangeLog:
      
      	PR c++/115623
      	* semantics.cc (finish_for_cond): Add check for C++ cond.
      
      gcc/testsuite/ChangeLog:
      
      	PR c++/115623
      	* g++.dg/vect/vect-novector-pragma_2.cc: New test.
      84acbfbe
    • Siarhei Volkau's avatar
      arm: Use LDMIA/STMIA for thumb1 DI/DF loads/stores · 236d6fef
      Siarhei Volkau authored
      
      If the address register is dead after load/store operation it looks
      beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
      at least if optimizing for size.
      
      gcc/ChangeLog:
      
      	* config/arm/arm.cc (thumb_load_double_from_address): Emit ldmia
      	when address reg rewritten by load.
      	* config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
      	(peephole2 to rewrite DI/DF store): New.
      	* config/arm/iterators.md (DIDF): New.
      
      gcc/testsuite:
      
      	* gcc.target/arm/thumb1-load-store-64bit.c: Add new test.
      
      Signed-off-by: default avatarSiarhei Volkau <lis8215@gmail.com>
      236d6fef
    • Alfie Richards's avatar
      Aarch64, bugfix: Fix NEON bigendian addp intrinsic [PR114890] · 11049cdf
      Alfie Richards authored
      This change removes code that switches the operands in bigendian mode erroneously.
      This fixes the related test also.
      
      gcc/ChangeLog:
      
      	PR target/114890
      	* config/aarch64/aarch64-simd.md: Remove bigendian operand swap.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/114890
      	* gcc.target/aarch64/vector_intrinsics_asm.c: Remove xfail.
      11049cdf
    • Alfie Richards's avatar
      Aarch64: Add test for non-commutative SIMD intrinsic · 14c67938
      Alfie Richards authored
      This adds a test for non-commutative SIMD NEON intrinsics.
      Specifically addp is non-commutative and has a bug in the current big-endian implementation.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/aarch64/vector_intrinsics_asm.c: New test.
      14c67938
    • Richard Biener's avatar
      middle-end/115426 - wrong gimplification of "rm" asm output operand · a4bbdec2
      Richard Biener authored
      When the operand is gimplified to an extract of a register or a
      register we have to disallow memory as we otherwise fail to
      gimplify it properly.  Instead of
      
        __asm__("" : "=rm" __imag <r>);
      
      we want
      
        __asm__("" : "=rm" D.2772);
        _1 = REALPART_EXPR <r>;
        r = COMPLEX_EXPR <_1, D.2772>;
      
      otherwise SSA rewrite will fail and generate wrong code with 'r'
      left bare in the asm output.
      
      	PR middle-end/115426
      	* gimplify.cc (gimplify_asm_expr): Handle "rm" output
      	constraint gimplified to a register (operation).
      
      	* gcc.dg/pr115426.c: New testcase.
      a4bbdec2
    • liuhongt's avatar
      Use __builtin_cpu_support instead of __get_cpuid_count. · 699087a1
      liuhongt authored
      gcc/testsuite/ChangeLog:
      
      	PR target/115748
      	* gcc.target/i386/avx512-check.h: Use __builtin_cpu_support
      	instead of __get_cpuid_count.
      699087a1
    • Roger Sayle's avatar
      i386: Add additional variant of bswaphisi2_lowpart peephole2. · 727f8b14
      Roger Sayle authored
      This patch adds an additional variation of the peephole2 used to convert
      bswaphisi2_lowpart into rotlhi3_1_slp, which converts xchgb %ah,%al into
      rotw if the flags register isn't live.  The motivating example is:
      
      void ext(int x);
      void foo(int x)
      {
        ext((x&~0xffff)|((x>>8)&0xff)|((x&0xff)<<8));
      }
      
      where GCC with -O2 currently produces:
      
      foo:	movl    %edi, %eax
              rolw    $8, %ax
              movl    %eax, %edi
              jmp     ext
      
      The issue is that the original xchgb (bswaphisi2_lowpart) can only be
      performed in "Q" registers that allow the %?h register to be used, so
      reload generates the above two movl.  However, it's later in peephole2
      where we see that CC_FLAGS can be clobbered, so we can use a rotate word,
      which is more forgiving with register allocations.  With the additional
      peephole2 proposed here, we now generate:
      
      foo:	rolw    $8, %di
              jmp     ext
      
      2024-07-04  Roger Sayle  <roger@nextmovesoftware.com>
      
      gcc/ChangeLog
      	* config/i386/i386.md (bswaphisi2_lowpart peephole2): New
      	peephole2 variant to eliminate register shuffling.
      
      gcc/testsuite/ChangeLog
      	* gcc.target/i386/xchg-4.c: New test case.
      727f8b14
    • Jeff Law's avatar
      [committed] Fix newlib build failure with rx as well as several dozen testsuite failures · 759f4abe
      Jeff Law authored
      The rx port has been failing to build newlib for a bit over a week.  I can't
      remember if it was the late-combine work or the IRA costing twiddle, regardless
      the real bug is in the rx backend.
      
      Basically dwarf2cfi is blowing up because of inconsistent state caused by the
      failure to mark a stack adjustment as frame related.  This instance in the
      epilogue looks like a simple goof.
      
      With the port building again, the testsuite would run and it showed a number of
      regressions, again related to CFI handling.  The common thread was a failure to
      mark a copy from FP to SP in the prologue as frame related.  The change which
      introduced this bug as supposed to just be changing promotions of vector types.
      It's unclear if Nick included the hunk accidentally or just goof'd on the
      logic.  Regardless it looks quite incorrect.
      
      Reverting that hunk fixes the regressions *and* fixes 94 pre-existing failures.
      
      The net is rx-elf is regression free and has moved forward in terms of its
      testsuite status.
      
      Pushing to the trunk momentarily.
      
      gcc/
      
      	* config/rx/rx.cc (rx_expand_prologue): Mark the copy from FP to SP
      	as frame related.
      	(rx_expand_epilogue): Mark the stack pointer adjustment as frame
      	related.
      759f4abe
    • Hongyu Wang's avatar
      [APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue · 8e72b1bb
      Hongyu Wang authored
      According to APX spec, the pushp/popp pairs should be matched,
      otherwise the PPX hint cannot take effect and cause performance loss.
      
      In the ix86_expand_epilogue, there are several optimizations that may
      cause the epilogue using mov to restore the regs. Check if PPX applied
      and prevent usage of mov/leave in the epilogue. Also do not use PPX
      for eh_return.
      
      gcc/ChangeLog:
      
      	* config/i386/i386.cc (ix86_expand_prologue): Set apx_ppx_used
      	flag in m.fs with TARGET_APX_PPX && !crtl->calls_eh_return.
      	(ix86_emit_save_regs): Emit ppx is available only when
      	TARGET_APX_PPX && !crtl->calls_eh_return.
      	(ix86_expand_epilogue): Don't restore reg using mov when
      	apx_ppx_used flag is true.
      	* config/i386/i386.h (struct machine_frame_state):
      	Add apx_ppx_used flag.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/apx-ppx-2.c: New test.
      	* gcc.target/i386/apx-ppx-3.c: Likewise.
      8e72b1bb
    • Jason Merrill's avatar
      c++: OVERLOAD in diagnostics · baac8f71
      Jason Merrill authored
      In modules we can get an OVERLOAD around a non-function, so let's tail
      recurse instead of falling through.  As a result we start printing the
      template header in this testcase.
      
      gcc/cp/ChangeLog:
      
      	* error.cc (dump_decl) [OVERLOAD]: Recurse on single case.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/warn/pr61945.C: Adjust diagnostic.
      baac8f71
    • Jason Merrill's avatar
      c++: CTAD and trait built-ins · 655fe94a
      Jason Merrill authored
      While poking at 101232 I noticed that we started trying to parse
      __is_invocable(_Fn, _Args...) as a functional cast to a CTAD placeholder
      type; we shouldn't consider CTAD for a template that shares a name (reserved
      for the implementation) with a built-in trait.
      
      gcc/cp/ChangeLog:
      
      	* pt.cc (ctad_template_p): Return false for trait names.
      655fe94a
    • Hu, Lin1's avatar
      vect: Fix ICE caused by missing check for TREE_CODE == SSA_NAME · d1eeafe4
      Hu, Lin1 authored
      Need to check if the tree's code is SSA_NAME before SSA_NAME_RANGE_INFO.
      
      2024-07-03  Hu, Lin1 <lin1.hu@intel.com>
      	    Andrew Pinski <quic_apinski@quicinc.com>
      
      gcc/ChangeLog:
      
      	PR tree-optimization/115753
      	* tree-vect-stmts.cc (supportable_indirect_convert_operation): Add
      	TYPE_CODE check before SSA_NAME_RANGE_INFO.
      
      gcc/testsuite/ChangeLog:
      
      	PR tree-optimization/115753
      	* gcc.dg/vect/pr115753-1.c: New test.
      	* gcc.dg/vect/pr115753-2.c: Ditto.
      	* gcc.dg/vect/pr115753-3.c: Ditto.
      d1eeafe4
    • GCC Administrator's avatar
      Daily bump. · 0720394a
      GCC Administrator authored
      0720394a
  3. Jul 03, 2024
    • Jeff Law's avatar
      [committed] Fix previously latent bug in reorg affecting cris port · e5f73853
      Jeff Law authored
      The late-combine patch has triggered a previously latent bug in reorg.
      
      Basically we have a sequence like this in the middle of reorg before we start
      relaxing delay slots (cris-elf, gcc.dg/torture/pr98289.c)
      
      > (insn 67 49 18 (sequence [
      >             (jump_insn 50 49 52 (set (pc)
      >                     (if_then_else (ne (reg:CC 19 ccr)
      >                             (const_int 0 [0]))
      >                         (label_ref:SI 30)
      >                         (pc))) "j.c":10:6 discrim 1 282 {*bnecc}
      >                  (expr_list:REG_DEAD (reg:CC 19 ccr)
      >                     (int_list:REG_BR_PROB 7 (nil)))
      >              -> 30)
      >             (insn/f 52 50 18 (set (mem:SI (reg/f:SI 14 sp) [1  S4 A8])
      >                     (reg:SI 16 srp)) 37 {*mov_tomemsi}
      >                  (nil))
      >         ]) "j.c":10:6 discrim 1 -1
      >      (nil))
      >
      > (note 18 67 54 [bb 3] NOTE_INSN_BASIC_BLOCK)
      >
      > (note 54 18 55 NOTE_INSN_EPILOGUE_BEG)
      >
      > (jump_insn 55 54 56 (return) "j.c":14:1 228 {*return_expanded}
      >      (nil)
      >  -> return)
      >
      > (barrier 56 55 43)
      >
      > (note 43 56 65 [bb 4] NOTE_INSN_BASIC_BLOCK)
      >
      > (note 65 43 30 NOTE_INSN_SWITCH_TEXT_SECTIONS)
      >
      > (code_label 30 65 8 5 6 (nil) [1 uses])
      >
      > (note 8 30 61 [bb 5] NOTE_INSN_BASIC_BLOCK)
      
      So at a high level the things to note are that insn 50 conditionally jumps
      around insn 55.  Second there's a SWITCH_TEXT_SECTIONS note between insn 50 and
      the target label for insn 50 (code_label 30).
      
      reorg sees the conditional jump around the unconditional jump/return and will
      invert the jump and retarget the original jump to an appropriate location.  In
      this case generating:
      
      > (insn 67 49 18 (sequence [
      >             (jump_insn 50 49 52 (set (pc)
      >                     (if_then_else (eq (reg:CC 19 ccr)
      >                             (const_int 0 [0]))
      >                         (label_ref:SI 68)
      >                         (pc))) "j.c":10:6 discrim 1 281 {*beqcc}
      >                  (expr_list:REG_DEAD (reg:CC 19 ccr)
      >                     (int_list:REG_BR_PROB 1073741831 (nil)))
      >              -> 68)
      >             (insn/s/f 52 50 18 (set (mem:SI (reg/f:SI 14 sp) [1  S4 A8])
      >                     (reg:SI 16 srp)) 37 {*mov_tomemsi}
      >                  (nil))
      >         ]) "j.c":10:6 discrim 1 -1
      >      (nil))
      >
      > (note 18 67 54 [bb 3] NOTE_INSN_BASIC_BLOCK)
      >
      > (note 54 18 43 NOTE_INSN_EPILOGUE_BEG)
      >
      > (note 43 54 65 [bb 4] NOTE_INSN_BASIC_BLOCK)
      >
      > (note 65 43 8 NOTE_INSN_SWITCH_TEXT_SECTIONS)
      >
      > (note 8 65 61 [bb 5] NOTE_INSN_BASIC_BLOCK)
      [ ... ]
      Where the new target of the jump is a return statement later in the IL.
      
      Note that we now have a SWITCH_TEXT_SECTIONS note that is not immediately
      preceded by a BARRIER.  That triggers an assertion in the dwarf2 code.  Removal
      of the BARRIER is inherent in this optimization.
      
      The fix is simple, we avoid this optimization when there's a
      SWITCH_TEXT_SECTIONS note between the conditional jump insn and its target.
      Thankfully we already have a routine to test for this in reorg, so we just need
      to call it appropriately.  The other approach would be to drop the note which I
      considered and discarded.
      
      We don't have great coverage for delay slot targets.  I've tested arc, cris,
      fr30, frv, h8, iq2000, microblaze, or1k, sh3  visium in my tester as crosses
      without new regressions, fixing one regression along the way.   Bootstrap &
      regression testing on sh4 and hppa will take considerably longer.
      
      gcc/
      
      	* reorg.cc (relax_delay_slots): Do not optimize a conditional
      	jump around an unconditional jump/return in the presence of
      	a text section switch.
      e5f73853
    • John David Anglin's avatar
      Revert "Delete MALLOC_ABI_ALIGNMENT define from pa32-linux.h" · ad2206d5
      John David Anglin authored
      This reverts commit 0ee3266b.
      ad2206d5
    • Harald Anlauf's avatar
      Fortran: fix associate with assumed-length character array [PR115700] · 7b7f2034
      Harald Anlauf authored
      gcc/fortran/ChangeLog:
      
      	PR fortran/115700
      	* trans-stmt.cc (trans_associate_var): When the associate target
      	is an array-valued character variable, the length is known at entry
      	of the associate block.  Move setting of string length of the
      	selector to the initialization part of the block.
      
      gcc/testsuite/ChangeLog:
      
      	PR fortran/115700
      	* gfortran.dg/associate_69.f90: New test.
      7b7f2034
    • Palmer Dabbelt's avatar
      RISC-V: Describe -march behavior for dependent extensions · 70f6bc39
      Palmer Dabbelt authored
      gcc/ChangeLog:
      
      	* doc/invoke.texi: Describe -march behavior for dependent extensions on
      	RISC-V.
      70f6bc39
    • Gianluca Guida's avatar
      RISC-V: Add support for Zabha extension · 7b2b2e3d
      Gianluca Guida authored
      The Zabha extension adds support for subword Zaamo ops.
      
      Extension: https://github.com/riscv/riscv-zabha.git
      Ratification: https://jira.riscv.org/browse/RVS-1685
      
      
      
      gcc/ChangeLog:
      
      	* common/config/riscv/riscv-common.cc
      	(riscv_subset_list::to_string): Skip zabha when not supported by
      	the assembler.
      	* config.in: Regenerate.
      	* config/riscv/arch-canonicalize: Make zabha imply zaamo.
      	* config/riscv/iterators.md (amobh): Add iterator for amo
      	byte/halfword.
      	* config/riscv/riscv.opt: Add zabha.
      	* config/riscv/sync.md (atomic_<atomic_optab><mode>): Add
      	subword atomic op pattern.
      	(zabha_atomic_fetch_<atomic_optab><mode>): Add subword
      	atomic_fetch op pattern.
      	(lrsc_atomic_fetch_<atomic_optab><mode>): Prefer zabha over lrsc
      	for subword atomic ops.
      	(zabha_atomic_exchange<mode>): Add subword atomic exchange
      	pattern.
      	(lrsc_atomic_exchange<mode>): Prefer zabha over lrsc for subword
      	atomic exchange ops.
      	* configure: Regenerate.
      	* configure.ac: Add zabha assembler check.
      	* doc/sourcebuild.texi: Add zabha documentation.
      
      gcc/testsuite/ChangeLog:
      
      	* lib/target-supports.exp: Add zabha testsuite infra support.
      	* gcc.target/riscv/amo/inline-atomics-1.c: Remove zabha to continue to
      	test the lr/sc subword patterns.
      	* gcc.target/riscv/amo/inline-atomics-2.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acq-rel.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acquire.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-relaxed.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-release.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-seq-cst.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acq-rel.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acquire.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-relaxed.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-release.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-seq-cst.c: Ditto.
      	* gcc.target/riscv/amo/zabha-all-amo-ops-char-run.c: New test.
      	* gcc.target/riscv/amo/zabha-all-amo-ops-short-run.c: New test.
      	* gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-char.c: New test.
      	* gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-short.c: New test.
      	* gcc.target/riscv/amo/zabha-rvwmo-amo-add-char.c: New test.
      	* gcc.target/riscv/amo/zabha-rvwmo-amo-add-short.c: New test.
      	* gcc.target/riscv/amo/zabha-ztso-amo-add-char.c: New test.
      	* gcc.target/riscv/amo/zabha-ztso-amo-add-short.c: New test.
      
      Co-Authored-By: default avatarPatrick O'Neill <patrick@rivosinc.com>
      Signed-Off-By: default avatarGianluca Guida <gianluca@rivosinc.com>
      Tested-by: default avatarAndrea Parri <andrea@rivosinc.com>
      7b2b2e3d
    • Luis Silva's avatar
      [PATCH] ARC: Update gcc.target/arc/pr9001184797.c test · c41eb4c7
      Luis Silva authored
      ... to comply with new standards due to stricter analysis in
      the latest GCC versions.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/arc/pr9001184797.c: Fix compiler warnings.
      c41eb4c7
    • Pan Li's avatar
      RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763] · de9254e2
      Pan Li authored
      
      According to the ISA,  the zvfhmin sub extension should only contain
      convertion insn.  Thus,  the vfmv insn acts on FP16 should not be
      present when only the zvfhmin option is given.
      
      This patch would like to fix it by split the pred_broadcast define_insn
      into zvfhmin and zvfh part.  Given below example:
      
      void test (_Float16 *dest, _Float16 bias) {
        dest[0] = bias;
        dest[1] = bias;
      }
      
      when compile with -march=rv64gcv_zfh_zvfhmin
      
      Before this patch:
      test:
        vsetivli        zero,2,e16,mf4,ta,ma
        vfmv.v.f        v1,fa0 // should not leverage vfmv for zvfhmin
        vse16.v v1,0(a0)
        ret
      
      After this patch:
      test:
        addi     sp,sp,-16
        fsh      fa0,14(sp)
        addi     a5,sp,14
        vsetivli zero,2,e16,mf4,ta,ma
        vlse16.v v1,0(a5),zero
        vse16.v  v1,0(a0)
        addi     sp,sp,16
        jr       ra
      
      	PR target/115763
      
      gcc/ChangeLog:
      
      	* config/riscv/vector.md (*pred_broadcast<mode>): Split into
      	zvfh and zvfhmin part.
      	(*pred_broadcast<mode>_zvfh): New define_insn for zvfh part.
      	(*pred_broadcast<mode>_zvfhmin): Ditto but for zvfhmin.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/base/scalar_move-5.c: Adjust asm check.
      	* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.
      	* gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto.
      	* gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto.
      	* gcc.target/riscv/rvv/base/pr115763-1.c: New test.
      	* gcc.target/riscv/rvv/base/pr115763-2.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      de9254e2
Loading