Skip to content
Snippets Groups Projects
  1. Jul 04, 2024
    • Tamar Christina's avatar
      testsuite: Update test for PR115537 to use SVE . · adcfb4fb
      Tamar Christina authored
      The PR was about SVE codegen, the testcase accidentally used neoverse-n1
      instead of neoverse-v1 as was the original report.
      
      This updates the tool options.
      
      gcc/testsuite/ChangeLog:
      
      	PR tree-optimization/115537
      	* gcc.dg/vect/pr115537.c: Update flag from neoverse-n1 to neoverse-v1.
      adcfb4fb
    • Tamar Christina's avatar
      c++ frontend: check for missing condition for novector [PR115623] · 84acbfbe
      Tamar Christina authored
      It looks like I forgot to check in the C++ frontend if a condition exist for the
      loop being adorned with novector.  This causes a segfault because cond isn't
      expected to be null.
      
      This fixes it by issuing ignoring the pragma when there's no loop condition
      the same way we do in the C frontend.
      
      gcc/cp/ChangeLog:
      
      	PR c++/115623
      	* semantics.cc (finish_for_cond): Add check for C++ cond.
      
      gcc/testsuite/ChangeLog:
      
      	PR c++/115623
      	* g++.dg/vect/vect-novector-pragma_2.cc: New test.
      84acbfbe
    • Siarhei Volkau's avatar
      arm: Use LDMIA/STMIA for thumb1 DI/DF loads/stores · 236d6fef
      Siarhei Volkau authored
      
      If the address register is dead after load/store operation it looks
      beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
      at least if optimizing for size.
      
      gcc/ChangeLog:
      
      	* config/arm/arm.cc (thumb_load_double_from_address): Emit ldmia
      	when address reg rewritten by load.
      	* config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
      	(peephole2 to rewrite DI/DF store): New.
      	* config/arm/iterators.md (DIDF): New.
      
      gcc/testsuite:
      
      	* gcc.target/arm/thumb1-load-store-64bit.c: Add new test.
      
      Signed-off-by: default avatarSiarhei Volkau <lis8215@gmail.com>
      236d6fef
    • Alfie Richards's avatar
      Aarch64, bugfix: Fix NEON bigendian addp intrinsic [PR114890] · 11049cdf
      Alfie Richards authored
      This change removes code that switches the operands in bigendian mode erroneously.
      This fixes the related test also.
      
      gcc/ChangeLog:
      
      	PR target/114890
      	* config/aarch64/aarch64-simd.md: Remove bigendian operand swap.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/114890
      	* gcc.target/aarch64/vector_intrinsics_asm.c: Remove xfail.
      11049cdf
    • Alfie Richards's avatar
      Aarch64: Add test for non-commutative SIMD intrinsic · 14c67938
      Alfie Richards authored
      This adds a test for non-commutative SIMD NEON intrinsics.
      Specifically addp is non-commutative and has a bug in the current big-endian implementation.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/aarch64/vector_intrinsics_asm.c: New test.
      14c67938
    • Richard Biener's avatar
      middle-end/115426 - wrong gimplification of "rm" asm output operand · a4bbdec2
      Richard Biener authored
      When the operand is gimplified to an extract of a register or a
      register we have to disallow memory as we otherwise fail to
      gimplify it properly.  Instead of
      
        __asm__("" : "=rm" __imag <r>);
      
      we want
      
        __asm__("" : "=rm" D.2772);
        _1 = REALPART_EXPR <r>;
        r = COMPLEX_EXPR <_1, D.2772>;
      
      otherwise SSA rewrite will fail and generate wrong code with 'r'
      left bare in the asm output.
      
      	PR middle-end/115426
      	* gimplify.cc (gimplify_asm_expr): Handle "rm" output
      	constraint gimplified to a register (operation).
      
      	* gcc.dg/pr115426.c: New testcase.
      a4bbdec2
    • liuhongt's avatar
      Use __builtin_cpu_support instead of __get_cpuid_count. · 699087a1
      liuhongt authored
      gcc/testsuite/ChangeLog:
      
      	PR target/115748
      	* gcc.target/i386/avx512-check.h: Use __builtin_cpu_support
      	instead of __get_cpuid_count.
      699087a1
    • Roger Sayle's avatar
      i386: Add additional variant of bswaphisi2_lowpart peephole2. · 727f8b14
      Roger Sayle authored
      This patch adds an additional variation of the peephole2 used to convert
      bswaphisi2_lowpart into rotlhi3_1_slp, which converts xchgb %ah,%al into
      rotw if the flags register isn't live.  The motivating example is:
      
      void ext(int x);
      void foo(int x)
      {
        ext((x&~0xffff)|((x>>8)&0xff)|((x&0xff)<<8));
      }
      
      where GCC with -O2 currently produces:
      
      foo:	movl    %edi, %eax
              rolw    $8, %ax
              movl    %eax, %edi
              jmp     ext
      
      The issue is that the original xchgb (bswaphisi2_lowpart) can only be
      performed in "Q" registers that allow the %?h register to be used, so
      reload generates the above two movl.  However, it's later in peephole2
      where we see that CC_FLAGS can be clobbered, so we can use a rotate word,
      which is more forgiving with register allocations.  With the additional
      peephole2 proposed here, we now generate:
      
      foo:	rolw    $8, %di
              jmp     ext
      
      2024-07-04  Roger Sayle  <roger@nextmovesoftware.com>
      
      gcc/ChangeLog
      	* config/i386/i386.md (bswaphisi2_lowpart peephole2): New
      	peephole2 variant to eliminate register shuffling.
      
      gcc/testsuite/ChangeLog
      	* gcc.target/i386/xchg-4.c: New test case.
      727f8b14
    • Jeff Law's avatar
      [committed] Fix newlib build failure with rx as well as several dozen testsuite failures · 759f4abe
      Jeff Law authored
      The rx port has been failing to build newlib for a bit over a week.  I can't
      remember if it was the late-combine work or the IRA costing twiddle, regardless
      the real bug is in the rx backend.
      
      Basically dwarf2cfi is blowing up because of inconsistent state caused by the
      failure to mark a stack adjustment as frame related.  This instance in the
      epilogue looks like a simple goof.
      
      With the port building again, the testsuite would run and it showed a number of
      regressions, again related to CFI handling.  The common thread was a failure to
      mark a copy from FP to SP in the prologue as frame related.  The change which
      introduced this bug as supposed to just be changing promotions of vector types.
      It's unclear if Nick included the hunk accidentally or just goof'd on the
      logic.  Regardless it looks quite incorrect.
      
      Reverting that hunk fixes the regressions *and* fixes 94 pre-existing failures.
      
      The net is rx-elf is regression free and has moved forward in terms of its
      testsuite status.
      
      Pushing to the trunk momentarily.
      
      gcc/
      
      	* config/rx/rx.cc (rx_expand_prologue): Mark the copy from FP to SP
      	as frame related.
      	(rx_expand_epilogue): Mark the stack pointer adjustment as frame
      	related.
      759f4abe
    • Hongyu Wang's avatar
      [APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue · 8e72b1bb
      Hongyu Wang authored
      According to APX spec, the pushp/popp pairs should be matched,
      otherwise the PPX hint cannot take effect and cause performance loss.
      
      In the ix86_expand_epilogue, there are several optimizations that may
      cause the epilogue using mov to restore the regs. Check if PPX applied
      and prevent usage of mov/leave in the epilogue. Also do not use PPX
      for eh_return.
      
      gcc/ChangeLog:
      
      	* config/i386/i386.cc (ix86_expand_prologue): Set apx_ppx_used
      	flag in m.fs with TARGET_APX_PPX && !crtl->calls_eh_return.
      	(ix86_emit_save_regs): Emit ppx is available only when
      	TARGET_APX_PPX && !crtl->calls_eh_return.
      	(ix86_expand_epilogue): Don't restore reg using mov when
      	apx_ppx_used flag is true.
      	* config/i386/i386.h (struct machine_frame_state):
      	Add apx_ppx_used flag.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/apx-ppx-2.c: New test.
      	* gcc.target/i386/apx-ppx-3.c: Likewise.
      8e72b1bb
    • Jason Merrill's avatar
      c++: OVERLOAD in diagnostics · baac8f71
      Jason Merrill authored
      In modules we can get an OVERLOAD around a non-function, so let's tail
      recurse instead of falling through.  As a result we start printing the
      template header in this testcase.
      
      gcc/cp/ChangeLog:
      
      	* error.cc (dump_decl) [OVERLOAD]: Recurse on single case.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/warn/pr61945.C: Adjust diagnostic.
      baac8f71
    • Jason Merrill's avatar
      c++: CTAD and trait built-ins · 655fe94a
      Jason Merrill authored
      While poking at 101232 I noticed that we started trying to parse
      __is_invocable(_Fn, _Args...) as a functional cast to a CTAD placeholder
      type; we shouldn't consider CTAD for a template that shares a name (reserved
      for the implementation) with a built-in trait.
      
      gcc/cp/ChangeLog:
      
      	* pt.cc (ctad_template_p): Return false for trait names.
      655fe94a
    • Hu, Lin1's avatar
      vect: Fix ICE caused by missing check for TREE_CODE == SSA_NAME · d1eeafe4
      Hu, Lin1 authored
      Need to check if the tree's code is SSA_NAME before SSA_NAME_RANGE_INFO.
      
      2024-07-03  Hu, Lin1 <lin1.hu@intel.com>
      	    Andrew Pinski <quic_apinski@quicinc.com>
      
      gcc/ChangeLog:
      
      	PR tree-optimization/115753
      	* tree-vect-stmts.cc (supportable_indirect_convert_operation): Add
      	TYPE_CODE check before SSA_NAME_RANGE_INFO.
      
      gcc/testsuite/ChangeLog:
      
      	PR tree-optimization/115753
      	* gcc.dg/vect/pr115753-1.c: New test.
      	* gcc.dg/vect/pr115753-2.c: Ditto.
      	* gcc.dg/vect/pr115753-3.c: Ditto.
      d1eeafe4
    • GCC Administrator's avatar
      Daily bump. · 0720394a
      GCC Administrator authored
      0720394a
  2. Jul 03, 2024
    • Jeff Law's avatar
      [committed] Fix previously latent bug in reorg affecting cris port · e5f73853
      Jeff Law authored
      The late-combine patch has triggered a previously latent bug in reorg.
      
      Basically we have a sequence like this in the middle of reorg before we start
      relaxing delay slots (cris-elf, gcc.dg/torture/pr98289.c)
      
      > (insn 67 49 18 (sequence [
      >             (jump_insn 50 49 52 (set (pc)
      >                     (if_then_else (ne (reg:CC 19 ccr)
      >                             (const_int 0 [0]))
      >                         (label_ref:SI 30)
      >                         (pc))) "j.c":10:6 discrim 1 282 {*bnecc}
      >                  (expr_list:REG_DEAD (reg:CC 19 ccr)
      >                     (int_list:REG_BR_PROB 7 (nil)))
      >              -> 30)
      >             (insn/f 52 50 18 (set (mem:SI (reg/f:SI 14 sp) [1  S4 A8])
      >                     (reg:SI 16 srp)) 37 {*mov_tomemsi}
      >                  (nil))
      >         ]) "j.c":10:6 discrim 1 -1
      >      (nil))
      >
      > (note 18 67 54 [bb 3] NOTE_INSN_BASIC_BLOCK)
      >
      > (note 54 18 55 NOTE_INSN_EPILOGUE_BEG)
      >
      > (jump_insn 55 54 56 (return) "j.c":14:1 228 {*return_expanded}
      >      (nil)
      >  -> return)
      >
      > (barrier 56 55 43)
      >
      > (note 43 56 65 [bb 4] NOTE_INSN_BASIC_BLOCK)
      >
      > (note 65 43 30 NOTE_INSN_SWITCH_TEXT_SECTIONS)
      >
      > (code_label 30 65 8 5 6 (nil) [1 uses])
      >
      > (note 8 30 61 [bb 5] NOTE_INSN_BASIC_BLOCK)
      
      So at a high level the things to note are that insn 50 conditionally jumps
      around insn 55.  Second there's a SWITCH_TEXT_SECTIONS note between insn 50 and
      the target label for insn 50 (code_label 30).
      
      reorg sees the conditional jump around the unconditional jump/return and will
      invert the jump and retarget the original jump to an appropriate location.  In
      this case generating:
      
      > (insn 67 49 18 (sequence [
      >             (jump_insn 50 49 52 (set (pc)
      >                     (if_then_else (eq (reg:CC 19 ccr)
      >                             (const_int 0 [0]))
      >                         (label_ref:SI 68)
      >                         (pc))) "j.c":10:6 discrim 1 281 {*beqcc}
      >                  (expr_list:REG_DEAD (reg:CC 19 ccr)
      >                     (int_list:REG_BR_PROB 1073741831 (nil)))
      >              -> 68)
      >             (insn/s/f 52 50 18 (set (mem:SI (reg/f:SI 14 sp) [1  S4 A8])
      >                     (reg:SI 16 srp)) 37 {*mov_tomemsi}
      >                  (nil))
      >         ]) "j.c":10:6 discrim 1 -1
      >      (nil))
      >
      > (note 18 67 54 [bb 3] NOTE_INSN_BASIC_BLOCK)
      >
      > (note 54 18 43 NOTE_INSN_EPILOGUE_BEG)
      >
      > (note 43 54 65 [bb 4] NOTE_INSN_BASIC_BLOCK)
      >
      > (note 65 43 8 NOTE_INSN_SWITCH_TEXT_SECTIONS)
      >
      > (note 8 65 61 [bb 5] NOTE_INSN_BASIC_BLOCK)
      [ ... ]
      Where the new target of the jump is a return statement later in the IL.
      
      Note that we now have a SWITCH_TEXT_SECTIONS note that is not immediately
      preceded by a BARRIER.  That triggers an assertion in the dwarf2 code.  Removal
      of the BARRIER is inherent in this optimization.
      
      The fix is simple, we avoid this optimization when there's a
      SWITCH_TEXT_SECTIONS note between the conditional jump insn and its target.
      Thankfully we already have a routine to test for this in reorg, so we just need
      to call it appropriately.  The other approach would be to drop the note which I
      considered and discarded.
      
      We don't have great coverage for delay slot targets.  I've tested arc, cris,
      fr30, frv, h8, iq2000, microblaze, or1k, sh3  visium in my tester as crosses
      without new regressions, fixing one regression along the way.   Bootstrap &
      regression testing on sh4 and hppa will take considerably longer.
      
      gcc/
      
      	* reorg.cc (relax_delay_slots): Do not optimize a conditional
      	jump around an unconditional jump/return in the presence of
      	a text section switch.
      e5f73853
    • John David Anglin's avatar
      Revert "Delete MALLOC_ABI_ALIGNMENT define from pa32-linux.h" · ad2206d5
      John David Anglin authored
      This reverts commit 0ee3266b.
      ad2206d5
    • Harald Anlauf's avatar
      Fortran: fix associate with assumed-length character array [PR115700] · 7b7f2034
      Harald Anlauf authored
      gcc/fortran/ChangeLog:
      
      	PR fortran/115700
      	* trans-stmt.cc (trans_associate_var): When the associate target
      	is an array-valued character variable, the length is known at entry
      	of the associate block.  Move setting of string length of the
      	selector to the initialization part of the block.
      
      gcc/testsuite/ChangeLog:
      
      	PR fortran/115700
      	* gfortran.dg/associate_69.f90: New test.
      7b7f2034
    • Palmer Dabbelt's avatar
      RISC-V: Describe -march behavior for dependent extensions · 70f6bc39
      Palmer Dabbelt authored
      gcc/ChangeLog:
      
      	* doc/invoke.texi: Describe -march behavior for dependent extensions on
      	RISC-V.
      Unverified
      70f6bc39
    • Gianluca Guida's avatar
      RISC-V: Add support for Zabha extension · 7b2b2e3d
      Gianluca Guida authored
      The Zabha extension adds support for subword Zaamo ops.
      
      Extension: https://github.com/riscv/riscv-zabha.git
      Ratification: https://jira.riscv.org/browse/RVS-1685
      
      
      
      gcc/ChangeLog:
      
      	* common/config/riscv/riscv-common.cc
      	(riscv_subset_list::to_string): Skip zabha when not supported by
      	the assembler.
      	* config.in: Regenerate.
      	* config/riscv/arch-canonicalize: Make zabha imply zaamo.
      	* config/riscv/iterators.md (amobh): Add iterator for amo
      	byte/halfword.
      	* config/riscv/riscv.opt: Add zabha.
      	* config/riscv/sync.md (atomic_<atomic_optab><mode>): Add
      	subword atomic op pattern.
      	(zabha_atomic_fetch_<atomic_optab><mode>): Add subword
      	atomic_fetch op pattern.
      	(lrsc_atomic_fetch_<atomic_optab><mode>): Prefer zabha over lrsc
      	for subword atomic ops.
      	(zabha_atomic_exchange<mode>): Add subword atomic exchange
      	pattern.
      	(lrsc_atomic_exchange<mode>): Prefer zabha over lrsc for subword
      	atomic exchange ops.
      	* configure: Regenerate.
      	* configure.ac: Add zabha assembler check.
      	* doc/sourcebuild.texi: Add zabha documentation.
      
      gcc/testsuite/ChangeLog:
      
      	* lib/target-supports.exp: Add zabha testsuite infra support.
      	* gcc.target/riscv/amo/inline-atomics-1.c: Remove zabha to continue to
      	test the lr/sc subword patterns.
      	* gcc.target/riscv/amo/inline-atomics-2.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acq-rel.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acquire.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-relaxed.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-release.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-seq-cst.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acq-rel.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acquire.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-relaxed.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-release.c: Ditto.
      	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-seq-cst.c: Ditto.
      	* gcc.target/riscv/amo/zabha-all-amo-ops-char-run.c: New test.
      	* gcc.target/riscv/amo/zabha-all-amo-ops-short-run.c: New test.
      	* gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-char.c: New test.
      	* gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-short.c: New test.
      	* gcc.target/riscv/amo/zabha-rvwmo-amo-add-char.c: New test.
      	* gcc.target/riscv/amo/zabha-rvwmo-amo-add-short.c: New test.
      	* gcc.target/riscv/amo/zabha-ztso-amo-add-char.c: New test.
      	* gcc.target/riscv/amo/zabha-ztso-amo-add-short.c: New test.
      
      Co-Authored-By: default avatarPatrick O'Neill <patrick@rivosinc.com>
      Signed-Off-By: default avatarGianluca Guida <gianluca@rivosinc.com>
      Tested-by: default avatarAndrea Parri <andrea@rivosinc.com>
      Unverified
      7b2b2e3d
    • Luis Silva's avatar
      [PATCH] ARC: Update gcc.target/arc/pr9001184797.c test · c41eb4c7
      Luis Silva authored
      ... to comply with new standards due to stricter analysis in
      the latest GCC versions.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/arc/pr9001184797.c: Fix compiler warnings.
      c41eb4c7
    • Pan Li's avatar
      RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763] · de9254e2
      Pan Li authored
      
      According to the ISA,  the zvfhmin sub extension should only contain
      convertion insn.  Thus,  the vfmv insn acts on FP16 should not be
      present when only the zvfhmin option is given.
      
      This patch would like to fix it by split the pred_broadcast define_insn
      into zvfhmin and zvfh part.  Given below example:
      
      void test (_Float16 *dest, _Float16 bias) {
        dest[0] = bias;
        dest[1] = bias;
      }
      
      when compile with -march=rv64gcv_zfh_zvfhmin
      
      Before this patch:
      test:
        vsetivli        zero,2,e16,mf4,ta,ma
        vfmv.v.f        v1,fa0 // should not leverage vfmv for zvfhmin
        vse16.v v1,0(a0)
        ret
      
      After this patch:
      test:
        addi     sp,sp,-16
        fsh      fa0,14(sp)
        addi     a5,sp,14
        vsetivli zero,2,e16,mf4,ta,ma
        vlse16.v v1,0(a5),zero
        vse16.v  v1,0(a0)
        addi     sp,sp,16
        jr       ra
      
      	PR target/115763
      
      gcc/ChangeLog:
      
      	* config/riscv/vector.md (*pred_broadcast<mode>): Split into
      	zvfh and zvfhmin part.
      	(*pred_broadcast<mode>_zvfh): New define_insn for zvfh part.
      	(*pred_broadcast<mode>_zvfhmin): Ditto but for zvfhmin.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/base/scalar_move-5.c: Adjust asm check.
      	* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.
      	* gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto.
      	* gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto.
      	* gcc.target/riscv/rvv/base/pr115763-1.c: New test.
      	* gcc.target/riscv/rvv/base/pr115763-2.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      de9254e2
    • Pan Li's avatar
      Match: Allow more types truncation for .SAT_TRUNC · 44c767c0
      Pan Li authored
      
      The .SAT_TRUNC has the input and output types,  aka cvt from
      itype to otype and the sizeof (otype) < sizeof (itype).  The
      previous patch only allows the sizeof (otype) == sizeof (itype) / 2.
      But actually we have 1/4 and 1/8 truncation.
      
      This patch would like to support more types trunction when
      sizeof (otype) < sizeof (itype).  The below truncation will be
      covered.
      
      * uint64_t => uint8_t
      * uint64_t => uint16_t
      * uint64_t => uint32_t
      * uint32_t => uint8_t
      * uint32_t => uint16_t
      * uint16_t => uint8_t
      
      The below test suites are passed for this patch:
      1. The rv64gcv fully regression tests.
      2. The rv64gcv build with glibc.
      3. The x86 bootstrap tests.
      4. The x86 fully regression tests.
      
      gcc/ChangeLog:
      
      	* match.pd: Allow any otype is less than itype truncation.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      44c767c0
    • Pan Li's avatar
      Vect: Support IFN SAT_TRUNC for unsigned vector int · 8d2c460e
      Pan Li authored
      
      This patch would like to support the .SAT_TRUNC for the unsigned
      vector int.  Given we have below example code:
      
      Form 1
        #define VEC_DEF_SAT_U_TRUC_FMT_1(NT, WT)                             \
        void __attribute__((noinline))                                       \
        vec_sat_u_truc_##WT##_to_##NT##_fmt_1 (NT *x, WT *y, unsigned limit) \
        {                                                                    \
          for (unsigned i = 0; i < limit; i++)                               \
            {                                                                \
              bool overflow = y[i] > (WT)(NT)(-1);                           \
              x[i] = ((NT)y[i]) | (NT)-overflow;                             \
            }                                                                \
        }
      
      VEC_DEF_SAT_U_TRUC_FMT_1 (uint32_t, uint64_t)
      
      Before this patch:
      void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y, unsigned int limit)
      {
        ...
        _51 = .SELECT_VL (ivtmp_49, POLY_INT_CST [2, 2]);
        ivtmp_35 = _51 * 8;
        vect__4.7_32 = .MASK_LEN_LOAD (vectp_y.5_34, 64B, { -1, ... }, _51, 0);
        mask_overflow_16.8_30 = vect__4.7_32 > { 4294967295, ... };
        vect__5.9_29 = (vector([2,2]) unsigned int) vect__4.7_32;
        vect__10.13_20 = .VCOND_MASK (mask_overflow_16.8_30, { 4294967295, ... }, vect__5.9_29);
        ivtmp_12 = _51 * 4;
        .MASK_LEN_STORE (vectp_x.14_11, 32B, { -1, ... }, _51, 0, vect__10.13_20);
        vectp_y.5_33 = vectp_y.5_34 + ivtmp_35;
        vectp_x.14_46 = vectp_x.14_11 + ivtmp_12;
        ivtmp_50 = ivtmp_49 - _51;
        if (ivtmp_50 != 0)
        ...
      }
      
      After this patch:
      void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y, unsigned int limit)
      {
        ...
        _12 = .SELECT_VL (ivtmp_21, POLY_INT_CST [2, 2]);
        ivtmp_34 = _12 * 8;
        vect__4.7_31 = .MASK_LEN_LOAD (vectp_y.5_33, 64B, { -1, ... }, _12, 0);
        vect_patt_40.8_30 = .SAT_TRUNC (vect__4.7_31); // << .SAT_TRUNC
        ivtmp_29 = _12 * 4;
        .MASK_LEN_STORE (vectp_x.9_28, 32B, { -1, ... }, _12, 0, vect_patt_40.8_30);
        vectp_y.5_32 = vectp_y.5_33 + ivtmp_34;
        vectp_x.9_27 = vectp_x.9_28 + ivtmp_29;
        ivtmp_20 = ivtmp_21 - _12;
        if (ivtmp_20 != 0)
        ...
      }
      
      The below test suites are passed for this patch
      * The x86 bootstrap test.
      * The x86 fully regression test.
      * The rv64gcv fully regression tests.
      
      gcc/ChangeLog:
      
      	* tree-vect-patterns.cc (gimple_unsigned_integer_sat_trunc): Add
      	new decl generated by match.
      	(vect_recog_sat_trunc_pattern): Add new func impl to recog the
      	.SAT_TRUNC pattern.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      8d2c460e
    • Richard Biener's avatar
      Remove redundant vector permute dump · 1dc20965
      Richard Biener authored
      The following removes redundant dumping in vect permute vectorization.
      
      	* tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove
      	redundant dump.
      1dc20965
    • Jennifer Schmitz's avatar
      [PATCH] match.pd: Fold x/sqrt(x) to sqrt(x) · 8dc5ad3c
      Jennifer Schmitz authored
      
      This patch adds a pattern in match.pd folding x/sqrt(x) to sqrt(x) for -funsafe-math-optimizations. Test cases were added for double, float, and long double.
      
      The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
      Ok for mainline?
      
      Signed-off-by: default avatarJennifer Schmitz <jschmitz@nvidia.com>
      
      gcc/
      
      	* match.pd: Fold x/sqrt(x) to sqrt(x).
      
      gcc/testsuite/
      
      	* gcc.dg/tree-ssa/sqrt_div.c: New test.
      8dc5ad3c
    • Alexandre Oliva's avatar
      Deduplicate explicitly-sized types · 640f0f3e
      Alexandre Oliva authored
      When make_type_from_size is called with a biased type, for an entity
      that isn't explicitly biased, we may refrain from reusing the given
      type because it doesn't seem to match, and then proceed to create an
      exact copy of that type.
      
      Compute earlier the biased status of the expected type, early enough
      for the suitability check of the given type.  Modify for_biased
      instead of biased_p, so that biased_p remains with the given type's
      status for the comparison.
      
      Avoid creating unnecessary copies of types in make_type_from_size, by
      caching and reusing previously-created identical types, similarly to
      the caching of packable types.
      
      While at that, fix two vaguely related issues:
      
      - TYPE_DEBUG_TYPE's storage is shared with other sorts of references
      to types, so it shouldn't be accessed unless
      TYPE_CAN_HAVE_DEBUG_TYPE_P holds.
      
      - When we choose the narrower/packed variant of a type as the main
      debug info type, we fail to output its name if we fail to follow debug
      type for the TYPE_NAME decl type in modified_type_die.
      
      
      for  gcc/ada/ChangeLog
      
      	* gcc-interface/misc.cc (gnat_get_array_descr_info): Only follow
      	TYPE_DEBUG_TYPE if TYPE_CAN_HAVE_DEBUG_TYPE_P.
      	* gcc-interface/utils.cc (sized_type_hash): New struct.
      	(sized_type_hasher): New struct.
      	(sized_type_hash_table): New variable.
      	(init_gnat_utils): Allocate it.
      	(destroy_gnat_utils): Release it.
      	(sized_type_hasher::equal): New.
      	(hash_sized_type): New.
      	(canonicalize_sized_type): New.
      	(make_type_from_size): Use it to cache packed variants.  Fix
      	type reuse by combining biased_p and for_biased earlier.  Hold
      	the combination in for_biased, adjusting later uses.
      
      for  gcc/ChangeLog
      
      	* dwarf2out.cc (modified_type_die): Follow name's debug type.
      
      for  gcc/testsuite/ChangeLog
      
      	* gnat.dg/bias1.adb: Count occurrences of -7.*DW_AT_GNU_bias.
      640f0f3e
    • Alexandre Oliva's avatar
      [debug] Avoid dropping bits from num/den in fixed-point types · 113c4826
      Alexandre Oliva authored
      We used to use an unsigned 128-bit type to hold the numerator and
      denominator used to represent the delta of a fixed-point type in debug
      information, but there are cases in which that was not enough, and
      more significant bits silently overflowed and got omitted from debug
      information.
      
      Introduce a mode in which UI_to_gnu selects a wide-enough unsigned
      type, and use that to convert numerator and denominator.  While at
      that, avoid exceeding the maximum precision for wide ints, and for
      available int modes, when selecting a type to represent very wide
      constants, falling back to 0/0 for unrepresentable fractions.
      
      
      for  gcc/ada/ChangeLog
      
      	* gcc-interface/cuintp.cc (UI_To_gnu): Add mode that selects a
      	wide enough unsigned type.  Fail if the constant exceeds the
      	representable numbers.
      	* gcc-interface/decl.cc (gnat_to_gnu_entity): Use it for
      	numerator and denominator of fixed-point types.  In case of
      	failure, fall back to an indeterminate fraction.
      113c4826
    • Alexandre Oliva's avatar
      [i386] restore recompute to override opts after change [PR113719] · bf2fc0a2
      Alexandre Oliva authored
      The first patch for PR113719 regressed gcc.dg/ipa/iinline-attr.c on
      toolchains configured to --enable-frame-pointer, because the
      optimization node created within handle_optimize_attribute had
      flag_omit_frame_pointer incorrectly set, whereas
      default_optimization_node didn't.  With this difference,
      can_inline_edge_by_limits_p flagged an optimization mismatch and we
      refused to inline the function that had a redundant optimization flag
      into one that didn't, which is exactly what is tested for there.
      
      This patch restores the calls to ix86_default_align and
      ix86_recompute_optlev_based_flags that used to be, and ought to be,
      issued during TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE, but preserves the
      intent of the original change, of having those functions called at
      different spots within ix86_option_override_internal.  To that end,
      the remaining bits were refactored into a separate function, that was
      in turn adjusted to operate on explicitly-passed opts and opts_set,
      rather than going for their global counterparts.
      
      
      for  gcc/ChangeLog
      
      	PR target/113719
      	* config/i386/i386-options.cc
      	(ix86_override_options_after_change_1): Add opts and opts_set
      	parms, operate on them, after factoring out of...
      	(ix86_override_options_after_change): ... this.  Restore calls
      	of ix86_default_align and ix86_recompute_optlev_based_flags.
      	(ix86_option_override_internal): Call the factored-out bits.
      bf2fc0a2
    • Kyrylo Tkachov's avatar
      aarch64: PR target/115475 Implement missing __ARM_FEATURE_SVE_BF16 macro · 6492c713
      Kyrylo Tkachov authored
      
      The ACLE requires __ARM_FEATURE_SVE_BF16 to be enabled when SVE and BF16
      and the associated intrinsics are available.
      GCC does support the required intrinsics for TARGET_SVE_BF16 so define
      this macro too.
      
      Bootstrapped and tested on aarch64-none-linux-gnu.
      
      gcc/
      
      	PR target/115475
      	* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
      	Define __ARM_FEATURE_SVE_BF16 for TARGET_SVE_BF16.
      
      gcc/testsuite/
      
      	PR target/115475
      	* gcc.target/aarch64/acle/bf16_sve_feature.c: New test.
      
      Signed-off-by: default avatarKyrylo Tkachov <ktkachov@nvidia.com>
      6492c713
    • Kyrylo Tkachov's avatar
      aarch64: PR target/115457 Implement missing __ARM_FEATURE_BF16 macro · c1094213
      Kyrylo Tkachov authored
      
      The ACLE asks the user to test for __ARM_FEATURE_BF16 before using the
      <arm_bf16.h> header but GCC doesn't set this up.
      LLVM does, so this is an inconsistency between the compilers.
      
      This patch enables that macro for TARGET_BF16_FP.
      Bootstrapped and tested on aarch64-none-linux-gnu.
      
      gcc/
      
      	PR target/115457
      	* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
      	Define __ARM_FEATURE_BF16 for TARGET_BF16_FP.
      
      gcc/testsuite/
      
      	PR target/115457
      	* gcc.target/aarch64/acle/bf16_feature.c: New test.
      
      Signed-off-by: default avatarKyrylo Tkachov <ktkachov@nvidia.com>
      c1094213
    • Richard Biener's avatar
      Handle NULL stmt in SLP_TREE_SCALAR_STMTS · 03a810da
      Richard Biener authored
      The following starts to handle NULL elements in SLP_TREE_SCALAR_STMTS
      with the first candidate being the two-operator nodes where some
      lanes are do-not-care and also do not have a scalar stmt computing
      the result.  I originally added SLP_TREE_SCALAR_STMTS to two-operator
      nodes but this exposes PR115764, so I've split that out.
      
      I have a patch use NULL elements for loads from groups with gaps
      where we get around not doing that by having a load permutation.
      
      	* tree-vect-slp.cc (bst_traits::hash): Handle NULL elements
      	in SLP_TREE_SCALAR_STMTS.
      	(vect_print_slp_tree): Likewise.
      	(vect_mark_slp_stmts): Likewise.
      	(vect_mark_slp_stmts_relevant): Likewise.
      	(vect_find_last_scalar_stmt_in_slp): Likewise.
      	(vect_bb_slp_mark_live_stmts): Likewise.
      	(vect_slp_prune_covered_roots): Likewise.
      	(vect_bb_partition_graph_r): Likewise.
      	(vect_remove_slp_scalar_calls): Likewise.
      	(vect_slp_gather_vectorized_scalar_stmts): Likewise.
      	(vect_bb_slp_scalar_cost): Likewise.
      	(vect_contains_pattern_stmt_p): Likewise.
      	(vect_slp_convert_to_external): Likewise.
      	(vect_find_first_scalar_stmt_in_slp): Likewise.
      	(vect_optimize_slp_pass::remove_redundant_permutations): Likewise.
      	(vect_slp_analyze_node_operations_1): Likewise.
      	(vect_schedule_slp_node): Likewise.
      	* tree-vect-stmts.cc (can_vectorize_live_stmts): Likewise.
      	(vectorizable_shift): Likewise.
      	* tree-vect-data-refs.cc (vect_slp_analyze_load_dependences):
      	Handle NULL elements in SLP_TREE_SCALAR_STMTS.
      03a810da
    • Georg-Johann Lay's avatar
      AVR: target/98762 - Handle partial clobber in movqi output. · e9fb6efa
      Georg-Johann Lay authored
      	PR target/98762
      gcc/
      	* config/avr/avr.cc (avr_out_movqi_r_mr_reg_disp_tiny): Properly
      	restore the base register when it is partially clobbered.
      gcc/testsuite/
      	* gcc.target/avr/torture/pr98762.c: New test.
      e9fb6efa
    • Tamar Christina's avatar
      ivopts: replace constant_multiple_of with aff_combination_constant_multiple_p [PR114932] · 735edbf1
      Tamar Christina authored
      The current implementation of constant_multiple_of is doing a more limited
      version of aff_combination_constant_multiple_p.
      
      The only non-debug usage of constant_multiple_of will proceed with the values
      as affine trees.  There is scope for further optimization here, namely I believe
      that if constant_multiple_of returns the aff_tree after the conversion then
      get_computation_aff_1 can use it instead of manually creating the aff_tree.
      
      However I think it makes sense to first commit this smaller change and then
      incrementally change things.
      
      gcc/ChangeLog:
      
      	PR tree-optimization/114932
      	* tree-ssa-loop-ivopts.cc (constant_multiple_of): Use
      	aff_combination_constant_multiple_p instead.
      735edbf1
    • Tamar Christina's avatar
      ivopts: fix wide_int_constant_multiple_p when VAL and DIV are 0. [PR114932] · 25127123
      Tamar Christina authored
      wide_int_constant_multiple_p tries to check if for two tree expressions a and b
      that there is a multiplier which makes a == b * c.
      
      This code however seems to think that there's no c where a=0 and b=0 are equal
      which is of course wrong.
      
      This fixes it and also fixes the comment.
      
      gcc/ChangeLog:
      
      	PR tree-optimization/114932
      	* tree-affine.cc (wide_int_constant_multiple_p): Support 0 and 0 being
      	multiples.
      25127123
    • Richard Sandiford's avatar
      Give fast DCE a separate dirty flag · 47ea6bdd
      Richard Sandiford authored
      Thomas pointed out that we sometimes failed to eliminate some dead code
      (specifically clobbers of otherwise unused registers) on nvptx when
      late-combine is enabled.  This happens because:
      
      - combine is able to optimise the function in a way that exposes dead code.
        This leaves the df information in a "dirty" state.
      
      - late_combine calls df_analyze without DF_LR_RUN_DCE run set.
        This updates the df information and clears the "dirty" state.
      
      - late_combine doesn't find any extra optimisations, and so leaves
        the df information up-to-date.
      
      - if_after_combine (ce2) calls df_analyze with DF_LR_RUN_DCE set.
        Because the df information is already up-to-date, fast DCE is
        not run.
      
      The upshot is that running late-combine has the effect of suppressing
      a DCE opportunity that would have been noticed without late_combine.
      
      I think this shows that we should track the state of the DCE separately
      from the LR problem.  Every pass updates the latter, but not all passes
      update the former.
      
      gcc/
      	* df.h (DF_LR_DCE): New df_problem_id.
      	(df_lr_dce): New macro.
      	* df-core.cc (rest_of_handle_df_finish): Check for a null free_fun.
      	* df-problems.cc (df_lr_finalize): Split out fast DCE handling to...
      	(df_lr_dce_finalize): ...this new function.
      	(problem_LR_DCE): New df_problem.
      	(df_lr_add_problem): Register LR_DCE rather than LR itself.
      	* dce.cc (fast_dce): Clear df_lr_dce->solutions_dirty.
      47ea6bdd
    • liuhongt's avatar
      Move runtime check into a separate function and guard it with target ("no-avx") · 239ad907
      liuhongt authored
      The patch can avoid SIGILL on non-AVX512 machine due to kmovd is
      generated in dynamic check.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/115748
      	* gcc.target/i386/avx512-check.h: Move runtime check into a
      	separate function and guard it with target ("no-avx").
      239ad907
    • Pan Li's avatar
      RISC-V: Fix asm check failure for truncated after SAT_SUB · ab3e3d2f
      Pan Li authored
      
      It seems that the asm check is incorrect for truncated after SAT_SUB,
      we should take the vx check for vssubu instead of vv check.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c:
      	Update vssubu check from vv to vx.
      	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c:
      	Ditto.
      	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c:
      	Ditto.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      ab3e3d2f
    • Richard Biener's avatar
      tree-optimization/115764 - testcase for BB SLP issue · 2be2145f
      Richard Biener authored
      The following adds a testcase for a CSE issue with BB SLP two operator
      handling when we make those CSE aware by providing SLP_TREE_SCALAR_STMTS
      for them.  This was reduced from 526.blender_r.
      
      	PR tree-optimization/115764
      	* gcc.dg/vect/bb-slp-76.c: New testcase.
      2be2145f
    • Lewis Hyatt's avatar
      preprocessor: Create the parser before handling command-line includes [PR115312] · 038d64f6
      Lewis Hyatt authored
      Since r14-2893, we create a parser object in preprocess-only mode for the
      purpose of parsing #pragma while preprocessing. The parser object was
      formerly created after calling c_finish_options(), which leads to problems
      on platforms that don't use stdc-predef.h (such as MinGW, as reported in
      the PR). On such platforms, the call to c_finish_options() will process
      the first command-line-specified include file. If that includes a PCH, then
      c-ppoutput.cc will encounter a state it did not anticipate. Fix it by
      creating the parser prior to calling c_finish_options().
      
      gcc/c-family/ChangeLog:
      
      	PR pch/115312
      	* c-opts.cc (c_common_init): Call c_init_preprocess() before
      	c_finish_options() so that a parser is available to process any
      	includes specified on the command line.
      
      gcc/testsuite/ChangeLog:
      
      	PR pch/115312
      	* g++.dg/pch/pr115312.C: New test.
      	* g++.dg/pch/pr115312.Hs: New test.
      038d64f6
    • GCC Administrator's avatar
      Daily bump. · 75198248
      GCC Administrator authored
      75198248
Loading