Skip to content
Snippets Groups Projects
  1. Jun 06, 2024
    • Thomas Schwinge's avatar
      nvptx, libgcc: Stub unwinding implementation · a29c5852
      Thomas Schwinge authored
      
      Adding stub '_Unwind_Backtrace', '_Unwind_GetIPInfo' functions is necessary
      for linking libbacktrace, as a normal (non-'LIBGFOR_MINIMAL') configuration
      of libgfortran wants to do, for example.
      
      The file 'libgcc/config/nvptx/unwind-nvptx.c' is copied from
      'libgcc/config/gcn/unwind-gcn.c'.
      
      libgcc/ChangeLog:
      
      	* config/nvptx/t-nvptx: Add unwind-nvptx.c.
      	* config/nvptx/unwind-nvptx.c: New file.
      
      Co-authored-by: default avatarAndrew Stubbs <ams@gcc.gnu.org>
      a29c5852
    • Thomas Schwinge's avatar
      nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld' · 5bbe5350
      Thomas Schwinge authored
      This extends commit d9c90c82
      "nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'"
      for offloading.
      
      	libgcc/
      	* config/nvptx/gbl-ctors.c ["mgomp"]
      	(__do_global_ctors__entry__mgomp)
      	(__do_global_dtors__entry__mgomp): New.
      	[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
      	New.
      	libgomp/
      	* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
      	(nvptx_close_device, GOMP_OFFLOAD_load_image)
      	(GOMP_OFFLOAD_unload_image): Call it.
      5bbe5350
    • Thomas Schwinge's avatar
      nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution, via 'vote.all.pred' · b4e68dd9
      Thomas Schwinge authored
      For example, this allows for '-muniform-simt' code to be executed
      single-threaded, which currently fails (device-side 'trap'): the '0xffffffff'
      bitmask isn't correct if not all 32 threads of a warp are active.  The same
      issue/fix, I suppose but have not verified, would apply if we were to allow for
      OpenACC 'vector_length' smaller than 32, for example for OpenACC 'serial'.
      
      We use 'nvptx_uniform_warp_check' only for PTX ISA version less than 6.0.
      Otherwise we're using 'nvptx_warpsync', which emits 'bar.warp.sync 0xffffffff',
      which evidently appears to do the right thing.  (I've tested '-muniform-simt'
      code executing single-threaded.)
      
      The change that I proposed on 2022-12-15 was to emit PTX code to calculate
      '(1 << %ntid.x) - 1' as the actual bitmask to use instead of '0xffffffff'.
      This works, but the PTX JIT generates SASS code to do this computation.
      
      In turn, this change now uses PTX 'vote.all.pred' -- which even simplifies upon
      the original code a little bit, see the following examplary SASS 'diff' before
      vs. after this change:
      
          [...]
                    /*[...]*/                   SYNC                                                        (*"BRANCH_TARGETS .L_x_332"*)        }
            .L_x_332:
          -         /*[...]*/                   VOTE.ANY R9, PT, PT ;
          +         /*[...]*/                   VOTE.ALL P1, PT ;
          -         /*[...]*/                   ISETP.NE.U32.AND P1, PT, R9, -0x1, PT ;
          -         /*[...]*/              @!P1 BRA `(.L_x_333) ;
          +         /*[...]*/               @P1 BRA `(.L_x_333) ;
                    /*[...]*/                   BPT.TRAP 0x1 ;
            .L_x_333:
          -         /*[...]*/               @P1 EXIT ;
          +         /*[...]*/              @!P1 EXIT ;
          [...]
      
      	gcc/
      	* config/nvptx/nvptx.md (nvptx_uniform_warp_check): Make fit for
      	non-full-warp execution, via 'vote.all.pred'.
      	gcc/testsuite/
      	* gcc.target/nvptx/nvptx.exp
      	(check_effective_target_default_ptx_isa_version_at_least_6_0):
      	New.
      	* gcc.target/nvptx/uniform-simt-2.c: Adjust.
      	* gcc.target/nvptx/uniform-simt-5.c: New.
      b4e68dd9
    • Thomas Schwinge's avatar
      Clean up after newlib "nvptx: In offloading execution, map '_exit' to 'abort' [GCC PR85463]" · 395ac041
      Thomas Schwinge authored
      	PR target/85463
      	libgfortran/
      	* runtime/minimal.c [__nvptx__] (exit): Don't override.
      	libgomp/
      	* config/nvptx/error.c (exit): Don't override.
      	* testsuite/libgomp.oacc-fortran/error_stop-1.f: Update.
      	* testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise.
      	* testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise.
      	* testsuite/libgomp.oacc-fortran/stop-1.f: Likewise.
      	* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
      	* testsuite/libgomp.oacc-fortran/stop-3.f: Likewise.
      395ac041
    • Pan Li's avatar
      Vect: Support IFN SAT_SUB for unsigned vector int · 2d11de35
      Pan Li authored
      
      This patch would like to support the .SAT_SUB for the unsigned
      vector int.  Given we have below example code:
      
      void
      vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
      {
        for (unsigned i = 0; i < n; i++)
          out[i] = (x[i] - y[i]) & (-(uint64_t)(x[i] >= y[i]));
      }
      
      Before this patch:
      void
      vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
      {
        ...
        _77 = .SELECT_VL (ivtmp_75, POLY_INT_CST [2, 2]);
        ivtmp_56 = _77 * 8;
        vect__4.7_59 = .MASK_LEN_LOAD (vectp_x.5_57, 64B, { -1, ... }, _77, 0);
        vect__6.10_63 = .MASK_LEN_LOAD (vectp_y.8_61, 64B, { -1, ... }, _77, 0);
      
        mask__7.11_64 = vect__4.7_59 >= vect__6.10_63;
        _66 = .COND_SUB (mask__7.11_64, vect__4.7_59, vect__6.10_63, { 0, ... });
      
        .MASK_LEN_STORE (vectp_out.15_71, 64B, { -1, ... }, _77, 0, _66);
        vectp_x.5_58 = vectp_x.5_57 + ivtmp_56;
        vectp_y.8_62 = vectp_y.8_61 + ivtmp_56;
        vectp_out.15_72 = vectp_out.15_71 + ivtmp_56;
        ivtmp_76 = ivtmp_75 - _77;
        ...
      }
      
      After this patch:
      void
      vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
      {
        ...
        _76 = .SELECT_VL (ivtmp_74, POLY_INT_CST [2, 2]);
        ivtmp_60 = _76 * 8;
        vect__4.7_63 = .MASK_LEN_LOAD (vectp_x.5_61, 64B, { -1, ... }, _76, 0);
        vect__6.10_67 = .MASK_LEN_LOAD (vectp_y.8_65, 64B, { -1, ... }, _76, 0);
      
        vect_patt_37.11_68 = .SAT_SUB (vect__4.7_63, vect__6.10_67);
      
        .MASK_LEN_STORE (vectp_out.12_70, 64B, { -1, ... }, _76, 0, vect_patt_37.11_68);
        vectp_x.5_62 = vectp_x.5_61 + ivtmp_60;
        vectp_y.8_66 = vectp_y.8_65 + ivtmp_60;
        vectp_out.12_71 = vectp_out.12_70 + ivtmp_60;
        ivtmp_75 = ivtmp_74 - _76;
        ...
      }
      
      The below test suites are passed for this patch
      * The x86 bootstrap test.
      * The x86 fully regression test.
      * The riscv fully regression tests.
      
      gcc/ChangeLog:
      
      	* match.pd: Add new form for vector mode recog.
      	* tree-vect-patterns.cc (gimple_unsigned_integer_sat_sub): Add
      	new match func decl;
      	(vect_recog_build_binary_gimple_call): Extract helper func to
      	build gcall with given internal_fn.
      	(vect_recog_sat_sub_pattern): Add new func impl to recog .SAT_SUB.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      2d11de35
    • Michal Jires's avatar
      lto: Remove random_seed from section name. · 346f33e2
      Michal Jires authored
      This patch removes suffixes from section names during LTO linking.
      
      These suffixes were originally added for ld -r to work (PR lto/44992).
      They were added to all LTO object files, but are only useful before WPA.
      After that they waste space, and if kept random, make LTO caching impossible.
      
      Bootstrapped/regtested on x86_64-pc-linux-gnu
      
      gcc/ChangeLog:
      
      	* lto-streamer.cc (lto_get_section_name): Remove suffixes after WPA.
      
      gcc/lto/ChangeLog:
      
      	* lto-common.cc (lto_section_with_id): Dont load suffix during LTRANS.
      346f33e2
    • Michal Jires's avatar
      lto: Skip flag OPT_fltrans_output_list_. · ca43678c
      Michal Jires authored
      Bootstrapped/regtested on x86_64-pc-linux-gnu
      
      gcc/ChangeLog:
      
      	* lto-opts.cc (lto_write_options): Skip OPT_fltrans_output_list_.
      ca43678c
    • Robin Dapp's avatar
      RISC-V: Regenerate opt urls. · 037fc4d1
      Robin Dapp authored
      I wasn't aware that I needed to regenerate the opt urls when
      adding an option.  This patch does that.
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv.opt.urls: Regenerate.
      037fc4d1
    • Hongyu Wang's avatar
      [APX CCMP] Support ccmp for float compare · 0b6cea87
      Hongyu Wang authored
      The ccmp insn itself doesn't support fp compare, but x86 has fp comi
      insn that changes EFLAG which can be the scc input to ccmp. Allow
      scalar fp compare in ix86_gen_ccmp_first except ORDERED/UNORDERD
      compare which can not be identified in ccmp.
      
      gcc/ChangeLog:
      
      	* config/i386/i386-expand.cc (ix86_gen_ccmp_first):
      	Add fp compare and check the allowed fp compare type.
      	(ix86_gen_ccmp_next): Adjust compare_code input to ccmp for
      	fp compare.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/apx-ccmp-1.c: Add test for fp compare.
      	* gcc.target/i386/apx-ccmp-2.c: Likewise.
      0b6cea87
    • Hongyu Wang's avatar
      [APX CCMP] Adjust startegy for selecting ccmp candidates · 23db8730
      Hongyu Wang authored
      For general ccmp scenario, the tree sequence is like
      
      _1 = (a < b)
      _2 = (c < d)
      _3 = _1 & _2
      
      current ccmp expanding will try to swap compare order for _1 and _2,
      compare the expansion cost/cost2 for expanding _1 or _2 first, then
      return the sequence with lower cost.
      
      It is possible that one expansion succeeds and the other fails.
      For example, x86 has int ccmp but not fp ccmp, so a combined fp and
      int comparison must be ordered such that the fp comparison happens
      first.  The costs are not meaningful for failed expansions.
      
      Check the expand_ccmp_next result ret and ret2, returns the valid one
      before cost comparison.
      
      gcc/ChangeLog:
      
      	* ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
      	expand_ccmp_next, returns the valid one first instead of
      	comparing cost.
      23db8730
    • Hongyu Wang's avatar
      [APX CCMP] Support APX CCMP · c989e59f
      Hongyu Wang authored
      APX CCMP feature implements conditional compare which executes compare
      when EFLAGS matches certain condition.
      
      CCMP introduces default flags value (dfv), when conditional compare does
      not execute, it will directly set the flags according to dfv.
      
      The instruction goes like
      
      ccmpeq {dfv=sf,of,cf,zf}  %rax, %r16
      
      For this instruction, it will test EFLAGS regs if it matches conditional
      code EQ, if yes, compare %rax and %r16 like legacy cmp. If no, the
      EFLAGS will be updated according to dfv, which means SF,OF,CF,ZF are
      set. PF will be set according to CF in dfv, and AF will always be
      cleared.
      
      The dfv part can be a combination of sf,of,cf,zf, like {dfv=cf,zf} which
      sets CF and ZF only and clear others, or {dfv=} which clears all EFLAGS.
      
      To enable CCMP, we implemented the target hook TARGET_GEN_CCMP_FIRST and
      TARGET_GEN_CCMP_NEXT to reuse the current ccmp infrastructure. Also we
      extended the cstorem4 optab to support storing different CCmode to fit
      current ccmp infrasturcture.
      
      gcc/ChangeLog:
      
      	* config/i386/i386-expand.cc (ix86_gen_ccmp_first): New function
      	that test if the first compare can be generated.
      	(ix86_gen_ccmp_next): New function to emit a simgle compare and ccmp
      	sequence.
      	* config/i386/i386-opts.h (enum apx_features): Add apx_ccmp.
      	* config/i386/i386-protos.h (ix86_gen_ccmp_first): New proto
      	declare.
      	(ix86_gen_ccmp_next): Likewise.
      	(ix86_get_flags_cc): Likewise.
      	* config/i386/i386.cc (ix86_flags_cc): New enum.
      	(ix86_ccmp_dfv_mapping): New string array to map conditional
      	code to dfv.
      	(ix86_print_operand): Handle special dfv flag for CCMP.
      	(ix86_get_flags_cc): New function to return x86 CC enum.
      	(TARGET_GEN_CCMP_FIRST): Define.
      	(TARGET_GEN_CCMP_NEXT): Likewise.
      	* config/i386/i386.h (TARGET_APX_CCMP): Define.
      	* config/i386/i386.md (@ccmp<mode>): New define_insn to support
      	ccmp.
      	(UNSPEC_APX_DFV): New unspec for ccmp dfv.
      	(ALL_CC): New mode iterator.
      	(cstorecc4): Change to ...
      	(cstore<mode>4) ... this, use ALL_CC to loop through all
      	available CCmodes.
      	* config/i386/i386.opt (apx_ccmp): Add enum value for ccmp.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/apx-ccmp-1.c: New compile test.
      	* gcc.target/i386/apx-ccmp-2.c: New runtime test.
      c989e59f
    • Hongyu Wang's avatar
      [APX] Adjust target-support check [PR 115341] · f46d54a2
      Hongyu Wang authored
      Current target apxf check does not specify sub-features that assembler
      supports, so the check with older binutils will fail at assemble stage
      for new apx features like NF,CCMP or CFCMOV. Adjust the assembler check
      for all apx subfeatures.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/115341
      	* lib/target-supports.exp (check_effective_target_apxf):
      	Check for all apx sub-features.
      f46d54a2
    • Richard Biener's avatar
      Allow single-lane SLP in-order reductions · 4653b682
      Richard Biener authored
      The single-lane case isn't different from non-SLP, no re-association
      implied.  But the transform stage cannot handle a conditional reduction
      op which isn't checked during analysis - this makes it work, exercised
      with a single-lane non-reduction-chain by gcc.target/i386/pr112464.c
      
      	* tree-vect-loop.cc (vectorizable_reduction): Allow
      	single-lane SLP in-order reductions.
      	(vectorize_fold_left_reduction): Handle SLP reduction with
      	conditional reduction op.
      4653b682
    • Richard Biener's avatar
      Add double reduction support for SLP vectorization · 2ee41ef7
      Richard Biener authored
      The following makes double reduction vectorization work when
      using (single-lane) SLP vectorization.
      
      	* tree-vect-loop.cc (vect_analyze_scalar_cycles_1): Queue
      	double reductions in LOOP_VINFO_REDUCTIONS.
      	(vect_create_epilog_for_reduction): Remove asserts disabling
      	SLP for double reductions.
      	(vectorizable_reduction): Analyze SLP double reductions
      	only once and start off the correct places.
      	* tree-vect-slp.cc (vect_get_and_check_slp_defs): Allow
      	vect_double_reduction_def.
      	(vect_build_slp_tree_2): Fix condition for the ignored
      	reduction initial values.
      	* tree-vect-stmts.cc (vect_analyze_stmt): Allow
      	vect_double_reduction_def.
      2ee41ef7
    • Richard Biener's avatar
      Allow single-lane COND_REDUCTION vectorization · 202a9c8f
      Richard Biener authored
      The following enables single-lane COND_REDUCTION vectorization.
      
      	* tree-vect-loop.cc (vect_create_epilog_for_reduction):
      	Adjust for single-lane COND_REDUCTION SLP vectorization.
      	(vectorizable_reduction): Likewise.
      	(vect_transform_cycle_phi): Likewise.
      202a9c8f
    • Richard Biener's avatar
      Relax COND_EXPR reduction vectorization SLP restriction · 28edeb14
      Richard Biener authored
      Allow one-lane SLP but for the case where we need to swap the arms.
      
      	* tree-vect-stmts.cc (vectorizable_condition): Allow
      	single-lane SLP, but not when we need to swap then and
      	else clause.
      28edeb14
    • Jakub Jelinek's avatar
      libgomp: Mark Loop transformation constructs as implemented in the implementation status · 6a6bab4b
      Jakub Jelinek authored
      The implementation has been committed in r15-1037.
      
      2024-06-06  Jakub Jelinek  <jakub@redhat.com>
      
      	* libgomp.texi (OpenMP 5.1 status): Mark Loop transformation constructs
      	as implemented.
      6a6bab4b
    • YunQiang Su's avatar
      MIPS: Need COSTS_N_INSNS in mips_insn_cost · edd90d6d
      YunQiang Su authored
      In mips_insn_cost, COSTS_N_INSNS is missing when we return the cost
      if count * ratio > 0.
      
      gcc
      	* config/mips/mips.cc(mips_insn_cost): Add missing COSTS_N_INSNS
      	to count.
      edd90d6d
    • liuhongt's avatar
      Refine testcase for power10. · fcfce55c
      liuhongt authored
      For power10, there're extra 3 REG_EQUIV notes with (fix:SI. to avoid
      the failure. Check (fix:SI is from the pattern not NOTE.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/115365
      	* gcc.dg/pr100927.c: Don't scan fix:SI from the note.
      fcfce55c
    • Alexandre Oliva's avatar
      [libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__ · 67be156f
      Alexandre Oliva authored
      A proprietary embedded operating system that uses clang as its primary
      compiler ships headers that require __clang__ to be defined.  Defining
      that macro causes libstdc++ to adopt workarounds that work for clang
      but that break for GCC.
      
      So, introduce a _GLIBCXX_CLANG macro, and a convention to test for it
      rather than for __clang__, so that a GCC variant that adds -D__clang__
      to satisfy system headers can also -D_GLIBCXX_CLANG=0 to avoid
      workarounds that are not meant for GCC.
      
      I've left fast_float and ryu files alone, their tests for __clang__
      don't seem to be harmful for GCC, they don't include bits/c++config,
      and patching such third-party files would just make trouble for
      updating them without visible benefit.  pstl_config.h, though also
      imported, required adjustment.
      
      
      for  libstdc++-v3/ChangeLog
      
      	* include/bits/c++config (_GLIBCXX_CLANG): Define or undefine.
      	* include/bits/locale_facets_nonio.tcc: Test for it.
      	* include/bits/stl_bvector.h: Likewise.
      	* include/c_compatibility/stdatomic.h: Likewise.
      	* include/experimental/bits/simd.h: Likewise.
      	* include/experimental/bits/simd_builtin.h: Likewise.
      	* include/experimental/bits/simd_detail.h: Likewise.
      	* include/experimental/bits/simd_x86.h: Likewise.
      	* include/experimental/simd: Likewise.
      	* include/std/complex: Likewise.
      	* include/std/ranges: Likewise.
      	* include/std/variant: Likewise.
      	* include/pstl/pstl_config.h: Likewise.
      67be156f
    • liuhongt's avatar
      Adjust rtx_cost for MEM to enable more simplication · 961dd0d6
      liuhongt authored
      For CONST_VECTOR_DUPLICATE_P in constant_pool, it is just broadcast or
      variants in ix86_vector_duplicate_simode_const.
      Adjust the cost to COSTS_N_INSNS (2) + speed which should be a little
      bit larger than broadcast.
      
      gcc/ChangeLog:
      	PR target/114428
      	* config/i386/i386.cc (ix86_rtx_costs): Adjust cost for
      	CONST_VECTOR_DUPLICATE_P in constant_pool.
      	* config/i386/i386-expand.cc (ix86_broadcast_from_constant):
      	Remove static.
      	* config/i386/i386-protos.h (ix86_broadcast_from_constant):
      	Declare.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr114428.c: New test.
      961dd0d6
    • liuhongt's avatar
      Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode. · 7876cde2
      liuhongt authored
      When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
      of A, then it can be simplified to LSHIFTRT.
      
      i.e Simplify
      (and:v8hi
        (ashifrt:v8hi A 8)
        (const_vector 0xff x8))
      to
      (lshifrt:v8hi A 8)
      
      gcc/ChangeLog:
      
      	PR target/114428
      	* simplify-rtx.cc
      	(simplify_context::simplify_binary_operation_1):
      	Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
      	specific mask.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr114428-1.c: New test.
      7876cde2
    • GCC Administrator's avatar
      Daily bump. · 10cb3336
      GCC Administrator authored
      10cb3336
  2. Jun 05, 2024
    • Jonathan Wakely's avatar
      contrib: Fix spelling and capitalization in header-tools · 66fa2f10
      Jonathan Wakely authored
      contrib/header-tools/ChangeLog:
      
      	* README: Fix spelling and capitalization typos.
      	* gcc-order-headers: Fix spelling typo.
      66fa2f10
    • Sundeep KOKKONDA's avatar
      contrib: header-tools scripts updated to python3 · ac6fb0ff
      Sundeep KOKKONDA authored
      
      The scripts in contrib/header-tools/ are incompatible with python3.
      This updates them to use python3.
      
      contrib/header-tools/ChangeLog:
      
      	* count-headers: Adapt to Python 3.
      	* gcc-order-headers: Likewise.
      	* graph-header-logs: Likewise.
      	* graph-include-web: Likewise.
      	* headerutils.py: Likewise.
      	* included-by: Likewise.
      	* reduce-headers: Likewise.
      	* replace-header: Likewise.
      	* show-headers: Likewise.
      
      Signed-off-by: default avatarSundeep KOKKONDA <sundeep.kokkonda@windriver.com>
      ac6fb0ff
    • Robin Dapp's avatar
      check_GNU_style: Use raw strings. · 03e1a727
      Robin Dapp authored
      This silences some warnings when using check_GNU_style.
      
      contrib/ChangeLog:
      
      	* check_GNU_style_lib.py: Use raw strings for regexps.
      03e1a727
    • Robin Dapp's avatar
      RISC-V: Introduce -mvector-strict-align. · 68b0742a
      Robin Dapp authored
      this patch disables movmisalign by default and introduces
      the -mno-vector-strict-align option to override it and re-enable
      movmisalign.  For now, generic-ooo is the only uarch that supports
      misaligned vector access.
      
      The patch also adds a check_effective_target_riscv_v_misalign_ok to
      the testsuite which enables or disables the vector misalignment tests
      depending on whether the target under test can execute a misaligned
      vle32.
      
      Changes from v3:
       - Adressed Kito's comments.
       - Made -mscalar-strict-align a real alias.
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
      	Move from here...
      	* config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
      	...to here and map to riscv_vector_unaligned_access_p.
      	* config/riscv/riscv.opt: Add -mvector-strict-align.
      	* config/riscv/riscv.cc (struct riscv_tune_param): Add
      	vector_unaligned_access.
      	(riscv_override_options_internal): Set
      	riscv_vector_unaligned_access_p.
      	* doc/invoke.texi: Document -mvector-strict-align.
      
      gcc/testsuite/ChangeLog:
      
      	* lib/target-supports.exp: Add
      	check_effective_target_riscv_v_misalign_ok.
      	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add
      	-mno-vector-strict-align.
      	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Ditto.
      	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
      	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.
      	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: Ditto.
      	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: Ditto.
      	* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Ditto.
      68b0742a
    • Tamar Christina's avatar
      AArch64: enable new predicate tuning for Neoverse cores. · 3eb9f6ea
      Tamar Christina authored
      This enables the new tuning flag for Neoverse V1, Neoverse V2 and Neoverse N2.
      It is kept off for generic codegen.
      
      Note the reason for the +sve even though they are in aarch64-sve.exp is if the
      testsuite is ran with a forced SVE off option, e.g. -march=armv8-a+nosve then
      the intrinsics end up being disabled because the -march is preferred over the
      -mcpu even though the -mcpu comes later.
      
      This prevents the tests from failing in such runs.
      
      gcc/ChangeLog:
      
      	* config/aarch64/tuning_models/neoversen2.h (neoversen2_tunings): Add
      	AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
      	* config/aarch64/tuning_models/neoversev1.h (neoversev1_tunings): Add
      	AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
      	* config/aarch64/tuning_models/neoversev2.h (neoversev2_tunings): Add
      	AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/aarch64/sve/pred_clobber_1.c: New test.
      	* gcc.target/aarch64/sve/pred_clobber_2.c: New test.
      	* gcc.target/aarch64/sve/pred_clobber_3.c: New test.
      	* gcc.target/aarch64/sve/pred_clobber_4.c: New test.
      3eb9f6ea
    • Tamar Christina's avatar
      AArch64: add new alternative with early clobber to patterns · 2de3bbde
      Tamar Christina authored
      This patch adds new alternatives to the patterns which are affected.  The new
      alternatives with the conditional early clobbers are added before the normal
      ones in order for LRA to prefer them in the event that we have enough free
      registers to accommodate them.
      
      In case register pressure is too high the normal alternatives will be preferred
      before a reload is considered as we rather have the tie than a spill.
      
      Tests are in the next patch.
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64-sve.md (and<mode>3,
      	@aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc,
      	*<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z,
      	*<nlogical><mode>3_cc, *<nlogical><mode>3_ptest,
      	aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc,
      	*<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>,
      	*cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest,
      	@aarch64_pred_cmp<cmp_op><mode>_wide,
      	*aarch64_pred_cmp<cmp_op><mode>_wide_cc,
      	*aarch64_pred_cmp<cmp_op><mode>_wide_ptest, @aarch64_brk<brk_op>,
      	*aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest,
      	@aarch64_brk<brk_op>, *aarch64_brk<brk_op>_cc,
      	*aarch64_brk<brk_op>_ptest, aarch64_rdffr_z, *aarch64_rdffr_z_ptest,
      	*aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add
      	new early clobber
      	alternative.
      	* config/aarch64/aarch64-sve2.md
      	(@aarch64_pred_<sve_int_op><mode>): Likewise.
      2de3bbde
    • Tamar Christina's avatar
      AArch64: add new tuning param and attribute for enabling conditional early clobber · 35f17c68
      Tamar Christina authored
      This adds a new tuning parameter AARCH64_EXTRA_TUNE_AVOID_PRED_RMW for AArch64 to
      allow us to conditionally enable the early clobber alternatives based on the
      tuning models.
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64-tuning-flags.def
      	(AVOID_PRED_RMW): New.
      	* config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New.
      	* config/aarch64/aarch64.md (pred_clobber): New.
      	(arch_enabled): Use it.
      35f17c68
    • Tamar Christina's avatar
      AArch64: convert several predicate patterns to new compact syntax · fd489889
      Tamar Christina authored
      This converts the single alternative patterns to the new compact syntax such
      that when I add the new alternatives it's clearer what's being changed.
      
      Note that this will spew out a bunch of warnings from geninsn as it'll warn that
      @ is useless for a single alternative pattern.  These are not fatal so won't
      break the build and are only temporary.
      
      No change in functionality is expected with this patch.
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64-sve.md (and<mode>3,
      	@aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc,
      	*<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z,
      	*<nlogical><mode>3_cc, *<nlogical><mode>3_ptest,
      	aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc,
      	*<logical_nn><mode>3_ptest, *cmp<cmp_op><mode>_ptest,
      	@aarch64_pred_cmp<cmp_op><mode>_wide,
      	*aarch64_pred_cmp<cmp_op><mode>_wide_cc,
      	*aarch64_pred_cmp<cmp_op><mode>_wide_ptest, *aarch64_brk<brk_op>_cc,
      	*aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>,
      	*aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, aarch64_rdffr_z,
      	*aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc,
      	*aarch64_rdffr_cc): Convert to compact syntax.
      	* config/aarch64/aarch64-sve2.md
      	(@aarch64_pred_<sve_int_op><mode>): Likewise.
      fd489889
    • Jakub Jelinek's avatar
      openmp: OpenMP loop transformation support · 804c0f35
      Jakub Jelinek authored
      This patch is largely rewritten version of the
      https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631764.html
      patch set which I've promissed to adjust the way I'd like it but didn't
      get to it until now.
      The previous series together in diffstat was
       176 files changed, 12107 insertions(+), 298 deletions(-)
      This patch is
       197 files changed, 10843 insertions(+), 212 deletions(-)
      and diff between the old series and new patch is
       268 files changed, 8053 insertions(+), 9231 deletions(-)
      
      Only the 5.1/5.2 tile/unroll constructs are supported, in various
      places some preparations for the other 6.0 loop transformations
      constructs (interchange/reverse/fuse) are done, but certainly
      not complete and not everywhere.  The important difference is that
      because tile/unroll partial map 1:1 the original loops to generated
      canonical loops and add another set of generated loops without canonical
      form inside of it, the tile/unroll partial constructs are terminal
      for the generated loop, one can't have some loops from the tile or
      unroll partial and some further loops from inside the body of that
      construct.
      The GENERIC representation attempts to match what the standard specifies,
      so there are separate OMP_TILE and OMP_UNROLL trees.  If for a particular
      loop in a loop nest of some OpenMP loop it awaits a generated loop from a
      nested loop, or if in OMP_LOOPXFORM_LOWERED OMP_TILE/UNROLL construct
      a generated loop has been moved to some surrounding construct, that
      particular loop is represented by all NULL_TREEs in the
      OMP_FOR_{INIT,COND,INCR,ORIG_DECLS} vector.
      The lowering of the loop transforming constructs is done at gimplification
      time, at the start of gimplify_omp_for.
      I think this way it is more maintainable over magic clauses with various
      loop depths on the other looping constructs or the magic OMP_LOOP_TRANS
      construct.
      Though, I admit I'm still undecided how to represent the OpenMP 6.0
      loop transformation case of say:
        #pragma omp for collapse (4)
        for (int i = 0; i < 32; ++i)
        #pragma omp interchange permutation (2, 1)
        #pragma omp reverse
        for (int j = 0; j < 32; ++j)
        #pragma omp reverse
        for (int k = 0; k < 32; ++k)
        for (int l = 0; l < 32; ++l)
          ;
      Surely the i loop would go to first vector elements of OMP_FOR_*
      of the work-sharing loop, then 2 loops are expecting generated loops
      from interchange which would be inside of the body.  But the innermost
      l loop isn't part of the interchange, so the question is where to
      put it.  One possibility is to have it in the 4th loop of the OMP_FOR,
      another possibility would be to add some artificial construct inside
      of the OMP_INTERCHANGE and 2 OMP_REVERSE bodies which would contain
      the inner loop(s), e.g. it could be OMP_INTERCHANGE without permutation
      clause or some artificial ones or whatever.
      
      I've recently raised various unclear things in the 5.1/5.2/TRs versions
      regarding loop transformations, in particular
      https://github.com/OpenMP/spec/issues/3908
      https://github.com/OpenMP/spec/issues/3909
      (sorry, private links unless you have OpenMP membership).  Until those
      are resolved, I have a sorry on trying to mix generated loops with
      non-rectangular loops (way too many questions need to be answered before
      that can be done) and similarly for mixing non-perfectly nested loops
      with generated loops (again, it can be implemented somehow, but is way
      too unclear).  The second issue is mostly about data sharing, which is
      ambiguous, the patch makes the artificial iterators of the loops effectively
      private in the associated constructs (more like local), but for user
      iterators doesn't do anything in particular, so for now one needs to use
      explicit data sharing clauses on the non-loop transformation OpenMP looping
      constructs or surrounding parallel/task/target etc.
      
      2024-06-05  Jakub Jelinek  <jakub@redhat.com>
      	    Frederik Harwath  <frederik@codesourcery.com>
      	    Sandra Loosemore  <sandra@codesourcery.com>
      
      gcc/
      	* tree.def (OMP_TILE, OMP_UNROLL): New tree codes.
      	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_PARTIAL,
      	OMP_CLAUSE_FULL and OMP_CLAUSE_SIZES.
      	* tree.h (OMP_LOOPXFORM_CHECK): Define.
      	(OMP_LOOPXFORM_LOWERED): Define.
      	(OMP_CLAUSE_PARTIAL_EXPR): Define.
      	(OMP_CLAUSE_SIZES_LIST): Define.
      	* tree.cc (omp_clause_num_ops, omp_clause_code_name): Add entries
      	for OMP_CLAUSE_{PARTIAL,FULL,SIZES}.
      	* tree-pretty-print.cc (dump_omp_clause): Handle
      	OMP_CLAUSE_{PARTIAL,FULL,SIZES}.
      	(dump_generic_node): Handle OMP_TILE and OMP_UNROLL.  Skip printing
      	loops with NULL OMP_FOR_INIT (node) vector element.
      	* gimplify.cc (is_gimple_stmt): Handle OMP_TILE and OMP_UNROLL.
      	(gimplify_omp_taskloop_expr): For SAVE_EXPR use gimplify_save_expr.
      	(gimplify_omp_loop_xform): New function.
      	(gimplify_omp_for): Call omp_maybe_apply_loop_xforms and if that
      	reshuffles what the passed pointer points to, retry or return GS_OK.
      	Handle OMP_TILE and OMP_UNROLL.
      	(gimplify_omp_loop): Call omp_maybe_apply_loop_xforms and if that
      	reshuffles what the passed pointer points to, return GS_OK.
      	(gimplify_expr): Handle OMP_TILE and OMP_UNROLL.
      	* omp-general.h (omp_loop_number_of_iterations,
      	omp_maybe_apply_loop_xforms): Declare.
      	* omp-general.cc (omp_adjust_for_condition): For LE_EXPR and GE_EXPR
      	with pointers, don't add/subtract one, but the size of what the
      	pointer points to.
      	(omp_loop_number_of_iterations, omp_apply_tile,
      	find_nested_loop_xform, omp_maybe_apply_loop_xforms): New functions.
      gcc/c-family/
      	* c-common.h (c_omp_find_generated_loop): Declare.
      	* c-gimplify.cc (c_genericize_control_stmt): Handle OMP_TILE and
      	OMP_UNROLL.
      	* c-omp.cc (c_finish_omp_for): Handle generated loops.
      	(c_omp_is_loop_iterator): Likewise.
      	(c_find_nested_loop_xform_r, c_omp_find_generated_loop): New
      	functions.
      	(c_omp_check_loop_iv): Handle generated loops.  For now sorry
      	on mixing non-rectangular loop with generated loops.
      	(c_omp_check_loop_binding_exprs): For now sorry on mixing
      	imperfect loops with generated loops.
      	(c_omp_directives): Uncomment tile and unroll entries.
      	* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_TILE and
      	PRAGMA_OMP_UNROLL, change PRAGMA_OMP__LAST_ to the latter.
      	(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_FULL and
      	PRAGMA_OMP_CLAUSE_PARTIAL.
      	* c-pragma.cc (omp_pragmas_simd): Add tile and unroll omp pragmas.
      gcc/c/
      	* c-parser.cc (c_parser_skip_std_attribute_spec_seq): New function.
      	(check_omp_intervening_code): Reject imperfectly nested tile.
      	(c_parser_compound_statement_nostart): If want_nested_loop, use
      	c_parser_omp_next_tokens_can_be_canon_loop instead of just checking
      	for RID_FOR keyword.
      	(c_parser_omp_clause_name): Handle full and partial clause names.
      	(c_parser_omp_clause_allocate): Remove spurious semicolon.
      	(c_parser_omp_clause_full, c_parser_omp_clause_partial): New
      	functions.
      	(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_FULL and
      	PRAGMA_OMP_CLAUSE_PARTIAL.
      	(c_parser_omp_next_tokens_can_be_canon_loop): New function.
      	(c_parser_omp_loop_nest): Parse C23 attributes.  Handle tile/unroll
      	constructs.  Use c_parser_omp_next_tokens_can_be_canon_loop instead
      	of just checking for RID_FOR keyword.  Only add_stmt (body) if it is
      	non-NULL.
      	(c_parser_omp_for_loop): Rename tiling variable to oacc_tiling.  For
      	OMP_CLAUSE_SIZES set collapse to list length of OMP_CLAUSE_SIZES_LIST.
      	Use c_parser_omp_next_tokens_can_be_canon_loop instead of just
      	checking for RID_FOR keyword.  Remove spurious semicolon.  Don't call
      	c_omp_check_loop_binding_exprs if stmt is NULL.  Skip generated loops.
      	(c_parser_omp_tile_sizes, c_parser_omp_tile): New functions.
      	(OMP_UNROLL_CLAUSE_MASK): Define.
      	(c_parser_omp_unroll): New function.
      	(c_parser_omp_construct): Handle PRAGMA_OMP_TILE and
      	PRAGMA_OMP_UNROLL.
      	* c-typeck.cc (c_finish_omp_clauses): Adjust wording of some of the
      	conflicting clause diagnostic messages to include word clause.
      	Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES} and diagnose full vs. partial
      	conflict.
      gcc/cp/
      	* cp-tree.h (dependent_omp_for_p): Add another tree argument.
      	* parser.cc (check_omp_intervening_code): Reject imperfectly nested
      	tile.
      	(cp_parser_statement_seq_opt): If want_nested_loop, use
      	cp_parser_next_tokens_can_be_canon_loop instead of just checking
      	for RID_FOR keyword.
      	(cp_parser_omp_clause_name): Handle full and partial clause names.
      	(cp_parser_omp_clause_full, cp_parser_omp_clause_partial): New
      	functions.
      	(cp_parser_omp_all_clauses): Formatting fix.  Handle
      	PRAGMA_OMP_CLAUSE_PARTIAL and PRAGMA_OMP_CLAUSE_FULL.
      	(cp_parser_next_tokens_can_be_canon_loop): New function.
      	(cp_parser_omp_loop_nest): Parse C++11 attributes.  Handle tile/unroll
      	constructs.  Use cp_parser_next_tokens_can_be_canon_loop instead
      	of just checking for RID_FOR keyword.  Only add_stmt
      	cp_parser_omp_loop_nest result if it is non-NULL.
      	(cp_parser_omp_for_loop): Rename tiling variable to oacc_tiling.  For
      	OMP_CLAUSE_SIZES set collapse to list length of OMP_CLAUSE_SIZES_LIST.
      	Use cp_parser_next_tokens_can_be_canon_loop instead of just
      	checking for RID_FOR keyword.  Remove spurious semicolon.  Don't call
      	c_omp_check_loop_binding_exprs if stmt is NULL.  Skip and/or handle
      	generated loops.  Remove spurious ()s around & operands.
      	(cp_parser_omp_tile_sizes, cp_parser_omp_tile): New functions.
      	(OMP_UNROLL_CLAUSE_MASK): Define.
      	(cp_parser_omp_unroll): New function.
      	(cp_parser_omp_construct): Handle PRAGMA_OMP_TILE and
      	PRAGMA_OMP_UNROLL.
      	(cp_parser_pragma): Likewise.
      	* semantics.cc (finish_omp_clauses): Don't call
      	fold_build_cleanup_point_expr for cases which obviously won't need it,
      	like checked INTEGER_CSTs.  Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES}
      	and diagnose full vs. partial conflict.  Adjust wording of some of the
      	conflicting clause diagnostic messages to include word clause.
      	(finish_omp_for): Use decl equal to global_namespace as a marker for
      	generated loop.  Pass also body to dependent_omp_for_p.  Skip
      	generated loops.
      	(finish_omp_for_block): Skip generated loops.
      	* pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES}.
      	(tsubst_stmt): Handle OMP_TILE and OMP_UNROLL.  Handle or skip
      	generated loops.
      	(dependent_omp_for_p): Add body argument.  If declv vector element
      	is NULL, find generated loop.
      	* cp-gimplify.cc (cp_gimplify_expr): Handle OMP_TILE and OMP_UNROLL.
      	(cp_fold_r): Likewise.
      	(cp_genericize_r): Likewise.  Skip generated loops.
      gcc/fortran/
      	* gfortran.h (enum gfc_statement): Add ST_OMP_UNROLL,
      	ST_OMP_END_UNROLL, ST_OMP_TILE and ST_OMP_END_TILE.
      	(struct gfc_omp_clauses): Add sizes_list, partial, full and erroneous
      	members.
      	(enum gfc_exec_op): Add EXEC_OMP_UNROLL and EXEC_OMP_TILE.
      	(gfc_expr_list_len): Declare.
      	* match.h (gfc_match_omp_tile, gfc_match_omp_unroll): Declare.
      	* openmp.cc (gfc_get_location): Declare.
      	(gfc_free_omp_clauses): Free sizes_list.
      	(match_oacc_expr_list): Rename to ...
      	(match_omp_oacc_expr_list): ... this.  Add is_omp argument and
      	change diagnostic wording if it is true.
      	(enum omp_mask2): Add OMP_CLAUSE_{FULL,PARTIAL,SIZES}.
      	(gfc_match_omp_clauses): Parse full, partial and sizes clauses.
      	(gfc_match_oacc_wait): Use match_omp_oacc_expr_list instead of
      	match_oacc_expr_list.
      	(OMP_UNROLL_CLAUSES, OMP_TILE_CLAUSES): Define.
      	(gfc_match_omp_tile, gfc_match_omp_unroll): New functions.
      	(resolve_omp_clauses): Diagnose full vs. partial clause conflict.
      	Resolve sizes clause arguments.
      	(find_nested_loop_in_chain): Use switch instead of series of ifs.
      	Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
      	(gfc_resolve_omp_do_blocks): Set omp_current_do_collapse to
      	list length of sizes_list if present.
      	(gfc_resolve_do_iterator): Return for EXEC_OMP_TILE or
      	EXEC_OMP_UNROLL.
      	(restructure_intervening_code): Remove spurious ()s around & operands.
      	(is_outer_iteration_variable): Handle EXEC_OMP_TILE and
      	EXEC_OMP_UNROLL.
      	(check_nested_loop_in_chain): Likewise.
      	(expr_is_invariant): Likewise.
      	(resolve_omp_do): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.  Diagnose
      	tile without sizes clause.  Use sizes_list length for count if
      	non-NULL.  Set code->ext.omp_clauses->erroneous on loops where we've
      	reported diagnostics.  Sorry for mixing non-rectangular loops with
      	generated loops.
      	(omp_code_to_statement): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
      	(gfc_resolve_omp_directive): Likewise.
      	* parse.cc (decode_omp_directive): Parse end tile, end unroll, tile
      	and unroll.  Move nothing entry alphabetically.
      	(case_exec_markers): Add ST_OMP_TILE and ST_OMP_UNROLL.
      	(gfc_ascii_statement): Handle ST_OMP_END_TILE, ST_OMP_END_UNROLL,
      	ST_OMP_TILE and ST_OMP_UNROLL.
      	(parse_omp_do): Add nested argument.  Handle ST_OMP_TILE and
      	ST_OMP_UNROLL.
      	(parse_omp_structured_block): Adjust parse_omp_do caller.
      	(parse_executable): Likewise.  Handle ST_OMP_TILE and ST_OMP_UNROLL.
      	* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE and
      	EXEC_OMP_UNROLL.
      	(gfc_resolve_code): Likewise.
      	* st.cc (gfc_free_statement): Likewise.
      	* trans.cc (trans_code): Likewise.
      	* trans-openmp.cc (gfc_trans_omp_clauses): Handle full, partial and
      	sizes clauses.  Use tree_cons + nreverse instead of
      	temporary vector and build_tree_list_vec for tile_list handling.
      	(gfc_expr_list_len): New function.
      	(gfc_trans_omp_do): Rename tile to oacc_tile.  Handle sizes clause.
      	Don't assert code->op is EXEC_DO.  Handle EXEC_OMP_TILE and
      	EXEC_OMP_UNROLL.
      	(gfc_trans_omp_directive): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
      	* dump-parse-tree.cc (show_omp_clauses): Dump full, partial and
      	sizes clauses.
      	(show_omp_node): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
      	(show_code_node): Likewise.
      gcc/testsuite/
      	* c-c++-common/gomp/attrs-tile-1.c: New test.
      	* c-c++-common/gomp/attrs-tile-2.c: New test.
      	* c-c++-common/gomp/attrs-tile-3.c: New test.
      	* c-c++-common/gomp/attrs-tile-4.c: New test.
      	* c-c++-common/gomp/attrs-tile-5.c: New test.
      	* c-c++-common/gomp/attrs-tile-6.c: New test.
      	* c-c++-common/gomp/attrs-unroll-1.c: New test.
      	* c-c++-common/gomp/attrs-unroll-2.c: New test.
      	* c-c++-common/gomp/attrs-unroll-3.c: New test.
      	* c-c++-common/gomp/attrs-unroll-inner-1.c: New test.
      	* c-c++-common/gomp/attrs-unroll-inner-2.c: New test.
      	* c-c++-common/gomp/attrs-unroll-inner-3.c: New test.
      	* c-c++-common/gomp/attrs-unroll-inner-4.c: New test.
      	* c-c++-common/gomp/attrs-unroll-inner-5.c: New test.
      	* c-c++-common/gomp/imperfect-attributes.c: Adjust expected
      	diagnostics.
      	* c-c++-common/gomp/imperfect-loop-nest.c: New test.
      	* c-c++-common/gomp/ordered-5.c: New test.
      	* c-c++-common/gomp/scan-7.c: New test.
      	* c-c++-common/gomp/tile-1.c: New test.
      	* c-c++-common/gomp/tile-2.c: New test.
      	* c-c++-common/gomp/tile-3.c: New test.
      	* c-c++-common/gomp/tile-4.c: New test.
      	* c-c++-common/gomp/tile-5.c: New test.
      	* c-c++-common/gomp/tile-6.c: New test.
      	* c-c++-common/gomp/tile-7.c: New test.
      	* c-c++-common/gomp/tile-8.c: New test.
      	* c-c++-common/gomp/tile-9.c: New test.
      	* c-c++-common/gomp/tile-10.c: New test.
      	* c-c++-common/gomp/tile-11.c: New test.
      	* c-c++-common/gomp/tile-12.c: New test.
      	* c-c++-common/gomp/tile-13.c: New test.
      	* c-c++-common/gomp/tile-14.c: New test.
      	* c-c++-common/gomp/tile-15.c: New test.
      	* c-c++-common/gomp/unroll-1.c: New test.
      	* c-c++-common/gomp/unroll-2.c: New test.
      	* c-c++-common/gomp/unroll-3.c: New test.
      	* c-c++-common/gomp/unroll-4.c: New test.
      	* c-c++-common/gomp/unroll-5.c: New test.
      	* c-c++-common/gomp/unroll-6.c: New test.
      	* c-c++-common/gomp/unroll-7.c: New test.
      	* c-c++-common/gomp/unroll-8.c: New test.
      	* c-c++-common/gomp/unroll-9.c: New test.
      	* c-c++-common/gomp/unroll-inner-1.c: New test.
      	* c-c++-common/gomp/unroll-inner-2.c: New test.
      	* c-c++-common/gomp/unroll-inner-3.c: New test.
      	* c-c++-common/gomp/unroll-non-rect-1.c: New test.
      	* c-c++-common/gomp/unroll-non-rect-2.c: New test.
      	* c-c++-common/gomp/unroll-non-rect-3.c: New test.
      	* c-c++-common/gomp/unroll-simd-1.c: New test.
      	* gcc.dg/gomp/attrs-4.c: Adjust expected diagnostics.
      	* gcc.dg/gomp/for-1.c: Likewise.
      	* gcc.dg/gomp/for-11.c: Likewise.
      	* g++.dg/gomp/attrs-4.C: Likewise.
      	* g++.dg/gomp/for-1.C: Likewise.
      	* g++.dg/gomp/pr94512.C: Likewise.
      	* g++.dg/gomp/tile-1.C: New test.
      	* g++.dg/gomp/tile-2.C: New test.
      	* g++.dg/gomp/unroll-1.C: New test.
      	* g++.dg/gomp/unroll-2.C: New test.
      	* g++.dg/gomp/unroll-3.C: New test.
      	* gfortran.dg/gomp/inner-loops-1.f90: New test.
      	* gfortran.dg/gomp/inner-loops-2.f90: New test.
      	* gfortran.dg/gomp/pure-1.f90: Add tests for !$omp unroll
      	and !$omp tile.
      	* gfortran.dg/gomp/pure-2.f90: Remove those tests from here.
      	* gfortran.dg/gomp/scan-9.f90: New test.
      	* gfortran.dg/gomp/tile-1.f90: New test.
      	* gfortran.dg/gomp/tile-2.f90: New test.
      	* gfortran.dg/gomp/tile-3.f90: New test.
      	* gfortran.dg/gomp/tile-4.f90: New test.
      	* gfortran.dg/gomp/tile-5.f90: New test.
      	* gfortran.dg/gomp/tile-6.f90: New test.
      	* gfortran.dg/gomp/tile-7.f90: New test.
      	* gfortran.dg/gomp/tile-8.f90: New test.
      	* gfortran.dg/gomp/tile-9.f90: New test.
      	* gfortran.dg/gomp/tile-10.f90: New test.
      	* gfortran.dg/gomp/tile-imperfect-nest-1.f90: New test.
      	* gfortran.dg/gomp/tile-imperfect-nest-2.f90: New test.
      	* gfortran.dg/gomp/tile-inner-loops-1.f90: New test.
      	* gfortran.dg/gomp/tile-inner-loops-2.f90: New test.
      	* gfortran.dg/gomp/tile-inner-loops-3.f90: New test.
      	* gfortran.dg/gomp/tile-inner-loops-4.f90: New test.
      	* gfortran.dg/gomp/tile-inner-loops-5.f90: New test.
      	* gfortran.dg/gomp/tile-inner-loops-6.f90: New test.
      	* gfortran.dg/gomp/tile-inner-loops-7.f90: New test.
      	* gfortran.dg/gomp/tile-inner-loops-8.f90: New test.
      	* gfortran.dg/gomp/tile-non-rectangular-1.f90: New test.
      	* gfortran.dg/gomp/tile-non-rectangular-2.f90: New test.
      	* gfortran.dg/gomp/tile-non-rectangular-3.f90: New test.
      	* gfortran.dg/gomp/tile-unroll-1.f90: New test.
      	* gfortran.dg/gomp/tile-unroll-2.f90: New test.
      	* gfortran.dg/gomp/unroll-1.f90: New test.
      	* gfortran.dg/gomp/unroll-2.f90: New test.
      	* gfortran.dg/gomp/unroll-3.f90: New test.
      	* gfortran.dg/gomp/unroll-4.f90: New test.
      	* gfortran.dg/gomp/unroll-5.f90: New test.
      	* gfortran.dg/gomp/unroll-6.f90: New test.
      	* gfortran.dg/gomp/unroll-7.f90: New test.
      	* gfortran.dg/gomp/unroll-8.f90: New test.
      	* gfortran.dg/gomp/unroll-9.f90: New test.
      	* gfortran.dg/gomp/unroll-10.f90: New test.
      	* gfortran.dg/gomp/unroll-11.f90: New test.
      	* gfortran.dg/gomp/unroll-12.f90: New test.
      	* gfortran.dg/gomp/unroll-13.f90: New test.
      	* gfortran.dg/gomp/unroll-inner-loop-1.f90: New test.
      	* gfortran.dg/gomp/unroll-inner-loop-2.f90: New test.
      	* gfortran.dg/gomp/unroll-no-clause-1.f90: New test.
      	* gfortran.dg/gomp/unroll-non-rect-1.f90: New test.
      	* gfortran.dg/gomp/unroll-non-rect-2.f90: New test.
      	* gfortran.dg/gomp/unroll-simd-1.f90: New test.
      	* gfortran.dg/gomp/unroll-simd-2.f90: New test.
      	* gfortran.dg/gomp/unroll-simd-3.f90: New test.
      	* gfortran.dg/gomp/unroll-tile-1.f90: New test.
      	* gfortran.dg/gomp/unroll-tile-2.f90: New test.
      	* gfortran.dg/gomp/unroll-tile-inner-1.f90: New test.
      libgomp/
      	* testsuite/libgomp.c-c++-common/imperfect-transform-1.c: New test.
      	* testsuite/libgomp.c-c++-common/imperfect-transform-2.c: New test.
      	* testsuite/libgomp.c-c++-common/matrix-1.h: New test.
      	* testsuite/libgomp.c-c++-common/matrix-constant-iter.h: New test.
      	* testsuite/libgomp.c-c++-common/matrix-helper.h: New test.
      	* testsuite/libgomp.c-c++-common/matrix-no-directive-1.c: New test.
      	* testsuite/libgomp.c-c++-common/matrix-no-directive-unroll-full-1.c:
      	New test.
      	* testsuite/libgomp.c-c++-common/matrix-omp-distribute-parallel-for-1.c:
      	New test.
      	* testsuite/libgomp.c-c++-common/matrix-omp-for-1.c: New test.
      	* testsuite/libgomp.c-c++-common/matrix-omp-parallel-for-1.c: New test.
      	* testsuite/libgomp.c-c++-common/matrix-omp-parallel-masked-taskloop-1.c:
      	New test.
      	* testsuite/libgomp.c-c++-common/matrix-omp-parallel-masked-taskloop-simd-1.c:
      	New test.
      	* testsuite/libgomp.c-c++-common/matrix-omp-target-parallel-for-1.c:
      	New test.
      	* testsuite/libgomp.c-c++-common/matrix-omp-target-teams-distribute-parallel-for-1.c:
      	New test.
      	* testsuite/libgomp.c-c++-common/matrix-omp-taskloop-1.c: New test.
      	* testsuite/libgomp.c-c++-common/matrix-omp-teams-distribute-parallel-for-1.c:
      	New test.
      	* testsuite/libgomp.c-c++-common/matrix-simd-1.c: New test.
      	* testsuite/libgomp.c-c++-common/matrix-transform-variants-1.h:
      	New test.
      	* testsuite/libgomp.c-c++-common/target-imperfect-transform-1.c:
      	New test.
      	* testsuite/libgomp.c-c++-common/target-imperfect-transform-2.c:
      	New test.
      	* testsuite/libgomp.c-c++-common/unroll-1.c: New test.
      	* testsuite/libgomp.c-c++-common/unroll-non-rect-1.c: New test.
      	* testsuite/libgomp.c++/matrix-no-directive-unroll-full-1.C: New test.
      	* testsuite/libgomp.c++/tile-2.C: New test.
      	* testsuite/libgomp.c++/tile-3.C: New test.
      	* testsuite/libgomp.c++/unroll-1.C: New test.
      	* testsuite/libgomp.c++/unroll-2.C: New test.
      	* testsuite/libgomp.c++/unroll-full-tile.C: New test.
      	* testsuite/libgomp.fortran/imperfect-transform-1.f90: New test.
      	* testsuite/libgomp.fortran/imperfect-transform-2.f90: New test.
      	* testsuite/libgomp.fortran/inner-1.f90: New test.
      	* testsuite/libgomp.fortran/nested-fn.f90: New test.
      	* testsuite/libgomp.fortran/target-imperfect-transform-1.f90: New test.
      	* testsuite/libgomp.fortran/target-imperfect-transform-2.f90: New test.
      	* testsuite/libgomp.fortran/tile-1.f90: New test.
      	* testsuite/libgomp.fortran/tile-2.f90: New test.
      	* testsuite/libgomp.fortran/tile-unroll-1.f90: New test.
      	* testsuite/libgomp.fortran/tile-unroll-2.f90: New test.
      	* testsuite/libgomp.fortran/tile-unroll-3.f90: New test.
      	* testsuite/libgomp.fortran/tile-unroll-4.f90: New test.
      	* testsuite/libgomp.fortran/unroll-1.f90: New test.
      	* testsuite/libgomp.fortran/unroll-2.f90: New test.
      	* testsuite/libgomp.fortran/unroll-3.f90: New test.
      	* testsuite/libgomp.fortran/unroll-4.f90: New test.
      	* testsuite/libgomp.fortran/unroll-5.f90: New test.
      	* testsuite/libgomp.fortran/unroll-6.f90: New test.
      	* testsuite/libgomp.fortran/unroll-7a.f90: New test.
      	* testsuite/libgomp.fortran/unroll-7b.f90: New test.
      	* testsuite/libgomp.fortran/unroll-7c.f90: New test.
      	* testsuite/libgomp.fortran/unroll-7.f90: New test.
      	* testsuite/libgomp.fortran/unroll-8.f90: New test.
      	* testsuite/libgomp.fortran/unroll-simd-1.f90: New test.
      	* testsuite/libgomp.fortran/unroll-tile-1.f90: New test.
      	* testsuite/libgomp.fortran/unroll-tile-2.f90: New test.
      804c0f35
    • Wilco Dijkstra's avatar
      AArch64: Fix cpu features initialization [PR115342] · d7cbcfe7
      Wilco Dijkstra authored
      The CPU features initialization code uses CPUID registers (rather than
      HWCAP).  The equality comparisons it uses are incorrect: for example FEAT_SVE
      is not set if SVE2 is available.  Using HWCAPs for these is both simpler and
      correct.  The initialization must also be done atomically to avoid multiple
      threads causing corruption due to non-atomic RMW accesses to the global.
      
      libgcc:
      	PR target/115342
      	* config/aarch64/cpuinfo.c (__init_cpu_features_constructor):
      	Use HWCAP where possible.  Use atomic write for initialization.
      	Fix FEAT_PREDRES comparison.
      	(__init_cpu_features_resolver): Use atomic load for correct
      	initialization.
      	(__init_cpu_features): Likewise.
      d7cbcfe7
    • Wilco Dijkstra's avatar
      testsuite: Improve check-function-bodies · acdc9df3
      Wilco Dijkstra authored
      Improve check-function-bodies by allowing single-character function names.
      
      gcc/testsuite:
      	* lib/scanasm.exp (configure_check-function-bodies): Allow single-char
      	function names.
      acdc9df3
    • Kewen Lin's avatar
      darwin: Replace use of LONG_DOUBLE_TYPE_SIZE · 58ecd2eb
      Kewen Lin authored
      Joseph pointed out "floating types should have their mode,
      not a poorly defined precision value" in the discussion[1],
      as he and Richi suggested, the existing macros
      {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
      hook mode_for_floating_type.  To be prepared for that, this
      patch is to replace use of LONG_DOUBLE_TYPE_SIZE in darwin
      with TYPE_PRECISION of long_double_type_node.
      
      [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
      
      gcc/ChangeLog:
      
      	* config/darwin.cc (darwin_patch_builtins): Use TYPE_PRECISION of
      	long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
      58ecd2eb
    • Kewen Lin's avatar
      fortran: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE · 37a48009
      Kewen Lin authored
      Joseph pointed out "floating types should have their mode,
      not a poorly defined precision value" in the discussion[1],
      as he and Richi suggested, the existing macros
      {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
      hook mode_for_floating_type.  To be prepared for that, this
      patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
      in fortran with TYPE_PRECISION of
      {float,{,long_}double}_type_node.
      
      [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
      
      gcc/fortran/ChangeLog:
      
      	* trans-intrinsic.cc (build_round_expr): Use TYPE_PRECISION of
      	long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
      	* trans-types.cc (gfc_build_real_type): Use TYPE_PRECISION of
      	{float,double,long_double}_type_node to replace
      	{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE.
      37a48009
    • Kewen Lin's avatar
      d: Replace use of LONG_DOUBLE_TYPE_SIZE · b36461f1
      Kewen Lin authored
      Joseph pointed out "floating types should have their mode,
      not a poorly defined precision value" in the discussion[1],
      as he and Richi suggested, the existing macros
      {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
      hook mode_for_floating_type.  To be prepared for that, this
      patch is to remove the only one use of LONG_DOUBLE_TYPE_SIZE
      in d.  Iain found that LONG_DOUBLE_TYPE_SIZE is poorly named
      and used incorrectly before, so this patch follows his advice
      with int_size_in_bytes.
      
      [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
      
      
      
      Co-authored-by: default avatarIain Buclaw <ibuclaw@gdcproject.org>
      
      gcc/d/ChangeLog:
      
      	* d-target.cc (Target::_init): Use int_size_in_bytes of
      	long_double_type_node to replace the expression with
      	LONG_DOUBLE_TYPE_SIZE for c.long_doublesize assignment.
      b36461f1
    • Kewen Lin's avatar
      ada: Replace use of LONG_DOUBLE_TYPE_SIZE · 6fa25aa9
      Kewen Lin authored
      Joseph pointed out "floating types should have their mode,
      not a poorly defined precision value" in the discussion[1],
      as he and Richi suggested, the existing macros
      {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
      hook mode_for_floating_type.  To be prepared for that, this
      patch is to replace use of LONG_DOUBLE_TYPE_SIZE in ada
      with TYPE_PRECISION of long_double_type_node.
      
      [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
      
      gcc/ada/ChangeLog:
      
      	* gcc-interface/decl.cc (gnat_to_gnu_entity): Use TYPE_PRECISION of
      	long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
      6fa25aa9
    • Pan Li's avatar
      Internal-fn: Support new IFN SAT_SUB for unsigned scalar int · abe6d393
      Pan Li authored
      
      This patch would like to add the middle-end presentation for the
      saturation sub.  Aka set the result of add to the min when downflow.
      It will take the pattern similar as below.
      
      SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));
      
      For example for uint8_t, we have
      
      * SAT_SUB (255, 0)   => 255
      * SAT_SUB (1, 2)     => 0
      * SAT_SUB (254, 255) => 0
      * SAT_SUB (0, 255)   => 0
      
      Given below SAT_SUB for uint64
      
      uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
      {
        return (x - y) & (-(TYPE)(x >= y));
      }
      
      Before this patch:
      uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
      {
        _Bool _1;
        long unsigned int _3;
        uint64_t _6;
      
      ;;   basic block 2, loop depth 0
      ;;    pred:       ENTRY
        _1 = x_4(D) >= y_5(D);
        _3 = x_4(D) - y_5(D);
        _6 = _1 ? _3 : 0;
        return _6;
      ;;    succ:       EXIT
      }
      
      After this patch:
      uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
      {
        uint64_t _6;
      
      ;;   basic block 2, loop depth 0
      ;;    pred:       ENTRY
        _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
        return _6;
      ;;    succ:       EXIT
      }
      
      The below tests are running for this patch:
      *. The riscv fully regression tests.
      *. The x86 bootstrap tests.
      *. The x86 fully regression tests.
      
      	PR target/51492
      	PR target/112600
      
      gcc/ChangeLog:
      
      	* internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
      	* match.pd: Add new match for SAT_SUB.
      	* optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
      	* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
      	new decl for generated in match.pd.
      	(build_saturation_binary_arith_call): Add new helper function
      	to build the gimple call to binary SAT alu.
      	(match_saturation_arith): Rename from.
      	(match_unsigned_saturation_add): Rename to.
      	(match_unsigned_saturation_sub): Add new func to match the
      	unsigned sat sub.
      	(math_opts_dom_walker::after_dom_children): Add SAT_SUB matching
      	try when COND_EXPR.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      abe6d393
    • Gerald Pfeifer's avatar
      doc: Streamline recommendation of GNU awk · 99314267
      Gerald Pfeifer authored
      GNU awk 3.1.5 was released in August 2005; no need to specify this in
      the context of "recent version".
      
      gcc:
      	PR other/69374
      	* doc/install.texi (Prerequisites): Drop reference to GNU awk
      	version 3.1.5. Remove fluff.
      99314267
Loading