Skip to content
Snippets Groups Projects
  1. Sep 19, 2021
    • Andrew Pinski's avatar
      Fix PR bootstrap/102389: --with-build-config=bootstrap-lto is broken · 68aace44
      Andrew Pinski authored
      So the problem here is that now the lto-plugin requires NM that works
      with LTO to work so we need to pass down NM just like we do for ranlib
      and ar.
      
      OK? Bootstrapped and tested with --with-build-config=bootstrap-lto on aarch64-linux-gnu.
      Note you need to use binutils 2.35 or later too due to ttps://sourceware.org/PR25355
      (I will submit another patch to improve the installation instructions too).
      
      config/ChangeLog:
      
      	PR bootstrap/102389
      	* bootstrap-lto-lean.mk: Handle NM like RANLIB AND AR.
      	* bootstrap-lto.mk: Likewise.
      68aace44
    • Aldy Hernandez's avatar
      Minor cleanups to forward threader. · 08900f28
      Aldy Hernandez authored
      Every time we allocate a threading edge we push it onto the path in a
      distinct step.  There's no need to do this in two steps, and avoiding
      this, keeps us from exposing the internals of the registry.
      
      I've also did some tiny cleanups in thread_across_edge, most importantly
      removing the bitmap in favor of an auto_bitmap.
      
      There are no functional changes.
      
      gcc/ChangeLog:
      
      	* tree-ssa-threadbackward.c
      	(back_threader_registry::register_path): Use push_edge.
      	* tree-ssa-threadedge.c
      	(jump_threader::thread_around_empty_blocks): Same.
      	(jump_threader::thread_through_normal_block): Same.
      	(jump_threader::thread_across_edge): Same.  Also, use auto_bitmap.
      	Tidy up code.
      	* tree-ssa-threadupdate.c
      	(jt_path_registry::allocate_thread_edge): Remove.
      	(jt_path_registry::push_edge): New.
      	(dump_jump_thread_path): Make static.
      	* tree-ssa-threadupdate.h (allocate_thread_edge): Remove.
      	(push_edge): New.
      08900f28
    • Iain Sandoe's avatar
      Jit, testsuite: Amend expect processing to tolerate more platforms. · 124c354a
      Iain Sandoe authored
      
      The current 'fixed_host_execute' implementation fails on Darwin
      platforms for a number of reasons:
      
      1/ If the sub-process spawn fails (e.g. because of missing or mal-
         formed params); rather than reporting the fail output into the
         match stream, as indicated by the expect manual, it terminates
         the script.
      
       - We fix this by (a) checking that the executable is valid as well
         as existing (b) we put the spawn into a catch block and report
         a failure.
      
      2/ There is no recovery path at all for a buffer-full case (and we
         do see buffer-full events with the default sizes).
      
       - Added by the patch here, however it is not as sophisticated as
         the methods used by dejagnu internally.  Here we set the process
         to be "nowait" and then close the connection - with the intent
         that this will terminate the spawned process.
      
      3/  The expect logic assumes that 'Totals:' is a valid indicator
          for the end of the spawned process output.  This is not true
          even for the default dejagnu header (there are a number of
          additional reporting lines after).  In addition to this, there
          are some tests that intentionally produce more output after
          the totals report (and there are tests that do not use that
          mechanism at all).
      
          The effect is the we might arrive at the "wait" for the spawned
          process to finish - but that process might not have completed
          all its output.  For Darwin, at least that causes a deadlock
          between expect and the spawnee - the latter is doing a non-
          cancellable write and the former is waiting for the latter to
          terminate.  For some reason this does not seem to affect Linux
          perhaps the pty implementation allows the write(s) are able to
          proceed even though there is no reader.
      
       -  This is fixed by modifying the loop termination condition to be
          either EOF (which will be the 'correct' condition) or a timeout
          which would represent an error either in the runtime or in the
          parsing of the output.  As added precautions, we only try to
          wait if there is a correcly-spawned process, and we are also
          specific about which process we are waiting for.
      
      4/  Darwin appears to have a bug in either the tcl or termios
          'cooking' code that ocassionally inserts an additional CR char
          into the stream - thus '\n' => '\r\r\n' instead of '\r\n'. The
          original program output is correct (it only contains a single
          \n) - the additional character is being inserted somewhere in
          the translations applied before the output reaches expect.
      
          The logic of this expect implementation does not tolerate single
          \r or \n characters (it will fail with a timeout or buffer-full
          if that occurs).
      
       -  This is fixed by having a line-end match that is adjusted for
          Darwin.
      
      5/  The default buffer size does seem to be too small in some cases
          noting that GCC uses 10000 as the match buffer size and the
          default is 2000.
      
       -  Fixed by increasing the size to 8192.
      
      6/  There is a somewhat arbitrary dumping of output where we match
          ^$prefix\tSOMETHING... and then process the something.  This
          essentially allows the match to start at any place in the buffer
          following any collection of non-line-end chars.
      
       -  Fixed by amending the match for 'general' lines to accommodate
          these cases, and reporting such lines to the log.  At least this
          should allow debugging of any cases where output that should be
          recognized is being dropped.
      
      Signed-off-by: default avatarIain Sandoe <iain@sandoe.co.uk>
      
      gcc/testsuite/ChangeLog:
      
      	* jit.dg/jit.exp (fixed_local_execute): Amend the match and
      	exit conditions to cater for more platforms.
      124c354a
    • Aldy Hernandez's avatar
      Make dump_ranger routines externally visible. · 8d42a27d
      Aldy Hernandez authored
      There was an inline extern declaration for dump_ranger that was a bit of
      a hack.  I've removed it in favor of an actual prototype.  There are
      also some trivial changes to the dumping code in the path solver.
      
      gcc/ChangeLog:
      
      	* gimple-range-path.cc (path_range_query::path_range_query): Add
      	header.
      	(path_range_query::dump): Remove extern declaration of dump_ranger.
      	* gimple-range-trace.cc (dump_ranger): Add DEBUG_FUNCTION marker.
      	* gimple-range-trace.h (dump_ranger): Add prototype.
      8d42a27d
    • John Ericson's avatar
      [PATCH] Factor out `find_a_program` helper around `find_a_file` · 5fee8a0a
      John Ericson authored
      gcc/
      	* gcc.c (find_a_program): New function, factored out of...
      	(find_a_file): Here.
      	(execute): Use find_a_program when looking for programs rather
      	than find_a_file.
      5fee8a0a
    • Matwey V. Kornilov's avatar
      [PATCH] avr: Add atmega324pb MCU · 16f97766
      Matwey V. Kornilov authored
      gcc/
      	* config/avr/avr-mcus.def: Add atmega324pb.
      	* doc/avr-mmcu.texi: Corresponding changes.
      16f97766
    • Roger Sayle's avatar
      PR middle-end/88173: More constant folding of NaN comparisons. · e9e46864
      Roger Sayle authored
      This patch tackles PR middle-end/88173 where the order of operands in
      a comparison affects constant folding.  As diagnosed by Jason Merrill,
      "match.pd handles these comparisons very differently".  The history is
      that the middle end, typically canonicalizes comparisons to place
      constants on the right, but when a comparison contains two constants
      we need to check/transform both constants, i.e. on both the left and the
      right.  Hence the added lines below duplicate for @0 the same transform
      applied a few lines above for @1.
      
      Whilst preparing the testcase, I noticed that this transformation is
      incorrectly disabled with -fsignaling-nans even when both operands are
      known not be be signaling NaNs, so I've corrected that and added a
      second test case.  Unfortunately, c-c++-common/pr57371-4.c then starts
      failing, as it doesn't distinguish QNaNs (which are quiet) from SNaNs
      (which signal), so this patch includes a minor tweak to the expected
      behaviour for QNaNs in that existing test.
      
      2021-09-19  Roger Sayle <roger@nextmovesoftware.com>
      
      gcc/ChangeLog
      	PR middle-end/88173
      	* match.pd (cmp @0 REAL_CST@1): When @0 is also REAL_CST, apply
      	the same transformations as to @1.  For comparisons against NaN,
      	don't check HONOR_SNANS but confirm that neither operand is a
      	signaling NaN.
      
      gcc/testsuite/ChangeLog
      	PR middle-end/88173
      	* c-c++-common/pr57371-4.c: Tweak/correct test case for QNaNs.
      	* g++.dg/pr88173-1.C: New test case.
      	* g++.dg/pr88173-2.C: New test case.
      e9e46864
    • Benjamin Peterson's avatar
      [PATCH] Remove unused function make_unique_name. · 69337e74
      Benjamin Peterson authored
      gcc/
      	* attribs.c (make_unique_name): Delete.
      	* attribs.h (make_unique_name): Delete.
      69337e74
    • Andrew Pinski's avatar
      Fix middle-end/102395: reg_class having only NO_REGS and ALL_REGS. · 767c0982
      Andrew Pinski authored
      So this is a simple fix is to just add to the assert that
      sclass and dclass are both greater than or equal to NO_REGS.
      NO_REGS is documented as the first register class so it should
      have the value of 0.
      
      gcc/ChangeLog:
      
      	* lra-constraints.c (check_and_process_move): Assert
      	that dclass and sclass are greater than or equal to NO_REGS.
      767c0982
    • GCC Administrator's avatar
      Daily bump. · cf74e7b5
      GCC Administrator authored
      cf74e7b5
  2. Sep 18, 2021
    • Jakub Jelinek's avatar
      openmp: Handle unconstrained and reproducible modifiers on order(concurrent) · e9d8fcab
      Jakub Jelinek authored
      This patch adds handling for unconstrained and reproducible modifiers on
      order(concurrent) clause.  For all static schedules (including auto and
      no schedule or dist_schedule clauses) I believe what we implement is
      reproducible, so the patch doesn't do much beyond recognizing those.
      Note, there is an OpenMP/spec issue that needs resolution on what
      should happen with the dynamic schedules (whether it should be an error
      to mix such clauses, or silently make it non-reproducible, and in which
      exact cases), so it might need some follow-up.
      
      Besides that, this patch allows order(concurrent) clause on the distribute
      construct which is something also added in OpenMP 5.1, and finally
      check the newly added restriction that at most one order clause
      can appear on a construct.
      
      The allowing of order clause on distribute has a side-effect that
      order(concurrent) copyin(thrpriv) is no longer allowed on combined/composite
      constructs with distribute parallel for{, simd} in it, previously the
      order applied only to for/simd and so a threadprivate var could be seen
      in the construct, but now it also applies to distribute and so on the parallel
      we shouldn't refer to a threadprivate var.
      
      2021-09-18  Jakub Jelinek  <jakub@redhat.com>
      
      gcc/
      	* tree.h (OMP_CLAUSE_ORDER_UNCONSTRAINED): Define.
      	* tree-pretty-print.c (dump_omp_clause): Print unconstrained:
      	for OMP_CLAUSE_ORDER_UNCONSTRAINED.
      gcc/c-family/
      	* c-omp.c (c_omp_split_clauses): Split order clause also to
      	distribute construct.  Copy over OMP_CLAUSE_ORDER_UNCONSTRAINED.
      gcc/c/
      	* c-parser.c (c_parser_omp_clause_order): Parse unconstrained
      	and reproducible modifiers.
      	(OMP_DISTRIBUTE_CLAUSE_MASK): Add order clause.
      gcc/cp/
      	* parser.c (cp_parser_omp_clause_order): Parse unconstrained
      	and reproducible modifiers.
      	(OMP_DISTRIBUTE_CLAUSE_MASK): Add order clause.
      gcc/testsuite/
      	* c-c++-common/gomp/order-1.c (f2): Add tests for distribute
      	with order clause.
      	(f3): Remove.
      	* c-c++-common/gomp/order-2.c: Don't expect error for distribute
      	with order clause.
      	* c-c++-common/gomp/order-5.c: New test.
      	* c-c++-common/gomp/order-6.c: New test.
      	* c-c++-common/gomp/clause-dups-1.c (f1): Add tests for
      	duplicated order clause.
      	(f9): New function.
      	* c-c++-common/gomp/clauses-1.c (baz, bar): Don't mix copyin and
      	order(concurrent) clauses on the same composite construct combined
      	with distribute, instead split it into two tests, one without
      	copyin and one without order(concurrent).  Add order(concurrent)
      	clauses to {,{,target} teams} distribute.
      	* g++.dg/gomp/attrs-1.C (baz, bar): Likewise.
      	* g++.dg/gomp/attrs-2.C (baz, bar): Likewise.
      e9d8fcab
    • liuhongt's avatar
      Fix ICE in pass_rpad. · e666a0a2
      liuhongt authored
      Besides conversion instructions, pass_rpad also handles scalar
      sqrt/rsqrt/rcp/round instructions, while r12-3614 should only want to
      handle conversion instructions, so fix it.
      
      gcc/ChangeLog:
      
      	* config/i386/i386-features.c (remove_partial_avx_dependency):
      	Restrict TARGET_USE_VECTOR_FP_CONVERTS and
      	TARGET_USE_VECTOR_CONVERTS to conversion instructions only.
      e666a0a2
    • Jakub Jelinek's avatar
      openmp: Allow private or firstprivate arguments to default clause even for C/C++ · e5597f2a
      Jakub Jelinek authored
      OpenMP 5.1 allows default(private) or default(firstprivate) even in C/C++,
      but it behaves the same way as in Fortran only for variables not declared at
      namespace or file scope.  For the namespace/file scope variables it instead
      behaves as default(none).
      
      2021-09-18  Jakub Jelinek  <jakub@redhat.com>
      
      gcc/
      	* gimplify.c (omp_default_clause): For C/C++ default({,first}private),
      	if file/namespace scope variable doesn't have predetermined sharing,
      	treat it as if there was default(none).
      gcc/c/
      	* c-parser.c (c_parser_omp_clause_default): Handle private and
      	firstprivate arguments, adjust diagnostics on unknown argument.
      gcc/cp/
      	* parser.c (cp_parser_omp_clause_default): Handle private and
      	firstprivate arguments, adjust diagnostics on unknown argument.
      	* cp-gimplify.c (cxx_omp_finish_clause): Handle OMP_CLAUSE_PRIVATE.
      gcc/testsuite/
      	* c-c++-common/gomp/default-2.c: New test.
      	* c-c++-common/gomp/default-3.c: New test.
      	* g++.dg/gomp/default-1.C: New test.
      libgomp/
      	* testsuite/libgomp.c++/default-1.C: New test.
      	* testsuite/libgomp.c-c++-common/default-1.c: New test.
      	* libgomp.texi (OpenMP 5.1): Mark "private and firstprivate argument
      	to default clause in C and C++" as implemented.
      e5597f2a
    • liuhongt's avatar
      AVX512FP16: Add testcase for scalar FMA instructions. · d07c750c
      liuhongt authored
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx512fp16-vfmaddXXXsh-1a.c: New test.
      	* gcc.target/i386/avx512fp16-vfmaddXXXsh-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16-vfmsubXXXsh-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16-vfmsubXXXsh-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16-vfnmaddXXXsh-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16-vfnmaddXXXsh-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16-vfnmsubXXXsh-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16-vfnmsubXXXsh-1b.c: Ditto.
      d07c750c
    • liuhongt's avatar
      AVX512FP16: Add scalar fma instructions. · 3c9de0a9
      liuhongt authored
      Add vfmadd[132,213,231]sh/vfnmadd[132,213,231]sh/
      vfmsub[132,213,231]sh/vfnmsub[132,213,231]sh.
      
      gcc/ChangeLog:
      
      	* config/i386/avx512fp16intrin.h (_mm_fmadd_sh):
      	New intrinsic.
      	(_mm_mask_fmadd_sh): Likewise.
      	(_mm_mask3_fmadd_sh): Likewise.
      	(_mm_maskz_fmadd_sh): Likewise.
      	(_mm_fmadd_round_sh): Likewise.
      	(_mm_mask_fmadd_round_sh): Likewise.
      	(_mm_mask3_fmadd_round_sh): Likewise.
      	(_mm_maskz_fmadd_round_sh): Likewise.
      	(_mm_fnmadd_sh): Likewise.
      	(_mm_mask_fnmadd_sh): Likewise.
      	(_mm_mask3_fnmadd_sh): Likewise.
      	(_mm_maskz_fnmadd_sh): Likewise.
      	(_mm_fnmadd_round_sh): Likewise.
      	(_mm_mask_fnmadd_round_sh): Likewise.
      	(_mm_mask3_fnmadd_round_sh): Likewise.
      	(_mm_maskz_fnmadd_round_sh): Likewise.
      	(_mm_fmsub_sh): Likewise.
      	(_mm_mask_fmsub_sh): Likewise.
      	(_mm_mask3_fmsub_sh): Likewise.
      	(_mm_maskz_fmsub_sh): Likewise.
      	(_mm_fmsub_round_sh): Likewise.
      	(_mm_mask_fmsub_round_sh): Likewise.
      	(_mm_mask3_fmsub_round_sh): Likewise.
      	(_mm_maskz_fmsub_round_sh): Likewise.
      	(_mm_fnmsub_sh): Likewise.
      	(_mm_mask_fnmsub_sh): Likewise.
      	(_mm_mask3_fnmsub_sh): Likewise.
      	(_mm_maskz_fnmsub_sh): Likewise.
      	(_mm_fnmsub_round_sh): Likewise.
      	(_mm_mask_fnmsub_round_sh): Likewise.
      	(_mm_mask3_fnmsub_round_sh): Likewise.
      	(_mm_maskz_fnmsub_round_sh): Likewise.
      	* config/i386/i386-builtin-types.def
      	(V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT): New builtin type.
      	* config/i386/i386-builtin.def: Add new builtins.
      	* config/i386/i386-expand.c: Handle new builtin type.
      	* config/i386/sse.md (fmai_vmfmadd_<mode><round_name>):
      	Ajdust to support FP16.
      	(fmai_vmfmsub_<mode><round_name>): Ditto.
      	(fmai_vmfnmadd_<mode><round_name>): Ditto.
      	(fmai_vmfnmsub_<mode><round_name>): Ditto.
      	(*fmai_fmadd_<mode>): Ditto.
      	(*fmai_fmsub_<mode>): Ditto.
      	(*fmai_fnmadd_<mode><round_name>): Ditto.
      	(*fmai_fnmsub_<mode><round_name>): Ditto.
      	(avx512f_vmfmadd_<mode>_mask<round_name>): Ditto.
      	(avx512f_vmfmadd_<mode>_mask3<round_name>): Ditto.
      	(avx512f_vmfmadd_<mode>_maskz<round_expand_name>): Ditto.
      	(avx512f_vmfmadd_<mode>_maskz_1<round_name>): Ditto.
      	(*avx512f_vmfmsub_<mode>_mask<round_name>): Ditto.
      	(avx512f_vmfmsub_<mode>_mask3<round_name>): Ditto.
      	(*avx512f_vmfmsub_<mode>_maskz_1<round_name>): Ditto.
      	(*avx512f_vmfnmsub_<mode>_mask<round_name>): Ditto.
      	(*avx512f_vmfnmsub_<mode>_mask3<round_name>): Ditto.
      	(*avx512f_vmfnmsub_<mode>_mask<round_name>): Ditto.
      	(*avx512f_vmfnmadd_<mode>_mask<round_name>): Renamed to ...
      	(avx512f_vmfnmadd_<mode>_mask<round_name>) ... this, and
      	adjust to support FP16.
      	(avx512f_vmfnmadd_<mode>_mask3<round_name>): Ditto.
      	(avx512f_vmfnmadd_<mode>_maskz_1<round_name>): Ditto.
      	(avx512f_vmfnmadd_<mode>_maskz<round_expand_name>): New
      	expander.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx-1.c: Add test for new builtins.
      	* gcc.target/i386/sse-13.c: Ditto.
      	* gcc.target/i386/sse-23.c: Ditto.
      	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
      	* gcc.target/i386/sse-22.c: Ditto.
      3c9de0a9
    • H.J. Lu's avatar
      AVX512FP16: Enable FP16 mask load/store. · 376d69f3
      H.J. Lu authored
      gcc/ChangeLog:
      
      	* config/i386/sse.md (avx512fmaskmodelower): Extend to support
      	HF modes.
      	(maskload<mode><avx512fmaskmodelower>): Ditto.
      	(maskstore<mode><avx512fmaskmodelower>): Ditto.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx512fp16-xorsign-1.c: New test.
      376d69f3
    • liuhongt's avatar
      AVX512FP16: Add testcase for fp16 bitwise operations. · ef6ab4ab
      liuhongt authored
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx512fp16-neg-1a.c: New test.
      	* gcc.target/i386/avx512fp16-neg-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16-scalar-bitwise-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16-scalar-bitwise-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16-vector-bitwise-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16-vector-bitwise-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16vl-neg-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16vl-neg-1b.c: Ditto.
      ef6ab4ab
    • H.J. Lu's avatar
      AVX512FP16: Add scalar/vector bitwise operations, including · 75a97b59
      H.J. Lu authored
      1. FP16 vector xor/ior/and/andnot/abs/neg
      2. FP16 scalar abs/neg/copysign/xorsign
      
      gcc/ChangeLog:
      
      	* config/i386/i386-expand.c (ix86_expand_fp_absneg_operator):
      	Handle HFmode.
      	(ix86_expand_copysign): Ditto.
      	(ix86_expand_xorsign): Ditto.
      	* config/i386/i386.c (ix86_build_const_vector): Handle HF vector
      	modes.
      	(ix86_build_signbit_mask): Ditto.
      	(ix86_can_change_mode_class): Ditto.
      	* config/i386/i386.md
      	(SSEMODEF): Add HFmode.
      	(ssevecmodef): Ditto.
      	(<code>hf2): New define_expand.
      	(*<code>hf2_1): New define_insn_and_split.
      	(copysign<mode>): Extend to support HFmode under AVX512FP16.
      	(xorsign<mode>): Ditto.
      	* config/i386/sse.md (VFB): New mode iterator.
      	(VFB_128_256): Ditto.
      	(VFB_512): Ditto.
      	(sseintvecmode2): Support HF vector mode.
      	(<code><mode>2): Use new mode iterator.
      	(*<code><mode>2): Ditto.
      	(copysign<mode>3): Ditto.
      	(xorsign<mode>3): Ditto.
      	(<code><mode>3<mask_name>): Ditto.
      	(<code><mode>3<mask_name>): Ditto.
      	(<sse>_andnot<mode>3<mask_name>): Adjust for HF vector mode.
      	(<sse>_andnot<mode>3<mask_name>): Ditto.
      	(*<code><mode>3<mask_name>): Ditto.
      	(*<code><mode>3<mask_name>): Ditto.
      75a97b59
    • liuhongt's avatar
      AVX512FP16: Add testcase for fma instructions · 630a1249
      liuhongt authored
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx512fp16-vfmaddXXXph-1a.c: New test.
      	* gcc.target/i386/avx512fp16-vfmaddXXXph-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16-vfmsubXXXph-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16-vfmsubXXXph-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16-vfnmaddXXXph-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16-vfnmaddXXXph-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16-vfnmsubXXXph-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16-vfnmsubXXXph-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16vl-vfmaddXXXph-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16vl-vfmaddXXXph-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16vl-vfmsubXXXph-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16vl-vfmsubXXXph-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1b.c: Ditto.
      630a1249
    • liuhongt's avatar
      AVX512FP16: Add FP16 fma instructions. · ede1820d
      liuhongt authored
      Add vfmadd[132,213,231]ph/vfnmadd[132,213,231]ph/vfmsub[132,213,231]ph/
      vfnmsub[132,213,231]ph.
      
      gcc/ChangeLog:
      
      	* config/i386/avx512fp16intrin.h (_mm512_mask_fmadd_ph):
      	New intrinsic.
      	(_mm512_mask3_fmadd_ph): Likewise.
      	(_mm512_maskz_fmadd_ph): Likewise.
      	(_mm512_fmadd_round_ph): Likewise.
      	(_mm512_mask_fmadd_round_ph): Likewise.
      	(_mm512_mask3_fmadd_round_ph): Likewise.
      	(_mm512_maskz_fmadd_round_ph): Likewise.
      	(_mm512_fnmadd_ph): Likewise.
      	(_mm512_mask_fnmadd_ph): Likewise.
      	(_mm512_mask3_fnmadd_ph): Likewise.
      	(_mm512_maskz_fnmadd_ph): Likewise.
      	(_mm512_fnmadd_round_ph): Likewise.
      	(_mm512_mask_fnmadd_round_ph): Likewise.
      	(_mm512_mask3_fnmadd_round_ph): Likewise.
      	(_mm512_maskz_fnmadd_round_ph): Likewise.
      	(_mm512_fmsub_ph): Likewise.
      	(_mm512_mask_fmsub_ph): Likewise.
      	(_mm512_mask3_fmsub_ph): Likewise.
      	(_mm512_maskz_fmsub_ph): Likewise.
      	(_mm512_fmsub_round_ph): Likewise.
      	(_mm512_mask_fmsub_round_ph): Likewise.
      	(_mm512_mask3_fmsub_round_ph): Likewise.
      	(_mm512_maskz_fmsub_round_ph): Likewise.
      	(_mm512_fnmsub_ph): Likewise.
      	(_mm512_mask_fnmsub_ph): Likewise.
      	(_mm512_mask3_fnmsub_ph): Likewise.
      	(_mm512_maskz_fnmsub_ph): Likewise.
      	(_mm512_fnmsub_round_ph): Likewise.
      	(_mm512_mask_fnmsub_round_ph): Likewise.
      	(_mm512_mask3_fnmsub_round_ph): Likewise.
      	(_mm512_maskz_fnmsub_round_ph): Likewise.
      	* config/i386/avx512fp16vlintrin.h (_mm256_fmadd_ph):
      	New intrinsic.
      	(_mm256_mask_fmadd_ph): Likewise.
      	(_mm256_mask3_fmadd_ph): Likewise.
      	(_mm256_maskz_fmadd_ph): Likewise.
      	(_mm_fmadd_ph): Likewise.
      	(_mm_mask_fmadd_ph): Likewise.
      	(_mm_mask3_fmadd_ph): Likewise.
      	(_mm_maskz_fmadd_ph): Likewise.
      	(_mm256_fnmadd_ph): Likewise.
      	(_mm256_mask_fnmadd_ph): Likewise.
      	(_mm256_mask3_fnmadd_ph): Likewise.
      	(_mm256_maskz_fnmadd_ph): Likewise.
      	(_mm_fnmadd_ph): Likewise.
      	(_mm_mask_fnmadd_ph): Likewise.
      	(_mm_mask3_fnmadd_ph): Likewise.
      	(_mm_maskz_fnmadd_ph): Likewise.
      	(_mm256_fmsub_ph): Likewise.
      	(_mm256_mask_fmsub_ph): Likewise.
      	(_mm256_mask3_fmsub_ph): Likewise.
      	(_mm256_maskz_fmsub_ph): Likewise.
      	(_mm_fmsub_ph): Likewise.
      	(_mm_mask_fmsub_ph): Likewise.
      	(_mm_mask3_fmsub_ph): Likewise.
      	(_mm_maskz_fmsub_ph): Likewise.
      	(_mm256_fnmsub_ph): Likewise.
      	(_mm256_mask_fnmsub_ph): Likewise.
      	(_mm256_mask3_fnmsub_ph): Likewise.
      	(_mm256_maskz_fnmsub_ph): Likewise.
      	(_mm_fnmsub_ph): Likewise.
      	(_mm_mask_fnmsub_ph): Likewise.
      	(_mm_mask3_fnmsub_ph): Likewise.
      	(_mm_maskz_fnmsub_ph): Likewise.
      	* config/i386/i386-builtin.def: Add corresponding new builtins.
      	* config/i386/sse.md
      	(<avx512>_fmadd_<mode>_maskz<round_expand_name>): Adjust to
      	support HF vector modes.
      	(<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>):
      	Ditto.
      	(*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_1): Ditto.
      	(*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_2): Ditto.
      	(*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_3): Ditto.
      	(<avx512>_fmadd_<mode>_mask<round_name>): Ditto.
      	(<avx512>_fmadd_<mode>_mask3<round_name>): Ditto.
      	(<avx512>_fmsub_<mode>_maskz<round_expand_name>): Ditto.
      	(<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>):
      	Ditto.
      	(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_1): Ditto.
      	(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_2): Ditto.
      	(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_3): Ditto.
      	(<avx512>_fmsub_<mode>_mask<round_name>): Ditto.
      	(<avx512>_fmsub_<mode>_mask3<round_name>): Ditto.
      	(<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>):
      	Ditto.
      	(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_1): Ditto.
      	(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_2): Ditto.
      	(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_3): Ditto.
      	(<avx512>_fnmadd_<mode>_mask<round_name>): Ditto.
      	(<avx512>_fnmadd_<mode>_mask3<round_name>): Ditto.
      	(<avx512>_fnmsub_<mode>_maskz<round_expand_name>): Ditto.
      	(<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>):
      	Ditto.
      	(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_1): Ditto.
      	(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_2): Ditto.
      	(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_3): Ditto.
      	(<avx512>_fnmsub_<mode>_mask<round_name>): Ditto.
      	(<avx512>_fnmsub_<mode>_mask3<round_name>): Ditto.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx-1.c: Add test for new builtins.
      	* gcc.target/i386/sse-13.c: Ditto.
      	* gcc.target/i386/sse-23.c: Ditto.
      	* gcc.target/i386/sse-14.c: Add test fot new intrinsics.
      	* gcc.target/i386/sse-22.c: Ditto.
      ede1820d
    • liuhongt's avatar
      AVX512FP16: Add testcase for vfmaddsub[132,213,231]ph/vfmsubadd[132,213,231]ph. · b6c24eab
      liuhongt authored
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx512fp16-vfmaddsubXXXph-1a.c: New test.
      	* gcc.target/i386/avx512fp16-vfmaddsubXXXph-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16-vfmsubaddXXXph-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16-vfmsubaddXXXph-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1b.c: Ditto.
      	* gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1a.c: Ditto.
      	* gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1b.c: Ditto.
      b6c24eab
    • liuhongt's avatar
      AVX512FP16: Add vfmaddsub[132,213,231]ph/vfmsubadd[132,213,231]ph. · 1e685084
      liuhongt authored
      gcc/ChangeLog:
      
      	* config/i386/avx512fp16intrin.h (_mm512_fmaddsub_ph):
      	New intrinsic.
      	(_mm512_mask_fmaddsub_ph): Likewise.
      	(_mm512_mask3_fmaddsub_ph): Likewise.
      	(_mm512_maskz_fmaddsub_ph): Likewise.
      	(_mm512_fmaddsub_round_ph): Likewise.
      	(_mm512_mask_fmaddsub_round_ph): Likewise.
      	(_mm512_mask3_fmaddsub_round_ph): Likewise.
      	(_mm512_maskz_fmaddsub_round_ph): Likewise.
      	(_mm512_mask_fmsubadd_ph): Likewise.
      	(_mm512_mask3_fmsubadd_ph): Likewise.
      	(_mm512_maskz_fmsubadd_ph): Likewise.
      	(_mm512_fmsubadd_round_ph): Likewise.
      	(_mm512_mask_fmsubadd_round_ph): Likewise.
      	(_mm512_mask3_fmsubadd_round_ph): Likewise.
      	(_mm512_maskz_fmsubadd_round_ph): Likewise.
      	* config/i386/avx512fp16vlintrin.h (_mm256_fmaddsub_ph):
      	New intrinsic.
      	(_mm256_mask_fmaddsub_ph): Likewise.
      	(_mm256_mask3_fmaddsub_ph): Likewise.
      	(_mm256_maskz_fmaddsub_ph): Likewise.
      	(_mm_fmaddsub_ph): Likewise.
      	(_mm_mask_fmaddsub_ph): Likewise.
      	(_mm_mask3_fmaddsub_ph): Likewise.
      	(_mm_maskz_fmaddsub_ph): Likewise.
      	(_mm256_fmsubadd_ph): Likewise.
      	(_mm256_mask_fmsubadd_ph): Likewise.
      	(_mm256_mask3_fmsubadd_ph): Likewise.
      	(_mm256_maskz_fmsubadd_ph): Likewise.
      	(_mm_fmsubadd_ph): Likewise.
      	(_mm_mask_fmsubadd_ph): Likewise.
      	(_mm_mask3_fmsubadd_ph): Likewise.
      	(_mm_maskz_fmsubadd_ph): Likewise.
      	* config/i386/i386-builtin.def: Add corresponding new builtins.
      	* config/i386/sse.md (VFH_SF_AVX512VL): New mode iterator.
      	* (<avx512>_fmsubadd_<mode>_maskz<round_expand_name>): New expander.
      	* (<avx512>_fmaddsub_<mode>_maskz<round_expand_name>): Use
      	VFH_SF_AVX512VL.
      	* (<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>):
      	Ditto.
      	* (<avx512>_fmaddsub_<mode>_mask<round_name>): Ditto.
      	* (<avx512>_fmaddsub_<mode>_mask3<round_name>): Ditto.
      	* (<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>):
      	Ditto.
      	* (<avx512>_fmsubadd_<mode>_mask<round_name>): Ditto.
      	* (<avx512>_fmsubadd_<mode>_mask3<round_name>): Ditto.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx-1.c: Add test for new builtins.
      	* gcc.target/i386/sse-13.c: Ditto.
      	* gcc.target/i386/sse-23.c: Ditto.
      	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
      	* gcc.target/i386/sse-22.c: Ditto.
      1e685084
    • liuhongt's avatar
      Support embedded broadcast for AVX512FP16 instructions. · 7afcb534
      liuhongt authored
      gcc/ChangeLog:
      
      	PR target/87767
      	* config/i386/i386.c (ix86_print_operand): Handle
      	V8HF/V16HF/V32HFmode.
      	* config/i386/i386.h (VALID_BCST_MODE_P): Add HFmode.
      	* config/i386/sse.md (avx512bcst): Remove.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx512fp16-broadcast-1.c: New test.
      	* gcc.target/i386/avx512fp16-broadcast-2.c: New test.
      7afcb534
    • Jason Merrill's avatar
      c++: improve lookup of member-qualified names · 18b57c1d
      Jason Merrill authored
      I've been working on the resolution of CWG1835 by P1787, which among many
      other things clarified that a name after -> or . is looked up first in the
      class of the object expression even if it's dependent.  This patch does not
      make that change; this is a smaller change extracted from that work in
      progress to make the lookup in the object type work better in cases where
      unqualified lookup doesn't find anything.
      
      Basically, if we see "t.foo::" we know that looking up foo in t needs to
      find a type, so we build an implicit TYPENAME_TYPE for it.
      
      This also implements the change from P1787 to assume that a name followed by
      < in a type-only context names a template, since the less-than operator
      can't appear in a type context.  This makes some of the lines in dtor11.C
      work.
      
      I introduce the predicate 'dependentish_scope_p' for the case where the
      current instantiation has dependent bases, so even though we can perform
      name lookup, we can't conclude that a lookup failure is conclusive.
      
      gcc/cp/ChangeLog:
      
      	* cp-tree.h (dependentish_scope_p): Declare.
      	* pt.c (dependentish_scope_p): New.
      	* parser.c (cp_parser_lookup_name): Return a TYPENAME_TYPE
      	for lookup of a type in a dependent object.
      	(cp_parser_template_id): Handle TYPENAME_TYPE.
      	(cp_parser_template_name): If we're looking for a type,
      	a name followed by < names a template.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/template/dtor5.C: Adjust expected error.
      	* g++.dg/cpp23/lookup2.C: New test.
      	* g++.dg/template/dtor11.C: New test.
      18b57c1d
    • Jason Merrill's avatar
      c++: fix comment typo · 8618f9e5
      Jason Merrill authored
      gcc/cp/ChangeLog:
      
      	* cp-tree.h: Fix typo in LANG_FLAG list.
      8618f9e5
    • GCC Administrator's avatar
      Daily bump. · 0a4cb439
      GCC Administrator authored
      0a4cb439
  3. Sep 17, 2021
    • Martin Sebor's avatar
      Factor predidacte analysis out of tree-ssa-uninit.c into its own module. · 94c12ffa
      Martin Sebor authored
      gcc/ChangeLog:
      
      	* Makefile.in (OBJS): Add gimple-predicate-analysis.o.
      	* tree-ssa-uninit.c (max_phi_args): Move to gimple-predicate-analysis.
      	(MASK_SET_BIT, MASK_TEST_BIT, MASK_EMPTY): Same.
      	(check_defs): Add comment.
      	(can_skip_redundant_opnd): Update comment.
      	(compute_uninit_opnds_pos): Adjust to namespace change.
      	(find_pdom): Move to gimple-predicate-analysis.cc.
      	(find_dom): Same.
      	(struct uninit_undef_val_t): New.
      	(is_non_loop_exit_postdominating): Move to gimple-predicate-analysis.cc.
      	(find_control_equiv_block): Same.
      	(MAX_NUM_CHAINS, MAX_CHAIN_LEN, MAX_POSTDOM_CHECK): Same.
      	(MAX_SWITCH_CASES): Same.
      	(compute_control_dep_chain): Same.
      	(find_uninit_use): Use predicate analyzer.
      	(struct pred_info): Move to gimple-predicate-analysis.
      	(convert_control_dep_chain_into_preds): Same.
      	(find_predicates): Same.
      	(collect_phi_def_edges): Same.
      	(warn_uninitialized_phi): Use predicate analyzer.
      	(find_def_preds): Move to gimple-predicate-analysis.
      	(dump_pred_info): Same.
      	(dump_pred_chain): Same.
      	(dump_predicates): Same.
      	(destroy_predicate_vecs): Remove.
      	(execute_late_warn_uninitialized): New.
      	(get_cmp_code): Move to gimple-predicate-analysis.
      	(is_value_included_in): Same.
      	(value_sat_pred_p): Same.
      	(find_matching_predicate_in_rest_chains): Same.
      	(is_use_properly_guarded): Same.
      	(prune_uninit_phi_opnds): Same.
      	(find_var_cmp_const): Same.
      	(use_pred_not_overlap_with_undef_path_pred): Same.
      	(pred_equal_p): Same.
      	(is_neq_relop_p): Same.
      	(is_neq_zero_form_p): Same.
      	(pred_expr_equal_p): Same.
      	(is_pred_expr_subset_of): Same.
      	(is_pred_chain_subset_of): Same.
      	(is_included_in): Same.
      	(is_superset_of): Same.
      	(pred_neg_p): Same.
      	(simplify_pred): Same.
      	(simplify_preds_2): Same.
      	(simplify_preds_3): Same.
      	(simplify_preds_4): Same.
      	(simplify_preds): Same.
      	(push_pred): Same.
      	(push_to_worklist): Same.
      	(get_pred_info_from_cmp): Same.
      	(is_degenerated_phi): Same.
      	(normalize_one_pred_1): Same.
      	(normalize_one_pred): Same.
      	(normalize_one_pred_chain): Same.
      	(normalize_preds): Same.
      	(can_one_predicate_be_invalidated_p): Same.
      	(can_chain_union_be_invalidated_p): Same.
      	(uninit_uses_cannot_happen): Same.
      	(pass_late_warn_uninitialized::execute): Define.
      	* gimple-predicate-analysis.cc: New file.
      	* gimple-predicate-analysis.h: New file.
      94c12ffa
    • Harald Anlauf's avatar
      Fortran - (large) arrays in the main shall be static · 51166eb2
      Harald Anlauf authored
      gcc/fortran/ChangeLog:
      
      	PR fortran/102366
      	* trans-decl.c (gfc_finish_var_decl): Disable the warning message
      	for variables moved from stack to static storange if they are
      	declared in the main, but allow the move to happen.
      
      gcc/testsuite/ChangeLog:
      
      	PR fortran/102366
      	* gfortran.dg/pr102366.f90: New test.
      51166eb2
    • Jonathan Wakely's avatar
      libstdc++: Add 'noexcept' to path::iterator members · 42eff613
      Jonathan Wakely authored
      
      All path::iterator operations are non-throwing.
      
      Signed-off-by: default avatarJonathan Wakely <jwakely@redhat.com>
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/fs_path.h (path::iterator): Add noexcept to all
      	member functions and friend functions.
      	(distance): Add noexcept.
      	(advance): Add noexcept and inline.
      	* include/experimental/bits/fs_path.h (path::iterator):
      	Add noexcept to all member functions.
      42eff613
    • Jonathan Wakely's avatar
      libstdc++: Fix last std::tuple constructor missing 'constexpr' [PR102270] · 1fa2c5a6
      Jonathan Wakely authored
      
      Also rename the test so it actually runs.
      
      Signed-off-by: default avatarJonathan Wakely <jwakely@redhat.com>
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/102270
      	* include/std/tuple (_Tuple_impl): Add constexpr to constructor
      	missed in previous patch.
      	* testsuite/20_util/tuple/cons/102270.C: Moved to...
      	* testsuite/20_util/tuple/cons/102270.cc: ...here.
      	* testsuite/util/testsuite_allocator.h (SimpleAllocator): Add
      	constexpr to constructor so it can be used for C++20 tests.
      1fa2c5a6
    • Julian Brown's avatar
      openacc: Remove unnecessary barriers (gimple worker partitioning/broadcast) · 2961ac45
      Julian Brown authored
      This is an optimisation for middle-end worker-partitioning support (used
      to support multiple workers on AMD GCN).  At present, barriers may be
      emitted in cases where they aren't needed and cannot be optimised away.
      This patch stops the extraneous barriers from being emitted in the
      first place.
      
      One exception to the above (where the barrier is still needed) is for
      predicated blocks of code that perform a write to gang-private shared
      memory from one worker.  We must execute a barrier before other workers
      read that shared memory location.
      
      gcc/
      	* config/gcn/gcn.c (gimple.h): Include.
      	(gcn_fork_join): Emit barrier for worker-level joins.
      	* omp-oacc-neuter-broadcast.cc (find_local_vars_to_propagate): Add
      	writes_gang_private bitmap parameter. Set bit for blocks
      	containing gang-private variable writes.
      	(worker_single_simple): Don't emit barrier after predicated block.
      	(worker_single_copy): Don't emit barrier if we're not broadcasting
      	anything and the block contains no gang-private writes.
      	(neuter_worker_single): Don't predicate blocks that only contain
      	NOPs or internal marker functions.  Pass has_gang_private_write
      	argument to worker_single_copy.
      	(oacc_do_neutering): Add writes_gang_private bitmap handling.
      2961ac45
    • Julian Brown's avatar
      openacc: Shared memory layout optimisation · 2a3f9f65
      Julian Brown authored
      This patch implements an algorithm to lay out local data-share (LDS)
      space.  It currently works for AMD GCN.  At the moment, LDS is used for
      three things:
      
        1. Gang-private variables
        2. Reduction temporaries (accumulators)
        3. Broadcasting for worker partitioning
      
      After the patch is applied, (2) and (3) are placed at preallocated
      locations in LDS, and (1) continues to be handled by the backend (as it
      is at present prior to this patch being applied). LDS now looks like this:
      
        +--------------+ (gang-private size + 1024, = 1536)
        | free space   |
        |    ...       |
        | - - - - - - -|
        | worker bcast |
        +--------------+
        | reductions   |
        +--------------+ <<< -mgang-private-size=<number> (def. 512)
        | gang-private |
        |    vars      |
        +--------------+ (32)
        | low LDS vars |
        +--------------+ LDS base
      
      So, gang-private space is fixed at a constant amount at compile time
      (which can be increased with a command-line switch if necessary
      for some given code). The layout algorithm takes out a slice of the
      remainder of usable space for reduction vars, and uses the rest for
      worker partitioning.
      
      The partitioning algorithm works as follows.
      
       1. An "adjacency" set is built up for each basic block that might
          do a broadcast. This is calculated by starting at each such block,
          and doing a recursive DFS walk over successors to find the next
          block (or blocks) that *also* does a broadcast
          (dfs_broadcast_reachable_1).
      
       2. The adjacency set is inverted to get adjacent predecessor blocks also.
      
       3. Blocks that will perform a broadcast are sorted by size of that
          broadcast: the biggest blocks are handled first.
      
       4. A splay tree structure is used to calculate the spans of LDS memory
          that are already allocated by the blocks adjacent to this one
          (merge_ranges{,_1}.
      
       5. The current block's broadcast space is allocated from the first free
          span not allocated in the splay tree structure calculated above
          (first_fit_range). This seems to work quite nicely and efficiently
          with the splay tree structure.
      
       6. Continue with the next-biggest broadcast block until we're done.
      
      In this way, "adjacent" broadcasts will not use the same piece of
      LDS memory.
      
      PR96334 "openacc: Unshare reduction temporaries for GCN" got merged in:
      
      The GCN backend uses tree nodes like MEM((__lds TYPE *) <constant>)
      for reduction temporaries. Unlike e.g. var decls and SSA names, these
      nodes cannot be shared during gimplification, but are so in some
      circumstances. This is detected when appropriate --enable-checking
      options are used. This patch unshares such nodes when they are reused
      more than once.
      
      gcc/
      	* config/gcn/gcn-protos.h
      	(gcn_goacc_create_worker_broadcast_record): Update prototype.
      	* config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Use
      	preallocated block of LDS memory.  Do not cache/share decls for
      	reduction temporaries between invocations.
      	(gcn_goacc_reduction_teardown): Unshare VAR on second use.
      	(gcn_goacc_create_worker_broadcast_record): Add OFFSET parameter
      	and return temporary LDS space at that offset.  Return pointer in
      	"sender" case.
      	* config/gcn/gcn.c (acc_lds_size, gang_private_hwm, lds_allocs):
      	New global vars.
      	(ACC_LDS_SIZE): Define as acc_lds_size.
      	(gcn_init_machine_status): Don't initialise lds_allocated,
      	lds_allocs, reduc_decls fields of machine function struct.
      	(gcn_option_override): Handle default size for gang-private
      	variables and -mgang-private-size option.
      	(gcn_expand_prologue): Use LDS_SIZE instead of LDS_SIZE-1 when
      	initialising M0_REG.
      	(gcn_shared_mem_layout): New function.
      	(gcn_print_lds_decl): Update comment. Use global lds_allocs map and
      	gang_private_hwm variable.
      	(TARGET_GOACC_SHARED_MEM_LAYOUT): Define target hook.
      	* config/gcn/gcn.h (machine_function): Remove lds_allocated,
      	lds_allocs, reduc_decls. Add reduction_base, reduction_limit.
      	* config/gcn/gcn.opt (gang_private_size_opt): New global.
      	(mgang-private-size=): New option.
      	* doc/tm.texi.in (TARGET_GOACC_SHARED_MEM_LAYOUT): Place
      	documentation hook.
      	* doc/tm.texi: Regenerate.
      	* omp-oacc-neuter-broadcast.cc (targhooks.h, diagnostic-core.h):
      	Add includes.
      	(build_sender_ref): Handle sender_decl being pointer.
      	(worker_single_copy): Add PLACEMENT and ISOLATE_BROADCASTS
      	parameters.  Pass placement argument to
      	create_worker_broadcast_record hook invocations.  Handle
      	sender_decl being pointer and isolate_broadcasts inserting extra
      	barriers.
      	(blk_offset_map_t): Add typedef.
      	(neuter_worker_single): Add BLK_OFFSET_MAP parameter.  Pass
      	preallocated range to worker_single_copy call.
      	(dfs_broadcast_reachable_1): New function.
      	(idx_decl_pair_t, used_range_vec_t): New typedefs.
      	(sort_size_descending): New function.
      	(addr_range): New class.
      	(splay_tree_compare_addr_range, splay_tree_free_key)
      	(first_fit_range, merge_ranges_1, merge_ranges): New functions.
      	(execute_omp_oacc_neuter_broadcast): Rename to...
      	(oacc_do_neutering): ... this.  Add BOUNDS_LO, BOUNDS_HI
      	parameters.  Arrange layout of shared memory for broadcast
      	operations.
      	(execute_omp_oacc_neuter_broadcast): New function.
      	(pass_omp_oacc_neuter_broadcast::gate): Remove num_workers==1
      	handling from here.  Enable pass for all OpenACC routines in order
      	to call shared memory-layout hook.
      	* target.def (create_worker_broadcast_record): Add OFFSET
      	parameter.
      	(shared_mem_layout): New hook.
      libgomp/
      	* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Update.
      2a3f9f65
    • Julian Brown's avatar
      openacc: Turn off worker partitioning if num_workers==1 · 82792cc4
      Julian Brown authored
      
      This patch turns off the middle-end worker-partitioning support if the
      number of workers for an outlined offload function is one.  In that case,
      we do not need to perform the broadcasting/neutering code transformation.
      
      	gcc/
      	* omp-oacc-neuter-broadcast.cc
      	(pass_omp_oacc_neuter_broadcast::gate): Disable if num_workers is
      	1.
      	(execute_omp_oacc_neuter_broadcast): Adjust.
      
      Co-Authored-By: default avatarThomas Schwinge <thomas@codesourcery.com>
      82792cc4
    • Julian Brown's avatar
      Add 'libgomp.oacc-c-c++-common/broadcast-many.c' · 8251f90e
      Julian Brown authored
      libgomp/
      	* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: New test.
      8251f90e
    • Andrew MacLeod's avatar
      Provide a relation oracle for paths. · 534c5352
      Andrew MacLeod authored
      This provides a path_oracle class which can optionally be used in conjunction
      with another oracle to track relations on a path as it is walked.
      
      	* value-relation.cc (class equiv_chain): Move to header file.
      	(path_oracle::path_oracle): New.
      	(path_oracle::~path_oracle): New.
      	(path_oracle::register_relation): New.
      	(path_oracle::query_relation): New.
      	(path_oracle::reset_path): New.
      	(path_oracle::dump): New.
      	* value-relation.h (class equiv_chain): Move to here.
      	(class path_oracle): New.
      534c5352
    • Andrew MacLeod's avatar
      Virtualize relation oracle and various cleanups. · 3674d8e6
      Andrew MacLeod authored
      Standardize equiv_oracle API onto the new relation_oracle virtual base, and
      then have dom_oracle inherit from that.
      equiv_set always returns an equivalency set now, never NULL.
      EQ_EXPR requires symmetry now.  Each SSA name must be in the other equiv set.
      Shuffle some routines around, simplify.
      
      	* gimple-range-cache.cc (ranger_cache::ranger_cache): Create a DOM
      	based oracle.
      	* gimple-range-fold.cc (fur_depend::register_relation): Use
      	register_stmt/edge routines.
      	* value-relation.cc (equiv_chain::find): Relocate from equiv_oracle.
      	(equiv_oracle::equiv_oracle): Create self equivalence cache.
      	(equiv_oracle::~equiv_oracle): Release same.
      	(equiv_oracle::equiv_set): Return entry from self equiv cache if there
      	are no equivalences.
      	(equiv_oracle::find_equiv_block): Move list find to equiv_chain.
      	(equiv_oracle::register_relation): Rename from register_equiv.
      	(relation_chain_head::find_relation): Relocate from dom_oracle.
      	(relation_oracle::register_stmt): New.
      	(relation_oracle::register_edge): New.
      	(dom_oracle::*): Rename from relation_oracle.
      	(dom_oracle::register_relation): Adjust to call equiv_oracle.
      	(dom_oracle::set_one_relation): Split from register_relation.
      	(dom_oracle::register_transitives): Consolidate 2 methods.
      	(dom_oracle::find_relation_block): Move core to relation_chain.
      	(dom_oracle::query_relation): Rename from find_relation_dom and adjust.
      	* value-relation.h (class relation_oracle): New pure virtual base.
      	(class equiv_oracle): Inherit from relation_oracle and adjust.
      	(class dom_oracle): Rename from old relation_oracle and adjust.
      3674d8e6
    • qing zhao's avatar
      testsuite: Fix gcc.target/i386/auto-init-* tests. · 896fec24
      qing zhao authored
      This set of tests failed on many different combination of -march, -mtune.
      some of them failed with -fstack-protestor-all, or -mno-sse. And the
      pattern matches are also different on lp64 or ia32.
      
      The reason for these failures is that the RTL or assembly level patten
      matches are only valid for -march=x86-64 -mtune=generic.
      
      We restrict the testing only for -march=x86-64 and -mtune=generic. Also
      add -fno-stack-protector or -msse for some of the testing cases.
      
      gcc/testsuite/ChangeLog:
      
      2021-09-17  qing zhao  <qing.zhao@oracle.com>
      
      	* gcc.target/i386/auto-init-1.c: Restrict the testing only for
      	-march=x86-64 and -mtune=generic. Add -fno-stack-protector.
      	* gcc.target/i386/auto-init-2.c: Restrict the testing only for
      	-march=x86-64 and -mtune=generic -msse.
      	* gcc.target/i386/auto-init-3.c: Likewise.
      	* gcc.target/i386/auto-init-4.c: Likewise.
      	* gcc.target/i386/auto-init-5.c: Different pattern match for lp64 and
      	ia32.
      	* gcc.target/i386/auto-init-6.c: Restrict the testing only for
      	-march=x86-64 and -mtune-generic -msse. Add -fno-stack-protector.
      	* gcc.target/i386/auto-init-7.c: Likewise.
      	* gcc.target/i386/auto-init-8.c: Restrict the testing only for
      	-march=x86-64 and -mtune=generic -msse..
      	* gcc.target/i386/auto-init-padding-1.c: Likewise.
      	* gcc.target/i386/auto-init-padding-10.c: Likewise.
      	* gcc.target/i386/auto-init-padding-11.c: Likewise.
      	* gcc.target/i386/auto-init-padding-12.c: Likewise.
      	* gcc.target/i386/auto-init-padding-2.c: Likewise.
      	* gcc.target/i386/auto-init-padding-3.c: Restrict the testing only for
      	-march=x86-64. Different pattern match for lp64 and ia32.
      	* gcc.target/i386/auto-init-padding-4.c: Restrict the testing only for
      	-march=x86-64 and -mtune-generic -msse.
      	* gcc.target/i386/auto-init-padding-5.c: Likewise.
      	* gcc.target/i386/auto-init-padding-6.c: Likewise.
      	* gcc.target/i386/auto-init-padding-7.c: Restrict the testing only for
      	-march=x86-64 and -mtune-generic -msse. Add -fno-stack-protector.
      	* gcc.target/i386/auto-init-padding-8.c: Likewise.
      	* gcc.target/i386/auto-init-padding-9.c: Restrict the testing only for
      	-march=x86-64. Different pattern match for lp64 and ia32.
      896fec24
    • Martin Sebor's avatar
      Better handle MIN/MAX_EXPR of unrelated objects [PR102200]. · 31e924c5
      Martin Sebor authored
      Resolves:
      PR middle-end/102200 - ICE on a min of a decl and pointer in a loop
      
      gcc/ChangeLog:
      
      	PR middle-end/102200
      	* pointer-query.cc (access_ref::inform_access): Handle MIN/MAX_EXPR.
      	(handle_min_max_size): Change argument.  Store original SSA_NAME for
      	operands to potentially distinct (sub)objects.
      	(compute_objsize_r): Adjust call to the above.
      
      gcc/testsuite/ChangeLog:
      
      	PR middle-end/102200
      	* gcc.dg/Wstringop-overflow-62.c: Adjust text of an expected note.
      	* gcc.dg/Warray-bounds-89.c: New test.
      	* gcc.dg/Wstringop-overflow-74.c: New test.
      	* gcc.dg/Wstringop-overflow-75.c: New test.
      	* gcc.dg/Wstringop-overflow-76.c: New test.
      31e924c5
    • Bill Schmidt's avatar
      rs6000: Support for vectorizing built-in functions · 47e5052b
      Bill Schmidt authored
      This patch just duplicates a couple of functions and adjusts them to use the
      new builtin names.  There's no logical change otherwise.
      
      2021-09-17  Bill Schmidt  <wschmidt@linux.ibm.com>
      
      gcc/
      	* config/rs6000/rs6000.c (rs6000-builtins.h): New include.
      	(rs6000_new_builtin_vectorized_function): New function.
      	(rs6000_new_builtin_md_vectorized_function): Likewise.
      	(rs6000_builtin_vectorized_function): Call
      	rs6000_new_builtin_vectorized_function.
      	(rs6000_builtin_md_vectorized_function): Call
      	rs6000_new_builtin_md_vectorized_function.
      47e5052b
    • Bill Schmidt's avatar
      rs6000: Handle some recent MMA builtin changes · 6cba7d1d
      Bill Schmidt authored
      Peter Bergner recently added two new builtins __builtin_vsx_lxvp and
      __builtin_vsx_stxvp.  These happened to break a pattern in MMA builtins that
      I had been using to automate gimple folding of MMA builtins.  Previously,
      every MMA function that could be folded had an associated internal function
      that it was folded into.  The LXVP/STXVP builtins are just folded directly
      into memory operations.
      
      Instead of relying on this pattern, this patch adds a new attribute to
      builtins called "mmaint," which is set for all MMA builtins that have an
      associated internal builtin.  The naming convention that adds _INTERNAL to
      the builtin index name remains.
      
      The rest of the patch is just duplicating Peter's patch, using the new
      builtin infrastructure.
      
      2021-09-17  Bill Schmidt  <wschmidt@linux.ibm.com>
      
      gcc/
      	* config/rs6000/rs6000-builtin-new.def (ASSEMBLE_ACC): Add mmaint flag.
      	(ASSEMBLE_PAIR): Likewise.
      	(BUILD_ACC): Likewise.
      	(DISASSEMBLE_ACC): Likewise.
      	(DISASSEMBLE_PAIR): Likewise.
      	(PMXVBF16GER2): Likewise.
      	(PMXVBF16GER2NN): Likewise.
      	(PMXVBF16GER2NP): Likewise.
      	(PMXVBF16GER2PN): Likewise.
      	(PMXVBF16GER2PP): Likewise.
      	(PMXVF16GER2): Likewise.
      	(PMXVF16GER2NN): Likewise.
      	(PMXVF16GER2NP): Likewise.
      	(PMXVF16GER2PN): Likewise.
      	(PMXVF16GER2PP): Likewise.
      	(PMXVF32GER): Likewise.
      	(PMXVF32GERNN): Likewise.
      	(PMXVF32GERNP): Likewise.
      	(PMXVF32GERPN): Likewise.
      	(PMXVF32GERPP): Likewise.
      	(PMXVF64GER): Likewise.
      	(PMXVF64GERNN): Likewise.
      	(PMXVF64GERNP): Likewise.
      	(PMXVF64GERPN): Likewise.
      	(PMXVF64GERPP): Likewise.
      	(PMXVI16GER2): Likewise.
      	(PMXVI16GER2PP): Likewise.
      	(PMXVI16GER2S): Likewise.
      	(PMXVI16GER2SPP): Likewise.
      	(PMXVI4GER8): Likewise.
      	(PMXVI4GER8PP): Likewise.
      	(PMXVI8GER4): Likewise.
      	(PMXVI8GER4PP): Likewise.
      	(PMXVI8GER4SPP): Likewise.
      	(XVBF16GER2): Likewise.
      	(XVBF16GER2NN): Likewise.
      	(XVBF16GER2NP): Likewise.
      	(XVBF16GER2PN): Likewise.
      	(XVBF16GER2PP): Likewise.
      	(XVF16GER2): Likewise.
      	(XVF16GER2NN): Likewise.
      	(XVF16GER2NP): Likewise.
      	(XVF16GER2PN): Likewise.
      	(XVF16GER2PP): Likewise.
      	(XVF32GER): Likewise.
      	(XVF32GERNN): Likewise.
      	(XVF32GERNP): Likewise.
      	(XVF32GERPN): Likewise.
      	(XVF32GERPP): Likewise.
      	(XVF64GER): Likewise.
      	(XVF64GERNN): Likewise.
      	(XVF64GERNP): Likewise.
      	(XVF64GERPN): Likewise.
      	(XVF64GERPP): Likewise.
      	(XVI16GER2): Likewise.
      	(XVI16GER2PP): Likewise.
      	(XVI16GER2S): Likewise.
      	(XVI16GER2SPP): Likewise.
      	(XVI4GER8): Likewise.
      	(XVI4GER8PP): Likewise.
      	(XVI8GER4): Likewise.
      	(XVI8GER4PP): Likewise.
      	(XVI8GER4SPP): Likewise.
      	(XXMFACC): Likewise.
      	(XXMTACC): Likewise.
      	(XXSETACCZ): Likewise.
      	(ASSEMBLE_PAIR_V): Likewise.
      	(BUILD_PAIR): Likewise.
      	(DISASSEMBLE_PAIR_V): Likewise.
      	(LXVP): New.
      	(STXVP): New.
      	* config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_mma_builtin):
      	Handle RS6000_BIF_LXVP and RS6000_BIF_STXVP.
      	* config/rs6000/rs6000-gen-builtins.c (attrinfo): Add ismmaint.
      	(parse_bif_attrs): Handle ismmaint.
      	(write_decls): Add bif_mmaint_bit and bif_is_mmaint.
      	(write_bif_static_init): Handle ismmaint.
      6cba7d1d
Loading